Chapter 6 Constants and Literal Pools
6.3 Loading Constants into Registers
We covered the topic of memory in detail in the last chapter, and we saw that there are specific instructions for loading data from memory into a register—the LDR instruction. You can create the address required by this instruction in a number of different ways, and so far we’ve examined addresses loaded directly into a register.
Now the idea of an address created from the Program Counter is introduced, where register r15 (the PC) is used with a displacement value to create an address. And
15
1 1 1 1 0 i 0 0 0 1 0 S 1 1 1 1 0 imm3 Rd imm8 14
Encoding T2 ARMv7-M MOV{S}<c> .W <Rd>, #<const>
13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
FIGURE 6.5 MOV operation using a 32-bit Thumb instruction.
we’re also going to bend the LDR instruction a bit to create a pseudo-instruction that the assembler understands.
First, the shortcut: When writing assembly, you should use the following pseudo- instruction to load constants into registers, as this is by far the easiest, safest, and most maintainable way, assuming that your assembler supports it:
LDR <Rd>, =<numeric constant>
or for floating-point numbers
VLDR.F32 <Sd>, =<numeric constant>
VLDR.F64 <Dd>, =<numeric constant>
so you could say something like
LDR r8, =0x20000040; start of my stack or
VLDR.F32 s7, =3.14159165; pi
It may seem unusual to use a pseudo-instruction, but there’s a valid reason to do so. For most programmers, constants are declared at the start of sections of code, and it may be necessary to change values as code is written, modified, and maintained by other programmers. Suppose that a section of code begins as
SRAM_BASE EQU 0x04000000
AREA EXAMPLE, CODE, READONLY
;
; initialization section
;
ENTRY
MOV r0, #SRAM_BASE MOV r1, #0xFF000000 .
. .
If the value of SRAM_BASE ever changed to a value that couldn’t be generated using the byte rotation scheme, the code will generate an error. If the code were written using
LDR r0, = SRAM_BASE
instead, the code will always assemble no matter what value SRAM_BASE takes.
This immediately raises the question of how the assembler handles those “unusual”
constants.
When the assembler sees the LDR pseudo-instruction, it will try to use either a MOV or MVN instruction to perform the given load before going further. Recall
that we can generate classes of numbers, but not every number, using the rotation schemes mentioned earlier. For those numbers that cannot be created, a literal pool, or a block of constants, is created to hold them in memory, usually very near the instructions that asked for the data, along with a load instruction that fetches the constant from memory. By default, a literal pool is placed at every END directive, so a load instruction would look just beyond the last instruction in a block of code for your number. However, the addressing mode that is used to do this, called a PC-relative address, only has a range of 4 kilobytes (since the offset is only 12 bits), which means that a very large block of code can cause a problem if we don’t correct for it. In fact, even a short block of code can potentially cause problems. Suppose we have the following ARM7TDMI code in memory:
AREA Example, CODE
ENTRY ; mark first instruction BL func1 ; call first subroutine BL func2 ; call second subroutine stop B stop ; terminate the program func1 LDR r0, =42 ; => MOV r0, #42
LDR r1, =0x12345678 ; => LDR r1, [PC, #N]
; where N = offset to literal pool 1 LDR r2, =0xFFFFFFFF ; => MVN r2, #0
BX lr ; return from subroutine LTORG ; literal pool 1 has 0x12345678 func2 LDR r3, =0x12345678 ; => LDR r3, [PC, #N]
; N = offset back to literal pool 1
;LDR r4, =0x87654321 ; if this is uncommented, it fails.
; Literal pool 2 is out of reach!
BX lr ; return from subroutine BigTable
SPACE 4200 ; clears 4200 bytes of memory,
; starting here
END ; literal pool 2 empty
This contrived program first calls two very short subroutines via the branch and link (BL) instruction. The next instruction is merely to terminate the program, so for now we can ignore it. Notice that the first subroutine, labeled func1, loads the number 42 into register r0, which is quite easy to do with a byte rotation scheme.
In fact, there is no rotation needed, since 0x2A fits within a byte. So the assembler generates a MOV instruction to load this value. The next value, 0x12345678, is too
“odd” to create using a rotation scheme; therefore, the assembler is forced to generate a literal pool, which you might think would start after the 4200 bytes of space we’ve reserved at the end of the program. However, the load instruction cannot reach this far, and if we do nothing to correct for this, the assembler will generate an error. The second load instruction in the subroutine, the one setting all the bits in register r2, can be performed with a MVN instruction. The final instruction in the subroutine transfers the value from the Link Register (r14) back into the Program Counter (reg- ister r15), thereby forcing the processor to return to the instruction following the first BL instruction. Don’t worry about subroutines just yet, as there is an entire chapter covering their operation.
By inserting an LTORG directive just at the end of our first subroutine, we have forced the assembler to build its literal pool between the two subroutines in memory, as shown in Figure 6.6, which shows the memory addresses, the instructions, and the actual mnemonics generated by the assembler. You’ll also notice that the LDR instruction at address 0x10 in our example appears as
LDR r1, [PC,#0x0004]
which needs some explanation as well. As we saw in Chapter 5, this particular type of load instruction tells the processor to use the Program Counter (which always contains the address of the instruction being fetched from memory) modify that number (in this case add the number 8 to it) and then use this as an address.
When we used the LTORG directive and told the assembler to put our literal pool between the subroutines in memory, we fixed the placement of our constants, and the assembler can then calculate how far those constants lie from the address in the Program Counter. The important thing to note in all of this is where the Program Counter is when the LDR instruction is in the pipeline’s execute stage. Again, referring to Figure 6.6, you can see that if the LDR instruction is in the execute stage of the ARM7TDMI’s pipeline, the MVN is in the decode stage, and the BX instruction is in the fetch stage. Therefore, the difference between the address 0x18 (what’s in the PC) and where we need to be to get our constant, which is 0x1C, is 4, which is the offset used to modify the PC in the LDR instruction. The good news is that you don’t ever have to calculate these offsets yourself—the assembler does that for you.
There are two more constants in the second subroutine, only one of which actu- ally gets turned into an instruction, since we commented out the second load instruc- tion. You will notice that in Figure 6.6, the instruction at address 0x20 is another PC-relative address, but this time the offset is negative. It turns out that the instruc- tions can share the data already in a literal pool. Since the assembler just generated this constant for the first subroutine, and it just happens to be very near our instruc- tion (within 4 kilobytes), you can just subtract 12 from the value of the Program Counter when the LDR instruction is in the execute stage of the pipeline. (For those
Address Instruction
0x00000000 0x00000004 0x00000008 0x00000010 0x00000014 0x00000018 0x0000000C
0x0000001C 0x00000020 0x00000024
EB000001 EB000005 EAFFFFFE E59F1004 E3E02000 E12FFF1E E3A0002A
12345678 E51F300C E12FFF1E BX
BL 0x0000000C BL 0x00000020 B 0x00000008 LDR
MVN BX
MOV RO,#0x0000002A
LDR R14
R1,[PC,#0x0004]
R2,#0x00000000 R14
R3,[PC,#–0x000C]
← PC
← PC + 4
EXECUTE DECODE
FETCH
FIGURE 6.6 Disassembly of ARM7TDMI program.
readers really paying attention: the Program Counter seems to have fetched the next instruction from beyond our little program—is this a problem or not?) The second load instruction has been commented out to prevent an assembler error. As we’ve put a table of 4200 bytes just at the end of our program, the nearest literal pool is now more than 4 kilobytes away, and the assembler cannot build an instruction to reach that value in memory. To fix this, another LTORG directive would need to be added just before the table begins.
If you tried to run this same code on a Cortex-M4, you would notice several things. First, the assembler would generate code using a combination of 16-bit and 32-bit instructions, so the disassembly would look very different. More importantly, you would get an error when you tried to assemble the program, since the second subroutine, func2, tries to create the constant 0x12345678 in a second literal pool, but it would be beyond the 4 kilobyte limit due to that large table we created. It cannot initially use the value already created in the first literal pool like the ARM7TDMI did because the assembler creates the shorter (16-bit) version of the LDR instruction.
Looking at Figure 6.7, you can see the offset allowed in the shorter instruction is only 8 bits, which is scaled by 4 for word accesses, and it cannot be negative. So now that the Program Counter has progressed beyond the first literal pool in memory, a PC-relative load instruction that cannot subtract values from the Program Counter to create an address will not work. In effect, we cannot see backwards. To correct this, a very simple modification of the instruction consists of adding a “.W” (for wide) extension to the LDR mnemonic, which forces the assembler to use a 32-bit Thumb-2 instruction, giving the instruction more options for creating addresses. The code below will now run without any issues.
BL func1 ; call first subroutine BL func2 ; call second subroutine stop B stop ; terminate the program func1 LDR r0, =42 ; => MOV r0, #42
LDR r1, =0x12345678 ; => LDR r1, [PC, #N]
; where N = offset to literal pool 1 LDR r2, =0xFFFFFFFF ; => MVN r2, #0
BX lr ; return from subroutine LTORG ; literal pool 1 has 0x12345678
All versions of the Thumb ISA.
Encoding T1
Encoding T2
LDR<c> <Rt>, <label>
LDR<c>.W <Rt>, <label>
LDR<c>.W <Rt>, [PC, #–0]
15 14 13 12 11 0 1 0 0 1
10 9
Rt imm8
8 7 6 5 4 3 2 1 0
ARMv7-M
Special case 15
1 1 1 1 1 0 0 0 U 1 0 1 1 1 1 1 Rt imm12
14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
FIGURE 6.7 LDR instruction in Thumb and Thumb-2.
func2 LDR.W r3, =0x12345678 ; => LDR r3, [PC, #N]
; N = offset back to literal pool 1
;LDR r4, =0x98765432 ; if this is uncommented, it fails.
; Literal pool 2 is out of reach!
BX lr ; return from subroutine BigTable
SPACE 4200 ; clears 4200 bytes of memory,
; starting here
So to summarize:
Use LDR <Rd > , =< numeric constant> to put a constant into an integer register.
Use VLDR <Sd > , =< numeric constant> to put a constant into a floating-point register. We’ll see this again in Section 9.9.
Literal pools are generated at the end of each section of code.
The assembler will check if the constant is available in a literal pool already, and if so, it will attempt to address the existing constant.
On the Cortex-M4, if an error is generated indicating a constant is out of range, check the width of the LDR instruction.
The assembler will attempt to place the constant in the next literal pool if it is not already available. If the next literal pool is out of range, the assembler will generate an error and you will need to fix it, probably with an LTORG or adjusting the width of the instruction used.
If you do use an LTORG, place the directive after the failed LDR pseudo- instruction and within ±4 kilobytes. You must place literal pools where the processor will not attempt to execute the data as an instruction, so put the literal pools after unconditional branch instructions or at the end of a subroutine.