Chapter 9 Introduction to Floating-Point: Basics, Data Types,
9.9 Loading Data into Floating-Point Registers
We have seen the various data types and formats available in the Cortex-M4 FPU, but how is data loaded into the register file and stored to memory? Fortunately, the instructions for loading and storing data to the FPU registers share features with the integer instructions seen in Chapter 5. We will first consider transfers to and from memory, then with the integer register file, and finally between FPU registers.
9.9.1 fLoATing-poinT LoAdsAnd sToRes: The insTRuCTions
Memory is accessed in the same way for floating-point data and integer data. The instructions and the format for floating-point loads and stores is given below.
VLDR|VSTR{<cond>}.32 <Sd>, [<Rn>{, #+/ − <imm>}]
VLDR|VSTR{<cond>}.64 <Dd>, [<Rn>{, #+/ − <imm>}]
The <cond> is an optional condition field, as discussed in Chapter 8. Notice that these instructions do not follow the convention of naming the destination first. For both loads and stores the FPU register is named first and the addressing follows. All FPU instructions may be predicated by a condition field; however, as described in Chapter 8, selecting a predicate, such as NE, introduces an IT instruction to affect the predicated execution. The <Sd> value is a single-precision register, the <Dd>
register is a pair of single-precision registers, the <Rn> register is an integer register, and the <imm> field is an 8-bit signed offset field. This addressing mode is referred to as pre-indexed addressing, since the offset is added to the address in the index register to form the effective address. For example, the instruction
VLDR s5, [r6, #08]
loads the 32-bit value located in memory into FPU register s5. The address is cre- ated from the value in register r6 plus the offset value of 8. Only fixed offsets and a single-index register are available in the FPU load and store instructions. An offset from an index register is useful in accessing constant tables and stacked data. Stacks will be covered in Chapter 13, and we will see an example of floating-point tables in Chapter 12.
VLDR may also be used to create literal pools of constants. This use is referred to as a pseudo-instruction, meaning the instruction as written in the source file is not a valid Cortex-M4 instruction, but is used by the assembler as a shortcut. The VLDR
pseudo-instruction used with immediate data creates a constant table and generates VLDR PC-relative addressed instructions. The format of the instruction is:
VLDR{<cond>}.F32 Sd, =constant VLDR{<cond>}.F64 Dd, =constant
Any value representable by the precision of the register to be loaded may be used as the constant. The format of the constants in the Keil tools may be any of the following:
[+/−]number.number (e.g., −5.873, 1034.77) [+/−]number[e[+/−]number] (e.g., 6e-5, −123e12)
[+/−]number.number[e[+/−]number] (e.g., 1.25e-18, −5.77e8)
For example, to load Avogadro’s constant, the molar gas constant, and Boltzmann’s constant in single-precision, the following pseudo-instructions are used to create a literal pool and generate the VLDR instructions to load the constant into the destina- tion registers.
VLDR.F32 s14, =6.0221415e23 ; Avogadro’s number VLDR.F32 s15, =8.314462 ; molar gas constant VLDR.F32 s16, =1.3806505e-23 ; Boltzmann’s constant The following code is generated:
41: VLDR.F32 s14, = 6.0221415e23 ; Avogadro’s number 0x0000001C ED9F7A03 VLDR s14,[pc,#0x0C]
42: VLDR.F32 s15, = 8.314462 ; molar gas constant 0x00000020 EDDF7A03 VLDR s15,[pc,#0x0C]
43: VLDR.F32 s16, = 1.3806505e-23 ; Boltzmann’s constant 0x00000024 ED9F8A03 VLDR s16,[pc,#0x0C]
The memory would be populated as shown below.
0x0000002C 0C30 DCW 0x0C30 0x0000002E 66FF DCW 0x66FF 0x00000030 0814 DCW 0x0809 0x00000032 4105 DCW 0x4105 0x00000034 8740 DCW 0x8740 0x00000036 1985 DCW 0x1985
You should convince yourself these constants and offsets are correct.
For hexadecimal constants, the following may be used:
VLDR{<cond>}.F32 Sd, =0f_xxxxxxxx where xxxxxxxx is an 8 character hex constant. For example,
VLDR.F32 s17, =0f_7FC00000 will load the default NaN value into register s17.
Note that Code Composer Studio does not support VLDR pseudo-instructions.
See Section 6.3.
9.9.2 The VMoV insTRuCTion
Often we want to copy data between ARM registers and the FPU. The VMOV instruction handles this, along with moving data between FPU registers and loading constants into FPU registers. The first of these instructions transfers a 32-bit operand between an ARM register and an FPU register; the second between an FPU register and an ARM register:
VMOV{<cond>}.F32 <Sd>, <Rt>
VMOV{<cond>}.F32 <Rt>, <Sn>
The format of the data type is given in the .F32 extension. When it could be unclear which data format the instruction is transferring, the data type is required to be included. The data type may be one of the following shown in Table 9.14.
We referred to the operand simply as a 32-bit operand because what is contained in the source register could be any 32-bit value, not necessarily a single-precision operand. For example, it could contain two half-precision operands. However, it does not have to be a floating-point operand at all. The FPU registers could be used as temporary storage for any 32-bit quantity.
The VMOV instruction may also be used to transfer data between FPU registers.
The syntax is
VMOV{<cond>}.F32 <Sd>, <Sn>
One important thing to remember in any data transfer operation is that the content of the source register is ignored in the transfer. That is, the data is simply transferred bit by bit. This means that if the data in the source register is an sNaN, the IOC flag will not be set. This is true for any data transfer operation, whether between FPU registers, or between an FPU register and memory, or between an FPU register and an ARM register.
As a legacy of the earlier FPUs that processed double-precision operands, the following VMOV instructions transfer to or from an ARM register and the upper or lower half of a double-precision register. The x is replaced with either a 1, for the top half, or a 0, for the lower half. This is necessary to identify which half of the double- precision register is being transferred.
TABLE 9.14
Data Type Identifiers
Data Type Identifier
Half-precision .F16 Single-precision .F32 or .F Double-precision .F64 or .D
VMOV{<cond>}.F32 <Dd[x]>, <Rt>
VMOV{<cond>}.F32 <Rt>, <Dn[x]>
It is not necessary to include the .F32 in the instruction format above, but it is good practice to make the data type explicit whenever possible. The use of this form of the VMOV instruction is common in routines which process double-precision values using integer instructions, such as routines that emulate double-precision operations. You may have access to integer routines that emulate the double-precision instructions that are defined in the IEEE 754-2008 specification but are not implemented in the Cortex-M4.
Two sets of instructions allow moving data between two ARM registers and two FPU registers. One key thing to note is that the ARM registers may be independently specified but the FPU registers must be contiguous. As with the instructions above, these are useful in handling double-precision operands or simply moving two 32-bit quantities in a single instruction. The first set is written as
VMOV{<cond>} <Sm>, <Sm1>, <Rt>, <Rt2>
VMOV{<cond>} <Rt>, <Rt2>, <Sm>, <Sm1>
The transfer is always between Sm and Rt, and Sm1 and Rt2. Sm1 must be the next contiguous register from Sm, so if Sm is register s6 then Sm1 is register s7. For example, the following instruction
VMOV s12, s13, r6, r11
would copy the contents of register r6 into register s12 and register r11 into register s13. The reverse operation is also available. The second set of instructions substitutes the two single-precision registers with a reference to a double-precision register. This form is a bit more limiting than the instructions above, but is often more useful in double-precision emulation code. The syntax for these instructions is shown below.
VMOV{<cond>} <Dm>, <Rt>, <Rt2>
VMOV{<cond>} <Rt>, <Rt2>, <Dm>
One final VMOV instruction is often very useful when a simple constant is needed. This is the immediate form of the instruction,
VMOV{<cond>}.F32 <Sd>, #<imm>
For many constants, the VMOV immediate form loads the constant without a memory access. Forming the constant can be a bit tricky, but fortunately for us, the assembler will do the heavy lifting. The format of the instruction contains two immediate fields, imm4H and imm4L, as we see in Figure 9.14.
The destination must be a single-precision register, meaning this instruction can- not be used to create half-precision constants. It’s unusual for the programmer to need to determine whether the constant can be represented, but if code space or speed is an issue, using immediate constants saves on area and executes faster than the PC-relative loads generated by the VLDR pseudo-instruction.
The single-precision operand is formed from the eight bits contained in the two 4-bit fields, imm4H and imm4L. The imm4H contains bits 7-4, and imm4L bits 3-0.
The bits contribute to the constant as shown in Figure 9.15.
While at first glance this does look quite confusing, many of the more common constants can be formed this way. The range of available constants is
+/− (1.0 … 1.9375) × 2(−3 … +4)
For example, the constant 1.0, or 0x3F800000, is formed when the immediate field is imm4H = 0111 and imm4L = 0000. When these bits are inserted as shown in Figure 9.15, we have the bit pattern shown in Figure 9.16.
Some other useful constants suitable for the immediate VMOV include those listed in Table 9.15. Notice that 0 and infinity cannot be represented, and if the con- stant cannot be constructed by this instruction, the assembler will create a literal pool.