C5-55.1.3 Scalar operations If the destination register lies in the first bank of eight registers, the instruction specifies a scalar operation: if d_bank == 0 then vec_len = 1 Sd[0] = d
Trang 1ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-5
5.1.3 Scalar operations
If the destination register lies in the first bank of eight registers, the instruction specifies a scalar operation:
if d_bank == 0 then vec_len = 1 Sd[0] = d_num Sn[0] = n_num Sm[0] = m_num
Note
Source operands The source operands are always scalars, regardless of which bank they are in This
allows individual elements of vectors to be used as scalars
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 2C5-6 Copyright © 1996-2000 ARM Limited All rights reserved ARM DDI 0100E
5.1.4 Mixed vector/scalar operations
If the destination register specified in the instruction does not lie in the first bank of eight registers, but the second source register does, then the destination register and first source register specify vectors and the second source register specifies a scalar:
if d_bank != 0 and m_bank == 0 then vec_len = vector length specified by FPSCR for i = 0 to vec_len-1
Sd[i] = (d_bank << 3) | d_index Sn[i] = (n_bank << 3) | n_index Sm[i] = m_num
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then d_index = d_index - 8 n_index = n_index + (vector stride specified by FPSCR)
if n_index > 7 then n_index = n_index - 8
Notes
First source operand
The first operand is always a vector, regardless of which bank it is in This allows a set of consecutive registers in the first bank to be treated as a vector
Vector wrap-around
A vector operand must not wrap around so that it re-uses its first element Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of
1, this is not a restriction, because the vector length is at most 8 When the FPSCR specifies
a vector stride of 2, it implies that the vector length must be at most 4
Operand overlap
If two operands overlap, they must be identical both in terms of which registers are accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that:
• If the set of register numbers generated in Sd[i] overlaps the set of register numbers generated in Sn[i], then d_num and n_num must be identical
• If the set of register numbers generated in Sn[i] includes m_num, the vector length must be 1
It is impossible for the set of register numbers generated in Sd[i] to include m_num, because they lie in different banks
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 3ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-7
Sd[i] = (d_bank << 3) | d_index Sn[i] = (n_bank << 3) | n_index Sm[i] = (m_bank << 3) | m_index d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then d_index = d_index - 8 n_index = n_index + (vector stride specified by FPSCR)
if n_index > 7 then n_index = n_index - 8 m_index = m_index + (vector stride specified by FPSCR)
if m_index > 7 then m_index = m_index - 8
Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element
Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this is not a restriction, since the vector length is at most 8 When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most 4
Operand overlap If two operands overlap, they must be identical both in terms of which registers are
accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that:
• If the set of register numbers generated in Sd[i] overlaps the set of register numbers generated in Sn[i], then d_num and n_num must be identical
• If the set of register numbers generated in Sd[i] overlaps the set of register numbers generated in Sm[i], then d_num and m_num must be identical
• If the set of register numbers generated in Sn[i] overlaps the set of register numbers generated in Sm[i], then n_num and m_num must be identical
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 4C5-8 Copyright © 1996-2000 ARM Limited All rights reserved ARM DDI 0100E
5.2 Addressing Mode 2 - Double-precision vectors (non-monadic)
When the vector length indicated by the FPSCR is greater than 1, the double-precision two-operand instructions FADDD, FDIVD, FMULD, FNMULD, and FSUBD can specify three different types of behavior:
• One arithmetic operation between two scalar values, yielding a scalar:
ScalarA op ScalarB → ScalarD
When this case is selected (see Scalar operations on page C5-11), it causes just one operation to be
performed, overriding the vector length specified in the FPSCR This allows scalar operations and vector operations to be mixed without the need to reprogram the FPSCR between them
• A set of N arithmetic operations, where N is the vector length specified in the FPSCR, with the first operand scanning through a vector, the second operand remaining constant and the destination scanning through a vector:
VectorA[0] op ScalarB → VectorD[0]
VectorA[1] op ScalarB → VectorD[1]
VectorA[N-1] op ScalarB → VectorD[N-1]
This can be abbreviated to:
VectorA op ScalarB → VectorD
• A set of N arithmetic operations, where N is the vector length specified in the FPSCR, with both operands and the destination scanning through vectors:
VectorA[0] op VectorB[0] → VectorD[0]
VectorA[1] op VectorB[1] → VectorD[1]
VectorA[N-1] op VectorB[N-1] → VectorD[N-1]
This can be abbreviated to:
VectorA op VectorB → VectorDThe double-precision three-operand instructions FMACD, FMSCD, FNMACD and FNMSCD each use the same register for their addition/subtraction operand as for their destination So they have three forms
corresponding to the above three:
• A pure scalar form:
± (ScalarA * ScalarB) ± ScalarD → ScalarD
• A form in which the second multiplication operand is a scalar and everything else scans through vectors:
± (VectorA[0] * ScalarB) ± VectorD[0] → VectorD[0]
± (VectorA[1] * ScalarB) ± VectorD[1] → VectorD[1]
Trang 5ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-9
This can be abbreviated to:
± (VectorA * ScalarB) ± VectorD → VectorD
• A form in which everything scans through a vector:
± (VectorA[0] * VectorB[0]) ± VectorD[0] → VectorD[0]
± (VectorA[1] * VectorB[1]) ± VectorD[1] → VectorD[1]
± (VectorA[N-1] * VectorB[N-1]) ± VectorD[N-1] → VectorD[N-1]
This can be abbreviated to:
± (VectorA * VectorB) ± VectorD → VectorD
5.2.1 Register banks
To allow these various forms to be specified, the set of 16 double-precision registers is split into four banks, each of four registers The form used by an instruction depends on which operands are in the first bank The general principle behind the rules is that the first bank must be used to hold scalar operands while the other banks are used to hold vector operands All destination register writes and many source register reads adhere
to this principle, but some source register reads can result in scalar access to vector elements or vector accesses to groups of scalars
A vector operand consists of 2-4 registers from a single bank, with the number of registers being specified
by the vector length field of the FPSCR (see Vector length/stride control on page C2-22) The register
number in the instruction specifies the register that contains the first element of the vector Each successive element of the vector is formed by incrementing the register number by the value specified by the vector stride field of the FPSCR If this causes the register number to overflow the top of the register bank, the register number wraps around to the bottom of the bank, as shown in Figure 5-2
Figure 5-2 Double-precision register banks
d12 d13 d14 d15
d0 d1 d2 d3
d4 d5 d6 d7
d8 d9 d10 d11
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 6C5-10 Copyright © 1996-2000 ARM Limited All rights reserved ARM DDI 0100E
5.2.2 Operation
The following pages describe each of the three possible forms of the addressing mode:
• Scalar operations on page C5-11
• Mixed vector/scalar operations on page C5-12
• Vector operations on page C5-13.
In each case, the following values are generated:
vec_len The number of individual operations specified by the instruction
Second source registers of the individual operations
The register numbers specified in the instruction are broken up into bank numbers and indices within the banks as follows:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 7ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-11
5.2.3 Scalar operations
If the destination register lies in the first bank of four registers, the instruction specifies a scalar operation:
if d_bank == 0 then vec_len = 1 Dd[0] = Dd Dn[0] = Dn Dm[0] = Dm
Notes
Source operands The source operands are always scalars, regardless of which bank they are in This
allows individual elements of vectors to be used as scalars
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 8C5-12 Copyright © 1996-2000 ARM Limited All rights reserved ARM DDI 0100E
5.2.4 Mixed vector/scalar operations
If the destination register specified in the instruction does not lie in the first bank of four registers, but the second source register does, then the destination register and first source register specify vectors and the second source register specifies a scalar:
if d_bank != 0 and m_bank == 0 then vec_len = vector length specified by FPSCR for i = 0 to vec_len-1
Dd[i] = (d_bank << 2) | d_index Dn[i] = (n_bank << 2) | n_index Dm[i] = Dm
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 3 then d_index = d_index - 4 n_index = n_index + (vector stride specified by FPSCR)
if n_index > 3 then n_index = n_index - 4
Notes
First source operand The first operand is always a vector, regardless of which bank it is in This allows a
set of consecutive registers in the first bank to be treated as a vector
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element
Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this implies that the vector length must be at most 4 When the FPSCR specifies a vector stride of 2, it implies that the vector length must
be at most 2
Operand overlap If two operands overlap, they must be identical both in terms of which registers are
accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that:
• If the set of register numbers generated in Dd[i] overlaps the set of register numbers generated in Dn[i], then Dd and Dn must be identical
• If the set of register numbers generated in Dn[i] includes Dm, then the vector length must be 1
It is impossible for the set of register numbers generated in Dd[i] to include Dm, because they lie in different banks
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 9ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-13
Dd[i] = (d_bank << 2) | d_index Dn[i] = (n_bank << 2) | n_index Dm[i] = (m_bank << 2) | m_index d_index = d_index + (vector stride specified by FPSCR)
if d_index > 3 then d_index = d_index - 4 n_index = n_index + (vector stride specified by FPSCR)
if n_index > 3 then n_index = n_index - 4 m_index = m_index + (vector stride specified by FPSCR)
if m_index > 3 then m_index = m_index - 4
Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element
Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this implies that the vector length must be at most 4 When the FPSCR specifies a vector stride of 2, it implies that the vector length must
be at most 2
Operand overlap If two operands overlap, they must be identical both in terms of which registers are
accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that:
• If the set of register numbers generated in Dd[i] overlaps the set of register numbers generated in Dn[i], then Dd and Dn must be identical
• If the set of register numbers generated in Dd[i] overlaps the set of register numbers generated in Dm[i], then Dd and Dm must be identical
• If the set of register numbers generated in Dn[i] overlaps the set of register numbers generated in Dm[i], then Dn and Dm must be identical
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 10C5-14 Copyright © 1996-2000 ARM Limited All rights reserved ARM DDI 0100E
5.3 Addressing Mode 3 - Single-precision vectors (monadic)
When the vector length indicated by the FPSCR is greater than 1, the single-precision one-operand instructions FABSS, FCPYS, FNEGS, and FSQRTS can specify three different types of behavior:
• An operation on a scalar value, yielding a scalar:
Op(ScalarB) → ScalarD
When this case is selected (see Scalar-to-scalar operations on page C5-16), it causes just one
operation to be performed, overriding the vector length specified in the FPSCR This allows scalar operations and vector operations to be mixed without the need to reprogram the FPSCR between them
• An operation on a scalar value, whose result is written to each of the N elements of a vector, where
N is the vector length specified in the FPSCR:
To allow these various forms to be specified, the set of 32 single-precision registers is split into four banks,
each of eight registers For a description of this, see Register banks on page C5-3.
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 10 9 8 7 6 5 4 3 0
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 11ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-15
5.3.1 Operation
The following pages describe each of the three possible forms of the addressing mode:
• Scalar-to-scalar operations on page C5-16
• Scalar-to-vector operations on page C5-17
• Vector-to-vector operations on page C5-18.
In each case, the following values are generated:
vec_len The number of individual operations specified by the instruction
Sd[0] Sd[vec_len-1]
Destination registers of the individual operations
Sm[0] Sm[vec_len-1]
Source registers of the individual operations
In all cases, the registers specified by the instruction are determined by concatenating the Fd and Fm fields
of the instruction with the D and M bits respectively:
d_num = (Fd << 1) | D m_num = (Fm << 1) | MThese register numbers are then broken up into bank numbers and indices within the banks as follows:d_bank = d_num[4:3]
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 12C5-16 Copyright © 1996-2000 ARM Limited All rights reserved ARM DDI 0100E
5.3.2 Scalar-to-scalar operations
If the destination register lies in the first bank of eight registers, the instruction specifies a scalar operation:
if d_bank == 0 then vec_len = 1 Sd[0] = d_num Sm[0] = m_num
Notes
Source operands The source operand is always a scalar, regardless of which bank it lies in This
allows individual elements of vectors to be used as scalars
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 13ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-17
Sd[i] = (d_bank << 3) | d_index Sm[i] = m_num
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then d_index = d_index - 8
Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element
Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this is not a restriction, because the vector length is at most 8 When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most 4
Operand overlap If the source and destination overlap, they must be identical both in terms of which
registers are accessed and the order in which they are accessed This implies that if the set of register numbers generated in Sn[i] includes m_num, the vector length must be 1
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 14C5-18 Copyright © 1996-2000 ARM Limited All rights reserved ARM DDI 0100E
Sd[i] = (d_bank << 3) | d_index Sm[i] = (m_bank << 3) | m_index d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then d_index = d_index - 8 m_index = m_index + (vector stride specified by FPSCR)
if m_index > 7 then m_index = m_index - 8
Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element
Otherwise, the results of the instruction are UNPREDICTABLE When the FPSCR specifies a vector stride of 1, this is not a restriction, since the vector length is at most 8 When the FPSCR specifies a vector stride of 2, it implies that the vector length must be at most 4
Operand overlap If the source and destination overlap, they must be identical both in terms of which
registers are accessed and the order in which they are accessed Otherwise, the results of the instruction are UNPREDICTABLE This implies that if the set of register numbers generated in Sd[i] overlaps the set of register numbers generated in Sm[i], d_num and m_num must be identical
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 15ARM DDI 0100E Copyright © 1996-2000 ARM Limited All rights reserved C5-19
5.4 Addressing Mode 4 - Double-precision vectors (monadic)
When the vector length indicated by the FPSCR is greater than 1, the double-precision one-operand instructions FABSD, FCPYD, FNEGD, and FSQRTD can specify three different types of behavior:
• An operation on a scalar value, yielding a scalar:
Op(ScalarB) > ScalarD
When this case is selected (see Scalar-to-scalar operations on page C5-21), it causes just one
operation to be performed, overriding the vector length specified in the FPSCR This allows scalar operations and vector operations to be mixed without the need to reprogram the FPSCR between them
• An operation on a scalar value, whose result is written to each of the N elements of a vector, where
N is the vector length specified in the FPSCR:
To allow these various forms to be specified, the set of 16 double-precision registers is split into four banks,
each of four registers For a description of this, see Register banks on page C5-9.
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 10 9 8 7 6 5 4 3 0
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.