The steps of the division algorithm can be somewhat simplified in order to speed up the process.. Since s = k/w and it; is a constant on a given computer, the standard multiphcation algo
Trang 15.2 Modular Addition Operation 99 A^B.C^ A , B , C , A3B3C3 A2B2C2 A , B , C , AoB^Co
iii iit iil ill iU iil
Fig 5.7 Carry Delayed Adder
combined, in other words, S' = A-\- B and S" = A-{- B -n can be computed
at the same time Then, we perform a sign detection to decide whether to
take S' or S" as the correct sum We will review algorithms of this type when
we study modular multiplication algorithms
5.2.1 Omura's M e t h o d
An efficient method computing the modular addition, which especially useful for multioperand modular addition was proposed by Omura in [260] Let n <
2^ This method allows a temporary value to grow larger than n, however, it
is always kept less than 2^ Whenever it exceeds 2^, the carry-out is ignored
and a correction is performed The correction factor is m = 2^^ — n, which
is precomputed and saved in a register Thus, Omura's method performs the
following steps given the integers A,B<2'^ (but they can be larger than n)
1 First compute S' = A-\- B
2 If there is a carry-out (of the /cth bit), then 5 = 5 ' + m, else S — S'
The correctness of Omura's algorithm follows from the observations that
• If there is no carry-out, then 5 = 4 4- -B is returned The sum S is less than 2^, but may be larger than n In a future computation, it will be brought below n if necessary
• If there is a carry-out, then we ignore the carry-out, which means we compute
S' = A-hB-2''
The result, which needs to be reduced modulo n, is in effect reduced
mod-ulo 2^^ We correct the result by adding m back to it, and thus, compute
Trang 2100 5 Prime Finite Field Arithmetic
= A-{-B-2^^2^-n
= A-hB -n
After all additions are completed, a final result is reduced modulo n by using
the standard technique As an example, let assume n = 39 Thus, we have
m = 2^ - 39 = 25 = (011001) The modular addition of A - 40 and 5 - 3 0
is performed using Omura's method as follows:
A = 40 - (101000)
B = 3 0 = (011110) S' = >l -f- B = 1(000110) Carry-out
m = (011001)
S = S' + m= (011111) Correction
Thus, we obtain the result as 5 = (011111) = 31 which is equal to 70 (mod 39)
as required On the other hand, the addition of A = 23 by B = 26 is performed
as
A = 23= (010111)
B = 2 6 = (011010) S' = A + B = 0(110001) No carry-out
S = S' = (110001)
This leaves the result as 5 = (110001) = 49 which is larger than the modulus
39 It will be reduced in a further step of the multioperand modulo addition
After all additions are completed, a final negative result can be corrected by
adding m to it For example, we correct the above result S = (110001) as
5.3 Modular Multiplication Operation
The modular multiplication problem is defined as the computation of P = AB (mod n) given the integers A, B, and n It is usually assumed that A and B are positive integers with 0 < A^B < n, i.e., they are the least positive residues
There are basically four approaches for computing the product P
• Multiply and then divide
Trang 35.3 Modular Multiplication Operation 101
The result P is a /c-bit or 5-word number
The reduction is accomplished by dividing P' by n, however, we are not
in-terested in the quotient; we only need the remainder The steps of the division algorithm can be somewhat simplified in order to speed up the process
5.3.1 Standard Multiplication Algorithm
Let A and B be two 5-digit (s-word) numbers expressed in radix W as:
where the digits of A and B are in the range [0, VF — 1] In general W can be
any positive number For reconfigurable hardware implementations, we often
select W = 2'^ where w is the word-size or granularity of the device, e.g.,
w = 4 The standard (pencil-and-paper) algorithm for multiplying A and B
produces the partial products by multiplying a digit of the multiplier (B)
by the entire number A, and then summing these partial products to obtain
the final number 2s-word number P' Let P-j denote the (Carry,Sum) pair produced from the product Ai • Bj For example, when W = 10, and Ai = 7 and Bj = 8, then P ^ = (5,6) The Plj pairs can be arranged in a table as
Trang 4102 5 Prime Finite Field Arithmetic
order to save space, a single partial product variable P' is being used The initial value of the partial product is equal to zero; we then take a digit of B and multiply by the entire number A, and add it to the partial product P'
The partial product variable P' contains the final product A- B dX the end of
the computation Algorithm 5.1 shows the standard procedure for computing
In the following, we show the steps of the computation of A- B = 348 • 857
using the standard algorithm
Trang 55.3 Modular Multiplication Operation 103
where the variables P/+j, Aj^ Bi, C, and S each hold a single-word, or a
W-bit number This step is termed as an inner-product operation which is common in many of the arithmetic and number-theoretic calculations The inner-product operation above requires that we multiply two VK-bit numbers and add this product to previous 'carry' which is also a VK-bit number and then add this result to the running partial product word P/^-j- From these three operations we obtain a 2V^-bit number since the maximum value is
->vr 1 + (2'^ - 1)(2^ _ 1) -f 2 ^ - 1 - 2^^ - 1 ^w w -)2Vr
Also, since the inner-product step is within the innermost loop, it needs to run
as fast as possible Of course, the best thing is to have a single microprocessor instruction for this computation; unfortunately, none of the currently available microprocessors and signal processors offers such a luxury A brief inspection
of the steps of this algorithm reveals that the total number of inner-product
steps is equal to 5^ Since s = k/w and it; is a constant on a given computer, the standard multiphcation algorithm requires 0{k'^) bit operations in order
to multiply two k-hit numbers
Trang 6104 5 Prime Finite Field Arithmetic
^ 1 2 P'
^ 1 2
P'
^ 0 3 2-^03 2P{2
^ 3 '
^2
A2 P' P' P'
in Step 5 of Algorithm 5.2 may be 1 bit longer than a single-precision number
which requires w bits Since
(2^ - 1) + 2(2^ - 1)(2^ - 1) -f (2^ - 1) = 22^^-^ - 2^+^
and
Trang 75.3 Modular Multiplication Operation 105
the carry-sum pair requires 2w-\-l bits instead of 2w bits for its representation
Thus, we need to accommodate this 'extra' bit during the execution of the operations in Steps 5, 6, and 7 of Algorithm 5.2 The resolution of this carry may depend on the way the carry bits are handled by the particular processor's architecture This issue, being rather implementation-dependent, will not be discussed here
5.3.3 Modular Reduction
The multiply-and-reduce modular multiplication algorithm first computes the
product A ' B (or, A - A) using one of the multiplication algorithms given
above The multiplication step is then followed by a division algorithm in order to compute the remainder However, as we have mentioned before, we are not interested in the quotient; we only need the remainder Therefore, the steps of the division algorithm can somewhat be simphfied in order to speed
up the process The reduction step can be achieved by making one of the well-known sequential division algorithms In the rest of this subsection, we describe the restoring and the nonrestoring division algorithms for computing
the remainder of P' when divided by n, where n is a general modulus^
Division is the most complex of the four basic arithmetic operations First
of all, it has two results: the quotient and the remainder Given a dividend
P' and a divisor n, a quotient Q and a remainder R have to be calculated in
order to satisfy
P' = Q'n-\-R with R < n
If P' and n are positive, then the quotient Q and the remainder R will be positive The sequential division algorithm successively shifts and subtracts n from P' until a remainder R with the property 0 < -R < n is found However,
after a subtraction we may obtain a negative remainder The restoring and nonrestoring algorithms take different actions when a negative remainder is obtained
Restoring Division Algorithm
Let Ri be the remainder obtained during the zth step of the division algorithm
Since we are not interested in the quotient, we ignore the generation of the bits of the quotient in the following algorithm The procedure given below
first left-aligns the operands P' and n Since P' is 2/i;-bit number and n is a
k-h\t number, the left ahgnment implies that n is shifted k bits to the left,
i.e., we start with 2^n Furthermore, the initial value of R is taken to be P', i.e., RQ = P', We then subtract the shifted n from P' to obtain R\\ if Ri is
^ It is noted that Solinas proposed in [338] primes of special form for which the reduction step can be accomplished with high efficiency However the material for Solinas special primes is not covered in this book The interested reader may consult [37]
Trang 8106 5 Prime Finite Field Arithmetic positive or zero, we continue to the next step If it is negative the remainder
is restored to its previous value as is shown in Algorithm 5.3 below
Algorithm 5.3 The Restoring Division Algorithm
Require: P\n, Ensure: R = P' mod n
In Step 5 of Algorithm 5.3, we check the sign of the remainder; if it is
negative, the previous remainder is taken to be the new remainder, i.e., a
restore operation is performed If the remainder Ri is positive, it remains as
the new remainder, i.e., we do not restore The restoring division algorithm
performs k subtractions in order to reduce the 2/c-bit number t modulo the /c-bit number n Thus, it takes much longer than the standard multiplication algorithm which requires s = k/w inner-product steps, where w is the word-
size of granularity being employed
In the following, we give an example of the restoring division algorithm for computing 3019 mod 53, where 3019 = (101111001011)2 and 53 - (110101)2-The result is 51 = (110011)2
Trang 95.3 Modular Multiplication Operation 107
RQ
n
-Ri n/2
R4 n/2 n/2 n/2
Nonrestoring Division Algorithm
The nonrestoring division algorithm allows a negative remainder In order to correct the remainder, a subtraction or an addition is performed during the next cycle, depending on the whether the sign of the remainder is positive
or negative, respectively This is based on the following observation: Suppose
Ri — Ri-\ — n < 0, then the restoring algorithm assigns Ri \= Ri-i and
performs a subtraction with the shifted n, obtaining
Ri^i ==Ri- n/2 = Ri-i - n/2
However, if Ri = Ri-i — n < 0, then one can instead let Ri remain negative
and add the shifted n in the following cycle Thus, one obtains
Ri^i = Ri-^ n/2 ^ {Ri-i - n) 4- n/2 = Ri-i - n/2,
which would be the same value The steps of the nonrestoring algorithm, which implements this observation, are given in Algorithm 5.4
Note that the nonrestoring division algorithm requires a final restoration cycle in which a negative remainder is corrected by adding the last value of n back to it
Trang 10108 5 Prime Finite Field Arithmetic
Algorithm 5.4 The Nonrestoring Division Algorithm
Return(J^fc)
In the following we compute 51 — 3019 mod 53 using the nonrestoring division algorithm Since the remainder is allowed to stay negative, we use 2's complement coding to represent such numbers
R 0 n o o n Final remainder
5.3.4 Interleaving Multiplication and Reduction
The interleaving algorithm has been known The details of the method are
sketched in papers [27, 334] Let Ai and Bi be the bits of the k-hit positive integers A and JB, respectively The product P' can be written as
Trang 115.3 Modular Multiplication Operation 109
fc-i fc-i
P' =: A'B^A'Y^ Bi2' = Y^{A • Bi)2'
i=0 i=0
= 2("' 2(2(0 -f A Bk-i) + A Bk-2) -\-' )-{-A - BQ
This formulation yields the shift-add multiphcation algorithm Notice that we
also reduce the partial product modulo n at each step of Algorithm 5.5
Algorithm 5.5 The Interleaving Multiplication Algorithm
5: end for 6: Return(P)
Assuming that A, B^P < n, we have
P :=2P + A' Bj
< 2 ( n - l ) - f ( n - 1 ) = 3 n - 3
Thus, the new P will be in the range 0 < P < 3n — 3, and at most 2
sub-tractions are needed to reduce P to the range 0 < P < n We can use the following algorithm to bring P back to this range:
P ' := P - n ; If P ' > 0 then P = P'
P ' := P - n ; If P ' > 0 then P = P' The computation of P requires k steps, at each step we perform the following
The left shift operation is easily performed by wiring The partial products,
on the other hand, are generated using an array of AND gates The most crucial operations are the addition and subtraction operations: they need to
be performed fast We have the following avenues to explore:
• We can use the carry propagate adder, introducing 0{k) delay per step
However, Omura's method can be used to avoid unnecessary subtractions:
Trang 12110 5 Prime Finite Field Arithmetic
3a P := 2 P 3b If carry-out then P := P -{- m
3c P \= P-\- A' Bj
3d If carry-out then P := P -h m
• We can use the carry save adder, introducing only 0(1) delay per step
However, recall that the sign information is not immediately available in the CSA We need to perform fast sign detection in order to determine whether the partial product needs to be reduced modulo n
5.3.5 Utilization of Carry Save Adders
In order to utilize the carry save adders in performing the modular
multipli-cation operations, we represent the numbers as the carry save pairs (C^S),
where the value of the number is the sum C-f 5 The carry save adder method
of the interleaving algorithm is given in Algorithm 5.6
Algorithm 5.6 The Carry-Save Interleaving Multiphcation Algorithm
7: end if 8: end for 9: Return(C,5)
The function SIGN gives the sign of the carry save number C -\- S', Since
the exact sign is available only when a full addition is performed, we calculate
an estimated sign with the SIGN function A sign estimation algorithm was introduced in [185] Here, we briefly review this algorithm, which is based on
the addition of the most significant t bits of C and S to estimate the sign of
C 4- 5 For example, let C = (011110) and S = (001010), then the function
SIGN produces
Trang 135.3 Modular Multiplication Operation 111
C - 0 1 1 1 1 0
S = 001010 {t = 1) SIGN = 0
(t = 2) SIGN = 01 (t = 3) SIGN = 100 (^ = 4) SIGN = 1001 (t = 5) SIGN = 10100 (t = 6) SIGN = 101000
In the worst case the exact sign is produced after adding all k bits If the
exact sign of C + 5 is computed, we can obtain the result of the multiplication operation in the correct range [0, n) If an estimation of the sign is used, then
we will prove that the range of the result becomes [0, n + Zl), where A depends
on the precision of the estimation Furthermore, since the sign is used to decide whether some multiple of n should be subtracted from the partial product,
an error in the decision causes only an error of a multiple of n in the partial
product, which is corrected later We define function T{X) on an n-bit integer
J If T(C) + T(S) > 0 then set C := C and S := 5
In Step J, the computation of the sign bit R of T{C) + T{S) involves n — t most significant bits of C and S The above procedure reduces a carry-sum
pair from the range
If the estimated sign in Step J is positive for all Q iterations, then QN is
subtracted from the initial pair; therefore
Trang 14112 5 Prime Finite Field Arithmetic
Since in Step I we perform (C, S) := C -\- S ~ n and in the last iteration the
carry-sum pair is not reduced (because the estimated sign is negative), we must have
For example, if Q = 3, then k = 2 can be used Instead of subtracting n three times, we first subtract 2N and then n This observation is utilized in
Algorithm 5.7
The parameter t controls the precision of estimation; the accuracy of the
estimation and the total amount of logic required to implement it decreases
as t increases After Step 7 of Algorithm 5.7, we have
CW+^CO < n - h 2 S
which implies that after the next shift-add step the range of C^*"^^^ + S^'^'^^^
will be [0,3N -f 2*+^) Assuming Q = 3, we have
3iV + 2*+^ < (Q + l ) n + 2* = 4iV + 2\
which implies 2* < n, or t < n - 1 The range of C^*"^^^ 4- S^*"*"^^ becomes
0 < C^^+i) -f 5(^+1) < 3A^ 4- 2*-^^ < 3A^ 4- 2^ < 2^-^^
Trang 155.3 Modular Multiplication Operation 113
Algorithm 5.7 The Carry-Save Interleaving Multiplication Algorithm
if T{C^'^)+T(S^''>) > O t h e n C(^) :=C(^> and5(^^ := S^'\
e n d if (C(^\5(^>) := C^^^-f-5^^^ - n ;
if T(C'(^))+T(5(^)) > O t h e n C^^) :=C(^) and5(^) :=5(^);
e n d if : e n d for : Return(C(^\5^^^)
_2n+i < _2jv < C^^+i) + 5^^+^^ < n 4- 2^ < 2"^+^
In order to contain the temporary results, we use (n-f-3)-bit carry save adders
which can represent integers in the range [—2"""^^, 2""^^) When t = n — 1, the sign estimation technique checks 5 most significant bits of C^^^ and S^^^
from the bit locations n — 2 to n 4- 3 This algorithm produces a pair of
integers (C, 5) = (C(^),5(^)) such that P = C + 5 is in the range [0,2N)
The final result in the correct range [0, n) can be obtained by computing
P — C -{• S and P = C -{• S — n using carry propagate adders If P < 0,
we have P = P -\- n < n^ and thus P is in the correct range Otherwise, we choose P because 0<P = P — n<2^<n implies P € [0, n) The steps of
the algorithm for computing 47 • 48 (mod 50), are illustrated in the following figure Here we have
in 3k = 18 clock cycles The range of C -f- 5" = 184 - 128 = 56 is [0, 2 • 50)
The final result is found by computing C H- 5 = 56 and C -\- S — n = 6^ and
selecting the latter since it is positive