Tài liệu Cryptographic Algorithms on Reconfigurable Hardware- P5 doc

The steps of the division algorithm can be somewhat simplified in order to speed up the process.. Since s = k/w and it; is a constant on a given computer, the standard multiphcation algo

Trang 1

5.2 Modular Addition Operation 99 A^B.C^ A , B , C , A3B3C3 A2B2C2 A , B , C , AoB^Co

iii iit iil ill iU iil

Fig 5.7 Carry Delayed Adder

combined, in other words, S' = A-\- B and S" = A-{- B -n can be computed

at the same time Then, we perform a sign detection to decide whether to

take S' or S" as the correct sum We will review algorithms of this type when

we study modular multiplication algorithms

5.2.1 Omura's M e t h o d

An efficient method computing the modular addition, which especially useful for multioperand modular addition was proposed by Omura in [260] Let n <

2^ This method allows a temporary value to grow larger than n, however, it

is always kept less than 2^ Whenever it exceeds 2^, the carry-out is ignored

and a correction is performed The correction factor is m = 2^^ — n, which

is precomputed and saved in a register Thus, Omura's method performs the

following steps given the integers A,B<2'^ (but they can be larger than n)

1 First compute S' = A-\- B

2 If there is a carry-out (of the /cth bit), then 5 = 5 ' + m, else S — S'

The correctness of Omura's algorithm follows from the observations that

• If there is no carry-out, then 5 = 4 4- -B is returned The sum S is less than 2^, but may be larger than n In a future computation, it will be brought below n if necessary

• If there is a carry-out, then we ignore the carry-out, which means we compute

S' = A-hB-2''

The result, which needs to be reduced modulo n, is in effect reduced

mod-ulo 2^^ We correct the result by adding m back to it, and thus, compute

Trang 2

100 5 Prime Finite Field Arithmetic

= A-{-B-2^^2^-n

= A-hB -n

After all additions are completed, a final result is reduced modulo n by using

the standard technique As an example, let assume n = 39 Thus, we have

m = 2^ - 39 = 25 = (011001) The modular addition of A - 40 and 5 - 3 0

is performed using Omura's method as follows:

A = 40 - (101000)

B = 3 0 = (011110) S' = >l -f- B = 1(000110) Carry-out

m = (011001)

S = S' + m= (011111) Correction

Thus, we obtain the result as 5 = (011111) = 31 which is equal to 70 (mod 39)

as required On the other hand, the addition of A = 23 by B = 26 is performed

as

A = 23= (010111)

B = 2 6 = (011010) S' = A + B = 0(110001) No carry-out

S = S' = (110001)

This leaves the result as 5 = (110001) = 49 which is larger than the modulus

39 It will be reduced in a further step of the multioperand modulo addition

After all additions are completed, a final negative result can be corrected by

adding m to it For example, we correct the above result S = (110001) as

5.3 Modular Multiplication Operation

The modular multiplication problem is defined as the computation of P = AB (mod n) given the integers A, B, and n It is usually assumed that A and B are positive integers with 0 < A^B < n, i.e., they are the least positive residues

There are basically four approaches for computing the product P

• Multiply and then divide

Trang 3

5.3 Modular Multiplication Operation 101

The result P is a /c-bit or 5-word number

The reduction is accomplished by dividing P' by n, however, we are not

in-terested in the quotient; we only need the remainder The steps of the division algorithm can be somewhat simplified in order to speed up the process

5.3.1 Standard Multiplication Algorithm

Let A and B be two 5-digit (s-word) numbers expressed in radix W as:

where the digits of A and B are in the range [0, VF — 1] In general W can be

any positive number For reconfigurable hardware implementations, we often

select W = 2'^ where w is the word-size or granularity of the device, e.g.,

w = 4 The standard (pencil-and-paper) algorithm for multiplying A and B

produces the partial products by multiplying a digit of the multiplier (B)

by the entire number A, and then summing these partial products to obtain

the final number 2s-word number P' Let P-j denote the (Carry,Sum) pair produced from the product Ai • Bj For example, when W = 10, and Ai = 7 and Bj = 8, then P ^ = (5,6) The Plj pairs can be arranged in a table as

Trang 4

order to save space, a single partial product variable P' is being used The initial value of the partial product is equal to zero; we then take a digit of B and multiply by the entire number A, and add it to the partial product P'

The partial product variable P' contains the final product A- B dX the end of

the computation Algorithm 5.1 shows the standard procedure for computing

In the following, we show the steps of the computation of A- B = 348 • 857

using the standard algorithm

Trang 5

where the variables P/+j, Aj^ Bi, C, and S each hold a single-word, or a

W-bit number This step is termed as an inner-product operation which is common in many of the arithmetic and number-theoretic calculations The inner-product operation above requires that we multiply two VK-bit numbers and add this product to previous 'carry' which is also a VK-bit number and then add this result to the running partial product word P/^-j- From these three operations we obtain a 2V^-bit number since the maximum value is

->vr 1 + (2'^ - 1)(2^ _ 1) -f 2 ^ - 1 - 2^^ - 1 ^w w -)2Vr

Also, since the inner-product step is within the innermost loop, it needs to run

as fast as possible Of course, the best thing is to have a single microprocessor instruction for this computation; unfortunately, none of the currently available microprocessors and signal processors offers such a luxury A brief inspection

of the steps of this algorithm reveals that the total number of inner-product

steps is equal to 5^ Since s = k/w and it; is a constant on a given computer, the standard multiphcation algorithm requires 0{k'^) bit operations in order

to multiply two k-hit numbers

Trang 6

^ 1 2 P'

^ 1 2

P'

^ 0 3 2-^03 2P{2

^ 3 '

^2

A2 P' P' P'

in Step 5 of Algorithm 5.2 may be 1 bit longer than a single-precision number

which requires w bits Since

(2^ - 1) + 2(2^ - 1)(2^ - 1) -f (2^ - 1) = 22^^-^ - 2^+^

and

Trang 7

the carry-sum pair requires 2w-\-l bits instead of 2w bits for its representation

Thus, we need to accommodate this 'extra' bit during the execution of the operations in Steps 5, 6, and 7 of Algorithm 5.2 The resolution of this carry may depend on the way the carry bits are handled by the particular processor's architecture This issue, being rather implementation-dependent, will not be discussed here

5.3.3 Modular Reduction

The multiply-and-reduce modular multiplication algorithm first computes the

product A ' B (or, A - A) using one of the multiplication algorithms given

above The multiplication step is then followed by a division algorithm in order to compute the remainder However, as we have mentioned before, we are not interested in the quotient; we only need the remainder Therefore, the steps of the division algorithm can somewhat be simphfied in order to speed

up the process The reduction step can be achieved by making one of the well-known sequential division algorithms In the rest of this subsection, we describe the restoring and the nonrestoring division algorithms for computing

the remainder of P' when divided by n, where n is a general modulus^

Division is the most complex of the four basic arithmetic operations First

of all, it has two results: the quotient and the remainder Given a dividend

P' and a divisor n, a quotient Q and a remainder R have to be calculated in

order to satisfy

P' = Q'n-\-R with R < n

If P' and n are positive, then the quotient Q and the remainder R will be positive The sequential division algorithm successively shifts and subtracts n from P' until a remainder R with the property 0 < -R < n is found However,

after a subtraction we may obtain a negative remainder The restoring and nonrestoring algorithms take different actions when a negative remainder is obtained

Restoring Division Algorithm

Let Ri be the remainder obtained during the zth step of the division algorithm

Since we are not interested in the quotient, we ignore the generation of the bits of the quotient in the following algorithm The procedure given below

first left-aligns the operands P' and n Since P' is 2/i;-bit number and n is a

k-h\t number, the left ahgnment implies that n is shifted k bits to the left,

i.e., we start with 2^n Furthermore, the initial value of R is taken to be P', i.e., RQ = P', We then subtract the shifted n from P' to obtain R\\ if Ri is

^ It is noted that Solinas proposed in [338] primes of special form for which the reduction step can be accomplished with high efficiency However the material for Solinas special primes is not covered in this book The interested reader may consult [37]

Trang 8

106 5 Prime Finite Field Arithmetic positive or zero, we continue to the next step If it is negative the remainder

is restored to its previous value as is shown in Algorithm 5.3 below

Algorithm 5.3 The Restoring Division Algorithm

Require: P\n, Ensure: R = P' mod n

In Step 5 of Algorithm 5.3, we check the sign of the remainder; if it is

negative, the previous remainder is taken to be the new remainder, i.e., a

restore operation is performed If the remainder Ri is positive, it remains as

the new remainder, i.e., we do not restore The restoring division algorithm

performs k subtractions in order to reduce the 2/c-bit number t modulo the /c-bit number n Thus, it takes much longer than the standard multiplication algorithm which requires s = k/w inner-product steps, where w is the word-

size of granularity being employed

In the following, we give an example of the restoring division algorithm for computing 3019 mod 53, where 3019 = (101111001011)2 and 53 - (110101)2-The result is 51 = (110011)2

Trang 9

RQ

n

-Ri n/2

R4 n/2 n/2 n/2

Nonrestoring Division Algorithm

The nonrestoring division algorithm allows a negative remainder In order to correct the remainder, a subtraction or an addition is performed during the next cycle, depending on the whether the sign of the remainder is positive

or negative, respectively This is based on the following observation: Suppose

Ri — Ri-\ — n < 0, then the restoring algorithm assigns Ri \= Ri-i and

performs a subtraction with the shifted n, obtaining

Ri^i ==Ri- n/2 = Ri-i - n/2

However, if Ri = Ri-i — n < 0, then one can instead let Ri remain negative

and add the shifted n in the following cycle Thus, one obtains

Ri^i = Ri-^ n/2 ^ {Ri-i - n) 4- n/2 = Ri-i - n/2,

which would be the same value The steps of the nonrestoring algorithm, which implements this observation, are given in Algorithm 5.4

Note that the nonrestoring division algorithm requires a final restoration cycle in which a negative remainder is corrected by adding the last value of n back to it

Trang 10

Algorithm 5.4 The Nonrestoring Division Algorithm

Return(J^fc)

In the following we compute 51 — 3019 mod 53 using the nonrestoring division algorithm Since the remainder is allowed to stay negative, we use 2's complement coding to represent such numbers

R 0 n o o n Final remainder

5.3.4 Interleaving Multiplication and Reduction

The interleaving algorithm has been known The details of the method are

sketched in papers [27, 334] Let Ai and Bi be the bits of the k-hit positive integers A and JB, respectively The product P' can be written as

Trang 11

fc-i fc-i

P' =: A'B^A'Y^ Bi2' = Y^{A • Bi)2'

i=0 i=0

= 2("' 2(2(0 -f A Bk-i) + A Bk-2) -\-' )-{-A - BQ

This formulation yields the shift-add multiphcation algorithm Notice that we

also reduce the partial product modulo n at each step of Algorithm 5.5

Algorithm 5.5 The Interleaving Multiplication Algorithm

5: end for 6: Return(P)

Assuming that A, B^P < n, we have

P :=2P + A' Bj

< 2 ( n - l ) - f ( n - 1 ) = 3 n - 3

Thus, the new P will be in the range 0 < P < 3n — 3, and at most 2

sub-tractions are needed to reduce P to the range 0 < P < n We can use the following algorithm to bring P back to this range:

P ' := P - n ; If P ' > 0 then P = P'

P ' := P - n ; If P ' > 0 then P = P' The computation of P requires k steps, at each step we perform the following

The left shift operation is easily performed by wiring The partial products,

on the other hand, are generated using an array of AND gates The most crucial operations are the addition and subtraction operations: they need to

be performed fast We have the following avenues to explore:

• We can use the carry propagate adder, introducing 0{k) delay per step

However, Omura's method can be used to avoid unnecessary subtractions:

Trang 12

3a P := 2 P 3b If carry-out then P := P -{- m

3c P \= P-\- A' Bj

3d If carry-out then P := P -h m

• We can use the carry save adder, introducing only 0(1) delay per step

However, recall that the sign information is not immediately available in the CSA We need to perform fast sign detection in order to determine whether the partial product needs to be reduced modulo n

5.3.5 Utilization of Carry Save Adders

In order to utilize the carry save adders in performing the modular

multipli-cation operations, we represent the numbers as the carry save pairs (C^S),

where the value of the number is the sum C-f 5 The carry save adder method

of the interleaving algorithm is given in Algorithm 5.6

Algorithm 5.6 The Carry-Save Interleaving Multiphcation Algorithm

7: end if 8: end for 9: Return(C,5)

The function SIGN gives the sign of the carry save number C -\- S', Since

the exact sign is available only when a full addition is performed, we calculate

an estimated sign with the SIGN function A sign estimation algorithm was introduced in [185] Here, we briefly review this algorithm, which is based on

the addition of the most significant t bits of C and S to estimate the sign of

C 4- 5 For example, let C = (011110) and S = (001010), then the function

SIGN produces

Trang 13

C - 0 1 1 1 1 0

S = 001010 {t = 1) SIGN = 0

(t = 2) SIGN = 01 (t = 3) SIGN = 100 (^ = 4) SIGN = 1001 (t = 5) SIGN = 10100 (t = 6) SIGN = 101000

In the worst case the exact sign is produced after adding all k bits If the

exact sign of C + 5 is computed, we can obtain the result of the multiplication operation in the correct range [0, n) If an estimation of the sign is used, then

we will prove that the range of the result becomes [0, n + Zl), where A depends

on the precision of the estimation Furthermore, since the sign is used to decide whether some multiple of n should be subtracted from the partial product,

an error in the decision causes only an error of a multiple of n in the partial

product, which is corrected later We define function T{X) on an n-bit integer

J If T(C) + T(S) > 0 then set C := C and S := 5

In Step J, the computation of the sign bit R of T{C) + T{S) involves n — t most significant bits of C and S The above procedure reduces a carry-sum

pair from the range

If the estimated sign in Step J is positive for all Q iterations, then QN is

subtracted from the initial pair; therefore

Trang 14

Since in Step I we perform (C, S) := C -\- S ~ n and in the last iteration the

carry-sum pair is not reduced (because the estimated sign is negative), we must have

For example, if Q = 3, then k = 2 can be used Instead of subtracting n three times, we first subtract 2N and then n This observation is utilized in

Algorithm 5.7

The parameter t controls the precision of estimation; the accuracy of the

estimation and the total amount of logic required to implement it decreases

as t increases After Step 7 of Algorithm 5.7, we have

CW+^CO < n - h 2 S

which implies that after the next shift-add step the range of C^*"^^^ + S^'^'^^^

will be [0,3N -f 2*+^) Assuming Q = 3, we have

3iV + 2*+^ < (Q + l ) n + 2* = 4iV + 2\

which implies 2* < n, or t < n - 1 The range of C^*"^^^ 4- S^*"*"^^ becomes

0 < C^^+i) -f 5(^+1) < 3A^ 4- 2*-^^ < 3A^ 4- 2^ < 2^-^^

Trang 15

Algorithm 5.7 The Carry-Save Interleaving Multiplication Algorithm

if T{C^'^)+T(S^''>) > O t h e n C(^) :=C(^> and5(^^ := S^'\

e n d if (C(^\5(^>) := C^^^-f-5^^^ - n ;

if T(C'(^))+T(5(^)) > O t h e n C^^) :=C(^) and5(^) :=5(^);

e n d if : e n d for : Return(C(^\5^^^)

_2n+i < _2jv < C^^+i) + 5^^+^^ < n 4- 2^ < 2"^+^

In order to contain the temporary results, we use (n-f-3)-bit carry save adders

which can represent integers in the range [—2"""^^, 2""^^) When t = n — 1, the sign estimation technique checks 5 most significant bits of C^^^ and S^^^

from the bit locations n — 2 to n 4- 3 This algorithm produces a pair of

integers (C, 5) = (C(^),5(^)) such that P = C + 5 is in the range [0,2N)

The final result in the correct range [0, n) can be obtained by computing

P — C -{• S and P = C -{• S — n using carry propagate adders If P < 0,

we have P = P -\- n < n^ and thus P is in the correct range Otherwise, we choose P because 0<P = P — n<2^<n implies P € [0, n) The steps of

the algorithm for computing 47 • 48 (mod 50), are illustrated in the following figure Here we have

in 3k = 18 clock cycles The range of C -f- 5" = 184 - 128 = 56 is [0, 2 • 50)

The final result is found by computing C H- 5 = 56 and C -\- S — n = 6^ and

selecting the latter since it is positive

Tiêu đề	Cryptographic Algorithms on Reconfigurable Hardware - P5 doc
Trường học	University of Science and Technology of Ho Chi Minh City
Chuyên ngành	Cryptography
Thể loại	Thesis
Thành phố	Ho Chi Minh City

Định dạng
Số trang	30
Dung lượng	0,99 MB