Tài liệu Cryptographic Algorithms on Reconfigurable Hardware- P7 doc

6.1 Field Multiplication 159 Algorithm 6.5 Modular Reduction Using General Irreducible Polynomials Require: The degree m of the irreducible polynomial; the operand C to be reduced; and

Trang 1

6.1 Field Multiplication 159

Algorithm 6.5 Modular Reduction Using General Irreducible Polynomials

Require: The degree m of the irreducible polynomial; the operand C to be reduced;

and k the number of bits that can be reduced at once

Ensure: The field polynomial defined as C = C mod P , with a length of m bits

is computed the amount of shift needed to apply properly the method outlined

in figure 6.7 Then, in each iteration of the loop in lines 3-9, k bits of C are reduced In line 4 the k bits of C to be reduced are obtained This information

is used in line 5 to compute the appropriate scalar S needed to obtain the result of equation (6.23) In fine 6 the S-th entry of the table Paddedtable is left shifted shift positions so that in line 7 the operation C-{-2^^^^^{S-P) can

be finally computed allowing the effective reduction of k bits at once Then, in fine 8 the variable shift is updated in order to continue the reduction process

Algorithm 6.5 performs a total of A^^; = T^^x^l iterations At each

itera-tion of the algorithm the look-up tables Highdivtable and Paddedtable are

accessed once each In line 7, and XOR addition is executed, implying that the complexity cost of the general reduction method discussed in this section

Multiplication by a Primitive Element

Let P(a:;) = po+pia;-f-pia;^-f H-Pm-ia;"^"^ +a;'^ be an m-degree irreducible

polynomial over GF{2) Let also a be a root of p(a;), i.e., p(a) — 0 Then, the

set {1, a, a ^ , , a'^"^} is a basis for ^^(2^^), commonly called the

polyno-mial (canonical) basis of the field [221] An element A G GF{2'^) is expressed

m —1

in this basis as A — ^ aia\ Let A{a) be an arbitrary element of GF{2'^)

i = 0

Trang 2

160 6 Binary Finite Field Arithmetic

Then, the product C — a- A{a) can be expressed as,

C = a ( a o + a i a 4 - +arri_ia'^~^) = aoa + aia^ + H-am-iQ;'^ (6.25)

^

-—e

Fig 6.8 a • A{a) MultipUcation

Using the fact that a is a primitive root of the irreducible polynomial, we can write,

a ^ = po + P i a + + p m - i a ^ " ^ (6.26) Substituting Eq (6.26) into Eq (6.25) we obtain,

C = Co + c i a 4- + C m - i a ^ ~ \ where, CQ — am-iPo and

di — ai-i -f am-iPi, for i — 1 , , m — 1 A realization of the above operation is shown in

Fig 6.8 The main building block is an m-tap LFSR register That

regis-ter is initially loaded with the m coordinates of the field element A, namely, (ao, a i , a 2 , , am — 1) The signals pi represent the coefficients of the irre-

ducible polynomial Notice that whenever a given polynomial coefficient is

on, i.e Pi = 1, then the corresponding branch of the circuit will be a short circuit Otherwise, if Pi = 0 the branch acts as an open circuit After m clock cycles, the new register content will be the value of the field element C

Serial Multiplication

Using the multiplication procedure outlined above, the multiplication of two arbitrary field elements can be accomplished by using a procedure inspired in the well-know Horner's scheme

Let us consider two arbitrary field elements A and B expressed in

polyno-mial basis as,

m —1 m—l

Trang 3

Then, the product oi A • B can be expressed as,

C{a) - A{a)B{a) mod P{a)

= A{a) ( Y^ bia' j mod P{a)

m - l \

Y^ biA{a)a' mod P{a)

s i = 0 /

Therefore,

C{a) = {boAia) + biA{a)a -f b2A{a)a'^ 4 - + bm-iAia)'^-'^) mod P{a)

Algorithm 6.6 shows the standard procedure for computing above equation using Horner's rule

Algorithm 6.6 LSB-First Serial/Parallel Multipher

Require: An irreducible polynomial P{a) of degree ?n, two elements A^ B G Ensure: C{a) = A{a)B{a) mod P{a)

archi-As it was mentioned previously, the signals pi in the first LFSR block represent

the coefficients of the irreducible polynomial, and their values (either ones or zeroes) determine the LFSR structure Furthermore, a gate array is included

in order to compute the multiplication operation as is explained below

Ini-tially the register C is set to zero, whereas the register in the upper part of Fig 6.9 is loaded with the m coefficients of the field element A Thereafter, when the clock signal is applied to the registers, the value of Aa is generated

Then, B coefficients, namely, 6o, ^i, ^2, • • •, ^m-i are serially introduced in that order, thus generating the values biAa\ for z = 0 , 1 , , m — 1, which are ac- cumulated in register C until all the m product coefficients CQ, ci, C2, , Cm-i

are collected

6.1.7 Matrix-Vector Multipliers

The GF(2^) multiplication given by (6.1) can be described in terms of vector operations There are mainly two different approaches based on matrix vector operations to compute a field product:

Trang 4

matrix-162 6 Binary Finite Field Arithmetic

2 The polynomial multiplication and modular reduction parts are performed

in a single step by using the so-called Mastrovito matrix

Let a{x) and b{x) denote two degree m polynomials representing the ments in GF(2"^) Let c{x) = a{x)b{x) mod P{x) denote their field product

ele-The coefficient vectors of these polynomials are given by

a== [ao,ai,- • • , a m - i ] ^

b = [bo.bi, - bm-i]'-^

c = [co,ci,-" ,Cm-i]^

Also, let us define the polynomials

d{x) = a{x)b{x) = do-\- dix H h (i2m-2^^^~^ , d(^\x) = do -f c/ix + • -f- dm-ix'^-'^ , (6.27) d^^^{x) =dm-\- dm-^-lX + • • • 4- d2m-2X'^-^

The coefficient vectors representing these polynomials are

d = [do^di,'" ,C?2m-2]^ , d(^) = [do,dir".dm-if , d^^^ = [dm, dm-\-l, • • • , C?2m-2]^ • The work in [284] reduces the polynomial multiplication d{x) using an (m X m — 1) reduction matrix Q to obtain the field product c{x) as below:

Trang 5

c = d(^) + Q • d^^) (6.28) Mastrovito Multiplier

The so-called Mastrovito matrix is constructed from the coefficients of the first multiplicand and the irreducible polynomial defining the field Then, the polynomial multiplication and modulo reduction steps are performed together using this matrix The papers [351, 128, 401] follow the Mastrovito multiph-cation scheme outHned below

where M is the (m x m) Mastrovito matrix whose entries are the function of

the coefficients of a(x) and P{x) The Mastrovito matrix M is related to the

reduction matrix Q by

M - L + Q U , (6.30) where L and U are the following (m x m) and (m — 1 x m) matrices:

L =

U =

ao

ai (12

1

Q'm- -2 " ' -1 " • Cl2

L b

U b Then, c = d(^) + Q • d(^) = L b + Q U b = ( L + Q - U ) b = M b

The Mastrovito and the reduction matrices are studied thoroughly in [284, 401] for various types of irreducible polynomials In [351] a compre-hensive study of the Mastrovito multiplier for irreducible trinomials was pre-sented Authors in [401] proposed a practical and systematic design approach for a general Mastrovito multiplier In [388] it was shown that non-Mastrovito multipliers using direct modular reduction also provide competitive perfor-mance Moreover, efficient non-Mastrovito multipliers for irreducible trinomi-als were also proposed

Trang 6

6.1.8 Montgomery Multiplier

In this section we explain the Montgomery multiplication method in GF(2"^)

Once again, let P{x) be an irreducible polynomial over GF{2) that defines the

field GF(2^) Rather than computing Eq.(6.1), the Montgomery tion calculates

multiplica-C{x) = A[x)B{x)R-\x) mod P[x) (6.32)

where R{x) is a fixed element and gcd{R{x),P{x)) = 1

Because of Bezout's identity^, one can find two polynomials i?~^(x) and

P {x) such that

R{x)R-\x) + P{x)P'{x) - 1 (6.33) where R~^{x) is the inverse of R[x) modulo P{x) These two polynomi-

als can be calculated with the extended Euclidean algorithm Kog and Acar

[182, 388] selected R{x) — x^ for high performance modular reduction in the

Montgomery multiplication algorithm, which can be given as follows:

Algorithm 6.7 Montgomery Modular Multiplication Algorithm

Require: A{x),B{x),R(x),P'(x) Ensure: C{x) = A{x)B{x)R~^{x) mod P{x)

into our last expression

Trang 7

The degree of C{x) can be verified from Step 3 as follows:

deg[C{x)] < max{deg[T{x)],deg[U{x)] 4- deg[P{x)]} - deg[R{x)]

< max{2m — 2, deg[R{x)] — 1 + m} — deg[R{x)]

< max{2m — 2 — deg[R{x)],m — 1} Then, it can be concluded that deg[C{x)] < m — 1, if deg[R{x)] > m — 1 If

we choose R{x) = x'^, the result C{x) will be of degree m — 1 at most

It can be shown [182] that Algorithm 6.7 has an associated computational cost of 2m^ coefficient multiplications (ANDs) and 2m^ — 3m — 1 coefficient

additions (XORs), whereas the total time complexity is 3TA + (2|'log2m] +

[ l o g 2 ( m - l ) l ) r x

6.1.9 A Comparison of Field Multiplier Designs

Table 6.3 Fastest Reconfigurable

Work KOM variant by [47], implemented by [326]

KOM variant by [85], implemented by [326]

KOM variant by [293], implemented by [326]

KOM [106]

Recursive Classical [106]

KOM [117]

Massey-Omura

[118]

Platform Virtex 2 Virtex 2 Virtex 2 Virtex 2 Virtex 2 Virtex 2 Virtex 2

Field GF(2'^^) GF(2'^^) GF(2^^^)

5409 CLBs

5840 CLBs

1480 CLBs

1582 CLBs

1660 CLBs

36857 LUTs

37877S 523r;S

655778 8OO778

bits

S licesx tim ings

2.445M 2.254M 1.895M 0.429M 0.290M 0.221M 0.0336M (est.)

In this Subsection we compare some of the most representative designs

of GF{2'^) multipliers considering three metrics: speed, compactness and ciency Table 6.3 shows the fastest designs reported to date for GF{2'^) field

effi-multiplication It can be observed that Karatsuba-ofman Multipliers (KOM) are much faster than other schemes such as recursive classical multiplier or Massey-Omura scheme This can be explained from the theoretical point of view from the fact that KOM algorithms enjoy of a sub-quadratic complexity

In Table 6.4 we show a selection of some of the most compact reconfigurable hardware multiplier designs It is noted that this category is dominated by the interleaved and Montgomery multiplier schemes

Trang 8

Table 6.4 Most Compact Reconfigurable Hardware GF(2'^) Multipliers

Work Interleaved [104]

Montgomery [97]

Class.+Montg

[18]

Montgomery 118]

Interleaved [266]

Platform Virtex Virtex Virtex Virtex Virtex

Field GF(2"^^^) GF(2'"^^) GF(2^^") GF(2^^") GF(2'"^")

Cost

359 CLBs

425 CLBs (est)

1049 CLBs

1427 CLBs

420 CLBs (est)

bits Slicesxtiminqs

0.215M ' 0.195M 0.137M 0.0675M 0.042M

We measure efficiency by taking the ratio of number of bits processed over slices multiplied by the time delay achieved by the design, namely,

bits Slices X timings

For instance, consider the KOM variant design proposed by [47] and

imple-mented by [326] As is shown in Table 6.3, working over GF{2^^^), that design

achieved a time delay of just, 12.66778 at a cost of 5307 sHces Therefore its efficiency is calculated as,

Slices X timings 5307 x 12.56?7 2.445M

When comparing the designs featured in Tables 6.3 and 6.4, it is noticed that the most efficient multiplier designs are the Karatsuba-Ofman multipli-ers variants as they were reported in [47, 85, 293] This is a quite remarkable feature, which implies that the Karatsuba-Ofman multipliers represent both, the fastest and the most efficient of all multiplier designs studied in this Chap-ter

6.2 Field Squaring and Field Square Root for Irreducible Trinomials

Let us consider binary extension fields constructed using irreducible trinomials

of the form P(x) = x'^ -{- x'^ -h 1, with m > 2 It is convenient to consider,

without loss of generality, the additional restriction 1 < n < [^J ^

^ It is known that if P{x) = x"^ -\-x'^ -{-1 is irreducible over GF{2), so is P{x) =

^m _^ ajW-n _|_ ^228] Hence, provided that at least one irreducible trinomial of degiee m exists, it is always possible to find another irreducible trinomial such

that its middle coefficient n satisfies the restriction 1 < n < [ y j

Trang 9

6.2 Field Squaring and Field Square Root for Irreducible Trinomials 167 The rest of this Section is organized as follows First, in Subsection 6.2.1,

we give the corresponding formulae needed for computing the field squaring operation when considering arbitrary irreducible trinomials Those equations are then used in Subsection 6.2.2 to find the corresponding ones for the field square root operator

6.2.1 Field Squaring Computation

Let A = X^^^ aix'^ be an arbitrary element of GF{2'^) Then, according to

Eq (6.16) its square, A^, can be represented by the 2m-coefficient vector

A^{x) = [O ttm-i 0 am-2 0 ai 0 ao]

= K m - l ^ m - 2 • • • ^ m - 1 «m i ^ m - 1 ^2 • • • «1 «o] (6-35)

where a[ = 0 for i odd Hence, the upper half of A'^ (i.e., the m most cant bits) in Eq (6.35) is mapped into the first m coordinates by performing

signifi-addition and shift operations only

In order to investigate the exact cost of the field squaring operation, we

categorize all the irreducible trinomials over GF{2) into four different types

For all four types considered and by means of Eqs (6.35) and (6.21), the following explicit formulae for the field squaring operation were found

Type I: Computing C = A"^ mod P{x)y with P{x) = x"^ -f x" 4- 1, m even, n

odd and n < y ,

a± + arn±i i even, z < n or z > 2n, a± + ttm+i -f a^_„^i i even, n < i < 2n,

a ^ ^ i _ i i ± i i o d d , i < n, am-n+i i odd, i > riy

Ci = \

for z = 0,1, • • • , m — 1 It can be verified that Eq (6.36) has an associated

cost of m±E:zl XOR gates and 2T^ delays

Type II: Computing C = ^^ mod P{x), with P{x) = x"^ 4- a:"" 4-1, m even,

n odd and n = ^ ,

(6.37)

for 2 = 0,1, • • • , m — 1 It can be verified that Eq (6.37) has an associated

cost of ^ ^ ^ XOR gates and one Tx delay

Trang 10

Type III: Computing C = A^ mod P{x), with P{x) = x"^ + x ^ -f 1, m, n odd numbers and n < ^^^^,

Ci= {

a± -ha±_^rn^ + a i ^ ( ^ _ ^ ) a± 4- tti , 1

for z = 0 , 1 , • • • , m — 1 It can be verified that Eq (6.38) has an associated

cost of ^ XOR gates and 2Tx delays

Type IV: Computing C = A^ mod P{x), with P{x) = x ^ -f a:^ + 1, m odd

+ ar

i even, z < n, even, n < i < 2n,

even, z > 2n, odd, z < n,

z odd, i > n,

(6.39)

for z = 0,1, • • • , m — 1 It can be verified that Eq (6.39) has an associated

cost of ^+^~-^ XOR gates and one Tx delay

The complexity costs found on Equations (6.36) through (6.39) are in nance with the ones analytically derived in [386, 387]

conso-6.2.2 Field Square R o o t Computation

In the following, we keep the assumption that the middle coefficient n of the generating trinomial P{x) — x'^ -\-x'^ -\-1 satisfies the restriction 1 < n < ^

Clearly, Eqs (6.36)-(6.39) are a consequence of the fact that in binary extension fields, squaring is a linear operation The Hnear nature of binary extension field squaring, allow us to describe this operator in terms of an (m X m)-matrix as,

C = A^:=^MA (6.40)

Furthermore, based on Eq (6.40), it follows that computing the square

root of an arbitrary field element A means finding a field element D ~ yA such that D^ = MD = A Hence,

Trang 11

6.2 Field Squaring and Field Square Root for Irreducible Trinomials 169 Hence, for the trinomial types I, II, III and IV as described above, the

element D = \fA given by Eq (6.41) can be found by the computation of the inverse of the corresponding matrix M Then using \J~A = D = M~^A, we can determine the m coordinates of the field element as described bellow

Type I: Computing D such that D"^ = A mod P{x), with P{x) =: x ^ + a:^ + l,

m even, n odd, and n < y :

di = < (l2i + a(2i-f n) mod m -\-Cl2i-n LtJ < ^ < ^J

^21 + a(2i-fn) mod m n<i < ^ , y(^{2i-\-n) mod m -j < l < TTl

(6.42)

for z :== 0,1, • • • , m — 1 It can be verified that Eq (6.42) has an associated

cost of VQd^ XOR gates and 2T^ delays

Type II: Computing D such that D"^ = A mod P(x), with P{x) = x^4-x"' + l,

m even, n odd and n — ^ :

for z = 0 , 1 , • • • , m — 1 It can be verified that Eq (6.43) has an associated

cost of ^^^^ XOR gates and one Tx delay

Type III: Computing D such that D"^ = A mod P{x), with P{x) = a:"' +

x^^-l, m, n odd numbers and n < ^^^^,

di = <

a2i 0-21 + 0.2i-n

Type IV: Computing D such that D'^ = A mod P[x), with P[x) = x'^ + x'^ +

1, m, odd, n even and [ ^ ^ 1 <n< L ^ ^ J

Trang 12

-170 6 Binary Finite Field Arithmetic

^21 + a2i^{rn-n) + <^2i+(m-2n) + ^2i-\-{m-3n)

0>2i + a2i^{rn-n) + <^2i+(m-2n) + ^ 2 i + ( m - 3 n )

^21 + G^2i+(m-2n) + ^ 2 i + ( m - 3 n ) + «2i-f-(m-4n) 0^2i + ^ 2 i + ( m - 2 n ) + ^ 2 i + ( m - 3 n ) + ^2i+(7Ti-4n)

di—{ +a2i4-(m-5n)

^21 0^2i-m

2-However, taking advantage of the high redundancy of the terms involved in

Eq, (6.45), it can be shown (after a tedious long derivation) that actually

^"^^"•^ XOR gates are sufficient to implement it with a 2Tx gate delays

Table 6.5 Summary of Complexity Results

Type

I

II III

IV

I

II III

XOR gates

{m^n- l)/2

(m 4- 2)/4 (m - l ) / 2 ( m 4 - n - l ) / 2 ( m 4 - n - l ) / 2 (m 4- 2)/4 (m - l ) / 2 ( m 4 - n - l ) / 2

Time delay 2rx

Tx 2rx

To

2Ta

Tx

Tx 2Tx

Table 6.5 summarizes the area and time complexities just derived for the cases considered Furthermore, in Table 6.6 we hst all preferred irreducible

trinomials P(x) = x^-\-x^-\-\ of degree m € [160, 571] with m a prime number

In all the instances considered the computational complexity of computing the square root operator is comparable or better than that of the field squaring

Trang 13

6.2 Field Squaring and Field Square Root for Irreducible Trinomials 171

6.2.3 Illustrative Examples

In order to illustrate the approach just outlined, we include in this Section

several examples using first the artificially small finite field GF{2^^) and then

more realistic fields, in terms of practical cryptographic applications

Example 6.1 Field Square Root Computation over GF{2^^) Let us consider GF{2}^) generated with the irreducible Type III trinomial P(x) = x^^ 4- x^ + 1 As it was discussed before, one can find the square root

of any arbitrary field element A G GF[2^^) by applying Eq (6.41) In order

to follow this approach, based on Eq (6.38), we first determine the matrix M

of Eq (6.40) as shown in Table 6.7 Then, the inverse matrix of M modulus

two, M~^, is obtained as shown in Table 6.8 Afterwards, the polynomial

coefficients, in terms of the coefficients of A^ corresponding to the field square

C =^ A^ and the field square root D — y/~A elements can be found from Eqs

(6.40) and (6.41) as shown in Table 6.9

As predicted by Eq (6.38), field squaring can be computed at a cost of

(m - l ) / 2 = (15 - l ) / 2 = 7 XOR gates and one T^ delay In the same way,

the square root operation can be computed at a cost of ^^~ ^ = ^^ ~^^ = 7 XOR gates with an incurred delay time of one T^, which matches Eq (6.44) prediction It is noticed that in this binary extension field, computing a field square root requires the same computational effort than the one associated to field squaring

Example 6.2 Field Square Root Computation over GF{2^^'^) Let us consider GF(2}^'^) generated using the irreducible Type II trinomial, P{x) = x^^'^-{-x^^ -\-1 Using the same approach as for the precedent example,

Table 6.6 Irreducible Trinomials P{x) = x"

Encoded as m(n), with m, a Prime Number

+ x"" + 1 of Degree m G [160, 571]

m,{n)

167(35) 191(9) 193(15) 199(67) 223(33) 233(74) 239(81) 241(70) 257(41) 263(93) 271(70)

Type III III III III III

IV III

IV III III

IV

m ( n ) 281(93) 313(79) 337(55) 353(69) 359(117) 367(21) 383(135) 401(152) 409(87) 431(120) 433(33)

Type III III III III III III III

IV III

m{n)

439(49) 449(167) 457(61) 463(93) 479(105) 487(127) 503(3) 521(158) 569(77)

type^

III III III III III III III

IV III

Trang 14

we can obtain the square root polynomial coefficients of an arbitrary element

A from the field GF{2^^^) as,

gates with an incurred delay time of one Tx

Example 6.3 Field Square Root Computation over GF(2^^^) Let GF{2'^^^) be a field generated with the Type III irreducible trinomial^, P{x) = x"^^^ -f x'^^ -f 1 The square root of any arbitrary field element A is

Trang 15

Cl2i + ^21+159 + a2i+85 + a22-f 11 ^ < 32, Ci2i + Ci2i-\-159 + <^2i+85 + Cl2i-\-U + <^2i-63 32 < Z < 37,

a2i + a2i+85 + tt2i+ll + a2i-63 37 < 2 < 69, ci2i + a2i+85 + a2i+ii + a2i-63 + a2i-i37 69 < z < 74,

for z = 0,1, • • • , 232 Eq (6.47) can be implemented with an XOR gate cost of

^"^^"•^ = 1 5 3 XOR gates with a 4Tx gate delay, which agrees with the value

predicted by Eq (6.45)

6.3 Multiplicative Inverse

Among customary finite field arithmetic operations, namely, addition, traction, multiplication and inversion of nonzero elements, the computation

sub-of the later is the most time-consuming one Multiplicative inversion

compu-tation of a nonzero element a G GF{2'^) is defined as the process of finding the unique element a~^ G GF{2'^) such that a • a~^ = 1

Several algorithms for computing the multiplicative inverse in GF{2^)

have been proposed in hterature [153, 93, 356, 135, 399, 127, 296, 122] In [135], multiplicative inverse is computed using an improved modification of

Table 6.8 Square Root Matrix M"^ of Eq (6.41)

Tiêu đề	Cryptographic Algorithms on Reconfigurable Hardware
Trường học	University of Science and Technology of China
Chuyên ngành	Cryptographic Algorithms
Thể loại	Thesis
Năm xuất bản	2023
Thành phố	Hefei

Định dạng
Số trang	30
Dung lượng	1,1 MB