1. Trang chủ
  2. » Công Nghệ Thông Tin

cryptography for developers PHẦN 8 ppt

45 278 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Message Authentication Code Algorithms
Chuyên ngành Cryptography
Thể loại Chương
Năm xuất bản 2006
Định dạng
Số trang 45
Dung lượng 203,46 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Q: What is a MAC function?A: A MAC or message authentication code function is a function that accepts a secret key and message and reduces it to a MAC tag.. A: A tag is a short string of

Trang 1

Q: What is a MAC function?

A: A MAC or message authentication code function is a function that accepts a secret key

and message and reduces it to a MAC tag

Q: What is a MAC tag?

A: A tag is a short string of bits that is used to prove that the secret key and message were

processed together through the MAC function

Q: What does that mean? What does authentication mean?

A: Being able to prove that the message and secret key were combined to produce the tag

can directly imply one thing: that the holder of the key produced vouches for or simplywishes to convey an unaltered original message A forger not possessing the secret key

should have no significant advantage in producing verifiable MAC tags for messages In

short, the goal of a MAC function is to be able to conclude that if the MAC tag is rect, the message is intact and was not modified during transit Since only a limitednumber of parties (typically only one or two) have the secret key, the ownership of themessage is rather obvious

cor-Q: What standards are there?

A: There are two NIST standards for MAC functions currently worth considering.The

CMAC standard is SP 800-38B and specifies a method of turning a block cipher into aMAC function.The HMAC standard is FIPS-198 and specifies a method of turning ahash function into a MAC An older standard, FIPS-113, specifies CBC-MAC (a pre-cursor to CMAC) using DES, and should be considered insecure

Q: Should I use CMAC or HMAC?

A: Both CMAC and HMAC are secure when keyed and implemented safely CMAC is

typically more efficient for very short messages It is also ideal for instances where acipher is already deployed and space is limited HMAC is more efficient for larger mes-

Frequently Asked Questions

The following Frequently Asked Questions, answered by the authors of this book,

are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts To

have your questions about this chapter answered by the author, browse to

www.syngress.com/solutions and click on the “Ask the Author” form

Trang 2

sages, and ideal when a hash is already deployed Of course, you should pick whichevermatches the standard you are trying to adhere to.

Q: What is advantage?

A: We have seen the term advantage several times in our discussion already Essentially, the

advantage of an attacker refers to the probability of forgery gained by a forger throughanalysis of previously authenticated messages In the case of CMAC, for instance, the

advantage is roughly approximate to (mq)2/2126for CMAC-AES—where m is the

number of messages authenticated, and q is the number of AES blocks per message As

the ratio approaches one, the probability of a successful forgery approaches one as well.Advantage is a little different in this context than in the symmetric encryption con-text An advantage of 2–40is not the same as using a 40-bit encryption key An attack onthe MAC must take place online.This means, an attacker has but one chance to guessthe correct MAC tag In the latter context, an attacker can guess encryption keys offlineand does not run the risk of exposure

Q: How do key lengths play into the security of MAC functions?

A: Key lengths matter for MAC functions in much the same way they matter in symmetriccryptography.The longer the key, the longer a brute force key determination will take If

an attacker can guess a message, he can forge messages

Q: How does the length of the MAC tag play into the security of MAC functions?

A: The length of the MAC tag is often variable (at least it is in HMAC and CMAC) andcan limit the security of the MAC function.The shorter the tag, the more likely a forger

is to guess it correctly Unlike hash functions, the birthday paradox attack does not apply.Therefore, short MAC tags are often ideally secure for particular applications

Q: How do I match up key length, MAC tag length, and advantage?

A: Your key length should ideally be as large as possible.There is often little practical value

to using shorter keys For instance, padding an AES-128 key with 88 bits of zeroes,effectively reducing it to a 40-bit key, may seem like it requires fewer resources In fact,

it saves no time or space and weakens the system Ideally, for a MAC tag length of

w-bits, you wish to give your attacker an advantage of no more than 2-w For instance, ifyou are going to send 240blocks of message data with CMAC-AES, the attacker’s advan-tage is no less than 2–46 In this case, a tag longer than 46 bits is actually wasteful as youapproach the 240 thblock of message data On the other hand, if you are sending a trivialamount of message blocks, the advantage is very small and the tag length can be cus-tomized to suit bandwidth needs

Trang 3

Q: Why can I not use hash(key || message) as a MAC function?

A: Such a construction is not resistant to offline attacks and is also vulnerable to message

extension attacks Forging messages is trivial with this scheme

Q: What is a replay attack?

A: A replay attack can occur when you break a larger message into smaller independent

pieces (e.g., packets).The attacker exploits the fact that unless you correlate the order ofthe packets, the attacker can change the meaning of the message simply by re-arrangingthe order of the packets While each individual packet may be authenticated, it is notbeing modified.Thus, the attack goes unnoticed

Q: Why do I care?

A: Without replay protection, an attacker can change the meaning of the overall message

Often, this implies the attacker can re-issue statements or commands An attacker could,for instance, re-issue shell commands sent by a remote login shell

Q: How do I defeat replay attacks?

A: The most obvious solution is to have a method of correlating the packets to their overall

(relative) order within the larger stream of packets that make up the message.The mostobvious solutions are timestamp counters and simple incremental counters In bothcases, the counter is included as part of the message authenticated Filtering based onpreviously authenticated counters prevents an attacker from re-issuing an old packet orissuing them out of stream order

Q: How do I deal with packet loss or re-ordering?

A: Occasionally, packet loss and re-ordering are part of the communication medium For

example, UDP is a lossy protocol that tolerates packet loss Even when packets are notlost, they are not guaranteed to arrive in any particular order (this is often a warning thatdoes not arise in most networks) Out of order UDP is fairly rare on non-congestedIPv4 networks.The meaning of the error depends on the context of the application Ifyou are working with UDP (or another lossy medium), packet loss and re-ordering areusually not malicious acts.The best practice is to reject the packet, possibly issue a syn-chronization message, and resume the protocol Note that an attacker may exploit theresynchronization step to have a victim generate authenticated messages On a relativelystable medium such as TCP, packet loss and reordering are usually a sign of maliciousinterference and should be treated as hostile.The usual action here is to drop the con-nection (Commonly, this is argued to be a denial of service (DoS) attack vector

However, anyone with the ability to modify packets between you and another host can

also simply filter all packets anyways.) There is no added threat by taking this precaution.

Trang 4

In both cases, whether the error is treated as hostile or benign, the packet should bedropped and not interpreted further up the protocol stack.

Q: What libraries provide MAC functionality?

A: LibTomCrypt provides a highly modular HMAC function for C developers Crypto++provides similar functionality for C++ developers Limited HMAC support is also found

in OpenSSL LibTomCrypt also provides modular support for CMAC At the time ofthis writing, neither Crypto++ or OpenSSL provide support for CMAC By “modular,”

we mean that the HMAC and CMAC implementations are not tied to underlying rithms For instance, the HMAC code in LibTomCrypt can use any hash function thatLibTomCrypt supports without changes to the API.This allows future upgrades to beperformed in a more timely and streamlined fashion

algo-Q: What patents cover MAC functions?

A: Both HMAC and CMAC are patent free and can be used for any purpose Various otherMAC functions such as PMAC are covered by patents but are also not standard

Trang 5

Encrypt and Authenticate Modes

Solutions in this chapter:

Encrypt and Authenticate Modes

Security Goals

Standards

Design of GCM and CCM Modes

Putting It All Together

Chapter 7

 Summary

 Solutions Fast Track

 Frequently Asked Questions

Trang 6

In Chapter 6, “Message Authentication Code Algorithms,” we saw how we could use sage authentication code (MAC) functions to ensure the authenticity of messages betweentwo or more parties.The MAC function takes a message and secret key as input and pro-duces a MAC tag as output.This tag, combined with the message, can be verified by anyparty who has the same secret key

mes-We saw how MAC functions are integral to various applications to avoid various attacks.That is, if an attacker can forge messages he could perform tasks we would rather he couldnot We also saw how to secure a message broken into smaller packets for convenience.Finally, our example program combined both encryption and authentication into a frameencoder to provide both privacy and authentication In particular, we use PKCS #5, a keyderivation function to accept a master secret key, and produce a key for encryption andanother key for the MAC function

Would it not be nice, if we had some function F(K, P) that accepts a secret key K andmessage P and returns the pair of (C,T) corresponding to the ciphertext and MAC tag(respectively)? Instead of having to create, or otherwise supply, two secret keys to accomplishboth goals, we could defer that process to some encapsulated standard

Encrypt and Authenticate Modes

This chapter introduces a relatively new set of standards in the cryptographic world known

as encrypt and authenticate modes.These modes of operations encapsulate the tasks ofencryption and authentication into a single process.The user of these modes simply passes asingle key, IV (or nonce), and plaintext.The mode will then produce the ciphertext andMAC tag By combining both tasks into a single step, the entire operation is much easier toimplement

The catalyst for these modes is from two major sources.The first is to extract any formance benefits to be had from combining the modes.The second is to make authentica-tion more attractive to developers who tend to ignore it.You are more likely to find aproduct that encrypts data, than to find one that authenticates data

per-Security Goals

The security goals of encrypt and authenticate modes are to ensure the privacy and ticity of messages Ideally, breaking one should not weaken the other.To achieve these goals,most combined modes require a secret key long enough such that an attacker could notguess it.They also require a unique IV per invocation to ensure replay attacks are not pos-

authen-sible.These unique IVs are often called nonces in this context.The term nonce actually comes

from Nonce, which means to use N once and only once

We will see later in this chapter that we can use the nonce as a packet counter when thesecret key is randomly generated.This allows for ease of integration into existing protocols

Trang 7

Even though encrypt and authenticate modes are relatively new, there are still a few good

standards covering their design In May 2004, NIST specified CCM as SP 800-38C, the first

NIST encrypt and authenticate mode Specified as a mode of operation for block ciphers, it

was intended to be used with a NIST block cipher such as AES CCM was selected as the

result of a design contest in which various proposals were sought out Of the more likely

contestants to win were Galois Counter Mode (GCM), EAX mode, and CCM

GCM was designed originally to be put to use in various wireless standards such as802.16 (WiMAX), and later submitted to NIST for the contest GCM is not yet a NIST

standard (it is proposed as SP 800-38D), but as it is used through IEEE wireless standards it is

a good algorithm to know about GCM strives to achieve hardware performance by being

massively parallelizable In software, as we shall see, GCM can achieve high performance

levels with the suitable use of the processor’s cache

Finally, EAX mode was proposed after the submission of CCM mode to address some ofthe shortcomings in the design In particular, EAX mode is more flexible in terms of how it

can be used and strives for higher performance (which turns out to not be true in practice)

EAX mode is actually a properly constructed wrapper around CTR encryption mode and

CMAC authentication mode.This makes the security analysis easier, and the design more

worthy of attention Unfortunately, EAX was not, and is currently not, considered for

stan-dardization Despite this, EAX is still a worthy mode to know about and understand

Design and Implementation

We shall consider the design, implementation, and optimization of three popular algorithms

We will first explore the GCM algorithm, which has already found practical use in the IEEE

802 series of standards.The reader should take particular interest in this design, as it is also

likely to become a NIST standard After GCM, we will explore the design of CCM, the

only NIST standardized mode at the time of this writing CCM is both efficient and secure,

making it a mode worth using and knowing about `

Additional Authentication Data

All three algorithms include an input known as the additional authentication data (AAD, also

known as header data in CCM).This allows the implementer to include data that

accompa-nies the ciphertext, and must be authenticated but does not have to be encrypted; for

example, metadata such as packet counters, timestamps, user and host names, and so on

AAD is unique to these modes and is handled differently in all three In particular, EAXhas the most flexible AAD handling, while GCM and CCM are more restrictive All three

modes accept empty AAD strings, which allows developers to ignore the AAD facilities if

they do not need them

Trang 8

Design of GCM

GCM (Galois Counter Mode) is the design of David McGraw and John Viega It is theproduct of universal hashing and CTR mode encryption for security.The original motiva-tion for GCM mode was fast hardware implementation As such, GCM employs the use ofGF(2128) multiplication, which can be efficient in typical FPGA and other hardware imple-mentations

To properly discuss GCM, we have to unravel an implementer’s worst nightmare—bitordering.That is, which bit is the most significant bit, how are they ordered, and so on Itturns out that GCM is not one of the most straightforward designs in this respect Once weget past the Galois field math, the rest of GCM is relatively easy to specify

GCM GF(2) Mathematics

GCM employs multiplications in the field GF(2128)[x]/v(x) to perform a function it calls

GHASH Effectively, GHASH is a form of universal hashing, which we will discuss next.Themultiplication we are performing here is not any different in nature than the multiplicationsused within the AES block cipher.The only differences are the size of the field and the irre-ducible polynomial used

GCM uses a bit ordering that does not seem normal upon first inspection Instead ofstoring the coefficients of the polynomials from the least significant bit upward, they store

them backward For example, from AES we would see that the polynomial p(x) = x7+ x3+

x + 1 would be represented by 0x8B In the GCM notation, the bits are reversed In GCM notation, x7would be 0x01 instead of 0x80, so our polynomial p(x) would be represented as

0xD1 instead In effect, the bytes are in little endian fashion.The bytes themselves arearranged in big endian fashion, which further complicates things.That is, byte number 15 isthe least significant byte, and byte number 0 is the most significant byte

The multiplication routine is then implemented with the following routines:

static void gcm_rightshift(unsigned char *a)

This performs what GCM calls a right shift operation Numerically, it is equivalent to a left

shift (multiplication by 2), but since we order the bits in each byte in the opposite direction,

we use a right shift to perform this We are shifting from byte 15 down to byte 0

static const unsigned char mask[] = {

0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01

};

static const unsigned char poly[] = { 0x00, 0xE1 };

Trang 9

The mask is a simple way of masking off bits in the byte in reverse order.The poly array

is the least significant byte of the polynomial, the first element is a zero, and the second

ele-ment is the byte of the polynomial In this case, 0xE1 maps to p(x) = x128+ x7+ x2+ x + 1

where the x128term is implicit

void gcm_gf_mult(const unsigned char *a,

const unsigned char *b, unsigned char *c) {

} }

z = V[15] & 0x01;

gcm_rightshift(V);

V[0] ^= poly[z];

} memcpy(c, Z, 16);

}

This routine accomplishes the operation c = ab in the Galois field chosen by GCM It

effectively is the same algorithm we used for the multiplication in AES, except here we are

using an array of bytes to represent the polynomials We use Z to accumulate the product as

we produce it We use V as a copy of a, which we can double and selectively add to Z based

on the bits of b.

This multiplication routine accomplishes numerically what we require, but is horriblyslow Fortunately, there is more than one way to multiply field elements As we shall see during

the implementation phase, a table-based multiplication routine will be far more profitable

The curious reader may wish to examine the GCM source of LibTomCrypt for thevariety of tricks that are optionally used depending on the configuration In addition to the

previous routine, LibTomCrypt provides an alternative to gcm_gf_mult() (see src/encauth/

gcm/gcm_gf_mult.c in LibTomCrypt) that uses a windowed multiplication on whole words

(Darrel Hankerson, Alfred Menezes, Scott Vanstone, “Guide to Elliptic Curve Cryptography,”

p 50, Algorithm 2.36).This becomes important during the setup phase of GCM, even when

we use a table-based multiplication routine for bulk data processing Before we can show you

a table-based multiplication routine, we must show you the constraints on GCM that make

this possible

Trang 10

Universal Hashing

Universal hashing is a method of creating a function f(x) such that for distinct values of xand y, the probability of f(x) = f(y) is that of any proper random function.The simplestexample of such a universal hash is the mapping

f(x) = (ax + b mod p) mod n

for random values of a and b and random primes p and n (n < p) Universal MAC functions,

such as those in GCM (and other algorithms such as Daniel Bernstein’s Poly1305) use avariation of this to achieve a secure MAC function

H[i] = (H[i – 1] * K) + M[i]

where the last H[i] value is the tag, K is a unit in a finite field and the secret key, and M[i] is

a block of the message.The multiplication and addition must be performed in a finite field

of considerable size (e.g., 2128units or more) In the case of GCM, we will create the MACfunctionality, called GHASH, with this scheme using our GF(2128) multiplication routine

GCM Definitions

The entire GCM algorithm can be specified by a series of equations First, let us define thevarious symbols we will be using in the equations (Figure 7.1)

■ Let K represent the secret key

■ Let A represent the additional authentication data, there are m blocks of data in A

■ Let P represent the plaintext, there are n blocks of data in P

■ Let C represent the ciphertext

■ Let Y represent the CTR counters

■ Let T represent the MAC tag

■ Let E(K, P) represent the encryption of P with the secret key K and block cipher

E (e.g., E = AES)

■ Let IV represent the IV for the message to be processed

Trang 11

Figure 7.1GCM Data Processing

Input

P: Plaintext

K: Secret Key

A: Additional Authentication Data

IV: GCM Initial Vector

2 Y 0 = GHASH(H, {}, IV)

3 Yi = Y i-1+ 1, for i = 1, , n

4 Ci = P i XOR E(K, Y i), for i = 1, , n – 1

5 C n = P n XOR E(K, Y n ), truncated to the length of P n

6 T = GHASH(H, A, C) XOR E(K, Y 0)

7 Return C and T

The first step is to generate the universal MAC key H, which is used solely in the

GHASH function Next, we need an IV for the CTR mode If the user-supplied IV is 96

bits long, we use it directly by padding it with 31 zero bits and 1 one bit Otherwise, we

apply the GHASH function to the IV and use the returned value as the CTR IV

Once we have H and the initial Y 0value, we can encrypt the plaintext.The encryption

is performed in CTR mode using the counter in big endian fashion Oddly enough, the bits

per byte of the counter are treated in the normal ordering.The last block of ciphertext is

not expanded if it does not fill a block with the cipher For instance, if Pnis 32 bits, the

output of E(K,Yn) is truncated to 32 bits, and Cnis the 32-bit XOR of the two values

Finally, the MAC tag is produced by performing the GHASH function on the additionalauthentication data and ciphertext.The output of GHASH is then XORed with the encryp-

tion of the initial Y0value Next, we examine the GHASH function (Figure 7.2)

Trang 12

Figure 7.2GCM GHASH Function

Input

H: Secret Parameter (derived from the secret key)

A: Additional Authentication Data (m blocks) C: Ciphertext (also used as an additional input source, n blocks)

The GHASH function compresses the additional authentication data and ciphertext to a

final MAC tag.The multiplication by H is a GF(2128)[x] multiplication as mentioned earlier.

The length encodings are 64-bit big endian strings concatenated to one another.The length

of A stored in the first 8 bytes and the length of C in the last 8.

To demonstrate GCM, we used the implementation of LibTomCrypt.This tion is public domain, freely accessible on the project’s Web site, optimized, and easy tofollow We will omit various administrative portions of the code to reduce the size of thecode listings Readers are strongly encouraged to use the routines found in LibTomCrypt (orsimilar libraries) instead of rolling their own if they can get away with it

implementa-Interface

Our GCM interface has several functions that we will discuss in turn.The high level ofabstraction allows us to use the GCM implementation to the full flexibility warranted by theGCM specification.The functions we will discuss are:

Trang 13

1 gcm_gf_mult() Generic GF(2128)[x] multiplication

2 gcm_mult_h() Multiplication by H (usually optimized since H is fixed

after setup)

3 gcm_init() Initialize a GCM state

4 gcm_add_iv() Add IV data to the GCM state

5 gcm_add_aad() Add AAD to the GCM state

6 gcm_process() Add plaintext to the GCM state

7 gcm_done() Terminate the GCM state and return the MAC tagThese functions all combine to allow a caller to process a message through the GCMalgorithm For any message, the functions 3 through 7 are meant to be called in that order to

process the message.That is, one must add the IV before the AAD, and the AAD before the

plaintext GCM does not allow for processing the distinct data elements in other orders For

example, you cannot add AAD before the IV.The functions can be called multiple times as

long as the order of the appearance is intact For example, you can call gcm_add_iv() twice

before calling gcm_add_aad() for the first time

All the functions make use of the structure gcm_state, which contains the current

working state of the GCM algorithm It fully determines how the functions should behave,

which allows the functions to be fully thread safe (Figure 7.3)

Figure 7.3GCM State Structure

Y_0[16], /* initial counter */

buf[16]; /* buffer for stuff */

int cipher, /* which cipher */

ivmode, /* Which mode is the IV in? */

mode, /* mode the GCM code is in */

buflen; /* length of data in buf */

ulong64 totlen, /* 64-bit counter used for IV and AAD */

pttotlen; /* 64-bit counter for the PT */

Trang 14

Table 7.1gcm_state Members and Their Functions

Member Name Purpose

K Scheduled cipher key, used to encrypt counters

Y CTR mode counter value (incremented as text is processed)

Y_0 The initial counter value used to encrypt the GHASH output

buf Used in various places; for example, holds the encrypted counter

values

cipher ID of which cipher we are using with GCM

ivmode Specifies whether we are working with a short IV It is set to

nonzero if the IV is longer than 12 bytes

mode Current mode GCM is in Can be one of the following:

GCM_MODE_IVGCM_MODE_AADGCM_MODE_TEXT

buflen Current length of data in the buf array.

totlen Total length of the IV and AAD data

pttotlen Total length of the plaintext

PC A 16x256x16 table such that PC[i][j][k] is the kthbyte of H * j * x8i

in GF(2128)[x]

This table is pre-computed by gcm_init() based on the secret H

value to accelerate the multiplication by H required by the GHASH

function

The PC table is an optional table only included if GCM_TABLES was defined at buildtime As we will see shortly, it can greatly speed up the processing of data through GHASH;however, it requires a 64 kilobyte table, which could easily be prohibitive in various

embedded platforms

GCM Generic Multiplication

The following code implements the generic GF(2128)[x] multiplication required by GCM It

is designed to work with any multiplier values and is not optimized to the GHASH usage

pattern of multiplying by a single value (H).

gcm_gf_mult.c:

001 /* this is x*2^128 mod p(x) the results are 16 bytes

002 * each stored in a packed format Since only the

003 * lower 16 bits are not zero'ed I removed the upper 14 bytes */

004 const unsigned char gcm_shift_table[256*2] = {

005 0x00, 0x00, 0x01, 0xc2, 0x03, 0x84, 0x02, 0x46,

006 0x07, 0x08, 0x06, 0xca, 0x04, 0x8c, 0x05, 0x4e,

Trang 15

068 0xbc, 0xf8, 0xbd, 0x3a, 0xbf, 0x7c, 0xbe, 0xbe };

This table contains the residue of the value of k * x128mod p(x) for all 256 values of k

Since the value of p(x) is sparse, only the lower two bytes of the residue are nonzero As

such, we can compress the table Every pair of bytes are the lower two bytes of the residue

for the given value of k For instance, gcm_shift_table[3] and gcm_shift_table[4] are the value

of the least significant bytes of 2 * x128mod p(x)

This table is only used if LTC_FAST is defined.This define instructs the tion to use a fast parallel XOR operations on words instead of on the byte level In our case,

implementa-we can exploit it to perform the generic multiplication much faster

089 GCM GF multiplier (internal use only) bitserial

090 @param a First value

091 @param b Second value

092 @param c Destination for a * b

093 */

094 void gcm_gf_mult(const unsigned char *a,

095 const unsigned char *b,

Trang 16

(sim-116 #else

117

118 /* map normal numbers to "ieee" way e.g bit reversed */

119 #define M(x) (((x&8)>>3) | ((x&4)>>1) | ((x&2)<<1) | ((x&1)<<3))

rep-platforms, it is a unsigned long.The data type has to overlap perfectly with the unsigned char

data type It is used to allow parallel XOR operations

The BPD macro is the number of bytes per LTC_FAST_TYPE Clearly, this only works

if CHAR_BIT is 8, which is why LTC_FAST is not enabled by default.The WPV macro isthe number of words per 128-bit value plus a word

123 /**

124 GCM GF multiplier (internal use only) word oriented

125 @param a First value

126 @param b Second value

127 @param c Destination for a * b

128 */

129 void gcm_gf_mult(const unsigned char *a,

130 const unsigned char *b,

Trang 17

The B array contains the computed values of ka for k=0 15 It allows us to perform a 4x128 multiplication with a table lookup.The tmp array contains the product (before it has

been reduced).The pB array contains the loaded and converted copy of b with the

appro-priate treatment for the GCM order of the bits

140 /* create simple tables */

is loaded in big endian fashion to adhere to the GCM specs.The b value is loaded in the

oppo-site fashion so we can use a more straightforward digit extraction expression

In fact, we could load both as big endian, and merely rewrite the order in which wefetch nibbles to compensate

156 /* now create 2, 4 and 8 */

157 B[M(2)][0] = B[M(1)][0] >> 1;

158 B[M(4)][0] = B[M(1)][0] >> 2;

159 B[M(8)][0] = B[M(1)][0] >> 3;

160 for (i = 1; i < (int)WPV; i++) {

161 B[M(2)][i] =(B[M(1)][i-1] << (BPD-1)) | (B[M(1)][i] >> 1);

162 B[M(4)][i] =(B[M(1)][i-1] << (BPD-2)) | (B[M(1)][i] >> 2);

163 B[M(8)][i] =(B[M(1)][i-1] << (BPD-3)) | (B[M(1)][i] >> 3);

This block of code creates the entries for ax, ax 2 , and ax 3 Note that we do not performany reductions.This is why WPV has an extra word appended to it, since we are dealing

with values that have more than 128 bits in them

166 /* now all values with two bits which are

167 * 3, 5, 6, 9, 10, 12 */

168 for (i = 0; i < (int)WPV; i++) {

169 B[M(3)][i] = B[M(1)][i] ^ B[M(2)][i];

170 B[M(5)][i] = B[M(1)][i] ^ B[M(4)][i];

171 B[M(6)][i] = B[M(2)][i] ^ B[M(4)][i];

172 B[M(9)][i] = B[M(1)][i] ^ B[M(8)][i];

173 B[M(10)][i] = B[M(2)][i] ^ B[M(8)][i];

174 B[M(12)][i] = B[M(8)][i] ^ B[M(4)][i];

Trang 18

177 * 7, 11, 13, 14, 15 */

178 B[M(7)][i] = B[M(3)][i] ^ B[M(4)][i];

179 B[M(11)][i] = B[M(3)][i] ^ B[M(8)][i];

180 B[M(13)][i] = B[M(1)][i] ^ B[M(12)][i];

181 B[M(14)][i] = B[M(6)][i] ^ B[M(8)][i];

182 B[M(15)][i] = B[M(7)][i] ^ B[M(8)][i];

These two blocks construct the rest of the entries word per word We first construct thevalues that only have two bits set (3, 5, 6, 9, 10, and 12), and then from those we constructthe values that have three bits set Note the use of the M() macro, which evaluates to a con-stant at compile time

185 zeromem(tmp, sizeof(tmp));

186

187 /* compute product four bits of each word at a time */

188 /* for each nibble */

189 for (i = (BPD/4)-1; i >= 0; i ) {

190 /* for each word */

191 for (j = 0; j < (int)(WPV-1); j++) {

192 /* grab the 4 bits recall the nibbles are

193 backwards so it's a shift by (i^1)*4 */

194 u = (pB[j] >> ((i^1)<<2)) & 15;

Here we are extracting a nibble of b to multiply a by Note the use of (i^1) to extract

the nibbles in reverse order since GCM stores bits in each byte in reverse order

196 /* add offset by the word count the table

197 looked up value to the result */

Trang 19

223 /* reduce by taking most significant byte and adding the

224 appropriate two byte sequence 16 bytes down */

(kx16)x j–16 by a table look up and shift.This routine adds the residue of the product from the

high byte to the lower bytes

Each loop of the preceding for loop removes one byte from the product at a time Weperform the shift inline by adding the lookup values to pTmp[i–16] and pTmp[i–15]

100-percent portable It requires a data type that is a multiple of a unsigned char data type in

size, which is not always guaranteed

Now that we have a generic multiplier, we have to implement an optimized multiplier

to be used by GHASH

GCM Optimized Multiplication

The following multiplication routine is optimized solely for performing a multiplication by the

secret H value It takes advantage of the fact we can precompute tables for the multiplication.

gcm_mult_h.c:

001 /**

002 GCM multiply by H

003 @param gcm The GCM state which holds the H value

004 @param I The value to multiply H by

005 */

006 void gcm_mult_h(gcm_state *gcm, unsigned char *I)

007 {

Trang 20

010 int x, y;

011 XMEMCPY(T, &gcm->PC[0][I[0]][0], 16);

If GCM_TABLES has been defined, we will use the tables approach.The PC table tains 16 8x128 tables, one for each byte of the input and for each of their respective possiblevalues.The first thing we must do is copy the 0thentry to T (our accumulator).The rest ofthe lookups will be XORed into this value

Here we see the use of LTC_FAST to optimize parallel XOR operations For each byte

of I, the input, we look up the 128-bit value and XOR it against the accumulator Since theentries in the table have already been reduced, our accumulator never grows beyond 128 bits

003 @param gcm The GCM state to initialize

004 @param cipher The index of the cipher to use

005 @param key The secret key

006 @param keylen The length of the secret key

007 @return CRYPT_OK on success

008 */

009 int gcm_init(gcm_state *gcm, int cipher,

010 const unsigned char *key, int keylen)

011 {

Trang 22

This block of code initializes the GCM state to the default empty and zero state Afterthis point, we are ready to process IV, AAD, or plaintext (provided GCM_TABLES was notdefined).

061 #ifdef GCM_TABLES

062 /* setup tables */

063

064 /* generate the first table as it has no shifting

065 * (from which we make the other tables) */

072 /* now generate the rest of the tables

073 * based the previous table */

Ngày đăng: 12/08/2014, 16:20

TỪ KHÓA LIÊN QUAN