Q: What is a MAC function?A: A MAC or message authentication code function is a function that accepts a secret key and message and reduces it to a MAC tag.. A: A tag is a short string of
Trang 1Q: What is a MAC function?
A: A MAC or message authentication code function is a function that accepts a secret key
and message and reduces it to a MAC tag
Q: What is a MAC tag?
A: A tag is a short string of bits that is used to prove that the secret key and message were
processed together through the MAC function
Q: What does that mean? What does authentication mean?
A: Being able to prove that the message and secret key were combined to produce the tag
can directly imply one thing: that the holder of the key produced vouches for or simplywishes to convey an unaltered original message A forger not possessing the secret key
should have no significant advantage in producing verifiable MAC tags for messages In
short, the goal of a MAC function is to be able to conclude that if the MAC tag is rect, the message is intact and was not modified during transit Since only a limitednumber of parties (typically only one or two) have the secret key, the ownership of themessage is rather obvious
cor-Q: What standards are there?
A: There are two NIST standards for MAC functions currently worth considering.The
CMAC standard is SP 800-38B and specifies a method of turning a block cipher into aMAC function.The HMAC standard is FIPS-198 and specifies a method of turning ahash function into a MAC An older standard, FIPS-113, specifies CBC-MAC (a pre-cursor to CMAC) using DES, and should be considered insecure
Q: Should I use CMAC or HMAC?
A: Both CMAC and HMAC are secure when keyed and implemented safely CMAC is
typically more efficient for very short messages It is also ideal for instances where acipher is already deployed and space is limited HMAC is more efficient for larger mes-
Frequently Asked Questions
The following Frequently Asked Questions, answered by the authors of this book,
are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts To
have your questions about this chapter answered by the author, browse to
www.syngress.com/solutions and click on the “Ask the Author” form
Trang 2sages, and ideal when a hash is already deployed Of course, you should pick whichevermatches the standard you are trying to adhere to.
Q: What is advantage?
A: We have seen the term advantage several times in our discussion already Essentially, the
advantage of an attacker refers to the probability of forgery gained by a forger throughanalysis of previously authenticated messages In the case of CMAC, for instance, the
advantage is roughly approximate to (mq)2/2126for CMAC-AES—where m is the
number of messages authenticated, and q is the number of AES blocks per message As
the ratio approaches one, the probability of a successful forgery approaches one as well.Advantage is a little different in this context than in the symmetric encryption con-text An advantage of 2–40is not the same as using a 40-bit encryption key An attack onthe MAC must take place online.This means, an attacker has but one chance to guessthe correct MAC tag In the latter context, an attacker can guess encryption keys offlineand does not run the risk of exposure
Q: How do key lengths play into the security of MAC functions?
A: Key lengths matter for MAC functions in much the same way they matter in symmetriccryptography.The longer the key, the longer a brute force key determination will take If
an attacker can guess a message, he can forge messages
Q: How does the length of the MAC tag play into the security of MAC functions?
A: The length of the MAC tag is often variable (at least it is in HMAC and CMAC) andcan limit the security of the MAC function.The shorter the tag, the more likely a forger
is to guess it correctly Unlike hash functions, the birthday paradox attack does not apply.Therefore, short MAC tags are often ideally secure for particular applications
Q: How do I match up key length, MAC tag length, and advantage?
A: Your key length should ideally be as large as possible.There is often little practical value
to using shorter keys For instance, padding an AES-128 key with 88 bits of zeroes,effectively reducing it to a 40-bit key, may seem like it requires fewer resources In fact,
it saves no time or space and weakens the system Ideally, for a MAC tag length of
w-bits, you wish to give your attacker an advantage of no more than 2-w For instance, ifyou are going to send 240blocks of message data with CMAC-AES, the attacker’s advan-tage is no less than 2–46 In this case, a tag longer than 46 bits is actually wasteful as youapproach the 240 thblock of message data On the other hand, if you are sending a trivialamount of message blocks, the advantage is very small and the tag length can be cus-tomized to suit bandwidth needs
Trang 3Q: Why can I not use hash(key || message) as a MAC function?
A: Such a construction is not resistant to offline attacks and is also vulnerable to message
extension attacks Forging messages is trivial with this scheme
Q: What is a replay attack?
A: A replay attack can occur when you break a larger message into smaller independent
pieces (e.g., packets).The attacker exploits the fact that unless you correlate the order ofthe packets, the attacker can change the meaning of the message simply by re-arrangingthe order of the packets While each individual packet may be authenticated, it is notbeing modified.Thus, the attack goes unnoticed
Q: Why do I care?
A: Without replay protection, an attacker can change the meaning of the overall message
Often, this implies the attacker can re-issue statements or commands An attacker could,for instance, re-issue shell commands sent by a remote login shell
Q: How do I defeat replay attacks?
A: The most obvious solution is to have a method of correlating the packets to their overall
(relative) order within the larger stream of packets that make up the message.The mostobvious solutions are timestamp counters and simple incremental counters In bothcases, the counter is included as part of the message authenticated Filtering based onpreviously authenticated counters prevents an attacker from re-issuing an old packet orissuing them out of stream order
Q: How do I deal with packet loss or re-ordering?
A: Occasionally, packet loss and re-ordering are part of the communication medium For
example, UDP is a lossy protocol that tolerates packet loss Even when packets are notlost, they are not guaranteed to arrive in any particular order (this is often a warning thatdoes not arise in most networks) Out of order UDP is fairly rare on non-congestedIPv4 networks.The meaning of the error depends on the context of the application Ifyou are working with UDP (or another lossy medium), packet loss and re-ordering areusually not malicious acts.The best practice is to reject the packet, possibly issue a syn-chronization message, and resume the protocol Note that an attacker may exploit theresynchronization step to have a victim generate authenticated messages On a relativelystable medium such as TCP, packet loss and reordering are usually a sign of maliciousinterference and should be treated as hostile.The usual action here is to drop the con-nection (Commonly, this is argued to be a denial of service (DoS) attack vector
However, anyone with the ability to modify packets between you and another host can
also simply filter all packets anyways.) There is no added threat by taking this precaution.
Trang 4In both cases, whether the error is treated as hostile or benign, the packet should bedropped and not interpreted further up the protocol stack.
Q: What libraries provide MAC functionality?
A: LibTomCrypt provides a highly modular HMAC function for C developers Crypto++provides similar functionality for C++ developers Limited HMAC support is also found
in OpenSSL LibTomCrypt also provides modular support for CMAC At the time ofthis writing, neither Crypto++ or OpenSSL provide support for CMAC By “modular,”
we mean that the HMAC and CMAC implementations are not tied to underlying rithms For instance, the HMAC code in LibTomCrypt can use any hash function thatLibTomCrypt supports without changes to the API.This allows future upgrades to beperformed in a more timely and streamlined fashion
algo-Q: What patents cover MAC functions?
A: Both HMAC and CMAC are patent free and can be used for any purpose Various otherMAC functions such as PMAC are covered by patents but are also not standard
Trang 5Encrypt and Authenticate Modes
Solutions in this chapter:
■ Encrypt and Authenticate Modes
■ Security Goals
■ Standards
■ Design of GCM and CCM Modes
■ Putting It All Together
Chapter 7
Summary
Solutions Fast Track
Frequently Asked Questions
Trang 6In Chapter 6, “Message Authentication Code Algorithms,” we saw how we could use sage authentication code (MAC) functions to ensure the authenticity of messages betweentwo or more parties.The MAC function takes a message and secret key as input and pro-duces a MAC tag as output.This tag, combined with the message, can be verified by anyparty who has the same secret key
mes-We saw how MAC functions are integral to various applications to avoid various attacks.That is, if an attacker can forge messages he could perform tasks we would rather he couldnot We also saw how to secure a message broken into smaller packets for convenience.Finally, our example program combined both encryption and authentication into a frameencoder to provide both privacy and authentication In particular, we use PKCS #5, a keyderivation function to accept a master secret key, and produce a key for encryption andanother key for the MAC function
Would it not be nice, if we had some function F(K, P) that accepts a secret key K andmessage P and returns the pair of (C,T) corresponding to the ciphertext and MAC tag(respectively)? Instead of having to create, or otherwise supply, two secret keys to accomplishboth goals, we could defer that process to some encapsulated standard
Encrypt and Authenticate Modes
This chapter introduces a relatively new set of standards in the cryptographic world known
as encrypt and authenticate modes.These modes of operations encapsulate the tasks ofencryption and authentication into a single process.The user of these modes simply passes asingle key, IV (or nonce), and plaintext.The mode will then produce the ciphertext andMAC tag By combining both tasks into a single step, the entire operation is much easier toimplement
The catalyst for these modes is from two major sources.The first is to extract any formance benefits to be had from combining the modes.The second is to make authentica-tion more attractive to developers who tend to ignore it.You are more likely to find aproduct that encrypts data, than to find one that authenticates data
per-Security Goals
The security goals of encrypt and authenticate modes are to ensure the privacy and ticity of messages Ideally, breaking one should not weaken the other.To achieve these goals,most combined modes require a secret key long enough such that an attacker could notguess it.They also require a unique IV per invocation to ensure replay attacks are not pos-
authen-sible.These unique IVs are often called nonces in this context.The term nonce actually comes
from Nonce, which means to use N once and only once
We will see later in this chapter that we can use the nonce as a packet counter when thesecret key is randomly generated.This allows for ease of integration into existing protocols
Trang 7Even though encrypt and authenticate modes are relatively new, there are still a few good
standards covering their design In May 2004, NIST specified CCM as SP 800-38C, the first
NIST encrypt and authenticate mode Specified as a mode of operation for block ciphers, it
was intended to be used with a NIST block cipher such as AES CCM was selected as the
result of a design contest in which various proposals were sought out Of the more likely
contestants to win were Galois Counter Mode (GCM), EAX mode, and CCM
GCM was designed originally to be put to use in various wireless standards such as802.16 (WiMAX), and later submitted to NIST for the contest GCM is not yet a NIST
standard (it is proposed as SP 800-38D), but as it is used through IEEE wireless standards it is
a good algorithm to know about GCM strives to achieve hardware performance by being
massively parallelizable In software, as we shall see, GCM can achieve high performance
levels with the suitable use of the processor’s cache
Finally, EAX mode was proposed after the submission of CCM mode to address some ofthe shortcomings in the design In particular, EAX mode is more flexible in terms of how it
can be used and strives for higher performance (which turns out to not be true in practice)
EAX mode is actually a properly constructed wrapper around CTR encryption mode and
CMAC authentication mode.This makes the security analysis easier, and the design more
worthy of attention Unfortunately, EAX was not, and is currently not, considered for
stan-dardization Despite this, EAX is still a worthy mode to know about and understand
Design and Implementation
We shall consider the design, implementation, and optimization of three popular algorithms
We will first explore the GCM algorithm, which has already found practical use in the IEEE
802 series of standards.The reader should take particular interest in this design, as it is also
likely to become a NIST standard After GCM, we will explore the design of CCM, the
only NIST standardized mode at the time of this writing CCM is both efficient and secure,
making it a mode worth using and knowing about `
Additional Authentication Data
All three algorithms include an input known as the additional authentication data (AAD, also
known as header data in CCM).This allows the implementer to include data that
accompa-nies the ciphertext, and must be authenticated but does not have to be encrypted; for
example, metadata such as packet counters, timestamps, user and host names, and so on
AAD is unique to these modes and is handled differently in all three In particular, EAXhas the most flexible AAD handling, while GCM and CCM are more restrictive All three
modes accept empty AAD strings, which allows developers to ignore the AAD facilities if
they do not need them
Trang 8Design of GCM
GCM (Galois Counter Mode) is the design of David McGraw and John Viega It is theproduct of universal hashing and CTR mode encryption for security.The original motiva-tion for GCM mode was fast hardware implementation As such, GCM employs the use ofGF(2128) multiplication, which can be efficient in typical FPGA and other hardware imple-mentations
To properly discuss GCM, we have to unravel an implementer’s worst nightmare—bitordering.That is, which bit is the most significant bit, how are they ordered, and so on Itturns out that GCM is not one of the most straightforward designs in this respect Once weget past the Galois field math, the rest of GCM is relatively easy to specify
GCM GF(2) Mathematics
GCM employs multiplications in the field GF(2128)[x]/v(x) to perform a function it calls
GHASH Effectively, GHASH is a form of universal hashing, which we will discuss next.Themultiplication we are performing here is not any different in nature than the multiplicationsused within the AES block cipher.The only differences are the size of the field and the irre-ducible polynomial used
GCM uses a bit ordering that does not seem normal upon first inspection Instead ofstoring the coefficients of the polynomials from the least significant bit upward, they store
them backward For example, from AES we would see that the polynomial p(x) = x7+ x3+
x + 1 would be represented by 0x8B In the GCM notation, the bits are reversed In GCM notation, x7would be 0x01 instead of 0x80, so our polynomial p(x) would be represented as
0xD1 instead In effect, the bytes are in little endian fashion.The bytes themselves arearranged in big endian fashion, which further complicates things.That is, byte number 15 isthe least significant byte, and byte number 0 is the most significant byte
The multiplication routine is then implemented with the following routines:
static void gcm_rightshift(unsigned char *a)
This performs what GCM calls a right shift operation Numerically, it is equivalent to a left
shift (multiplication by 2), but since we order the bits in each byte in the opposite direction,
we use a right shift to perform this We are shifting from byte 15 down to byte 0
static const unsigned char mask[] = {
0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
};
static const unsigned char poly[] = { 0x00, 0xE1 };
Trang 9The mask is a simple way of masking off bits in the byte in reverse order.The poly array
is the least significant byte of the polynomial, the first element is a zero, and the second
ele-ment is the byte of the polynomial In this case, 0xE1 maps to p(x) = x128+ x7+ x2+ x + 1
where the x128term is implicit
void gcm_gf_mult(const unsigned char *a,
const unsigned char *b, unsigned char *c) {
} }
z = V[15] & 0x01;
gcm_rightshift(V);
V[0] ^= poly[z];
} memcpy(c, Z, 16);
}
This routine accomplishes the operation c = ab in the Galois field chosen by GCM It
effectively is the same algorithm we used for the multiplication in AES, except here we are
using an array of bytes to represent the polynomials We use Z to accumulate the product as
we produce it We use V as a copy of a, which we can double and selectively add to Z based
on the bits of b.
This multiplication routine accomplishes numerically what we require, but is horriblyslow Fortunately, there is more than one way to multiply field elements As we shall see during
the implementation phase, a table-based multiplication routine will be far more profitable
The curious reader may wish to examine the GCM source of LibTomCrypt for thevariety of tricks that are optionally used depending on the configuration In addition to the
previous routine, LibTomCrypt provides an alternative to gcm_gf_mult() (see src/encauth/
gcm/gcm_gf_mult.c in LibTomCrypt) that uses a windowed multiplication on whole words
(Darrel Hankerson, Alfred Menezes, Scott Vanstone, “Guide to Elliptic Curve Cryptography,”
p 50, Algorithm 2.36).This becomes important during the setup phase of GCM, even when
we use a table-based multiplication routine for bulk data processing Before we can show you
a table-based multiplication routine, we must show you the constraints on GCM that make
this possible
Trang 10Universal Hashing
Universal hashing is a method of creating a function f(x) such that for distinct values of xand y, the probability of f(x) = f(y) is that of any proper random function.The simplestexample of such a universal hash is the mapping
f(x) = (ax + b mod p) mod n
for random values of a and b and random primes p and n (n < p) Universal MAC functions,
such as those in GCM (and other algorithms such as Daniel Bernstein’s Poly1305) use avariation of this to achieve a secure MAC function
H[i] = (H[i – 1] * K) + M[i]
where the last H[i] value is the tag, K is a unit in a finite field and the secret key, and M[i] is
a block of the message.The multiplication and addition must be performed in a finite field
of considerable size (e.g., 2128units or more) In the case of GCM, we will create the MACfunctionality, called GHASH, with this scheme using our GF(2128) multiplication routine
GCM Definitions
The entire GCM algorithm can be specified by a series of equations First, let us define thevarious symbols we will be using in the equations (Figure 7.1)
■ Let K represent the secret key
■ Let A represent the additional authentication data, there are m blocks of data in A
■ Let P represent the plaintext, there are n blocks of data in P
■ Let C represent the ciphertext
■ Let Y represent the CTR counters
■ Let T represent the MAC tag
■ Let E(K, P) represent the encryption of P with the secret key K and block cipher
E (e.g., E = AES)
■ Let IV represent the IV for the message to be processed
Trang 11Figure 7.1GCM Data Processing
Input
P: Plaintext
K: Secret Key
A: Additional Authentication Data
IV: GCM Initial Vector
2 Y 0 = GHASH(H, {}, IV)
3 Yi = Y i-1+ 1, for i = 1, , n
4 Ci = P i XOR E(K, Y i), for i = 1, , n – 1
5 C n = P n XOR E(K, Y n ), truncated to the length of P n
6 T = GHASH(H, A, C) XOR E(K, Y 0)
7 Return C and T
The first step is to generate the universal MAC key H, which is used solely in the
GHASH function Next, we need an IV for the CTR mode If the user-supplied IV is 96
bits long, we use it directly by padding it with 31 zero bits and 1 one bit Otherwise, we
apply the GHASH function to the IV and use the returned value as the CTR IV
Once we have H and the initial Y 0value, we can encrypt the plaintext.The encryption
is performed in CTR mode using the counter in big endian fashion Oddly enough, the bits
per byte of the counter are treated in the normal ordering.The last block of ciphertext is
not expanded if it does not fill a block with the cipher For instance, if Pnis 32 bits, the
output of E(K,Yn) is truncated to 32 bits, and Cnis the 32-bit XOR of the two values
Finally, the MAC tag is produced by performing the GHASH function on the additionalauthentication data and ciphertext.The output of GHASH is then XORed with the encryp-
tion of the initial Y0value Next, we examine the GHASH function (Figure 7.2)
Trang 12Figure 7.2GCM GHASH Function
Input
H: Secret Parameter (derived from the secret key)
A: Additional Authentication Data (m blocks) C: Ciphertext (also used as an additional input source, n blocks)
The GHASH function compresses the additional authentication data and ciphertext to a
final MAC tag.The multiplication by H is a GF(2128)[x] multiplication as mentioned earlier.
The length encodings are 64-bit big endian strings concatenated to one another.The length
of A stored in the first 8 bytes and the length of C in the last 8.
To demonstrate GCM, we used the implementation of LibTomCrypt.This tion is public domain, freely accessible on the project’s Web site, optimized, and easy tofollow We will omit various administrative portions of the code to reduce the size of thecode listings Readers are strongly encouraged to use the routines found in LibTomCrypt (orsimilar libraries) instead of rolling their own if they can get away with it
implementa-Interface
Our GCM interface has several functions that we will discuss in turn.The high level ofabstraction allows us to use the GCM implementation to the full flexibility warranted by theGCM specification.The functions we will discuss are:
Trang 131 gcm_gf_mult() Generic GF(2128)[x] multiplication
2 gcm_mult_h() Multiplication by H (usually optimized since H is fixed
after setup)
3 gcm_init() Initialize a GCM state
4 gcm_add_iv() Add IV data to the GCM state
5 gcm_add_aad() Add AAD to the GCM state
6 gcm_process() Add plaintext to the GCM state
7 gcm_done() Terminate the GCM state and return the MAC tagThese functions all combine to allow a caller to process a message through the GCMalgorithm For any message, the functions 3 through 7 are meant to be called in that order to
process the message.That is, one must add the IV before the AAD, and the AAD before the
plaintext GCM does not allow for processing the distinct data elements in other orders For
example, you cannot add AAD before the IV.The functions can be called multiple times as
long as the order of the appearance is intact For example, you can call gcm_add_iv() twice
before calling gcm_add_aad() for the first time
All the functions make use of the structure gcm_state, which contains the current
working state of the GCM algorithm It fully determines how the functions should behave,
which allows the functions to be fully thread safe (Figure 7.3)
Figure 7.3GCM State Structure
Y_0[16], /* initial counter */
buf[16]; /* buffer for stuff */
int cipher, /* which cipher */
ivmode, /* Which mode is the IV in? */
mode, /* mode the GCM code is in */
buflen; /* length of data in buf */
ulong64 totlen, /* 64-bit counter used for IV and AAD */
pttotlen; /* 64-bit counter for the PT */
Trang 14Table 7.1gcm_state Members and Their Functions
Member Name Purpose
K Scheduled cipher key, used to encrypt counters
Y CTR mode counter value (incremented as text is processed)
Y_0 The initial counter value used to encrypt the GHASH output
buf Used in various places; for example, holds the encrypted counter
values
cipher ID of which cipher we are using with GCM
ivmode Specifies whether we are working with a short IV It is set to
nonzero if the IV is longer than 12 bytes
mode Current mode GCM is in Can be one of the following:
GCM_MODE_IVGCM_MODE_AADGCM_MODE_TEXT
buflen Current length of data in the buf array.
totlen Total length of the IV and AAD data
pttotlen Total length of the plaintext
PC A 16x256x16 table such that PC[i][j][k] is the kthbyte of H * j * x8i
in GF(2128)[x]
This table is pre-computed by gcm_init() based on the secret H
value to accelerate the multiplication by H required by the GHASH
function
The PC table is an optional table only included if GCM_TABLES was defined at buildtime As we will see shortly, it can greatly speed up the processing of data through GHASH;however, it requires a 64 kilobyte table, which could easily be prohibitive in various
embedded platforms
GCM Generic Multiplication
The following code implements the generic GF(2128)[x] multiplication required by GCM It
is designed to work with any multiplier values and is not optimized to the GHASH usage
pattern of multiplying by a single value (H).
gcm_gf_mult.c:
001 /* this is x*2^128 mod p(x) the results are 16 bytes
002 * each stored in a packed format Since only the
003 * lower 16 bits are not zero'ed I removed the upper 14 bytes */
004 const unsigned char gcm_shift_table[256*2] = {
005 0x00, 0x00, 0x01, 0xc2, 0x03, 0x84, 0x02, 0x46,
006 0x07, 0x08, 0x06, 0xca, 0x04, 0x8c, 0x05, 0x4e,
Trang 15068 0xbc, 0xf8, 0xbd, 0x3a, 0xbf, 0x7c, 0xbe, 0xbe };
This table contains the residue of the value of k * x128mod p(x) for all 256 values of k
Since the value of p(x) is sparse, only the lower two bytes of the residue are nonzero As
such, we can compress the table Every pair of bytes are the lower two bytes of the residue
for the given value of k For instance, gcm_shift_table[3] and gcm_shift_table[4] are the value
of the least significant bytes of 2 * x128mod p(x)
This table is only used if LTC_FAST is defined.This define instructs the tion to use a fast parallel XOR operations on words instead of on the byte level In our case,
implementa-we can exploit it to perform the generic multiplication much faster
089 GCM GF multiplier (internal use only) bitserial
090 @param a First value
091 @param b Second value
092 @param c Destination for a * b
093 */
094 void gcm_gf_mult(const unsigned char *a,
095 const unsigned char *b,
Trang 16(sim-116 #else
117
118 /* map normal numbers to "ieee" way e.g bit reversed */
119 #define M(x) (((x&8)>>3) | ((x&4)>>1) | ((x&2)<<1) | ((x&1)<<3))
rep-platforms, it is a unsigned long.The data type has to overlap perfectly with the unsigned char
data type It is used to allow parallel XOR operations
The BPD macro is the number of bytes per LTC_FAST_TYPE Clearly, this only works
if CHAR_BIT is 8, which is why LTC_FAST is not enabled by default.The WPV macro isthe number of words per 128-bit value plus a word
123 /**
124 GCM GF multiplier (internal use only) word oriented
125 @param a First value
126 @param b Second value
127 @param c Destination for a * b
128 */
129 void gcm_gf_mult(const unsigned char *a,
130 const unsigned char *b,
Trang 17The B array contains the computed values of ka for k=0 15 It allows us to perform a 4x128 multiplication with a table lookup.The tmp array contains the product (before it has
been reduced).The pB array contains the loaded and converted copy of b with the
appro-priate treatment for the GCM order of the bits
140 /* create simple tables */
is loaded in big endian fashion to adhere to the GCM specs.The b value is loaded in the
oppo-site fashion so we can use a more straightforward digit extraction expression
In fact, we could load both as big endian, and merely rewrite the order in which wefetch nibbles to compensate
156 /* now create 2, 4 and 8 */
157 B[M(2)][0] = B[M(1)][0] >> 1;
158 B[M(4)][0] = B[M(1)][0] >> 2;
159 B[M(8)][0] = B[M(1)][0] >> 3;
160 for (i = 1; i < (int)WPV; i++) {
161 B[M(2)][i] =(B[M(1)][i-1] << (BPD-1)) | (B[M(1)][i] >> 1);
162 B[M(4)][i] =(B[M(1)][i-1] << (BPD-2)) | (B[M(1)][i] >> 2);
163 B[M(8)][i] =(B[M(1)][i-1] << (BPD-3)) | (B[M(1)][i] >> 3);
This block of code creates the entries for ax, ax 2 , and ax 3 Note that we do not performany reductions.This is why WPV has an extra word appended to it, since we are dealing
with values that have more than 128 bits in them
166 /* now all values with two bits which are
167 * 3, 5, 6, 9, 10, 12 */
168 for (i = 0; i < (int)WPV; i++) {
169 B[M(3)][i] = B[M(1)][i] ^ B[M(2)][i];
170 B[M(5)][i] = B[M(1)][i] ^ B[M(4)][i];
171 B[M(6)][i] = B[M(2)][i] ^ B[M(4)][i];
172 B[M(9)][i] = B[M(1)][i] ^ B[M(8)][i];
173 B[M(10)][i] = B[M(2)][i] ^ B[M(8)][i];
174 B[M(12)][i] = B[M(8)][i] ^ B[M(4)][i];
Trang 18177 * 7, 11, 13, 14, 15 */
178 B[M(7)][i] = B[M(3)][i] ^ B[M(4)][i];
179 B[M(11)][i] = B[M(3)][i] ^ B[M(8)][i];
180 B[M(13)][i] = B[M(1)][i] ^ B[M(12)][i];
181 B[M(14)][i] = B[M(6)][i] ^ B[M(8)][i];
182 B[M(15)][i] = B[M(7)][i] ^ B[M(8)][i];
These two blocks construct the rest of the entries word per word We first construct thevalues that only have two bits set (3, 5, 6, 9, 10, and 12), and then from those we constructthe values that have three bits set Note the use of the M() macro, which evaluates to a con-stant at compile time
185 zeromem(tmp, sizeof(tmp));
186
187 /* compute product four bits of each word at a time */
188 /* for each nibble */
189 for (i = (BPD/4)-1; i >= 0; i ) {
190 /* for each word */
191 for (j = 0; j < (int)(WPV-1); j++) {
192 /* grab the 4 bits recall the nibbles are
193 backwards so it's a shift by (i^1)*4 */
194 u = (pB[j] >> ((i^1)<<2)) & 15;
Here we are extracting a nibble of b to multiply a by Note the use of (i^1) to extract
the nibbles in reverse order since GCM stores bits in each byte in reverse order
196 /* add offset by the word count the table
197 looked up value to the result */
Trang 19223 /* reduce by taking most significant byte and adding the
224 appropriate two byte sequence 16 bytes down */
(kx16)x j–16 by a table look up and shift.This routine adds the residue of the product from the
high byte to the lower bytes
Each loop of the preceding for loop removes one byte from the product at a time Weperform the shift inline by adding the lookup values to pTmp[i–16] and pTmp[i–15]
100-percent portable It requires a data type that is a multiple of a unsigned char data type in
size, which is not always guaranteed
Now that we have a generic multiplier, we have to implement an optimized multiplier
to be used by GHASH
GCM Optimized Multiplication
The following multiplication routine is optimized solely for performing a multiplication by the
secret H value It takes advantage of the fact we can precompute tables for the multiplication.
gcm_mult_h.c:
001 /**
002 GCM multiply by H
003 @param gcm The GCM state which holds the H value
004 @param I The value to multiply H by
005 */
006 void gcm_mult_h(gcm_state *gcm, unsigned char *I)
007 {
Trang 20010 int x, y;
011 XMEMCPY(T, &gcm->PC[0][I[0]][0], 16);
If GCM_TABLES has been defined, we will use the tables approach.The PC table tains 16 8x128 tables, one for each byte of the input and for each of their respective possiblevalues.The first thing we must do is copy the 0thentry to T (our accumulator).The rest ofthe lookups will be XORed into this value
Here we see the use of LTC_FAST to optimize parallel XOR operations For each byte
of I, the input, we look up the 128-bit value and XOR it against the accumulator Since theentries in the table have already been reduced, our accumulator never grows beyond 128 bits
003 @param gcm The GCM state to initialize
004 @param cipher The index of the cipher to use
005 @param key The secret key
006 @param keylen The length of the secret key
007 @return CRYPT_OK on success
008 */
009 int gcm_init(gcm_state *gcm, int cipher,
010 const unsigned char *key, int keylen)
011 {
Trang 22This block of code initializes the GCM state to the default empty and zero state Afterthis point, we are ready to process IV, AAD, or plaintext (provided GCM_TABLES was notdefined).
061 #ifdef GCM_TABLES
062 /* setup tables */
063
064 /* generate the first table as it has no shifting
065 * (from which we make the other tables) */
072 /* now generate the rest of the tables
073 * based the previous table */