5.14 Parallelizing Encryption and Decryption in Arbitrary Modes Breaking Compatibility5.14.1 Problem You are using a cipher mode that is not intrinsically parallelizable, but you have a
Trang 15.14 Parallelizing Encryption and Decryption in Arbitrary Modes (Breaking Compatibility)
5.14.1 Problem
You are using a cipher mode that is not intrinsically
parallelizable, but you have a large data set and want to takeadvantage of multiple processors at your disposal
5.14.2 Solution
Treat the data as multiple streams of interleaved data
5.14.3 Discussion
Parallelizing encryption and decryption does not necessarily result in a speed improvement To provide any chance of a speedup, you will certainly need to ensure that multiple processors are working in parallel Even in such an environment, data sets may be too small to run faster when they are processed in parallel.
Recipe 5.13 demonstrates how to parallelize CTR mode
encryption on a per-block level using a single encryption
context Instead of having spc_pctr_do_even( ) and
spc_pctr_do_odd( ) share a key and nonce, you could use twoseparate encryption contexts In such a case, there is no need
to limit your choice of mode to one that is intrinsically
parallelizable However, note that you won't get the same
results when using two separate contexts as you do when youuse a single context, even if you use the same key and IV or
Trang 2One consideration is how much to interleave There's no need tointerleave on a block level For example, if you are using twoparallel encryption contexts, you could encrypt the first 1,024bytes of data with the first context, then alternate every 1,024bytes
Generally, it is best to use a different key for each context Youcan derive multiple keys from a single base key, as shown in
Recipe 4.11
It's easiest to consider interleaving only at the plaintext level,particularly if you're using a block-based mode, where paddingwill generally be added for each cipher context In such a case,you would send the encrypted data in multiple independent
streams and reassemble it after decryption
5.14.4 See Also
Recipe 4.11, Recipe 5.13
Trang 35.13 Parallelizing Encryption and Decryption in Modes That Allow It (Without Breaking
Compatibility)
5.13.1 Problem
You want to parallelize encryption, decryption, or keystreamgeneration
5.13.2 Solution
Only some cipher modes are naturally parallelizable in a waythat doesn't break compatibility In particular, CTR mode isnaturally parallizable, as are decryption with CBC and CFB.There are two basic strategies: one is to treat the message in
an interleaved fashion, and the other is to break it up into asingle chunk for each parallel process
The first strategy is generally more practical However, it isoften difficult to make either technique result in a speed gainwhen processing messages in software
5.13.3 Discussion
Parallelizing encryption and decryption does not necessarily result in a speed improvement To provide any chance of a speedup, you'll certainly need to ensure that multiple processors are working in parallel Even in such an environment, data sets may be too small to run faster when they are processed in parallel.
Trang 4potential for parallelization For example, with CTR mode, thekeystream is computed in blocks, where each block of
keystream is generated by encrypting a unique plaintext block.Those blocks can be computed in any order
In CBC, CFB, and OFB modes, encryption can't really be
parallelized because the ciphertext for a block is necessary tocreate the ciphertext for the next block; thus, we can't computeciphertext out of order However, for CBC and CFB, when wedecrypt, things are different Because we only need the
ciphertext of a block to decrypt the next block, we can decryptthe next block before we decrypt the first one
There are two reasonable strategies for parallelizing the work.When a message shows up all at once, you might divide it
roughly into equal parts and handle each part separately
Alternatively, you can take an interleaved approach, where
alternating blocks are handled by different threads That is, theactual message is separated into two different plaintexts, asshown in Figure 5-5
Figure 5-5 Encryption through interleaving
If done correctly, both approaches will result in the correct
output We generally prefer the interleaving approach, becauseall threads can do work with just a little bit of data available.This is particularly true in hardware, where buffers are small.With a noninterleaving approach, you must wait at least until
Trang 5known in advance, you must wait for a large percentage of thedata to show up before the second thread can be launched
Even the interleaved approach is a lot easier when the size ofthe message is known in advance because it makes it easier toget the message all in one place If you need the whole
message to come in before you know the length, parallelizationmay not be worthwhile, because in many cases, waiting for anentire message to come in before beginning work can introduceenough latency to thwart the benefits of parallelization
If you aren't generally going to get an entire message all at
once, but you are able to determine the biggest message youmight get, another reasonably easy approach is to allocate aresult buffer big enough to hold the largest possible message
For the sake of simplicity, let's assume that the message arrivesall at once and you might want to process a message with twoparallel threads The following code provides an example APIthat can handle CTR mode encryption and decryption in parallel(remember that encryption and decryption are the same
operation in CTR mode)
Because we assume the message is available up front, all of theinformation we need to operate on a message is passed into thefunction spc_pctr_setup( ), which requires a context object(here, the type is SPC_CTR2_CTX), the key, the key length inbytes, a nonce SPC_BLOCK_SZ - SPC_CTR_BYTES in length, theinput buffer, the length of the message, and the output buffer.This function does not do any of the encryption and decryption,nor does it copy the input buffer anywhere
To process the first block, as well as every second block afterthat, call spc_pctr_do_odd( ), passing in a pointer to the
context object Nothing else is required because the input andoutput buffers used are the ones passed to the
Trang 6spc_memset(ctx->ctr_even + SPC_BLOCK_SZ - SPC_CTR_BYTES, 0, SPC_CTR_BYTES); pctr_increment(ctx->ctr_even);
Trang 7pctr_increment(ctx->ctr_even);
for (j = 0; j < SPC_BLOCK_SZ / sizeof(int); j++)
((int *)ctx->outptr_even)[j] ^= ((int *)ctx->inptr_even)[j];
Trang 94.11 Algorithmically Generating Symmetric Keys from One Base Secret
4.11.1 Problem
term secret (generally a key, but perhaps a password) If a
You want to generate a key to use for a short time from a long-short-term key is compromised, it should be impossible to
recover the base secret Multiple entities in the system should
be able to compute the same derived key if they have the rightbase secret
For example, you might want to have a single long-term keyand use it to create daily encryption keys or session-specifickeys
derived (called a distinguisher) and pass those two items
through a pseudo-random function The PRF acts very much like
a cryptographic one-way hash from a theoretical security point
of view, and indeed, such a one-way hash is often good as aPRF
Trang 10derivation, ranging from the simple to the complex On the
simple side of the spectrum, you can concatenate a base keywith unique data and pass the string through SHA1 On the
complex side is the PBKDF2 function from PKCS #5 (described
in Recipe 4.10)
purpose requirements In particular, there are cases where youone might need a key that is larger than the SHA1 output
generated
Fortunately, it is easy to build an n-bit to m-bit PRF that is
secure for key derivation The big difficulty is often in selectinggood distinguishers (i.e., information that differentiates
parties) Generally, it is okay to send differentiating informationthat one side does not already have and cannot compute in theclear, because if an attacker tampers with the information intraffic, the two sides will not be able to agree on a working key.(Of course, you do need to be prepared to handle such attacks.)Similarly, it is okay to send a salt See the sidebar, DistinguisherSelection for a discussion
Trang 11The basic idea behind a distinguisher is that it must be unique.
If you want to create a particular derived key, we recommend that you string together in a predetermined order any interesting information about that key, separating data items with a unique separation character (i.e., not a character that would be valid in one of the data items) You can use alternate formats, as long as your data representation is unambiguous, in that each possible
distinguisher is generated by a single, unique set of information.
As an example, let's say you want to have a different session key that you
change once a day You could then use the date as a unique distinguisher If you want to change keys every time there's a connection, the date is no longer unique However, you could use the date concatenated with the number of times
a connection has been established on that date The two together constitute a unique value.
There are many potential data items you might want to include in a distinguisher, and they do not have to be unique to be useful, as long as there is a guarantee that the distinguisher itself is unique Here is a list of some common data items you could use:
The encryption algorithm and any parameters for which the derived key will be used
The number of times the base key has been used, either overall or in the context of other interesting data items
A unique identifier corresponding to an entity in the system, such as a username or email address
Trang 12Recipe 6.15 and Recipe 6.16)
More specifically, key HMAC with the base secret Then, for
every block of output you need (where the block size is the size
of the HMAC output), MAC the distinguishers concatenated with
a fixed-size counter at the end The counter should indicate thenumber of blocks of output previously processed The basic idea
is to make sure that each MAC input is unique
If the desired output length is not a multiple of the MAC outputlength, simply generate blocks until you have sufficient bytes,then truncate
The security level of this solution is limited by the minimum of the number of bits of entropy in the base secret and the output size of the MAC For example, if you use a key with 256 bits of entropy, and you use HMAC-SHA1 to produce a 256-bit derived key, never assume that you have more than 160 bits of effective security (that is the output size of HMAC-SHA1).
SHA1, using the OpenSSL API for HMAC (discussed in Recipe6.10):
Trang 13void spc_make_derived_key(unsigned char *base, size_t bl, unsigned char *dist, size_t dl, unsigned char *out, size_t ol) {
Trang 165.9.1 Problem
You want to use counter (CTR) mode and your library doesn'tprovide an interface, or you want to use a more high-level
interface than your library provides Alternatively, you wouldlike a portable CTR interface, or you have only a block cipherimplementation and you would like to use CTR mode
5.9.2 Solution
CTR mode encrypts by generating keystream, then combiningthe keystream with the plaintext via XOR This mode generateskeystream one block at a time by encrypting plaintexts that arethe same, except for an ever-changing counter, as shown in
Figure 5-4 Generally, the counter value starts at zero and isincremented sequentially
Figure 5-4 Counter (CTR) mode
Few libraries provide a CTR implementation, because it has onlyrecently come into favor, despite the fact that it is a very oldmode with great properties We provide code implementing this
Trang 175.9.3 Discussion
You should probably use a higher-level abstraction, such as the one discussed in Recipe 5.16 Use a raw mode only when absolutely necessary, because there is a huge potential for introducing asecurity vulnerability by accident If you still want to use CTR mode, be sure to use a message authentication code with it.
CTR mode is a stream-based mode Encryption occurs by
XOR'ing the keystream bytes with the plaintext bytes The
keystream is generated one block at a time by encrypting a
plaintext block that includes a counter value Given a single key,the counter value must be unique for every encryption
This mode has many benefits over the "standard" modes (e.g.,ECB, CBC, CFB, and OFB) However, we recommend a higher-level mode, one that provides stronger security guarantees
(i.e., message integrity detection), such as CWC or CCM modes.Most high-level modes use CTR mode as a component
In Recipe 5.4, we discuss the advantages and drawbacks of CTRmode and compare it to other popular modes
Like most other modes, CTR mode requires a nonce (often
called an IV in this context) Most modes use the nonce as aninput to encryption, and thus require something the same size
as the algorithm's block length With CTR mode, the input toencryption is generally the concatenation of the nonce and acounter The counter is usually at least 32 bits, depending onthe maximum amount of data you might want to encrypt with asingle {key, nonce} pair We recommend using a good randomvalue for the nonce
In the following sections we present a reasonably optimized
Trang 18cipher interface presented in Recipe 5.5 It also requires the
spc_memset( ) function from Recipe 13.2 By default, we use a
6-byte counter, which leaves room for a nonce of SPC_BLOCK_SZ
- 6 bytes With AES and other ciphers with 128-bit blocks, this
is sufficient space
CTR mode with 64-bit blocks is highly susceptible to birthday attacks unless you use a large random portion to the nonce, which limits the message you can send with a given key In short, don't use CTR mode with 64-bit block ciphers.
unsigned char *spc_ctr_decrypt(unsigned char *key, size_t kl, unsigned char *nonce, unsigned char *in, size_t il)
Trang 19#include <string.h>
unsigned char *spc_ctr_encrypt(unsigned char *key, size_t kl, unsigned char *nonce, unsigned char *in, size_t il) {
Trang 20The function spc_ctr_update( ) has the following signature:
Trang 21int spc_ctr_update(CTR_CTX *ctx, unsigned char *in, size_t il, unsigned char *out);This function has the following arguments:
Trang 22Because this API is developed with PKCS #11 in mind, it's somewhat more low-level than it needs to be, and therefore is a bit difficult to use properly First, you need to be sure the output buffer is big enough to hold the input; otherwise, you will have a buffer overflow Second, you need to make sure the out argument always points to the first unused byte in the output buffer Otherwise, you will keep overwriting the same data every time spc_ctr_update( ) outputs data.
Trang 245.5.1 Problem
You're trying to make one of our implementations for otherblock cipher modes work They all use raw encryption
operations as a foundation, and you would like to understandhow to plug in third-party implementations
5.5.2 Solution
Raw operations on block ciphers consist of three operations:key setup, encryption of a block, and decryption of a block Inother recipes, we provide three macros that you need to
implement to use our code In the discussion for this recipe,we'll look at several desirable bindings for these macros
5.5.3 Discussion
Do not use raw encryption operations in your own designs! Such operations should only be used as a fundamental building block by skilled cryptographers.
Raw block ciphers operate on fixed-size chunks of data That
size is called the block size The input and output are of this
same fixed length A block cipher also requires a key, whichmay be of a different length than the block size Sometimes analgorithm will allow variable-length keys, but the block size isgenerally fixed
Trang 25into a key schedule Basically, the key schedule is just a set of
keys derived from the original key in a cipher-dependent
manner You need to create the key schedule only once; it'sgood for every use of the underlying key because raw
encryption always gives the same result for any {key, input}pair (the same is true for decryption)
Once you have a key schedule, you can generally pass it, alongwith an input block, into the cipher encryption function (or thedecryption function) to get an output block
To keep the example code as simple as possible, we've written
it assuming you are going to want to use one and only one
cipher with it (though it's not so difficult to make the code workwith multiple ciphers)
To get the code in this book working, you need to define severalmacros:
SPC_BLOCK_SZ
Denotes the block size of the cipher in bytes
SPC_KEY_SCHED
This macro must be an alias for the key schedule type thatgoes along with your cipher This value will be library-
specific and can be implemented by typedef instead of
through a macro Note that the key schedule type should be
an array of bytes of some fixed size, so that we can ask forthe size of the key schedule using sizeof(SPC_KEY_SCHED)
SPC_ENCRYPT_INIT(sched, key, keybytes) and
SPC_DECRYPT_INIT(sched, key, keybytes)
Both of these macros take a pointer to a key schedule towrite into, the key used to derive that schedule, and the
Trang 26initializing for decryption are the same operation
SPC_DO_ENCRYPT(sched, in, out) and
SPC_DO_DECRYPT(sched, in, out)
Both of these macros are expected to take a pointer to akey schedule and two pointers to memory corresponding tothe input block and the output block Both blocks are
expected to be of size SPC_BLOCK_SZ
In the following sections, we'll provide some bindings for thesemacros for Brian Gladman's AES implementation and for theOpenSSL API Unfortunately, we cannot use Microsoft's
CryptoAPI because it does not allow for exchanging symmetricencryption keys without encrypting them (see Recipe 5.26 and
Recipe 5.27 to see how to work around this limitation)and thatwould add significant complexity to what we're trying to achievewith this recipe In addition, AES is only available in the NETframework, which severely limits portability across various
Windows versions (The NET framework is available only forWindows XP and Windows NET Server 2003.)
5.5.3.1 Brian Gladman's AES implementation
Brian Gladman has written the fastest freely available AES
implementation to date He has a version in x86 assembly thatworks with Windows and a portable C version that is faster thanthe assembly versions other people offer It's available from hisweb page at http://fp.gladman.plus.com/AES/
To bind his implementation to our macros, do the following:
Trang 27Cipher Header file Key schedule type
Trang 28RC2 openssl/rc2.h RC2_KEY
Table 5-3 provides implementations of the SPC_ENCRYPT_INITmacro for each of the block ciphers listed in Table 5-2
Trang 29AES_set_decrypt_key(key, keybytes * 8, sched)For IDEA:
Trang 3113.2.1 Problem
You want to minimize the exposure of data such as passwordsand cryptographic keys to local attacks
13.2.2 Solution
You can only guarantee that memory is erased if you declare it
to be volatile at the point where you write over it In addition,you must not use an operation such as realloc( ) that maysilently move sensitive data In any event, you might also need
to worry about data being swapped to disk; see Recipe 13.3
13.2.3 Discussion
Securely erasing data from memory is a lot easier in C and C++than it is in languages where all memory is managed behind theprogrammer's back There are still some nonobvious pitfalls,however
One pitfall, particularly in C++, is that some API functions maysilently move data behind the programmer's back, leaving
behind a copy of the data in a different part of memory Themost prominent example in the C realm is realloc( ), whichwill sometimes move a piece of memory, updating the
programmer's pointer Yet the old memory location will
generally still have the unaltered data, up until the point wherethe memory manager reallocates the data and the program
overwrites the value
Trang 33
get_password_from_user_somehow(user_password, sizeof(user_password)); result = !strcmp(user_password, real_password);
Trang 34cdst = (volatile char *)dst;
csrc = (volatile char *)src;
while (len ) cdst[len] = csrc[len];
Trang 35}
volatile void *spc_memmove(volatile void *dst, volatile void *src, size_t len) { size_t i;
Trang 364.10.1 Problem
You do not want passwords to be stored on disk Instead, youwould like to convert a password into a cryptographic key
4.10.2 Solution
Use PBKDF2, the password-based key derivation function 2,specified in PKCS #5.[3]
[3] This standard is available from RSA Security at
http://www.rsasecurity.com/rsalabs/pkcs/pkcs-5/
You can also use this recipe to derive keys from other keys See Recipe 4.1 for considerations; that recipe also discusses considerations for choosing good salt values.
4.10.3 Discussion
Passwords can generally vary in length, whereas symmetrickeys are almost always a fixed size Passwords may be