Cryptography for internet and database applications developing secret and public key techniques with java

Number BasesIn day-to-day operations, we represent numbers using base 10.. The same confusion can happenwith binary numbers: Is 101 the decimal 101 or the binary 101 4 + 1 = 5?When a num

Trang 1

www.Ebook777.com

Trang 2

Nick Galbreath

Cryptography for Internet and Database

Applications

Developing Secret and Public Key

Techniques with Java™

www.Ebook777.com

Trang 4

Cryptography for Internet and

Database Applications

Trang 5

www.Ebook777.com

Trang 6

Nick Galbreath

Cryptography for Internet and Database

Applications Developing Secret and Public Key

Techniques with Java™

Trang 7

Developmental Editor: Adaobi Obi

Managing Editor: Micheline Frederick

New Media Editor: Brian Snapp

Text Design & Composition: Wiley Composition Services

Designations used by companies to distinguish their products are often claimed as marks In all instances where Wiley Publishing, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration This book is printed on acid-free paper ∞

Published by Wiley Publishing, Inc., Indianapolis, Indiana

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted

in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rose- wood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470 Requests to the Pub- lisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc.,

10475 Crosspointe Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4447, E-mail: permcoordinator@wiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect

to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may

be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with

a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, inci- dental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats Some content that appears

in print may not be available in electronic books.

Library of Congress Cataloging-in-Publication Data:

ISBN: 0-471-21029-3

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

Trang 8

Preface xiii Introduction xv

Contents

v

Trang 9

Booleans and BitFields 21 Chars 22

Conversion of Integral Types to Byte Arrays 24

BigInteger 28

Blowfish 41

RC5 43 Rijndael and the Advanced Encryption Standard (AES) 44 Twofish 46 RC6 47

Cipher Block Chaining Mode (CBC) 54

Trang 10

Hashes 64

Collisions 66 Attacks 67 Algorithms 69

Hashed Message Authentication Codes (HMACs) 74

Summary 76

Public Key Encryption and Major PKCS Categories 91

Factoring 92

Trang 11

Elliptic Curves 106 Underlying Mathematics: Elliptic Curves 106

Other Public Key Cryptographic Systems 112

NTRU 114

Summary 115

Randomness and Security 119

Win32 CryptoAPI CryptGenRandom 135

Java and Random Numbers 144

java.util.random 145 java.security.SecureRandom 146

Trang 12

Developer Issues 148 Reseeding 148

An Entropy Pool Implementation 150

Parameters, Keys, and Certificates 162

MessageDigest 171

MAC 173 SecureRandom 174 Ciphers 177 Additional Cipher-Related Objects 180 Signatures 183 SignedObject 184

Parameters, Keys, and Certificates 185

AlgorithmParameters 186 AlgorithmParameterGenerators 188 Keys 189

General Compression and java.util.zip.Deflate 206

Trang 13

Small Message Encryption 209

Encoding for Customer-Usable Data 213

Selecting a Base Representation 214

Encoding for Machines and Customer-Visible Applications 230

Base 128 and Java Source Encoding 237

Chapter 7 Application and Data Architecture 241

Database Architecture for Encrypted Data 241

Data 245 Passwords 245

Payment, Credit Card, and Other Account Numbers 247

Searching, Indexing, and Constraints 251

Null Values and Database Applications 256

Secure Memory Management in Java 258

Storage 267

Trang 14

Key Access and Distribution 268 Using Keys with Ciphers and MACs 269

Logging 278

Cryptographic Tokens and Applications 282

Expirations and Time Quantization 283

URL Tokens 285

A Simple URL MAC Implementation 287

URL Encryption 296

Decimal and Monetary Computations 300

BigDecimal 301

BigDecimal Alternatives and Wrappers 304

Appendix A Java Cryptography Class Reference 305

Trang 16

I wrote this book for software engineers with little or no exposure to tography Most other books fall into one of two categories, the encyclope-dia and description or the purely API descriptive The goal was try andbridge the two by providing a solid introduction to cryptography whileproviding solid examples and uses In addition, many books focus over-whelmingly on public key techniques In my experience the most commonuses for public key ciphers are handled by third-party applications (VPNs,Emails) or off-the-shelf protocols such as SSL and SSH.

cryp-While there are a number of excellent cryptographic libraries in C andC++, Java serves as the reference since:

■■ It’s very popular for new business and server applications

classes of bugs (stack smashing and buffer overflows)

■■ It provides a standard cryptographic API that, while not perfect or

complete, is about as “universal” as we can get

The cryptographic API for Java is scattered among several packages.Instead of listing classes by packages as is commonly done, Appendix Alists the classes alphabetically I found this to be much more useful thanflipping between different package sections

I’ve tried to limit source code to examples that are relatively simple orthat demonstrate main points or self-contained utilities More complicated(and perhaps more useful) examples were excluded simply because theerror and exception handling for a security application can be quite longand tedious and didn’t really add much value I have always disliked

Preface

xiii

Trang 17

CD-ROMs glued into books and likewise, never found chapters and ters of source code to be very useful Instead, full source code and moreexamples are presented at the companion Web site at www.wiley.com/compbooks.galbreath.

chap-Unfortunately, time and space didn’t permit examination of many ics In particular:

■■ Privacy Issues, including privacy “seals,” Gramm-Leech-Biley Act of

1999 and HIPPA privacy rule

■■ More detailed database tips and techniques

Perhaps these will be detailed in future editions Until then, I hope you findthis book useful

Finally, this book would not have been possible without the many ful conversations and support from Dave Andre, Jeff Bussgang, AnnCalvit, Roy Dixit, Jim Finucane, Bill French, Venkat Gaddipati, NabilHachem, Sam Haradhvala, Adam Hirsch, Steve Morris, Jack Orenstein,Rich O’Neil, and Matt Rowe, and many others from Upromise, SightPathand Open Market Special thanks to my family and friends that had to put

fruit-up with my social hiatus while working on this book Thank you all

Nick Galbreath

Plum Island, MA

June 2002

Trang 18

The goal of the book is to present the systems programmer with a full duction into the science and the application of cryptography using Java.

intro-Unlike many texts, this book does not focus on cryptography for

transmis-sion (allowing two parties to communicate securely) For most

applica-tions, those requirements are handled by the SSL or SSH protocols, or athird-party application Instead this book focuses on cryptography for stor-age, message integrity, and authentication, (the so-called “single-point”techniques) which is more common with custom applications Beside purecryptography, many “auxiliary” techniques are covered that make cryp-tography useful

The first chapter covers basic logical and numeric operations, both at thetheoretical level and with the specifics of the Java numeric model While itmay be a review to many readers, I’ve found that many system program-mers just do not have exposure to these low-level details Either they neverlearned, or more likely, because it just isn’t normally used in day-to-daywork, it’s been forgotten Those readers who are more familiar with C orC++, should find this chapter especially useful for translating code oneway or another from Java

The next two chapters introduce the reader to science, mathematics, andstandards of both secret key and public key cryptography Here you’lllearn why the algorithms work and the various tradeoffs between them.While the mathematics aren’t essential in day-to-day programming tasks,the goal was to teach the reader to be familiar with the terminology that isoften tossed around to be able to make informed decisions

Introduction

xv

Trang 19

Chapter 4 discusses random numbers and random number generators.

While not technically cryptography, random numbers are essential to it.

Those readers developing gaming systems should find this section cially interesting Java is used for examples of algorithms, but it could beeasily converted to any other programming language

espe-Finally, in Chapter 5 we introduce Java’s cryptographic APIs The JavaSDK already comes with some excellent documentation, but I’ve tried topull it together consolidating the JCA and JCE into a coherent format Like-wise, I made Appendix A the Java cryptography reference I wanted whenI’m programming Instead of listing classes by package, I list them allalphabetically You’ll find you’ll do a lot less page flipping this way and get

a much better understanding of the API with this organization

Chapter 6 is on small message encoding, and like Chapter 4, isn’t cally cryptography, but it’s always an issue when using cryptography.Here you’ll learn how to convert binary data (perhaps encrypted) into var-ious ASCII formats This is critical when embedding data in a URL, creatingpasswords and cryptographic tokens

techni-Chapter 7 pulls everything together and discusses many topics: tion and database design, working the passwords and tokens, key man-agement, and logging

applica-Numerous source code examples are used, and the source itself can befound at the companion Web site www.modp.com Many of the examples

in the book will require a little reworking for production use You’ll want

to modify the error handling to suite your needs In some cases, if the code

is especially long and not particularly illuminating, I decided not to list itand instead just refer to the Web site I find page after page of printedsource code in a book to not be particularly useful

Unfortunately, due to time and space requirements a lot of importanttopics are not covered Digital signatures are only mentioned and theydeserve more, but I’ve found for basic application work, they just aren’tthat useful Either, most people won’t use them or there is already a trustrelationship in place eliminating their usefulness, or encryption with a hash

or MAC is preferable And finally, when I do need to use them, the process

is normally handled by a third party application Other topics not giventheir due are the new XML and SAML standards for creating documents

that contain encrypted data, and embedding cryptography within the

data-base (instead of the application), such as the newer Oracle datadata-base can do

A lot more could be said for key management and database design as well.Perhaps future editions will remedy this

Trang 20

Before getting into the details of cryptographic operators, we’ll reviewsome basics of working with bits and number bases For those of you whohave worked on operating systems, low-level protocols, and embeddedsystems, the material in this chapter will probably be a review; for the rest

of you, whose day-to-day interactions with information technology andbasic software development don’t involve working directly with bits, thisinformation will be an important foundation to the information contained

in the rest of the book

General Operations

Most cryptographic functions operate directly on machine representation

of numbers What follows are overviews of how numbers are presented indifferent bases, how computers internally represent numbers as a collec-tion of bits, and how to directly manipulate bits This material is presented

in a computer- and language-generic way; the next section focuses cally on the Java model

specifi-Bits and Bytes

C H A P T E R

1

C H A P T E R

www.Ebook777.com

Trang 21

Number Bases

In day-to-day operations, we represent numbers using base 10 Each digit

is 0 through 9; thus, given a string of decimal digits d n d n-1 d2d1d0, thenumeric value would be:

10n d n+ 10n-1 d n-1+ + 102d2+ 10d1+ d0

This can be generalized to any base x, where there are x different digits

and a number is represented by:

x n d n + x n-1 d n-1+ + x2d2+ xd1+ d0

In computer science, the most important base is base 2, or the binary resentation of the number Here each digit, or bit, can either be 0 or 1 Thedecimal number 30 can be represented in binary as 11110 or 16 + 4 + 2 Alsocommon is hexadecimal, or base 16, where the digits are 0 to 9 and A, B, C,

rep-D, E, and F, representing 10 to 15, respectively The number 30 can now berepresented in hexadecimal as 1E or 16 + 14 The relationship between dig-its in binary, decimal, and hexadecimal is listed in Table 1.1

Table 1.1 Binary, Decimal, and Hexadecimal Representations

BINARY DECIMAL HEXADECIMAL

Trang 22

When you are working with different bases, the base of a number may be ambiguous For instance, is 99 the decimal or the hexadecimal 99 (= 9 × 16+9)? In this case, it’s common to prefix a hexadecimal number with0x or just x (e.g., 99 becomes 0x99 or x99) The same confusion can happenwith binary numbers: Is 101 the decimal 101 or the binary 101 (4 + 1 = 5)?When a number in binary may be confused, it’s customary to add a sub-script 2 at the end of a string of binary digits, for example, 1012.

Any number can be represented in another base b if it is positive;

how-ever, doing the conversion isn’t necessarily easy We’ll discuss purpose base conversion in a later section, but it’s useful to note that conversion between two bases is especially easy if one of the bases is apower of the other For instance, the decimal number 1234 has the canoni-cal representation in base 10 as 1 × 1000 + 2 × 100 + 3 × 10 + 4 However,

general-1234 can also be thought as two “digits” in base 100: (12) and (34), with avalue of 12 × 1000 + 34 × 100 It’s the same number with the same value; thedigits have just been regrouped This property is especially useful for base

2 Given a binary string, it’s possible to convert to hexadecimal by ing 4 bits and computing the hexadecimal value:

group-10011100 = 1001 1100 = 9C

Bits and Bytes

A bit is the smallest unit of information that a computer can work on and

can take two values “1” and “0,” although sometimes depending on text the values of “true” and “false” are used On most modern computers,

con-we do not work directly on bits but on a collection of bits, sometimes called

a word, the smallest of which is a byte Today, a byte by default means 8 bits,

but technically it can range from 4 to 10 bits The odd values are from eitherold or experimental CPU architectures that aren’t really in use anymore To

be precise, many standards use octet to mean 8 bits, but we’ll use the more common byte Modern CPUs operate on much larger word sizes: The term

32-bit microprocessor means the CPU operates primarily on 32-bit words in

one clock cycle It can, of course, operate on 8-bit words, but it doesn’tmean it happens any faster Many CPUs also have special instructions thatsometimes can operate on larger words, such as the SSE and similarinstructions for multimedia, as well as vector processing, such as on thePowerPC

A byte has a natural numeric interpretation as an integer from 0 to 255

using base 2, as described earlier Bit n represents the value 2n, and the

value of the byte becomes the sum of these bits The bits are laid out exactly

as expected for a numeric representation, with bit 0 on the right and bit 7 onthe left, but the layout is backward when compared to Western languages

Trang 23

(b7b6b5b4b3b2b1b0) = 27b7+ 26b6+ 25b5+ 24b4+ 23b3+ 22b2+ 21b1+ 20b0

or using decimal notation

(b7b6b5b4b3b2b1b0) = 128b7+ 64b6+ 32b5+ 16b4+ 8b3+ 4b2+ 2b1+ b0

For example, 00110111 = 32 + 16 + 4 + 2 + 1 = 55

Bits on the left are referred to as the most-significant bits, since they

contribute the most to the overall value of the number Likewise, the

right-most bits are called the least-significant bits This layout is also known as

Big-Endian, which we’ll discuss later.

Signed Bytes

Negative numbers can be represented in a few ways The simplest is toreverse one bit to represent the sign of the number, either positive or nega-tive Bits 0 through 6 would represent the number, and bit 7 would repre-sent the sign Although this allows a range from –127 to 127, it has the quirk

of two zeros: a “positive” zero and a “negative” zero Having the two zeros

is odd, but it can be worked around The bigger problem is when an flow occurs—for instance, adding 127 + 2 is 129 in unsigned arithmetic, or

over-1000001 However, in signed arithmetic, the value is –1

The most common representation is known as two’s complement Given x,

its negative is represented by flipping all the bits (turning 1s into 0s and

vice versa) and adding 1, or computing –1 –x (the same value) For example,

note in Table 1.2 that adding 1 to 127 makes the value –128 While thismethod is a bit odd, there are many benefits Microprocessors can encodejust an addition circuit and a complementation circuit to do both additionand subtraction (in fact, many CPUs carry around the complement withthe original value just in case subtraction comes up) The other main bene-fit is when casting occurs, or converting a byte into a larger word, asdescribed in the following section

Bitwise Operators

The usual arithmetic functions such as addition and multiplication pret words as numbers and perform the appropriate operations However,other operations work directly on the bits without regard to their represen-

inter-tation as numbers These are bitwise or logical operations While examples

shown in the next sections use 8-bit bytes, they naturally extend to anyword size

Trang 24

Table 1.2 Two’s Complement Representation

UNSIGNED SIGNED HEXADECIMAL BINARY

VALUE VALUE REPRESENTATION REPRESENTATION

sym-of XOR, since the caret symbol (^) is used to denote exponents in some systems

Table 1.3 Bitwise Operations and Notation

C-STYLE C-STYLE OPERATION NOTATION ASSIGNMENT TYPOGRAPHICAL

Trang 25

Table 1.3 Bitwise Operations and Notation (Continued)

C-STYLE C-STYLE OPERATION NOTATION ASSIGNMENT TYPOGRAPHICAL

SELF-Shift a right by a >> n a >>=n None; all shifts and

significant bits of a >>> shift

Complementation or Bitwise NOT

The simplest bit operation is complementation or the bitwise NOT This

simply flips bits within a word, where 0s become 1s and 1s become 0s—forexample ~11001 = 00110 Cryptographically, this operation is not usedmuch, primarily for implementing basic arithmetic in hardware

Bitwise AND

AND is useful in bit testing and bit extraction It’s based on the usual truthtable for logical AND, as shown in Table 1.4 You can remember the truthtable by observing that it’s the same as binary multiplication—anythingtimes zero is zero

Table 1.4 Bitwise AND

AND

Trang 26

To test if a bit is set, create a mask with 1s in positions you wish to testand perform the logical AND operation If the result is nonzero, the bit isset; otherwise, the bit is not:

01011101

AND 00001000

00001000 // not == 0, bit 4 is set

It’s also useful for bit clearing A mask is created with 1s in positions topreserve and 0s where you want to clear For instance, to clear the fourleast-significant bits in a byte, use the mask 11110000:

Bitwise OR extends the logical OR to a series of bits In English, or means

one or the other but not both The bitwise version means one or the other

or both (the same as the logical OR) See Table 1.5

Logical OR is useful in joining nonoverlapping words The followingexample joins two 4-bit words together:

Trang 27

Table 1.6 Bitwise XOR

Bitwise Exclusive OR (XOR)

Exclusive OR is abbreviated as XOR, and its operation is denoted by ⊕ It

is less common in normal programming operations but very common incryptographic applications Table 1.6 shows the Bitwise XOR table Thistable is easy to remember, since it is the same as addition but without anycarry (this is also known as addition modulo 2) It’s also equivalent to the

English usage of or—one or the other but not both.

XOR tests to see if two bits are different: If the bits are both 0s or both 1s,the result is 0; if they are different, the result is 1 XOR is useful in crypto-graphic applications, since unlike AND and OR, it’s an invertible opera-tion, so that A ^ B ^ B = A:

As its name implies, the left-shift operator shifts the bits within a word to

the left by a certain number of positions Left-shifts are denoted by a << b, where a is the value and b denotes the number of positions to shift left.

Zeros are filled in on the least-significant positions:

Trang 28

When you are working with large word sizes (such as 32-bit integers),left-shifts are used to compute large powers of 2, since 2n = 1 << n A trick

to test to see if only one bit is set (or rather the value is a power of 2) is using

x & -x == x In many cryptographic operations, we need to extract a tain number of the least-significant bits from a word Normally you areworking with word sizes much bigger than a byte However, say you

cer-needed to extract the six least-significant bits from a byte B You could form B & 0x3f (B & 0011111) Computing this for larger word sizes is a bit

per-clumsy, and worse, some algorithms may extract a variable number of bits,

so the mask has to be dynamic Hardwiring a hexadecimal mask is alsoprone to errors, and someone reading your code would have to think abouthow many bits you are extracting How many bits are in 0x7fffff? Theanswer is 23, but it’s not automatically clear, and staring at a computer mon-itor all day where the text is in a small font makes the task harder Instead,

a left-shift can be used, since 2n – 1 or (1 << n) – 1 has binary representation

of with (n – 1) digits of “1.” Instead of 0x7fffff, we can use ((1 << 24) – 1), which

will be perfectly clear to someone familiar with bit operations Even better,you can make your intentions clearer by making the mask a constant:

public static final int mask23lsb = (1<<24) -1; // 23-bit mask

Right-Shift

Not surprisingly, right-shifts, denoted by >>, are the opposite of left-shifts,and positions are shifted to the right However, there is another significantdifference In left-shift notation, zeros are placed in the least-significantposition With right-shift, the value of the most-significant bit is used to fill in the shifted positions For example, 1000000 >> 2 = 1100000, but

0100000 >> 2 = 00100000 This is designed so that the sign is preserved andthe right-shift is equivalent to division by 2 (dropping any fractional part),regardless of the sign For example, we’ll “undo” the previous left-shiftexamples:

Trang 29

crypto-you’ll want to use the unsigned right-shift, denoted by >>> (three less-than

symbols) This just shifts the bits to the right and fills in zeros for the mostsignificant bits All cryptographic papers assume the shift is unsigned andnormally use the plain >> When coding an algorithm using signed types,make sure you convert >> to >>>

Special Operations and Abbreviations

The previous operations, while useful, are fairly basic and typically directlyimplemented in hardware By combining and composing them, you cancreate new higher-level bit operations For cryptography, the most usefuloperations are rotations, concatenation, and extraction of most- and least-significant bits

Rotations

Bit rotations are fairly self-explanatory: Bits are shifted either left or right,and the bits that “fall off the edge” roll over onto the other side These arecommonly used in cryptography, since this method provides an invertibleoperation that can aid in scrambling bits The problem is that they aren’tused for much else, so many CPUs do not have rotation instructions, or ifthey do, they are frequently slow Even if they have a fast rotation, the onlyway to access them is by writing assembly language, since rotations do nothave a standard operator like the shifts do Common programming rota-tions can be accomplished with a combination of shifts and logical opera-tors, which is the slowest method of all In its most general case, a rotationwill take one subtraction, two shifts, and a logical OR operation Remember

to use the unsigned right-shift operator:

(x >>> n) | (x << 32-n); // rotate right n positions, x 32-bit int (x << n) | (x >>> 32-n); // rotate left n position, x 32-bit int

There is no special typographical symbol for rotations; we normally usethe abbreviations ROTR or ROTL, although certain technical papers maydefine their own symbols in their work

Bit Concatenation

Bit concatenation is the process of simply joining together two sets of bits

into one word If b is n-bits long, then a || b is a << n | a:

a = 101

b = 0011

Trang 30

MSB and LSB operations

Two other common operations are extracting a certain number of significant bits (MSB) or least-significant bits (LSB) We’ll use the notationMSBn (a) to denote extracting the n most-significant bits Likewise with

mask, and in the MSB case, by shifting appropriately:

MSB3(10111111) = (10111111 & 11100000) >>> 5 = 101

LSB2(11111101) = 11111111 & 00000011 = 00000001

Packed Words

Shifts are useful in creating packed words, where you treat the word as an

array of bits rather than as a number, and where the bits represent anythingyou like When doing this, you should use an unsigned numeric type.Unfortunately, Java and many scripting languages do not have unsignedtypes, so you must be extremely careful As an example, suppose you havetwo variables that are numbers between 0 and 15 When transmitting orstoring them, you might want to use the most space-efficient representa-tion To do this, you represent them not as 8-bit numbers from 0 to 255, but

as two numbers each with 4 bits For this example, we’ll use pseudocodeand use the type int to represent an unsigned 8-bit value:

int a = 5; // 00000101

int b = 13; // 00001101

int packed = a << 4 | b; // = 00000101 << 4 | 00001101

// = 01010000 | 00001101 // = 01011101

To undo the operation:

b = packed & 0x0F; // 01011101 & 00001111 = 00001101

a = packed >>> 4; // note unsigned right-shift

Trang 31

and so on If you were using an array, you could do this dynamically in aloop as well

for (int i = 0; i < 8; ++i)

b[i] = (c >>> i) & mask;

It’s quite easy to make mistakes, especially when the packed word has acomplicated structure If you are writing code and the answers aren’t whatyou’d expect:

■■ Check to make sure you are using the unsigned right-shift operator

>>> instead of the signed version >>

■■ Check that the language isn’t doing some type of automatic typeconversion of signed values (e.g., turning bytes into integers beforethe shift operation happens)

■■ Check to make sure your masks are correct

■■ If you are using dynamic shifts, make sure the shift amount isn’tlarger than the word size For instance, if you are shifting a byte,make sure you aren’t doing >> 9 How this is interpreted depends

on the environment

Integers and Endian Notation

Endian refers to how a string of bytes should be interpreted as an integer,

and this notation comes is two flavors: Little-Endian and Big-Endian The names come from Jonathan Swift’s Gulliver’s Travels, where the tiny people,

Lilliputians, were divided into two camps based on how they ate eggs TheLittle-Endians opened the eggs from the small end, and the Big-Endiansopened theirs from the big end As you might expect with that reference,there are pros and cons to Big-Endian and Little-Endian representations,but overall it doesn’t make any difference except when you have to convertbetween the two formats A comparison of the two is shown in Table 1.7

Table 1.7 Comparison of Little-Endian versus Big-Endian Representation

ENDIAN TYPE B 0 B 1 B 2 B 3 = 0XAABBCCDD SAMPLE

MICROPROCESSORS

Little-Endian aa bb cc dd Intel x86, Digital (VAX,

Alpha) Big-Endian dd cc bb aa Sun, HP, IBM RS6000, SGI,

“Java”

Trang 32

Big-Endian, also know as most-significant byte order (MSB) or networkorder, puts the most-significant or highest byte value first This is equiva-lent to how we write decimal numbers: left to right The downside is thatmany numerical algorithms have to work backward starting at the end ofthe array and working forward (just as you would manually with penciland paper).

The Little-Endian, or least significant byte (LSB), format is the opposite.This makes it harder for humans to read hex dumps, but numeric algorithmsare a little easier to implement Adding capacity (or widening conversions)

is also easier, since you just add bytes to the end of the array of bytes (i.e.,0xff becomes 0xff00)

Fortunately, regardless of what the byte Endian order is, the bits within

bytes are always in Big-Endian format For instance, 1 is always stored in abyte as 000000012no matter what platform

The Endian issue becomes critical when you are working with neous systems—that is, systems that use different Endian models Whenshipping bytes between these machines, you must use a standard Endianformat or an ASCII format In many other programming languages, youmust determine in advance what the Endian architecture is and adjust sub-sequent bit operations appropriately For cryptographic applications thisbecomes critical, since you are often manipulating bits directly

heteroge-With Java, the underlying architecture is hidden and you program using

a Big-Endian format The Java virtual machine does any Endian conversionneeded behind the scenes

For C and C++ programmers, normally a BIG_ENDIAN or LITTLE_ENDIAN macro is defined by the compiler or from an include file If not,you can use code similar to this for testing It sets raw memory and thenconverts it to an integer type The value will be different depending on theCPU’s Endian type This C code assumes an int is a standard 4 bytes or

32 bits, but you may wish to generalize:

int isBigEndian() {

static unsigned char test[sizeof(unsigned int)] = {0, 0, 0, 1};

unsigned int i = *(unsigned int) test;

if (i == 0x0000001) return 1; // true, big-endian

return 0; // false, little-endian

}

Java Numerics

We’ll now apply the information in the previous section to the Javanumeric model While notation is similar to the C or C++ model, there are

Trang 33

Java-specific issues with using signed and unsigned types—specifically,byte arrays Java also provides class wrappers around the native types, aswell as an unlimited-capacity integer arithmetic.

Basic Types

Java provides the standard basic numeric types—integers with 8, 16, 32,and 64 bits, and floating-point types with 32- and 64-bit representations.Unlike C and other languages, all the types are signed—there are no nativeunsigned types Table 1.8 shows Java’s primitive numeric types

For integer types, a literal integer can be expressed in decimal or decimal formats (using lower- or uppercase letters for hex digits):

hexa-int i1 = 1000;

int i2 = 0x3e8; // == 1000 decimal

int i3 = 0x3E8; // same

For literal long values, a prefix of L or l should always be used, even if itappears unnecessary:

long l1 = 1000; // compiles but is not recommended

long l1 = 1000L; // recommended

long l2 = 0xfffffffff; // won’t compile, error

long l2 = 0xfffffffffL; // correct

Table 1.8 Primitive Numeric Types in Java

NAME TYPE LOGICAL SIZE RANGE

byte signed integer 8 bits –128 to 127

short signed integer 16 bits –32768 to 32767

int signed integer 32 bits –2,147,483,648 to

2,147,483,647 (2.1 billion) long signed integer 64 bits –9,223,372,036,854,775,808

to 9,223,372,036,854,775,807 (± 9.2 × 10 18 )

float ANSI/IEEE 754 32 bits ±1.4 × 10 -45 to

Trang 34

Specifying literal values for bytes can be tricky because a byte is signedfrom –128 to 127, while very often you’ll be using constants specified from

0 to 255 If the value is within –128 to 127, the code will compile If the value

is from 128 to 255, you can either convert it to its negative equivalent or use

a cast operation The same principles apply to the short type:

Floating-type literals are assumed to be a double type unless suffixed

by an f for float Floating types can also be representing using scientificnotation using valEscale = val × 10scale

float f1 = 0.1; // compiler error

float f2 = 0.1f; // by default “0.1” is a double type

double d1 = 0.1;

double d2 = 1.0E2 // == 100

In practice, the short type is rarely used because it almost always isconverted into an int type before anything else is done The floattype also isn’t used because it doesn’t provide enough precision for mostapplications

Type Conversion

Conversions between types can either be widening or narrowing A

widen-ing conversion is where the new type can represent larger numbers and can

“contain” the old type For an integer-to-integer type conversion (i.e.,

container Integer-to-floating-point conversions are also considered aswidening, but some of the least-significant digits may be scrambled orzeroed due to how floating point numbers are presented These types ofconversion happen automatically and silently either at compile time or atrun time For example:

int i1 = 123;

long l1 = i; // ok l = 123

float f1 = i; // ok f = 123

int i2 = 123456789;

Trang 35

Narrowing conversions may result in a loss of magnitude and precision.Any conversion from floating-point to an integer type is considered nar-rowing and is clearly a larger integer type to a smaller integer type Java

will not automatically do narrowing conversions, and a compiler error will

be issued if the compiler detects such a conversion To do these sions, you must explicitly declare them with the cast operator

conver-Converting floating point to integers drops or truncates any digits to theright of the decimal point:

int i5 = 0xffL; // compiler error; long to int conversion

For narrowing conversions from one integer type to another, the significant bits (or bytes) form the larger type:

least-int i1 = 0xfffff01;

byte b1 = (byte) b; // b = 0x01;

These rules are summarized in Table 1.9, with N representing a ing conversion, W for a widening conversion, and W* for a widening

narrow-conversion that may result in a loss of precision

Table 1.9 Primitive Type Conversion

BYTE SHORT INT LONG FLOAT DOUBLE

Trang 36

Unsigned to Signed Conversions

There are no unsigned types in Java; however, you can still use the signed

types as an unsigned container Even though a byte is signed, it still has

8 bits and can represent an integer from 0 to 255 even if Java thinks wise To use the unsigned value, it must be converted into a larger typeusing an AND mask

other-To convert a unsigned byte:

byte c = (byte) 254; // b = -2 in Java.

short c = (short)(x & 0xff); // c = 254

int c = x & 0xff // c = 254

long c = x & 0xffL; // c = 254, must put L at end of 0xffL

Unsigned short values are converted by the following:

byte b = (byte)0xff;

long b = (b & 0xff) << 56; // Wrong b = 0

long b = (long)((int)(b & 0xff) << 56); // same as previous

long b = (b & 0xffL) << 56; // Correct b = 0xff00000000000000

Overflow

When a computation exceeds an integer type’s bounds, the value is “rolled

over” silently; no exception is thrown:

byte b = (byte) 127; // b = 0111111

b++; // b = 1000000, but b has the value of -128

While the silence may seem disturbing, in practice overflow is rarely aproblem For floating-point computations overflow is still silent, but theresult is not rolled over Instead, the type takes on a special Infinity

Trang 37

value that can be checked using normal comparison operators or by usingthe Double.isInfinite method Also note that Java does not think 0.0

is the same as 0 (zero) Instead, 0.0 is treated as a number that is very close

to zero, so division by 0.0 results in an infinite value Keep in mind,

how-ever, that Java throws an exception if you try to divide by the integer 0 In

the event that a computation doesn’t make sense, the value is NaN, whichstands for “not a number.” If you wish to see if a value is NaN, then you

must use the Double.isNaN method Various examples of manipulating

double types are shown as follows:

double d1 = 1.7E308 * 2.0; // overflow = Infinity

double d2 = 1.0/0.0; // = Infinity

double d3 = -1.0/0.0; // = -Infinity

int i = 1 / 0; // throws a DivisionByZero exception double d4 = 0.0/0.0 // = NaN

boolean b = (d4 == d4); // = false, always

boolean b = Double.isNan(d4); // true;

boolean b = Double.isInfinite(d3); // true

Arrays

In Java, native-type arrays are treated and created as if there were objectscreated using a constructor with the new operator Arrays can also be setwith initial values, as shown in the following code snippet

int[] a= new int[3]; // all three elements set to zero

int[] a = {1, 2, 3}; // a pre-set 3 element array.

Once constructed, an array is not resizable If dynamically allocated

storage is needed, you should use one of the collections in the java.utilpackage The index to arrays is with int types, and the first element starts

at 0 (as in C) Small types will have widening conversions done to them,and the long type cannot be used without a cast operation Thus, singledimensional arrays cannot be larger than 231, or roughly 2.1 billion, entries.Hopefully, this will be sufficient for your needs

Since arrays are objects, assignment is done by reference, as shown in the

following:

int[] a = {1,2};

int[] b = a;

b[0] = 100; // modifies the object that both a and b point too.

System.out.println(a[0]); // will print 100, not 1

Trang 38

In addition, they can be copied using the clone method To use this, castthe result to the correct array type:

int[] a = {1, 2};

int[] b = (int[]) a.clone(); // b is a “deep copy” of a

The System.arraycopy method is extremely useful for copying parts

of an array or concatenating arrays It makes a native system call to copymemory directly instead of copying each element individually and is muchfaster than writing a custom for loop:

System.arrayCopy(Object input, int inputOffset,

Object output, int outputOffset, int length)

Even though the method has Object in its signature, this method onlyworks on array types Other useful methods for manipulating nativearrays can be found in the class java.util.Arrays and are summarized

in Table 1.10

Table 1.10 Summary of Methods from java.util.Arrays

JAVA.UTIL.ARRAYS METHOD DESCRIPTION

static boolean Returns true if and only if the arrays are

equals(type[] a, type[] b) the same length and every element is

equal.

static void Fills an array with a single value.

fill(type[] a, type val)

static void Performs a fast numeric sort Array is

static void Sorts only part of an array.

sort(type[] a, int fromIndex,

int toIndex)

static int Performs a binary search of a sorted

binarySearch(type[] a, array Returns the array index if a match

type val) is found and –1 if no match Arrays must

be sorted first.

Trang 39

ArrayList, HashMap) The wrapper classes also provide basic string matting and parsing from strings for numbers Since they are objects, theycan also be null Likewise, objects are always pass-by-reference

for-public void changeInt(Integer i) {

i = new Integer(“1”);

}

Integer I = new Integer(“0”);

changeInt(i); // i is now 1

All of the classes share some common traits (examples are just shown for

want to memorize the previous table

a native type

■■ A static method valueOf that accepts a string and an optional radix

to parse a string and return an object The radix can be 2 to 32.static Integer Integer.valueOf(int val)

static Integer Integer.valueOf(int val, int radix)

static Long Long.valueOf(long val)

static Long Long.valueOf(long val, int radix)

■■ A static method parseClass (where Class is the name of the class,such as Byte or Integer) that also accepts a string, and an optional

radix to parse a string returns the native type instead of an object.

static int Integer.parseLong(int val)

static int Integer.parseFloat(int val, int radix)

static long Long.parseLong(long val)

static long Long.parseLong(long val, int radix)

Trang 40

Table 1.11 Java Class Wrappers for Native Types

NATIVE TYPE MATCHING JAVA.LANG CLASS

The Long and Integer classes have a few other useful static methods

that provide an unsigned representation of the number in binary, hex, or

octal formats as shown:

static String Integer.toBinaryString(int val)

static String Integer.toHexString(int val)

static String Integer.toOctalString(int val)

static String Long.toBinaryString(long val)

static String Long.toHexString(long val)

static String Long.toOctalString(long val)

Binary representation is especially useful for debugging bit fields These objects do not add any “leading zeros” to the output, so new

Booleans and BitFields

Java provides a native boolean type that can either be true or false.Unlike C and C++, it is not a numeric type and does not have a numericvalue, and it is not automatically cast Like the numeric types, there is also

a boolean class wrapper If you want to create an array of boolean values,you could use Boolean with one of the collection classes (e.g.,

specially designed for use with boolean types that provides a huge mance increase and memory savings More specialized applications willconvert a native type or byte array into a bit field directly

perfor-Bits and Bytes 21

www.Ebook777.com

Định dạng
Số trang	419
Dung lượng	1,42 MB