HandBooks Professional Java-C-Scrip-SQL part 128 ppsx

Function ascii_to_BCD in bcdconv.cpp Figure bcdconv.00 converts a decimal number, stored as ASCII digits, to a BCD value by extracting each digit from the input argument and subtracting

Trang 1

extra space to prevent the hashing from getting too slow means that the file would end up taking up about 437 Kbytes For this application, disk storage space would not be a problem; however, the techniques we will use to reduce the file size are useful in many other applications as well Also, searching a smaller file is likely to

be faster, because the heads have to move a shorter distance on the average to get

to the record where we are going to start our search

If you look back at Figure initstruct, you will notice that the upc field is ten

characters long Using the ASCII code for each digit, which is the usual

representation for character data, takes one byte per digit or 10 bytes in all I

mentioned above that we would be using a limited character set to reduce the size

of the records UPC codes are limited to the digits 0 through 9; if we pack two digits into one byte, by using four bits to represent each digit, we can cut that down

to five bytes for each UPC value stored Luckily, this is quite simple, as you will see when we discuss the BCD (binary-coded decimal) conversion code below The other data compression method we will employ is to convert the item descriptions from strings of ASCII characters, limited to the sets 0-9, A-Z, and the special

characters comma, period, minus, and space, to Radix40 representation,

mentioned in Chapter prologue.htm The main difference between Radix40

conversions and those for BCD is that in the former case we need to represent 40 different characters, rather than just the 10 digits, and therefore the packing of data must be done in a slightly more complicated way than just using four bits per

character

The Code

Now that we have covered the optimizations that we will use in our price lookup system, it's time to go through the code that implements these algorithms This specific implementation is set up to handle a maximum of FILE_CAPACITY items, defined in superm.h (Figure superm.00a).5 Each of these items, as defined

in the ItemRecord structure in the same file, has a price, a description, and a key, which is the UPC code The key would be read in by a bar-code scanner in a real system, although our test program will read it in from the keyboard

Some User-Defined Types

Several of the fields in the ItemRecord structure definition require some

explanation, so let's take a closer look at that definition, shown in Figure

superm.00

Trang 2

ItemRecord struct definition (from superm\superm.h) (Figure superm.00)

codelist/superm.00

The upc field is defined as a BCD (binary-coded decimal) value of

ASCII_KEY_SIZE digits (contained in BCD_KEY_SIZE bytes) The

description field is defined as a Radix40 field DESCRIPTION_WORDS in size; each of these words contains three Radix40 characters

A BCD value is stored as two digits per byte, each digit being represented by a four-bit code between 0000(0) and 1001(9) Function ascii_to_BCD in

bcdconv.cpp (Figure bcdconv.00) converts a decimal number, stored as ASCII digits, to a BCD value by extracting each digit from the input argument and

subtracting the code for '0' from the digit value; BCD_to_ascii (Figure

bcdconv.01) does the reverse

ASCII to BCD conversion function (from superm\bcdconv.cpp) (Figure

bcdconv.00)

codelist/bcdconv.00

BCD to ASCII conversion function (from superm\bcdconv.cpp) (Figure

bcdconv.01)

codelist/bcdconv.01

A UPC code is a ten-digit number between 0000000000 and 9999999999, which unfortunately is too large to fit in a long integer of 32 bits Of course, we could store it in ASCII, but that would require 10 bytes per UPC code So BCD

representation saves five bytes per item compared to ASCII

A Radix40 field, as mentioned above, stores three characters (from a limited set

of possibilities) in 16 bits This algorithm (like some other data compression

techniques) takes advantage of the fact that the number of bits required to store a character depends on the number of distinct characters to be represented.6 The BCD functions described above are an example of this approach In this case, however,

we need more than just the 10 digits If our character set can be limited to 40

characters (think of a Radix40 value as a "number" in base 40), we can fit three

of them in 16 bits, because 403 is less than 216

Let's start by looking at the header file for the Radix40 conversion functions, which is shown in Figure radix40.00a

Trang 3

The header file for Radix40 conversion (superm\radix40.h) (Figure radix40.00a)

codelist/radix40.00a

The legal_chars array, shown in Figure radix40.00 defines the characters that can be expressed in this implementation of Radix40.7 The variable weights contains the multipliers to be used to construct a two-byte Radix40 value from the three characters that we wish to store in it

The legal_chars array (from superm\radix40.cpp) (Figure radix40.00)

codelist/radix40.00

As indicated in the comment at the beginning of the ascii_to_radix40

function (Figure radix40.01), the job of that function is to convert a null-terminated ASCII character string to Radix40 representation After some initialization and error checking, the main loop begins by incrementing the index to the current word being constructed, after every third character is translated It then translates the current ASCII character by indexing into the lookup_chars array, which is shown in Figure radix40.02 Any character that translates to a value with its high bit set is an illegal character and is converted to a hyphen; the result flag is changed to S_ILLEGAL if this occurs

The ascii_to_radix40 function (from superm\radix40.cpp) (Figure radix40.01)

codelist/radix40.01

The lookup_chars array (from superm\radix40.cpp) (Figure radix40.02)

codelist/radix40.02

In the line radix40_data[current_word_index] +=

weights[cycle] * j;, the character is added into the current output word after being multiplied by the power of 40 that is appropriate to its position The first character in a word is represented by its position in the legal_chars string The second character is represented by 40 times that value and the third by 1600 times that value, as you would expect for a base-40 number

The complementary function radix40_to_ascii (Figure radix40.03) decodes each character unambiguously First, the current character is extracted from the current word by dividing by the weight appropriate to its position; then the current word is updated so the next character can be extracted Finally, the ASCII value of the character is looked up in the legal_chars array

Trang 4

The radix40_to_ascii function (from superm\radix40.cpp) (Figure radix40.03)

codelist/radix40.03

Preparing to Access the Price File

Now that we have examined the user-defined types used in the ItemRecord structure, we can go on to the PriceFile structure, which is used to keep track

of the data for a particular price file.8 The best way to learn about this structure is

to follow the program as it creates, initializes, and uses it The function main, which is shown in Figure superm.01, after checking that it was called with the correct number of arguments, calls the initialize_price_file function (Figure suplook.00) to set up the PriceFile structure

The main function (from superm\superm.cpp) (Figure superm.01)

codelist/superm.01

The initialize_price_file function (from superm\suplook.cpp) (Figure

suplook.00)

codelist/suplook.00

The initialize_price_file function allocates storage for and initializes the PriceFile structure, which is used to control access to the price file This structure contains pointers to the file, to the array of cached records that we have in memory, and to the array of record numbers of those cached records As we

discussed earlier, the use of a cache can reduce the amount of time spent reading records from the disk by maintaining copies of a number of those records in

memory, in the hope that they will be needed again Of course, we have to keep track of which records we have cached, so that we can tell whether we have to read

a particular record from the disk or can retrieve a copy of it from the cache instead

When execution starts, we don't have any records cached; therefore, we initialize each entry in these arrays to an "invalid" state (the key is set to

INVALID_BCD_VALUE) If file_mode is set to CLEAR_FILE, we write such

an "invalid" record to every position in the price file as well, so that any old data left over from a previous run is erased

Now that access to the price file has been set up, we can call the process

function (Figure superm.02) This function allows us to enter items and/or look up their prices and descriptions, depending on mode

Trang 5

The process function (from superm\superm.cpp) (Figure superm.02)

codelist/superm.02

First, let's look at entering a new item (INPUT_MODE) We must get the UPC code, the description, and the price of the item The UPC code is converted to BCD, the description to Radix40, and the price to unsigned Then we call

write_record (Figure suplook.01) to add the record to the file

The write_record function (from superm\suplook.cpp) (Figure suplook.01)

codelist/suplook.01

In order to write a record to the file, write_record calls

lookup_record_number (Figure suplook.02) to determine where the record should be stored so that we can retrieve it quickly later The

lookup_record_number function does almost the same thing as

lookup_record (Figure suplook.03), except tha the latter returns a pointer to the record rather than its number Therefore, they are implemented as calls to a common function: lookup_record_and_number (Figure suplook.04)

The lookup_record_number function (from superm\suplook.cpp) (Figure

suplook.02)

codelist/suplook.02

The lookup_record function (from superm\suplook.cpp) (Figure suplook.03)

codelist/suplook.03

The lookup_record_and_number function (from superm\suplook.cpp) (Figure suplook.04)

codelist/suplook.04

After a bit of setup code, lookup_record_and_number determines whether the record we want is already in the cache, in which case we don't have to search the file for it To do this, we call compute_cache_hash (Figure suplook.05), which in turn calls compute_hash (Figure suplook.06) to do most of the work

of calculating the hash code

The compute_cache_hash function (from superm\suplook.cpp) (Figure

suplook.05)

codelist/suplook.05

Định dạng
Số trang	6
Dung lượng	25,55 KB