HandBooks Professional Java-C-Scrip-SQL part 138 potx

This means that the range of frequencies corresponding to our message is still greater than 50% of the maximum possible frequency; we must have encoded an extremely frequent character, s

Trang 1

The first test determines whether the highest frequency value allocated to our

message is less than one-half of the total frequency range If it is, we know that the first output bit is 0, so we call the bit_plus_follow_0 macro (Figure

arenc.07) to output that bit Let's take a look at that macro

First we call output_0, which adds a 0 bit to the output buffer and writes it out if

it is full Then, in the event that bits_to_follow is greater than 0, we call output_1 and decrement bits_to_follow until it reaches 0 Why do we do this?

The bit_plus_follow_0 macro (from compress\arenc.cpp) (Figure arenc.07)

codelist/arenc.07

The reason is that bits_to_follow indicates the number of bits that could not

be output up till now because we didn't know the first bit to be produced

("deferred" bits) For example, if the range of codes for the message had been 011 through 100, we would be unable to output any bits until the first bit was decided However, once we have enough information to decide that the first bit is 0, we can output three bits, "011" The value of bits_to_follow would be 2 in that case, since we have two deferred bits Of course, if the first bit turns out to be 1, we would emit "100" instead The reason that we know that the following bits must be the opposite of the initial bit is that the only time we have to defer bits is when the code range is split between codes starting with 0 and those starting with 1; if both low and high started with the same bit, we could send that bit out

The values of HALF, FIRST_QTR, and THIRD_QTR are all based on

TOP_VALUE (Figure arith.00) In our example, FIRST_QTR is 64, HALF is 128, and THIRD_QTR is 192 The current value of high is 255, which is more than HALF, so we continue with our tests

Assuming that we haven't obtained the current output bit yet, we continue by

testing the complementary condition If low is greater than or equal to HALF, we know that the entire frequency range allocated to our message so far is in the top half of the total frequency range; therefore, the first bit of the message is 1 If this occurs, we output a 1 via bit_plus_follow_1 The next two lines reduce high and low by HALF, since we know that both are above that value

In our example, low is 223, which is more than HALF Therefore, we can call bit_plus_follow_1 to output a 1 bit Then we adjust both low and high by

Trang 2

subtracting HALF, to account for the 1 bit we have just produced; low is now 95 and high is 127

If we haven't passed either of the tests to output a 0 or a 1 bit, we continue by

testing whether the range is small enough but in the wrong position to provide output at this time This is the situation labeled "Defer" in figure initcode We know that the first two bits of the output will be either 01 or 10; we just don't know which of these two possibilities it will be Therefore, we defer the output,

incrementing bits_to_follow to indicate that we have done so; we also

reduce both low and high by FIRST_QTR, since we know that both are above that value If this seems mysterious, remember that the encoding information we want is contained in the differences between low and high, so we want to

remove any redundancy in their values.16 If we get to the break statement near the end of the loop, we still don't have any idea what the next output bit(s) will be This means that the range of frequencies corresponding to our message is still greater than 50% of the maximum possible frequency; we must have encoded an extremely frequent character, since it hasn't even contributed one bit! If this

happens, we break out of the output loop; obviously, we have nothing to declare

In any of the other three cases, we now have arrived at the statement low <<= 1; with the values of low and high guaranteed to be less than half the maximum possible frequency.17 Therefore, we shift the values of low and high up one bit to make room for our next pass through the loop One more detail: we increment high after shifting it because the range represented by high actually extends almost to the next frequency value; we have to shift a 1 bit in rather than a 0 to keep this relationship

In our example, low is 95 and high is 127, which represents a range of

frequencies from 95 to slightly less than 128 The shifts give us 190 for low and

255 for high, which represents a range from 190 to slightly less than 256 If we hadn't added 1 to high, the range would have been from 190 to slightly less than

255

Since we have been over the code in this loop once already, we can continue

directly with the example We start out with low at 190 and high at 255 Since high is not less than HALF (128), we proceed to the second test, where low turns out to be greater than HALF So we call bit_plus_follow_1 again as on the first loop and then reduce both low and high by HALF, producing 62 for low

Trang 3

and 127 for high At the bottom of the loop, we shift a 0 into low and a 1 into high, resulting in 124 and 255, respectively

On the next pass through the loop, high is not less than HALF, low isn't greater than or equal to HALF, and high isn't less than THIRD_QTR (192), so we hit the break and exit the loop We have sent two bits to the output buffer

We are finished with encode_symbol for this character Now we will start processing the next character of our example message, which is 'B' This character has a symbol value of 1, as it is the second character in our character set First, we set prev_cum to 0 The frequency accumulation loop in Figure arenc.03 will not

be executed at all, since symbol/2 evaluates to 0; we fall through to the

adjustment code in Figure arenc.04 and select the odd path after the else, since the symbol code is odd We set current_pair to (1,3), since that is the first entry in the frequency table Then we set total_pair_weight to the

corresponding entry in the both_weights table, which is 54 Next, we set cum

to 0 + 54, or 54 The high part of the current pair is 1, so high_half_weight becomes entry 1 in the translate table, or 2; we add this to prev_cum, which becomes 2 as well

Now we have reached the first line in Figure arenc.05 Since the current value of low is 124 and the current value of high is 255, the value of range becomes

131 Next, we recalculate high as 124 + (131*54)/63 - 1, or 235 The new value

of low is 124 + (131*2)/63, or 128 We are ready to enter the output loop

First, high is not less than HALF, so the first test fails Next, low is equal to HALF, so the second test succeeds Therefore, we call bit_plus_follow_1 to output a 1 bit; it would also output any deferred bits that we might have been unable to send out before, although there aren't any at present We also adjust low and high by subtracting HALF, to account for the bit we have just sent; their new values are 0 and 107, respectively

Next, we proceed to the statements beginning at low <<= 1;, where we shift low and high up, injecting a 0 and a 1, respectively; the new values are 0 for low and 215 for high On the next pass through the loop we will discover that these values are too far apart to emit any more bits, and we will break out of the loop and return to the main function

Trang 4

We could continue with a longer message, but I imagine you get the idea So let's return to the main function (Figure encode.00).18

update_model

The next function called is update_model (Figure adapt.01), which adjusts the frequencies in the current frequency table to account for having seen the most recent character

The update_model function (from compress\adapt.cpp) (Figure adapt.01)

codelist/adapt.01

The arguments to this function are symbol, the internal value of the character just encoded, and oldch, the previous character encoded, which indicates the

frequency table that was used to encode that character What is the internal value

of the character? In the current version of the program, it is the same as the ASCII code of the character; however, near the end of the chapter we will employ an optimization that involves translating characters from ASCII to an internal code to speed up translation

This function starts out by adding the character's ASCII code to char_total, a hash total which is used in the simple pseudorandom number generator that we use

to help decide when to upgrade a character's frequency to the next frequency index code We use the symbol_translation table to get the ASCII value of the character before adding it to char_total; this is present for compatibility with our final version which employs character translation

The next few lines initialize some variables: old_weight_code, which we set when changing a frequency index code or "weight code", so that we can update the frequency total for this frequency table; temp_freq_info, a pointer to the frequency table structure for the current character; and freq_ptr, the address of the frequency table itself

Next, we compute the index into the frequency table for the weight code we want

to examine If this index is even, that means that this symbol is in the high part of that byte in the frequency table In this case, we execute the code in the true branch of the if statement "if (symbol % 2 == 0)" This starts by setting temp_freq to the high four bits of the table entry If the result is 0, this character has the lowest possible frequency value; we assume that this is because it has never

Trang 5

been encountered before and set its frequency index code to INIT_INDEX Then

we update the total_freq element in the frequency table

However, if temp_freq is not 0, we have to decide whether to upgrade this character's frequency index code to the next level The probability of this upgrade

is inversely proportional to the ratio of the current frequency to the next frequency; the larger the gap between two frequency code values, the less the probability of the upgrade So we compare char_total to the entry in

upgrade_threshold; if char_total is greater, we want to do the upgrade,

so we record the previous frequency code in old_weight_code and add

HIGH_INCREMENT to the byte containing the frequency index for the current character We have to use HIGH_INCREMENT rather than 1 to adjust the

frequency index, since the frequency code for the current character occupies the high four bits of its byte

Of course, the character is just as likely to be in the low part of its byte; in that case, we execute the code in the false branch of that if statement, which

corresponds exactly to the above code In either case, we follow up with the if statement whose condition is "(old_weight_code != -1)", which tests whether a frequency index code was incremented If it was, we add the difference between the new code and the old one to the total_freq entry in the current frequency table, unless the character previously had a frequency index code of 0;

in that case, we have already adjusted the total_freq entry

The last operation in update_model is to make sure that the total of all

frequencies in the frequency table does not exceed the limit of MAX_FREQUENCY;

if that were to happen, more than one character might map into the same value between high and low, so that unambiguous decoding would become impossible Therefore, if temp_total_freq exceeds MAX_FREQUENCY, we have to

reduce the frequency indexes until this is no longer the case The while loop whose continuation expression is "(temp_total_freq >

MAX_FREQUENCY)" takes care of this problem in the following way

First, we initialize temp_total_freq to 0, as we will use it to accumulate the frequencies as we modify them Then we set freq_ptr to the address of the first entry in the frequency table to be modified Now we are ready to step through all the bytes in the frequency table; for each one, we test whether both indexes in the current byte are 0 If so, we can't reduce them, so we just add the frequency value

Trang 6

corresponding to the translation of two 0 indexes (BOTH_WEIGHTS_ZERO) to temp_total_freq

Otherwise, we copy the current index pair into freq If the high index is nonzero,

we decrement it Similarly, if the low index is nonzero, we decrement it After handling either or both of these cases, we add the translation of the new index pair

to temp_total_freq After we have processed all of the index values in this way, we retest the while condition, and when temp_total_freq is no longer out of range, we store it back into the frequency table and return to the main

program

Finally, we have returned to the main function (Figure encode.00), where we copy

ch to oldch, so that the current character will be used to select the frequency table for the next character to be encoded; then we continue in the main loop until all characters have been processed

When we reach EOF in the input file, the main loop terminates; we use

encode_symbol to encode EOF_SYMBOL, which tells the receiver to stop

Định dạng
Số trang	6
Dung lượng	29,77 KB