This means that the range of frequencies corresponding to our message is still greater than 50% of the maximum possible frequency; we must have encoded an extremely frequent character, s
Trang 1The first test determines whether the highest frequency value allocated to our
message is less than one-half of the total frequency range If it is, we know that the first output bit is 0, so we call the bit_plus_follow_0 macro (Figure
arenc.07) to output that bit Let's take a look at that macro
First we call output_0, which adds a 0 bit to the output buffer and writes it out if
it is full Then, in the event that bits_to_follow is greater than 0, we call output_1 and decrement bits_to_follow until it reaches 0 Why do we do this?
The bit_plus_follow_0 macro (from compress\arenc.cpp) (Figure arenc.07)
codelist/arenc.07
The reason is that bits_to_follow indicates the number of bits that could not
be output up till now because we didn't know the first bit to be produced
("deferred" bits) For example, if the range of codes for the message had been 011 through 100, we would be unable to output any bits until the first bit was decided However, once we have enough information to decide that the first bit is 0, we can output three bits, "011" The value of bits_to_follow would be 2 in that case, since we have two deferred bits Of course, if the first bit turns out to be 1, we would emit "100" instead The reason that we know that the following bits must be the opposite of the initial bit is that the only time we have to defer bits is when the code range is split between codes starting with 0 and those starting with 1; if both low and high started with the same bit, we could send that bit out
The values of HALF, FIRST_QTR, and THIRD_QTR are all based on
TOP_VALUE (Figure arith.00) In our example, FIRST_QTR is 64, HALF is 128, and THIRD_QTR is 192 The current value of high is 255, which is more than HALF, so we continue with our tests
Assuming that we haven't obtained the current output bit yet, we continue by
testing the complementary condition If low is greater than or equal to HALF, we know that the entire frequency range allocated to our message so far is in the top half of the total frequency range; therefore, the first bit of the message is 1 If this occurs, we output a 1 via bit_plus_follow_1 The next two lines reduce high and low by HALF, since we know that both are above that value
In our example, low is 223, which is more than HALF Therefore, we can call bit_plus_follow_1 to output a 1 bit Then we adjust both low and high by
Trang 2subtracting HALF, to account for the 1 bit we have just produced; low is now 95 and high is 127
If we haven't passed either of the tests to output a 0 or a 1 bit, we continue by
testing whether the range is small enough but in the wrong position to provide output at this time This is the situation labeled "Defer" in figure initcode We know that the first two bits of the output will be either 01 or 10; we just don't know which of these two possibilities it will be Therefore, we defer the output,
incrementing bits_to_follow to indicate that we have done so; we also
reduce both low and high by FIRST_QTR, since we know that both are above that value If this seems mysterious, remember that the encoding information we want is contained in the differences between low and high, so we want to
remove any redundancy in their values.16 If we get to the break statement near the end of the loop, we still don't have any idea what the next output bit(s) will be This means that the range of frequencies corresponding to our message is still greater than 50% of the maximum possible frequency; we must have encoded an extremely frequent character, since it hasn't even contributed one bit! If this
happens, we break out of the output loop; obviously, we have nothing to declare
In any of the other three cases, we now have arrived at the statement low <<= 1; with the values of low and high guaranteed to be less than half the maximum possible frequency.17 Therefore, we shift the values of low and high up one bit to make room for our next pass through the loop One more detail: we increment high after shifting it because the range represented by high actually extends almost to the next frequency value; we have to shift a 1 bit in rather than a 0 to keep this relationship
In our example, low is 95 and high is 127, which represents a range of
frequencies from 95 to slightly less than 128 The shifts give us 190 for low and
255 for high, which represents a range from 190 to slightly less than 256 If we hadn't added 1 to high, the range would have been from 190 to slightly less than
255
Since we have been over the code in this loop once already, we can continue
directly with the example We start out with low at 190 and high at 255 Since high is not less than HALF (128), we proceed to the second test, where low turns out to be greater than HALF So we call bit_plus_follow_1 again as on the first loop and then reduce both low and high by HALF, producing 62 for low
Trang 3and 127 for high At the bottom of the loop, we shift a 0 into low and a 1 into high, resulting in 124 and 255, respectively
On the next pass through the loop, high is not less than HALF, low isn't greater than or equal to HALF, and high isn't less than THIRD_QTR (192), so we hit the break and exit the loop We have sent two bits to the output buffer
We are finished with encode_symbol for this character Now we will start processing the next character of our example message, which is 'B' This character has a symbol value of 1, as it is the second character in our character set First, we set prev_cum to 0 The frequency accumulation loop in Figure arenc.03 will not
be executed at all, since symbol/2 evaluates to 0; we fall through to the
adjustment code in Figure arenc.04 and select the odd path after the else, since the symbol code is odd We set current_pair to (1,3), since that is the first entry in the frequency table Then we set total_pair_weight to the
corresponding entry in the both_weights table, which is 54 Next, we set cum
to 0 + 54, or 54 The high part of the current pair is 1, so high_half_weight becomes entry 1 in the translate table, or 2; we add this to prev_cum, which becomes 2 as well
Now we have reached the first line in Figure arenc.05 Since the current value of low is 124 and the current value of high is 255, the value of range becomes
131 Next, we recalculate high as 124 + (131*54)/63 - 1, or 235 The new value
of low is 124 + (131*2)/63, or 128 We are ready to enter the output loop
First, high is not less than HALF, so the first test fails Next, low is equal to HALF, so the second test succeeds Therefore, we call bit_plus_follow_1 to output a 1 bit; it would also output any deferred bits that we might have been unable to send out before, although there aren't any at present We also adjust low and high by subtracting HALF, to account for the bit we have just sent; their new values are 0 and 107, respectively
Next, we proceed to the statements beginning at low <<= 1;, where we shift low and high up, injecting a 0 and a 1, respectively; the new values are 0 for low and 215 for high On the next pass through the loop we will discover that these values are too far apart to emit any more bits, and we will break out of the loop and return to the main function
Trang 4We could continue with a longer message, but I imagine you get the idea So let's return to the main function (Figure encode.00).18
update_model
The next function called is update_model (Figure adapt.01), which adjusts the frequencies in the current frequency table to account for having seen the most recent character
The update_model function (from compress\adapt.cpp) (Figure adapt.01)
codelist/adapt.01
The arguments to this function are symbol, the internal value of the character just encoded, and oldch, the previous character encoded, which indicates the
frequency table that was used to encode that character What is the internal value
of the character? In the current version of the program, it is the same as the ASCII code of the character; however, near the end of the chapter we will employ an optimization that involves translating characters from ASCII to an internal code to speed up translation
This function starts out by adding the character's ASCII code to char_total, a hash total which is used in the simple pseudorandom number generator that we use
to help decide when to upgrade a character's frequency to the next frequency index code We use the symbol_translation table to get the ASCII value of the character before adding it to char_total; this is present for compatibility with our final version which employs character translation
The next few lines initialize some variables: old_weight_code, which we set when changing a frequency index code or "weight code", so that we can update the frequency total for this frequency table; temp_freq_info, a pointer to the frequency table structure for the current character; and freq_ptr, the address of the frequency table itself
Next, we compute the index into the frequency table for the weight code we want
to examine If this index is even, that means that this symbol is in the high part of that byte in the frequency table In this case, we execute the code in the true branch of the if statement "if (symbol % 2 == 0)" This starts by setting temp_freq to the high four bits of the table entry If the result is 0, this character has the lowest possible frequency value; we assume that this is because it has never
Trang 5been encountered before and set its frequency index code to INIT_INDEX Then
we update the total_freq element in the frequency table
However, if temp_freq is not 0, we have to decide whether to upgrade this character's frequency index code to the next level The probability of this upgrade
is inversely proportional to the ratio of the current frequency to the next frequency; the larger the gap between two frequency code values, the less the probability of the upgrade So we compare char_total to the entry in
upgrade_threshold; if char_total is greater, we want to do the upgrade,
so we record the previous frequency code in old_weight_code and add
HIGH_INCREMENT to the byte containing the frequency index for the current character We have to use HIGH_INCREMENT rather than 1 to adjust the
frequency index, since the frequency code for the current character occupies the high four bits of its byte
Of course, the character is just as likely to be in the low part of its byte; in that case, we execute the code in the false branch of that if statement, which
corresponds exactly to the above code In either case, we follow up with the if statement whose condition is "(old_weight_code != -1)", which tests whether a frequency index code was incremented If it was, we add the difference between the new code and the old one to the total_freq entry in the current frequency table, unless the character previously had a frequency index code of 0;
in that case, we have already adjusted the total_freq entry
The last operation in update_model is to make sure that the total of all
frequencies in the frequency table does not exceed the limit of MAX_FREQUENCY;
if that were to happen, more than one character might map into the same value between high and low, so that unambiguous decoding would become impossible Therefore, if temp_total_freq exceeds MAX_FREQUENCY, we have to
reduce the frequency indexes until this is no longer the case The while loop whose continuation expression is "(temp_total_freq >
MAX_FREQUENCY)" takes care of this problem in the following way
First, we initialize temp_total_freq to 0, as we will use it to accumulate the frequencies as we modify them Then we set freq_ptr to the address of the first entry in the frequency table to be modified Now we are ready to step through all the bytes in the frequency table; for each one, we test whether both indexes in the current byte are 0 If so, we can't reduce them, so we just add the frequency value
Trang 6corresponding to the translation of two 0 indexes (BOTH_WEIGHTS_ZERO) to temp_total_freq
Otherwise, we copy the current index pair into freq If the high index is nonzero,
we decrement it Similarly, if the low index is nonzero, we decrement it After handling either or both of these cases, we add the translation of the new index pair
to temp_total_freq After we have processed all of the index values in this way, we retest the while condition, and when temp_total_freq is no longer out of range, we store it back into the frequency table and return to the main
program
Finally, we have returned to the main function (Figure encode.00), where we copy
ch to oldch, so that the current character will be used to select the frequency table for the next character to be encoded; then we continue in the main loop until all characters have been processed
When we reach EOF in the input file, the main loop terminates; we use
encode_symbol to encode EOF_SYMBOL, which tells the receiver to stop