Examination of the code Figure ascrad1.cpp, disclosed that the toupper function was being called for every character in the string, every time the character was being examined.. First ve
Trang 1I resolved to improve the speed of this conversion (from 1300 microseconds/12 character string) as much as was practical, before proposing further hardware
upgrades to the system The first problem was to determine which operation was consuming the most CPU time Examination of the code (Figure ascrad1.cpp), disclosed that the toupper function was being called for every character in the string, every time the character was being examined) This seemed an obvious place to start
First version of ascii_to_Radix40 routine (from intro\ascrad1.cpp) (Figure
ascrad1.cpp)
codelist/ascrad1.00
The purpose of writing the loop in this way was to avoid making changes to the input string; after all, it was an input variable However, a more efficient way to leave the input string unaltered was to make a copy of the input string and convert the copy to uppercase, as indicated in Figure ascrad2.cpp This reduced the time to
650 microseconds/12 character string, but I suspected that more savings were
possible
Second version of ascii_to_Radix40 routine (from intro\ascrad2.cpp) (Figure ascrad2.cpp)
codelist/ascrad2.00
Another possible area of improvement was to reduce the use of dynamic string allocation to get storage for the copy of the string to be converted to uppercase In
my application, most of the strings would be less than 100 characters, so I decided
to allocate room for a string of 99 characters (plus the required null at the end) on the stack and to call the dynamic allocation routine only if the string was larger than that However, this change didn't affect the time significantly, so I removed it
I couldn't see any obvious way to increase the speed of this routine further, until I noticed that if the data had about the same number of occurrences of each
character, the loop to figure out the code for a single character would be executed
an average of 20 times per character! Could this be dispensed with?
Yes, by allocating 256 bytes for a table of conversion values.10 Then I could index into the table rather than searching the string of legal values (see Figure
ascrad4.cpp) Timing this version revealed an impressive improvement: 93
microseconds/12 character string This final version is 14 times the speed of the original.11
Trang 2Fourth version of ascii_to_Radix40 routine (from intro\ascrad4.cpp) (Figure ascrad4.cpp)
codelist/ascrad4.00
The use of a profiler would have reduced the effort needed to determine the major causes of the inefficiency Even without such an aid, attention to which lines were being executed most frequently enabled me to remove the major bottlenecks in the conversion to Radix40 representation It is no longer a significant part of the time needed to access a record
Summary
In this chapter, I have given some guidelines and examples of how to determine whether optimization is required and how to apply your optimization effort
effectively In the next chapter we will start to examine the algorithms and other solutions that you can apply once you have determined where your program needs improvement
Footnotes
1 If you don't have the time to read this book in its entirety, you can turn to Figures ioopt-processoropt in Chapter artopt.htm to find the algorithms best suited to your problem
2 Actually, you will never be ahead; five minutes saved 23 years from now is not as valuable as five minutes spent now This is analogous to the lottery in which you win a million dollars, but the prize is paid as one dollar a year for
a million years!
3 This is especially true on a multiuser system
4 My previous computer was also a Radio Shack computer, but it had only a cassette recorder/player for "mass storage"!
5 Microsoft is the most prominent example at the moment; the resource
consumption of Windows NTTM is still a matter of concern to many
programmers, even though the machines that these programmers own have increased in power at a tremendous rate
6 This example is actually quite conservative The program that took one hour
to run on a timesharing terminal would probably take much less than that on
a current desktop computer; we are also neglecting the time value of the savings, as noted above
7 Of course, if your old machine is more than two or three years old, you might want to replace it anyway, just to get the benefit of the improved technology available today
Trang 38 Perhaps this could be referred to as optimizing the design
9 This is worse than it may sound; the actual hardware on which the system runs is much slower than the i386 development machine I was using at the time
10 This table of conversion values can be found in Figure radix40.00
11 I also changed the method of clearing the result array to use memset rather than a loop
A Supermarket Price Lookup System
Introduction
In this chapter we will use a supermarket price lookup system to illustrate how to save storage by using a restricted character set and how to speed up access to
records by employing hash coding (or "scatter storage") and caching (or keeping
copies of recently accessed records in memory) We will look items up by their UPC (Universal Product Code), which is printed in the form of a "bar code" on virtually all supermarket items other than fresh produce We will emphasize rapid retrieval of prices, as maintenance of such files is usually done after hours, when speed would be less significant
Algorithms Discussed
Algorithms discussed: Hash Coding, Radix40 Data Representation, BCD Data Representation, Caching
Up the Down Staircase
To begin, let us assume that we can describe each item by the information in the structure definition in Figure initstruct
Item information (Figure initstruct)
typedef struct {
char upc[10];
char description[21];
float price;
} ItemRecord;
Trang 4One solution to our price-retrieval problem would be to create a file with one
record for each item, sorted by UPC code This would allow us to use a binary search to locate the price of a particular item How long would it take to find a record in such a file containing 10,000 items?
To answer this question, we have to analyze the algorithm of the binary search in some detail We start the search by looking at the middle record in the file If the key we are looking for is greater than the key of the middle record, we know that the record we are looking for must be in the second half of the file (if it is in the file at all) Likewise, if our key is less than the one in the middle record, the record
we are looking for must be in the first half of the file (again, if it is there at all) Once we have decided which half of the file to examine next, we look at the
middle record in that half and proceed exactly as we did previously Eventually, either we will find the record we are looking for or we will discover that we can no longer divide the segment we are looking at, as it has only one record (in which case the record we are looking for is not there)
Probably the easiest way to figure out the average number of accesses that would
be required to find a record in the file is to start from the other end of the problem: how many records could be found with one access? Obviously, only the middle record With another access, we could find either the record in the middle of the first half of the file or the record in the middle of the second half The next access adds another four records, in the centers of the first, second, third, and fourth
quarters of the file In other words, each added access doubles the number of added records that we can find
Binary search statistics (Figure binary.search)
Number of Number of Total accesses Total
accesses newly accessible to find all records
records records accessible
1 x 1 1 1
2 x 2 4 3
3 x 4 12 7
4 x 8 32 15
Trang 55 x 16 80 31
6 x 32 192 63
7 x 64 448 127
8 x 128 1024 255
9 x 256 2304 511
10 x 512 5120 1023
11 x 1024 11264 2047
12 x 2048 24576 4095
13 x 4096 53248 8191
14 x 1809 25326 10000
10000 123631
Average number of accesses per record = 12.3631 accesses/record
Figure binary.search shows the calculation of the average number of accesses for a 10,000 item file Notice that each line represents twice the number of records as the one above, with the exception of line 14 The entry for that line (1809) is the number of 14-access records needed to reach the capacity of our 10,000 record file
As you can see, the average number of accesses is approximately 12.4 per record Therefore, at a typical hard disk speed of 10 milliseconds per access, we would need almost 125 milliseconds to look up an average record using a binary search While this lookup time might not seem excessive, remember that a number of checkout terminals would probably be attempting to access the database at the same time, and the waiting time could become noticeable We might also be
concerned about the amount of wear on the disk mechanism that would result from this approach