HandBooks Professional Java-C-Scrip-SQL part 162 ppt

Originally, I wasn't planning to make any changes to the program from the previous version to this one other than increasing the buffer size.. However, the program was crashing at the sa

Trang 1

Originally, I wasn't planning to make any changes to the program from the

previous version to this one other than increasing the buffer size However, when running tests with the 64 MB memory configuration, I discovered that making just that one change caused the program to fail with a message telling me I was out of memory This was hard to understand at first, because I was allocating only 16 MB

at any one time; surely a 64 MB machine, even one running Windows 95, should

be able to handle that without difficulty!

However, the program was crashing at the same place every time I ran it with the same data, after a number of passes through the main loop, so I had to figure out what the cause might be At first, I didn't see anything questionable about the

program On further examination, however, I did notice something that was cause for concern: I was allocating and freeing those memory buffers every time through the main loop While it seems reasonable to me that allocating a number of buffers and then freeing all of them should return the memory allocation map to its

original state, apparently this was not the case At least, that's the only explanation

I can find for why the available memory displayed by the debugger should drop suddenly after a number of passes through the main loop in which it remained nearly constant

Actually, even if allocating and freeing the buffers every time through the loop did work properly, it really isn't the right way to handle the memory allocation task It's much more efficient to allocate one large buffer and just keep pointers to the places

in that buffer where our smaller, logically distinct, buffers reside Once I made those changes to the program, the crashes went away, so I apparently identified the problem correctly The new, improved version is shown in Figure zen03.cpp

Zensort version 3 (Zensort\zen03.cpp) (Figure zen03.cpp)

zensort/zen03.cpp

I think the changes in the program are relatively self-explanatory Basically, the only changes are the allocation of a new variable called BigBuffer which is used to hold all the data for the records being sorted, and the change of the

previously existing Buffer variable to an array of char* rather than an array of char Rather than allocating and deleting the individual buffers on every pass through the main loop, we merely recalculate the position in the large buffer where the logical buffer for each character begins The performance results for this

version of the program are shown in Figure timings.03

Trang 2

Performance of Zensort version 3 (Figure timings.03)

zensort/timings.03

While we didn't get as much of an increase in performance from making more room available for the buffers as we did from improving the algorithm in the

previous stage, we did get about a 13 percent increase in throughput on the largest file with a 64 MB system, and about 17 percent on the 192 MB system, which isn't negligible.6 Now let's take a look at another way of speeding up this algorithm that will have considerably more effect: sorting on two characters at a time

The Fourth Version

Every pass we make through the file requires a significant amount of disk activity, both reading and writing Therefore, anything that reduces the number of passes should help speed the program up noticeably The simplest way of accomplishing this goal in a general way is to sort on two characters at a time rather than one as

we have been doing previously

This requires a number of changes to the program, none of which is particularly complicated The new version is shown in Figure zen04.cpp

zensort/zen04.cpp

We'll start by examining a new function called CalculateKeySegment,

which, as its name suggests, calculates the segment of the key that we're going to use for sorting In this case, because we're going to be sorting on two characters at

a time, this function calculates a key segment value by combining two characters

of the input key, with the more significant character contributing more to the

resulting value than the less significant character

A simple way to think of this optimization is that we're going to sort on an alphabet consisting of 65536 characters, each of which is composed of two characters from the regular ASCII set Because the maximum possible value of a character is 256,

we can calculate the buffer in which we will store a particular record according to two characters of its key by multiplying the first character of the key by 256 and adding the second character of the key This value will never be more than 65535

or less than 0, so we will allocate 65536 buffers, one for each possible combination

of two characters

Trang 3

Besides the substitution of the key segment for the individual character of the key, the other major change to the program is in the handling of a full buffer In the old program, whenever a new record would cause the output buffer to overflow, we would write out the previous contents of the buffer and then store the new record at the beginning of the buffer However, this approach has the drawback that it is possible in some cases to have a record that is larger than the allocated buffer, in which case the program will fail if we attempt to store the record in that buffer

This wasn't too much of a problem in the previous version of the program, because with only 256 buffers, each of them would be big enough to hold any reasonably-sized record However, now that we have 65536 buffers, this is a real possibility With the current implementation, as long as the record isn't more than twice the size of the buffer, the program will work correctly If we're worried about records that are larger than that, we can change the code to handle the record in any

number of segments by using a while loop that will continue to store segments of the record in the buffer and write them out until the remaining segment will fit in the buffer

So how does this fourth version of the program actually perform? Figure

timings.04 answers that question

Performance of Zensort version 4 (Figure timings.04)

zensort/timings.04

If you compare the performance of this fourth version of the program to the

previous version on the small file, you'll notice that it has nearly doubled on both memory configurations As usual, however, what is more important is how well it performs when we have a lot of data to sort As you can see from the performance results, the throughput when sorting the large file has improved by over 50 percent

on both the small and large memory configurations

We've just about reached the end of the line with incremental changes to the

implementation To get any further significant increases in performance, we'll need

a radically different approach, and that's what the next version of this program provides

Trang 4

The Fifth Version

Before we get to this change in the program, though, it might be instructive if I explain how I arrived at the conclusion that such a change was either necessary or even possible

Unlike some well-known authors who shall remain nameless, I take technical writing very seriously I test my code before publishing it, I typeset my books myself to reduce the likelihood of typesetting errors, and I even create my master CDs myself, to minimize the chance that errors will creep in somewhere in

between my development system and your computer Of course, this doesn't

guarantee that there aren't any bugs in my programs; if major software

development companies can't guarantee that, my one-man development and quality assurance organization certainly can't! However, I do a pretty good job, and when I miss something, I usually hear from readers right away and can get the correction into the next printing of the book in question

You may be wondering what this has to do with the changes to the implementation

of this sorting algorithm The answer is that, although I thought I had discovered something important when I broke through the limited memory problem with

distribution sorting, I decided it would be a good idea to see how its performance compares with other available sorts Therefore, I asked a friend if he knew of any resources on sorting performance After he found a page about sorting on the

Internet, I followed up and found a page referring to a sorting contest

Before I could tell how my implementation would compare to those in the contest,

I had to generate some performance figures Although the page about the contest was somewhat out of date, it gave me enough information so that I was able to generate a test file similar to the one described in the contest The description was

"one million 100-byte records, with a 10-byte random key" I wasn't sure what they meant by "random": was it a string of ten random digits, or 10 random binary bytes, or 10 random ASCII values? I decided to assume that 10 random decimal digits would be close enough to start with, so that's how I created an initial version

of a test file When I ran my latest, greatest version on this file, I was pretty happy when I discovered I could sort about 500,000 records in a minute, because the figures on the contest page indicated that this was quite cost-competitive in the

"minute sort" category, which was based on the number of records sorted in a minute; although I was certainly not breaking any speed records as such, my

system was much cheaper than the one that had set the record, so on a

Trang 5

cost-performance basis I was doing quite well However, I did need some more recent information to see how the latest competition was going

So I contacted Jim Gray, who was listed on that page as a member of the contest committee, and heard back from him the next day Imagine my surprise when I discovered that my "fast" sorting algorithm wasn't even in the ball park My best throughput of approximately 800 KB/sec or so was less than one third of the

leading competitors Obviously, I had a lot more work to do if I wanted to compete

in any serious way

The first priority was to find out exactly why these other programs were so much faster than my program was My discussion with Jim Gray gave me the clue when

he told me that all of the best programs were limited by their disk I/O throughput Obviously, if we have to make five passes through the file, reading and writing all

of the data on each pass, we aren't going to be competitive with programs that do much less disk I/O, if that is the limiting factor on sorting speed

Obviously, any possible sorting algorithm must read the entire input file at least once and write an output file of the same size Is there any way to reduce the

amount of I/O that our sorting algorithm uses so that it can approach that ideal? Although we can't get to that limiting case, it is possible to do much better than we have done However, doing so requires more attention to the makeup of the keys that we are sorting Until now, we haven't cared very much about the distribution

of the keys, except that we would get larger buffers if there were fewer different characters in the keys, which would reduce the number of disk write operations needed to create the output file and thereby improve performance

However, if the keys were composed of reasonably uniformly distributed

characters (or sets of characters) that we could use to divide up the input file into a number of segments of similar size based on their key values, then we could use a

"divide and conquer" approach to sorting that can improve performance

significantly That's what the next version of this program, shown in Figure

zen05.cpp, does

zensort/zen05.cpp

This new version of the algorithm works in a different way from the ones we've seen before Instead of moving from right to left through the keys, sorting on the less significant positions to prepare the way for the more significant positions, we

Định dạng
Số trang	5
Dung lượng	24,52 KB