Historical Notes and Further Findings

In later chapters, we shall discuss several generalizations of sorting. Chapter 6 dis- cusses priority queues, a data structure that supports insertions of elements and re- moval of the smallest element. In particular, inserting n elements followed by re- peated deletion of the minimum amounts to sorting. Fast priority queues result in quite good sorting algorithms. A further generalization is the search trees introduced in Chap. 7, a data structure for maintaining a sorted list that allows searching, inserting, and removing elements in logarithmic time.

We have seen several simple, elegant, and efficient randomized algorithms in this chapter. An interesting question is whether these algorithms can be replaced by deterministic ones. Blum et al. [25] described a deterministic median selection algorithm that is similar to the randomized algorithm discussed in Sect.5.5. This deterministic algorithm makes pivot selection more reliable using recursion: it splits the input set into subsets of five elements, determines the median of each subset by sorting the five-element subset, then determines the median of the n/5 medians by calling the algorithm recursively, and finally uses the median of the medians as the splitter. The resulting algorithm has linear worst-case execution time, but the large constant factor makes the algorithm impractical. (We invite the reader to set up a recurrence for the running time and to show that it has a linear solution.)

There are quite practical ways to reduce the expected number of comparisons required by quicksort. Using the median of three random elements yields an algorithm with about 1.188n log n comparisons. The median of three medians of three-element subsets brings this down to≈1.094n log n [20]. The number of comparisons can be reduced further by making the number of elements considered for pivot selection de- pendent on the size of the subproblem. Martinez and Roura [123] showed that for a subproblem of size m, the median ofΘ(√

m)elements is a good choice for the pivot.

With this approach, the total number of comparisons becomes(1+o(1))n log n, i.e., it matches the lower bound of n log n−O(n)up to lower-order terms. Interestingly,

the above optimizations can be counterproductive. Although fewer instructions are executed, it becomes impossible to predict when the inner while loops of quicksort will be aborted. Since modern, deeply pipelined processors only work efficiently when they can predict the directions of branches taken, the net effect on perfor- mance can even be negative [102]. Therefore, in [167] , a comparison-based sorting algorithm that avoids conditional branch instructions was developed. An interesting deterministic variant of quicksort is proportion-extend sort [38].

A classical sorting algorithm of some historical interest is Shell sort [174, 100], a generalization of insertion sort, that gains efficiency by also comparing nonadja- cent elements. It is still open whether some variant of Shell sort achieves O(n log n) average running time [100, 124].

There are some interesting techniques for improving external multiway mergesort. The snow plow heuristic [112, Sect. 5.4.1] forms runs of expected size 2M using a fast memory of size M: whenever an element is selected from the internal priority queue and written to the output buffer and the next element in the input buffer can extend the current run, we add it to the priority queue. Also, the use of tournament trees instead of general priority queues leads to a further improvement of multiway merging [112].

Parallelism can be used to improve the sorting of very large data sets, either in the form of a uniprocessor using parallel disks or in the form of a multiprocessor. Mul- tiway mergesort and distribution sort can be adapted to D parallel disks by striping, i.e., any D consecutive blocks in a run or bucket are evenly distributed over the disks.

Using randomization, this idea can be developed into almost optimal algorithms that also overlap I/O and computation [49]. The sample sort algorithm of Sect.5.7.2can be adapted to parallel machines [24] and results in an efficient parallel sorter.

We have seen linear-time algorithms for highly structured inputs. A quite general model, for which the n log n lower bound does not hold, is the word model. In this model, keys are integers that fit into a single memory cell, say 32- or 64-bit keys, and the standard operations on words (bitwise-AND, bitwise-OR, addition, . . . ) are available in constant time. In this model, sorting is possible in deterministic time O(n log log n)[11]. With randomization, even O

n√

log log n

is possible [85]. Flash sort [149] is a distribution-based algorithm that works almost in-place.

Exercise 5.36 (Unix spellchecking). Assume you have a dictionary consisting of a sorted sequence of correctly spelled words. To check a text, you convert it to a sequence of words, sort it, scan the text and dictionary simultaneously, and output the words in the text that do not appear in the dictionary. Implement this spellchecker using Unix tools in a small number of lines of code. Can you do this in one line?

Priority Queues

The company TMG markets tailor-made first-rate garments. It organizes marketing, measurements, etc., but outsources the actual fabrication to independent tailors. The company keeps 20% of the revenue. When the company was founded in the 19th century, there were five subcontractors. Now it controls 15% of the world market and there are thousands of subcontractors worldwide.

Your task is to assign orders to the subcontractors. The rule is that an order is assigned to the tailor who has so far (in the current year) been assigned the smallest total value of orders. Your ancestors used a blackboard to keep track of the current total value of orders for each tailor; in computer science terms, they kept a list of values and spent linear time to find the correct tailor. The business has outgrown this solution. Can you come up with a more scalable solution where you have to look only at a small number of values to decide who will be assigned the next order?

In the following year the rules are changed. In order to encourage timely delivery, the orders are now assigned to the tailor with the smallest value of unfinished orders, i.e., whenever a finished order arrives, you have to deduct the value of the order from the backlog of the tailor who executed it. Is your strategy for assigning orders flexible enough to handle this efficiently?

Priority queues are the data structure required for the problem above and for many other applications. We start our discussion with the precise specification. Pri- ority queues maintain a set M of Elements with Keys under the following operations:

• M.build({e1, . . . ,en}): M :={e1, . . . ,en}.

• M.insert(e): M :=M∪ {e}.

• M.min: return min M.

• M.deleteMin: e :=min M; M :=M\ {e}; return e.

This is enough for the first part of our example. Each year, we build a new priority queue containing an Element with a Key of zero for each contract tailor. To assign an order, we delete the smallest Element, add the order value to its Key, and reinsert it.

Section6.1presents a simple, efficient implementation of this basic functionality.

0The photograph shows a queue at the Mao Mausoleum (V. Berger, see http://commons.wikimedia.org/wiki/Image:Zhengyangmen01.jpg).

Addressable priority queues additionally support operations on arbitrary elements addressed by an element handle h:

• insert: as before, but return a handle to the element inserted.

• remove(h): remove the element specified by the handle h.

• decreaseKey(h,k): decrease the key of the element specified by the handle h to k.

• M.merge(Q): M :=M∪Q; Q :=/0.

In our example, the operation remove might be helpful when a contractor is fired because he/she delivers poor quality. Using this operation together with insert, we can also implement the “new contract rules”: when an order is delivered, we remove the Element for the contractor who executed the order, subtract the value of the order from its Key value, and reinsert the Element. DecreaseKey streamlines this process to a single operation. In Sect.6.2, we shall see that this is not just convenient but that decreasing keys can be implemented more efficiently than arbitrary element updates.

Priority queues have many applications. For example, in Sect. 12.2, we shall see that our introductory example can also be viewed as a greedy algorithm for a machine-scheduling problem. Also, the rather naive selection-sort algorithm of Sect. 5.1 can be implemented efficiently now: first, insert all elements into a priority queue, and then repeatedly delete the smallest element and output it. A tuned version of this idea is described in Sect. 6.1. The resulting heapsort algorithm is popular because it needs no additional space and is worst-case efficient.

In a discrete-event simulation, one has to maintain a set of pending events. Each event happens at some scheduled point in time and creates zero or more new events in the future. Pending events are kept in a priority queue. The main loop of the simulation deletes the next event from the queue, executes it, and inserts newly generated events into the priority queue. Note that the priorities (times) of the deleted elements (simulated events) increase monotonically during the simulation. It turns out that many applications of priority queues have this monotonicity property. Section 10.5 explains how to exploit monotonicity for integer keys.

Another application of monotone priority queues is the best-first branch-and- bound approach to optimization described in Sect. 12.4. Here, the elements are partial solutions of an optimization problem and the keys are optimistic estimates of the obtainable solution quality. The algorithm repeatedly removes the best-looking partial solution, refines it, and inserts zero or more new partial solutions.

We shall see two applications of addressable priority queues in the chapters on graph algorithms. In both applications, the priority queue stores nodes of a graph. Di- jkstra’s algorithm for computing shortest paths (Sect. 10.3) uses a monotone priority queue where the keys are path lengths. The Jarník–Prim algorithm for computing minimum spanning trees (Sect. 11.2) uses a (nonmonotone) priority queue where the keys are the weights of edges connecting a node to a partial spanning tree. In both algorithms, there can be a decreaseKey operation for each edge, whereas there is at most one insert and deleteMin for each node. Observe that the number of edges may be much larger than the number of nodes, and hence the implementation of decreaseKey deserves special attention.

Designing Correct Algorithms and Programs

Historical Notes and Further Findings