Historical Notes and Further Findings

There is an interesting Internet survey1of priority queues. It lists the following applications: (shortest-) path planning (see Chap. 10), discrete-event simulation, coding and compression, scheduling in operating systems, computing maximum flows, and branch-and-bound (see Sect. 12.4).

In Sect. 6.1we saw an implementation of deleteMin by top-down search that needs about 2 log n element comparisons, and a variant using binary search that needs only log n+O(log log n)element comparisons. The latter is mostly of theoretical in- terest. Interestingly, a very simple “bottom-up” algorithm can be even better: The old minimum is removed and the resulting hole is sifted down all the way to the bottom of the heap. Only then, the rightmost element fills the hole and is subsequently sifted up. When used for sorting, the resulting Bottom-up heapsort requires32n log n+O(n) comparisons in the worst case and n log n+O(1)in the average case [204, 61, 169].

While bottom-up heapsort is simple and practical, our own experiments indicate that it is not faster than the usual top-down variant (for integer keys). This surprised us. The explanation might be that the outcomes of the comparisons saved by the bottom-up variant are easy to predict. Modern hardware executes such predictable comparisons very efficiently (see [167] for more discussion).

The recursive buildHeap routine in Exercise 6.6 is an example of a cache- oblivious algorithm [69]. This algorithm is efficient in the external-memory model even though it does not explicitly use the block size or cache size.

1http://www.leekillough.com/heaps/survey_results.html

6.5 Historical Notes and Further Findings 143 Pairing heaps [67] have constant amortized complexity for insert and merge [96]

and logarithmic amortized complexity for deleteMin. The best analysis is that due to Pettie [154]. Fredman [65] has given operation sequences consisting of O(n)insertions and deleteMins and O(n log n)decreaseKeys that require timeΩ(n log n log log n) for a family of addressable priority queues that includes all previously proposed vari- ants of pairing heaps.

The family of addressable priority queues is large. Vuillemin [202] introduced binomial heaps, and Fredman and Tarjan [68] invented Fibonacci heaps. Hứyer [94]

described additional balancing operations that are akin to the operations used for search trees. One such operation yields thin heaps [103], which have performance guarantees similar to Fibonacci heaps and do without parent pointers and mark bits.

It is likely that thin heaps are faster in practice than Fibonacci heaps. There are also priority queues with worst-case bounds asymptotically as good as the amortized bounds that we have seen for Fibonacci heaps [30]. The basic idea is to tolerate violations of the heap property and to continuously invest some work in reducing these violations. Another interesting variant is fat heaps [103].

Many applications need priority queues for integer keys only. For this special case, there are more efficient priority queues. The best theoretical bounds so far are constant time for decreaseKey and insert and O(log log n)time for deleteMin [193, 136]. Using randomization, the time bound can even be reduced to O√

log log n [85]. The algorithms are fairly complex. However, integer priority queues that also have the monotonicity property can be simple and practical. Section 10.3 gives exam- ples. Calendar queues [33] are popular in the discrete-event simulation community.

These are a variant of the bucket queues described in Sect. 10.5.1.

Sorted Sequences

All of us spend a significant part of our time on searching, and so do computers:

they look up telephone numbers, balances of bank accounts, flight reservations, bills and payments, . . . . In many applications, we want to search dynamic collections of data. New bookings are entered into reservation systems, reservations are changed or cancelled, and bookings turn into actual flights. We have already seen one solution to the problem, namely hashing. It is often desirable to keep a dynamic collection sorted. The “manual data structure” used for this purpose is a filing-card box. We can insert new cards at any position, we can remove cards, we can go through the cards in sorted order, and we can use some kind of binary search to find a particular card. Large libraries used to have filing-card boxes with hundreds of thousands of cards.1

Formally, we want to maintain a sorted sequence, i.e. a sequence of Elements sorted by their Key value, under the following operations:

• M.locate(k : Key): return min{e∈M : e≥k}.

• M.insert(e : Element): M :=M∪ {e}.

• M.remove(k : Key): M :=M\ {e∈M : key(e) =k}.

Here, M is the set of elements stored in the sequence. For simplicity, we assume that the elements have pairwise distinct keys. We shall reconsider this assumption in Exercise7.10. We shall show that these operations can be implemented to run in time O(log n), where n denotes the size of the sequence. How do sorted sequences compare with the data structures known to us from previous chapters? They are more flexible than sorted arrays, because they efficiently support insert and remove. They are slower but also more powerful than hash tables, since locate also works when there is no element with key k in M. Priority queues are a special case of sorted sequences; they can only locate and remove the smallest element.

Our basic realization of a sorted sequence consists of a sorted doubly linked list with an additional navigation data structure supporting locate. Figure7.1illustrates this approach. Recall that a doubly linked list for n elements consists of n+1 items,

1The above photograph is from the catalogue of the University of Graz (Dr. M. Gossler).

146 7 Sorted Sequences

2 3 5 7 11 13 17 19

navigation data structure

∞

Fig. 7.1. A sorted sequence as a doubly linked list plus a navigation data structure

one for each element and one additional “dummy item”. We use the dummy item to store a special key value+∞which is larger than all conceivable keys. We can then define the result of locate(k)as the handle to the smallest list item e≥k. If k is larger than all keys in M, locate will return a handle to the dummy item. In Sect. 3.1.1, we learned that doubly linked lists support a large set of operations; most of them can also be implemented efficiently for sorted sequences. For example, we

“inherit” constant-time implementations for first, last, succ, and pred. We shall see constant-amortized-time implementations for remove(h : Handle), insertBefore, and insertAfter, and logarithmic-time algorithms for concatenating and splitting sorted sequences. The indexing operator[ã]and finding the position of an element in the sequence also take logarithmic time. Before we delve into a description of the navigation data structure, let us look at some concrete applications of sorted sequences.

Best-first heuristics. Assume that we want to pack some items into a set of bins. The items arrive one at a time and have to be put into a bin immediately. Each item i has a weight w(i), and each bin has a maximum capacity. The goal is to minimize the number of bins used. One successful heuristic solution to this problem is to put item i into the bin that fits best, i.e., the bin whose remaining capacity is the smallest among all bins that have a residual capacity at least as large as w(i)[41]. To implement this algorithm, we can keep the bins in a sequence q sorted by their residual capacity. To place an item, we call q.locate(w(i)), remove the bin that we have found, reduce its residual capacity by w(i), and reinsert it into q. See also Exercise 12.8.

Sweep-line algorithms. Assume that you have a set of horizontal and vertical line segments in the plane and want to find all points where two segments intersect. A sweep-line algorithm moves a vertical line over the plane from left to right and main- tains the set of horizontal lines that intersect the sweep line in a sorted sequence q.

When the left endpoint of a horizontal segment is reached, it is inserted into q, and when its right endpoint is reached, it is removed from q. When a vertical line segment is reached at a position x that spans the vertical range[y,y], we call s.locate(y)and scan q until we reach the key y.2All horizontal line segments discovered during this scan define an intersection. The sweeping algorithm can be generalized to arbitrary line segments [21], curved objects, and many other geometric problems [46].

2This range query operation is also discussed in Sect.7.3.

Database indexes. A key problem in databases is to make large collections of data efficiently searchable. A variant of the(a,b)-tree data structure described in Sect.7.2 is one of the most important data structures used for databases.

The most popular navigation data structure is that of search trees. We shall fre- quently use the name of the navigation data structure to refer to the entire sorted sequence data structure.3We shall introduce search tree algorithms in three steps. As a warm-up, Sect.7.1introduces (unbalanced) binary search trees that support locate in O(log n)time under certain favorable circumstances. Since binary search trees are somewhat difficult to maintain under insertions and removals, we then switch to a generalization,(a,b)-trees that allows search tree nodes of larger degree. Section7.2 explains how(a,b)-trees can be used to implement all three basic operations in logarithmic worst-case time. In Sects.7.3and7.5, we shall augment search trees with additional mechanisms that support further operations. Section 7.4takes a closer look at the (amortized) cost of update operations.

Designing Correct Algorithms and Programs

Historical Notes and Further Findings