2.11.1 C++
Our pseudocode can be viewed as a concise notation for a subset of C++. The mem- ory management operations allocate and dispose are similar to the C++operations new and delete. C++calls the default constructor for each element of an array, i.e., allocating an array of n objects takes timeΩ(n)whereas allocating an array n of ints takes constant time. In contrast, we assume that all arrays which are not explicitly initialized contain garbage. In C++, you can obtain this effect using the C functions malloc and free. However, this is a deprecated practice and should only be used when array initialization would be a severe performance bottleneck. If memory manage- ment of many small objects is performance-critical, you can customize it using the allocator class of the C++standard library.
Our parameterizations of classes using of is a special case of the C++-template mechanism. The parameters added in brackets after a class name correspond to the parameters of a C++constructor.
Assertions are implemented as C macros in the include fileassert.h. By de- fault, violated assertions trigger a runtime error and print their position in the pro- gram text. If the macro NDEBUG is defined, assertion checking is disabled.
For many of the data structures and algorithms discussed in this book, excellent implementations are available in software libraries. Good sources are the standard template library STL [157], the Boost [27] C++libraries, and the LEDA [131, 118]
library of efficient algorithms and data structures.
2.11.2 Java
Java has no explicit memory management. Rather, a garbage collector periodically recycles pieces of memory that are no longer referenced. While this simplifies pro- gramming enormously, it can be a performance problem. Remedies are beyond the scope of this book. Generic types provide parameterization of classes. Assertions are implemented with the assert statement.
Excellent implementations for many data structures and algorithms are available in the package java.util and in the JDSL [78] data structure library.
2.12 Historical Notes and Further Findings
Sheperdson and Sturgis [179] defined the RAM model for use in algorithmic analy- sis. The RAM model restricts cells to holding a logarithmic number of bits. Dropping this assumption has undesirable consequences; for example, the complexity classes
P and PSPACE collapse [87]. Knuth [113] has described a more detailed abstract machine model.
Floyd [62] introduced the method of invariants to assign meaning to programs and Hoare [91, 92] systemized their use. The book [81] is a compendium on sums and recurrences and, more generally, discrete mathematics.
Books on compiler construction (e.g., [144, 207]) tell you more about the com- pilation of high-level programming languages into machine code.
3
Representing Sequences by Arrays and Linked Lists
Perhaps the world’s oldest data structures were the tablets in cuneiform script1used more than 5 000 years ago by custodians in Sumerian temples. These custodians kept lists of goods, and their quantities, owners, and buyers. The picture on the left shows an example. This was possibly the first application of written language. The operations performed on such lists have remained the same – adding entries, storing them for later, searching entries and changing them, going through a list to compile summaries, etc. The Peruvian quipu [137] that you see in the picture on the right served a similar purpose in the Inca empire, using knots in colored strings arranged sequentially on a master string. It is probably easier to maintain and use data on tablets than to use knotted string, but one would not want to haul stone tablets over Andean mountain trails. It is apparent that different representations make sense for the same kind of data.
The abstract notion of a sequence, list, or table is very simple and is independent of its representation in a computer. Mathematically, the only important property is that the elements of a sequence s=e0, . . . ,en−1are arranged in a linear order – in contrast to the trees and graphs discussed in Chaps. 7 and 8, or the unordered hash tables discussed in Chap. 4. There are two basic ways of referring to the elements of a sequence.
One is to specify the index of an element. This is the way we usually think about arrays, where s[i]returns the i-th element of a sequence s. Our pseudocode supports static arrays. In a static data structure, the size is known in advance, and the data structure is not modifiable by insertions and deletions. In a bounded data structure, the maximal size is known in advance. In Sect.3.2, we introduce dynamic or un-
1The 4 600 year old tablet at the top left is a list of gifts to the high priestess of Adab (see commons.wikimedia.org/wiki/Image:Sumerian_26th_c_Adab.jpg).
bounded arrays, which can grow and shrink as elements are inserted and removed.
The analysis of unbounded arrays introduces the concept of amortized analysis.
The second way of referring to the elements of a sequence is relative to other elements. For example, one could ask for the successor of an element e, the prede- cessor of an element e, or for the subsequencee, . . . ,eof elements between e and e. Although relative access can be simulated using array indexing, we shall see in Sect.3.1that a list-based representation of sequences is more flexible. In particular, it becomes easier to insert or remove arbitrary pieces of a sequence.
Many algorithms use sequences in a quite limited way. Only the front and/or the rear of the sequence are read and modified. Sequences that are used in this restricted way are called stacks, queues, and deques. We discuss them in Sect.3.4. In Sect.3.5, we summarize the findings of the chapter.