With such values, MAX-HEAPIFY will be called h times where h is the heap height, which is the number of edges in the longest pathfrom the root to a leaf, so its running time will beh sin
Trang 1Analysis: constant time assignments+ time for HEAP-INCREASE-KEY.
Time: O (lg n).
Min-priority queue operations are implemented similarly with min-heaps
Trang 2tree, possibly even at more than one location Let m be the index at which the
maximum appears (the lowest such index if the maximum appears more than once)
Since the maximum is not at the root of the subtree, node m has a parent Since the parent of a node has a lower index than the node, and m was chosen to be the smallest index of the maximum value, A[PARENT (m)] < A[m] But by the max-
heap property, we must have A[PARENT (m)] ≥ A[m] So our assumption is false,
and the claim is true
Solution to Exercise 6.2-6
If you put a value at the root that is less than every value in the left and rightsubtrees, then MAX-HEAPIFYwill be called recursively until a leaf is reached To
Trang 3make the recursive calls traverse the longest path to a leaf, choose values that makeMAX-HEAPIFYalways recurse on the left child It follows the left branch when theleft child is≥ the right child, so putting 0 at the root and 1 at all the other nodes, forexample, will accomplish that With such values, MAX-HEAPIFY will be called h times (where h is the heap height, which is the number of edges in the longest path
from the root to a leaf), so its running time will be(h) (since each call does (1)
work), which is(lg n) Since we have a case in which MAX-HEAPIFY’s runningtime is(lg n), its worst-case running time is (lg n).
Solution to Exercise 6.3-3
Let H be the height of the heap.
Two subtleties to beware of:
• Be careful not to confuse the height of a node (longest distance from a leaf)with its depth (distance from the root)
• If the heap is not a complete binary tree (bottom level is not full), then the nodes
at a given level (depth) don’t all have the same height For example, although all
nodes at depth H have height 0, nodes at depth H − 1 can have either height 0
or height 1
For a complete binary tree, it’s easy to show that there aren/2 h+1 nodes of
height h But the proof for an incomplete tree is tricky and is not derived from the
proof for a complete tree
Proof By induction on h.
In fact, we’ll show that the # of leaves= n/2.
The tree leaves (nodes at height 0) are at depths H and H− 1 They consist of
• all nodes at depth H , and
• the nodes at depth H − 1 that are not parents of depth-H nodes.
Let x be the number of nodes at depth H —that is, the number of nodes in the
bottom (possibly incomplete) level
Note that n − x is odd, because the n − x nodes above the bottom level form a
complete binary tree, and a complete binary tree has an odd number of nodes (1
less than a power of 2) Thus if n is odd, x is even, and if n is even, x is odd.
To prove the base case, we must consider separately the case in which n is even (x is odd) and the case in which n is odd (x is even) Here are two ways to do
this: The Þrst requires more cleverness, and the second requires more algebraicmanipulation
1 First method of proving the base case:
• If n is odd, then x is even, so all nodes have siblings—i.e., all internal
nodes have 2 children Thus (see Exercise B.5-3), # of internal nodes =
# of leaves− 1
Trang 4So, n= # of nodes = # of leaves + # of internal nodes = 2 · # of leaves − 1.Thus, # of leaves = (n + 1)/2 = n/2 (The latter equality holds because n
is odd.)
• If n is even, then x is odd, and some leaf doesn’t have a sibling If we gave
it a sibling, we would have n + 1 nodes, where n + 1 is odd, so the case
we analyzed above would apply Observe that we would also increase thenumber of leaves by 1, since we added a node to a parent that already had
a child By the odd-node case above, # of leaves+ 1 = (n + 1)/2 =
n/2 + 1 (The latter equality holds because n is even.)
In either case, # of leaves= n/2.
2 Second method of proving the base case:
Note that at any depth d < H there are 2 d nodes, because all such tree levelsare complete
• If x is even, there are x /2 nodes at depth H − 1 that are parents of depth H
nodes, hence 2H−1−x/2 nodes at depth H −1 that are not parents of depth-H
nodes Thus,total # of height-0 nodes = x + 2 H−1 − x/2
= 2H−1+ x/2
= (2 H + x)/2
= (2 H + x − 1)/2 (because x is even)
= n/2 (n= 2H + x − 1 because the complete tree down to depth H − 1 has 2 H− 1
nodes and depth H has x nodes.)
• If x is odd, by an argument similar to the even case, we see that
# of height-0 nodes = x + 2 H−1 − (x + 1)/2
= 2H−1 + (x − 1)/2
= (2 H + x − 1)/2
= n/2
= n/2 (because x odd ⇒ n even)
Let n h be the number of nodes at height h in the n-node tree T
Consider the tree Tformed by removing the leaves of T It has n = n − n0nodes
We know from the base case that n0= n/2, so n= n−n0
Note that the nodes at height h in T would be at height h− 1 if the leaves of the
tree were removed—that is, they are at height h − 1 in T Letting nh−1denote the
number of nodes at height h − 1 in T, we have
Trang 517 13
8 7
20 25
7
17 13
8 2
20 25
13 5 8
4 2
20 25
17
2 4
7 8
25 20
20
2 4
7 8
25 5
7 8
4
5
Trang 6Solution to Exercise 6.5-2
2 2
2 2
3 1 2
3 2 1
A
BUILD-MAX-HEAP(A):
1 -∞
2 -∞
1
3 2 1
3 1 2
A
b An upper bound of O (n lg n) time follows immediately from there being n − 1
calls to MAX-HEAP-INSERT, each taking O(lg n) time For a lower bound of
Trang 7(n lg n), consider the case in which the input array is given in strictly
increas-ing order Each call to MAX-HEAP-INSERT causes HEAP-INCREASE-KEY to
go all the way up to the root Since the depth of node i is n
In the worst case, therefore, BUILD-MAX-HEAP requires (n lg n) time to
build an n-element heap.
Solution to Problem 6-2
a A d-ary heap can be represented in a 1-dimensional array as follows The root
is kept in A[1], its d children are kept in order in A[2] through A[d + 1], their
children are kept in order in A[d + 2] through A[d2+ d + 1], and so on The following two procedures map a node with index i to its parent and to its j th
child (for 1≤ j ≤ d), respectively.
c The procedure HEAP-EXTRACT-MAXgiven in the text for binary heaps works
Þne for d-ary heaps too The change needed to support d-ary heaps is in
MAX-HEAPIFY, which must compare the argument node to all d children instead ofjust 2 children The running time of HEAP-EXTRACT-MAXis still the runningtime for MAX-HEAPIFY, but that now takes worst-case time proportional to theproduct of the height of the heap by the number of children examined at each
node (at most d), namely (d log d n ) = (d lg n/ lg d).
Trang 8d The procedure MAX-HEAP-INSERT given in the text for binary heaps works
Þne for d-ary heaps too The worst-case running time is still (h), where h
is the height of the heap (Since only parent pointers are followed, the number
of children a node has is irrelevant.) For a d-ary heap, this is (log d n ) =
(lg n/ lg d).
e. D-ARY-HEAP-INCREASE-KEY can be implemented as a slight modiÞcation
of MAX-HEAP-INSERT (only the Þrst couple lines are different) ing an element may make it larger than its parent, in which case it must bemoved higher up in the tree This can be done just as for insertion, travers-ing a path from the increased node toward the root In the worst case, theentire height of the tree must be traversed, so the worst-case running time is
Increas-(h) = (log d n ) = (lg n/ lg d).
D-ARY-HEAP-INCREASE-KEY(A, i, k) A[i] ← max(A[i], k)
while i > 1 and A[PARENT(i)] < A[i]
i ← PARENT(i)
Trang 9Chapter 7 overview
[The treatment in the second edition differs from that of the Þrst edition We use
a different partitioning method—known as “Lomuto partitioning”—in the secondedition, rather than the “Hoare partitioning” used in the Þrst edition Using Lomutopartitioning helps simplify the analysis, which uses indicator random variables inthe second edition.]
Quicksort
• Worst-case running time: (n2).
• Expected running time:(n lg n).
• Constants hidden in(n lg n) are small.
• Sorts in place
Description of quicksort
Quicksort is based on the three-step process of divide-and-conquer
• To sort the subarray A[ p r]:
and A[q + 1 r], such that each element in the Þrst subarray A[p q − 1]
is≤ A[q] and A[q] is ≤ each element in the second subarray A[q + 1 r].
Combine: No work is needed to combine the subarrays, because they are sorted
in place
• Perform the divide step by a procedure PARTITION, which returns the index qthat marks the position separating the subarrays
Trang 10• PARTITIONalways selects the last element A[r] in the subarray A[ p r] as the
pivot—the element around which to partition.
• As the procedure executes, the array is partitioned into four regions, some ofwhich may be empty:
Loop invariant:
1 All entries in A[ p i] are ≤ pivot.
2 All entries in A[i + 1 j − 1] are > pivot.
3 A[r]= pivot
It’s not needed as part of the loop invariant, but the fourth region is A[ j r −1],
whose entries have not yet been examined, and so we don’t know how theycompare to the pivot
Example: On an 8-element subarray.
Trang 11A[i+1 j–1]: known to be > pivot A[p i]: known to be ≤ pivot
[The index j disappears because it is no longer needed once the for loop is exited.]
Correctness: Use the loop invariant to prove correctness of PARTITION:
Initialization: Before the loop starts, all the conditions of the loop invariant are
satisÞed, because r is the pivot and the subarrays A[ p i] and A[i + 1 j − 1]
are empty
are swapped and then i and j are incremented If A[ j ] > pivot, then increment
only j
parti-tioned into one of the three cases: A[ p i] ≤ pivot, A[i + 1 r − 1] > pivot,
and A[r]= pivot
The last two lines of PARTITIONmove the pivot element from the end of the array
to between the two subarrays This is done by swapping the pivot and the Þrst
element of the second subarray, i.e., by swapping A[i + 1] and A[r].
Time for partitioning: (n) to partition an n-element subarray.
Trang 12Performance of quicksort
The running time of quicksort depends on the partitioning of the subarrays:
• If the subarrays are balanced, then quicksort can run as fast as mergesort
• If they are unbalanced, then quicksort can run as slowly as insertion sort
Worst case
• Occurs when the subarrays are completely unbalanced
• Have 0 elements in one subarray and n− 1 elements in the other subarray
• Get the recurrence
T (n) = T (n − 1) + T (0) + (n)
= T (n − 1) + (n)
= (n2)
• Same running time as insertion sort
• In fact, the worst-case running time occurs when quicksort takes a sorted array
as input, but insertion sort runs in O (n) time in this case.
Best case
• Occurs when the subarrays are completely balanced every time
• Each subarray has≤ n/2 elements.
• Get the recurrence
• Imagine that PARTITIONalways produces a 9-to-1 split
• Get the recurrence
T (n) ≤ T (9n/10) + T (n/10) + (n)
= O(n lg n)
• Intuition: look at the recursion tree
• It’s like the one for T (n) = T (n/3) + T (2n/3) + O(n) in Section 4.2.
• Except that here the constants are different; we get log10n full levels and
log10/9 n levels that are nonempty.
• As long as it’s a constant, the base of the log doesn’t matter in asymptoticnotation
• Any split of constant proportionality will yield a recursion tree of depth
(lg n).
Trang 13Intuition for the average case
• Splits in the recursion tree will not always be constant
• There will usually be a mix of good and bad splits throughout the recursiontree
• To see that this doesn’t affect the asymptotic running time of quicksort, assumethat levels alternate between best-case and worst-case splits
• The extra level in the left-hand Þgure only adds to the constant hidden in the
-notation.
• There are still the same number of subarrays to sort, and only twice as muchwork was done to get to that point
• Both Þgures result in O (n lg n) time, though the constant for the Þgure on the
left is higher than that of the Þgure on the right
Randomized version of quicksort
• We have assumed that all input permutations are equally likely
• This is not always true
• To correct this, we add randomization to quicksort
• We could randomly permute the input array
• Instead, we use random sampling, or picking one element at random.
• Don’t always use A[r] as the pivot Instead, randomly pick an element from the
subarray that is being sorted
We add this randomization by not always using A[r] as the pivot, but instead
ran-domly picking an element from the subarray that is being sorted
RANDOMIZED-PARTITION(A, p, r)
i ← RANDOM(p, r)
exchange A[r] ↔ A[i]
Randomly selecting the pivot element will, on average, cause the split of the inputarray to be reasonably well balanced
Trang 14• Guess: T (n) ≤ cn2, for some c.
• Substituting our guess into the above recurrence:
T (n) ≤ max
= c · max
0≤q≤n−1 (q2+ (n − q − 1)2) + (n)
• The maximum value of(q2+ (n − q − 1)2) occurs when q is either 0 or n − 1.
(Second derivative with respect to q is positive.) This means that
• Pick c so that c (2n − 1) dominates (n).
• Therefore, the worst-case running time of quicksort is O (n2).
• Can also show that the recurrence’s solution is (n2) Thus, the worst-case
running time is(n2).
Trang 15Average-case analysis
• The dominant cost of the algorithm is partitioning
• PARTITIONremoves the pivot element from future consideration each time
• Thus, PARTITIONis called at most n times.
• QUICKSORTrecurses on the partitions
• The amount of work that each call to PARTITION does is a constant plus the
number of comparisons that are performed in its for loop.
• Let X = the total number of comparisons performed in all calls to PARTITION.
• Therefore, the total work done over the entire execution is O (n + X).
We will now compute a bound on the overall number of comparisons
For ease of analysis:
• Rename the elements of A as z1, z2, , z n , with z i being the ith smallest
Now all we have to do is Þnd the probability that two elements are compared
• Think about when two elements are not compared.
• For example, numbers in separate partitions will not be compared
• In the previous example,8, 1, 6, 4, 0, 3, 9, 5 and the pivot is 5, so that none
of the set{1, 4, 0, 3} will ever be compared to any of the set {8, 6, 9}.
Trang 16• Once a pivot x is chosen such that z i < x < z j , then z i and z j will never becompared at any later time.
• If either z i or z j is chosen before any other element of Z i j, then it will be
compared to all the elements of Z i j, except itself
• The probability that z i is compared to z j is the probability that either z i or z j isthe Þrst element chosen
• There are j −i +1 elements, and pivots are chosen randomly and independently.
Thus, the probability that any particular one of them is the Þrst one chosen is
1/( j − i + 1).
Therefore,
Pr{z i is compared to z j } = Pr {z i or z j is the Þrst pivot chosen from Z i j}
= Pr {z i is the Þrst pivot chosen from Z i j}
+ Pr {z j is the Þrst pivot chosen from Z i j}
[The second line follows because the two events are mutually exclusive.]
Substituting into the equation for E[X ]:
series in equation (A.7):
Trang 17Solution to Exercise 7.2-3
PARTITION does a “worst-case partitioning” when the elements are in decreasingorder It reduces the size of the subarray under consideration by only 1 at each step,which we’ve seen has running time(n2).
In particular, PARTITION, given a subarray A[ p r] of distinct elements in
de-creasing order, produces an empty partition in A[ p q − 1], puts the pivot
(orig-inally in A[r]) into A[ p], and produces a partition A[ p + 1 r] with only one fewer element than A[ p r] The recurrence for QUICKSORT becomes T (n) =
T (n − 1) + (n), which has the solution T (n) = (n2).
Solution to Exercise 7.2-5
The minimum depth follows a path that always takes the smaller part of the tition—i.e., that multiplies the number of elements by α One iteration reduces
par-the number of elements from n to αn, and i iterations reduces the number of
ele-ments toα i n At a leaf, there is just one remaining element, and so at a
minimum-depth leaf of minimum-depth m, we have α m n = 1 Thus, α m = 1/n Taking logs, we get
m lg α = − lg n, or m = − lg n/ lg α.
Similarly, maximum depth corresponds to always taking the larger part of the tition, i.e., keeping a fraction 1− α of the elements each time The maximum depth M is reached when there is one element left, that is, when (1 − α) M n = 1
Trang 18Solution to Exercise 7.4-2
To show that quicksort’s best-case running time is(n lg n), we use a technique
similar to the one used in Section 7.4.1 to show that its worst-case running time
is O (n2).
Let T (n) be the best-case time for the procedure QUICKSORTon an input of size n.
We have the recurrence
As we’ll show below, the expression q lg q + (n − q − 1) lg(n − q − 1) achieves a
minimum over the range 1≤ q ≤ n−1 when q = n−q−1, or q = (n−1)/2, since the Þrst derivative of the expression with respect to q is 0 when q = (n − 1)/2 and the second derivative of the expression is positive (It doesn’t matter that q is not
an integer when n is even, since we’re just trying to determine the minimum value
of a function, knowing that when we constrain q to integer values, the function’s
value will be no lower.)
Choosing q = (n − 1)/2 gives us the bound
quantity 2cn + c lg(n − 1) − c Thus, the best-case running time of quicksort is
(n lg n).
Letting f (q) = q lg q + (n − q − 1) lg(n − q − 1), we now show how to Þnd
the minimum value of this function in the range 1 ≤ q ≤ n − 1 We need to Þnd the value of q for which the derivative of f with respect to q is 0 We rewrite this
function as
Trang 19The derivative f(q) is 0 when q = n − q − 1, or when q = (n − 1)/2 To verify
that q = (n − 1)/2 is indeed a minimum (not a maximum or an inßection point),
we need to check that the second derivative of f is positive at q = (n − 1)/2:
ln 2
2
itself with arguments A , p, q − 1 QUICKSORT then calls itself again, with
arguments A , q + 1, r QUICKSORT instead sets p ← q + 1 and performs
another iteration of its while loop This executes the same operations as calling
itself with A , q + 1, r, because in both cases, the Þrst and third arguments (A
and r) have the same values as before, and p has the old value of q+ 1
b The stack depth of QUICKSORT will be (n) on an n-element input array if
there are (n) recursive calls to QUICKSORT This happens if every call toPARTITION(A, p, r) returns q = r The sequence of recursive calls in this
scenario isQUICKSORT(A, 1, n) ,
Trang 20c The problem demonstrated by the scenario in part (b) is that each invocation of
QUICKSORT calls QUICKSORT again with almost the same range To avoidsuch behavior, we must change QUICKSORT so that the recursive call is on asmaller interval of the array The following variation of QUICKSORT checkswhich of the two subarrays returned from PARTITION is smaller and recurses
on the smaller subarray, which is at most half the size of the current array Sincethe array size is reduced by at least half on each recursive call, the number ofrecursive calls, and hence the stack depth, is (lg n) in the worst case Note
that this method works no matter how partitioning is performed (as long asthe PARTITIONprocedure has the same functionality as the procedure given inSection 7.1)
Trang 21Sorting in Linear Time
Chapter 8 overview
How fast can we sort?
We will prove a lower bound, then beat it by playing a different game
Comparison sorting
• The only operation that may be used to gain order information about a sequence
is comparison of pairs of elements
• All sorts seen so far are comparison sorts: insertion sort, selection sort, mergesort, quicksort, heapsort, treesort
Lower bounds for sorting
Lower bounds
• (n) to examine all the input.
• All sorts seen so far are(n lg n).
• We’ll show that(n lg n) is a lower bound for comparison sorts.
Decision tree
• Abstraction of any comparison sort
• Represents comparisons made by
• a speciÞc sorting algorithm
• on inputs of a given size
• Abstracts away everything else: control and data movement
• We’re counting only comparisons.