Sorting and Algorithm Analysis
Sorting and Algorithm Analysis
A[1 . . . j] sorted i j
A:
A: B:
swap A[j + 1]
back
min(A[i], B[j])
Figure 3.1: Illustrations of (a) a single iteration of insertion sort, where A[j +1]
is inserted within the sorted prefix A[1 . . . j], and (b) a snapshot in the middle of
the process of merging two sorted arrays A and B.
When sorting a stack of papers, you may instinctively use an insertion sort, where
you maintain two stacks of papers: those sorted so far, and the remaining unsorted
ones. In every step, you insert a paper from the unsorted stack into its proper
Initial
array (n) time
...
(n) time
n/4 sorted
arrays of
...
length 4
...
(n) time
2 sorted
arrays of
length n/2
recursively sort first half recursively sort second half
(T (n/2) time) (T (n/2) time) (c)
merge ((n) time)
(b)
Figure 3.2: Running time of merge sort computed three ways: (a) by noting
that we spend (log n) time per element, (b) by solving the recurrence T (n) =
2T (n/2) + (n), and (c) via an iterative perspective where the algorithm runs in
log n phases, each taking (n) time.
There are several methods we can use to determine the total running time of an
algorithm, depending on its structure.
Solving a Recurrence. For recursive algorithms, we usually compute running
time by solving a recurrence. For example, if T (n) denotes the running time of
merge sort on n elements, we know that T (n) = 2T (n/2) + (n), since we are
performing two recursive sorts each on n/2 elements, followed by a (n) merge
operation. The solution of this recurrence gives the running time of merge sort,
T (n) = (n log n).
Loop Counting. For iterative algorithms, simple loop counting often suffices. As
shown in Figure 3.2, we can think of merge sort as an iterative algorithm that takes
n singleton elements and merges them pairwise to obtain n/2 sorted lists of length
2, then merges these pairwise to obtain n/4 sorted lists of length 4, and so on for
log n such phases. Since merging is a linear-time operation, each phase (merging
n/k sorted lists of length k) takes (n) time, for a total of (n log n) time1 .
Running Time Spent per Element. Instead of adding up the total time spent in
each step of an algorithm, summed over all steps during the algorithms execution,
it is sometimes more convenient to consider the total running time spent on each
individual input element, summed over all elements. This adds up the same amount
of total work, just in a dierent, more convenient order. Using merge sort as
our example again (Figure 3.2), it takes (n) time to merge two sorted arrays of
combined length n, which is O(1) time per element taking part in the merge. The
total amount of work merge sort spends on a single element of data is therefore
proportional to the number of merge operations in which the element takes part.
How many merges happen to a single element? If you put yourself in the perspective
of an element of data, you will find that every time you take part in a merge, you
end up in a sorted subarray twice as large as before. This will happen log n times
before you end up in a sorted array of length n. We therefore spend (log n) time
per element, for a total running time of (n log n). This sort of per element
outlook on running time can be highly useful in the analysis of many algorithms.
Another common and powerful technique for proving correctness and running time
is the use of a potential function argument. A potential function maps the state of
our algorithm to a nonnegative integer. If we can guarantee that this must decrease
during each iteration of the algorithm, then termination is inevitable. This approach
is commonly used with algorithms based on iterative refinement, where our potential
function usually reflects the amount of incorrectness inherent in the algorithms
current solution. There is a natural physical analogy suggested by the use of the
term potential: we can think of a potential function as telling us the amount of
energy stored in our current state, with our algorithm acting in the role of gravity,
pulling in a direction that decreases this potential energy.
For sorting, a natural potential function is the number of inversions in our array.
A pair of elements A[i] and A[j] with i < j constitutes an inversion if A[i] > A[j];
that is, the elements are ordered incorrectly with respect to their positions. A
sorted array has no inversions, and a reverse-sorted array has the maximum possible
number of inversions, n2 , since every pair is an inversion. Inversions show up often
in the study of permutations and sorting, and since inversion count is a natural way
to measure unsortedness, it is often a good choice for a potential function. By
1 Recall from Section 2.3 (Figure 2.5) that solving a recurrence involves adding up the work
done by an algorithm at each level of recursion. If you look at Figure 3.2 closely, you will also
see a tree of recursive subproblems whose work we have added up level by level. Hence, in this
case, the analysis of our iterative outlook on merge sort turns out to be just another way of doing
the same math we used to solve the merge sort recurrence.
swapping two adjacent out-of-order elements A[i] > A[i + 1], we correct a single
inversion, leaving all others unchanged. For example, this tells us that bubble sort
must terminate, since each scan through the array (except the last) performs at
least one such swap, decreasing the inversion count.
Potential functions are often useful for analyzing running time. For example:
Bubble Sort. Let A0 denote the sorted version of our array A, and suppose
our potential function tells us the size of the largest suffix of our array such
that A[j . . . n] = A0 [j . . . n] (i.e., the number of elements at the end of A that
are in their correct final positions). Each scan of bubble sort increases this
potential by at least one. The first scan pulls the largest element, through
repeated swaps, to its correct final position at the end of the array, then
the second scan pulls the second-largest element back to the second-to-last
position, and so on. Since our potential cannot exceed n, we perform at most
n scans, each taking (n) time, for a total running time of O(n2 ).
When we study amortized analysis in the next chapter, we will make somewhat
more sophisticated running time arguments by using potential functions that can
both increase and decrease, rather than moving in a single monotonic direction.
1 2 3
We can easily show an upper bound of O(n log n) on the running time required to
sort by demonstrating an O(n log n) algorithm, like merge sort. Arguing a lower
bound is trickier, however, since this involves proving that any comparison-based
sorting algorithm, no matter how bizarre or clever, must take (n log n) steps in
the worst case. To make such a claim, we first show how any comparison-based
sorting algorithm can be abstractly represented using a simple algorithmic model
called a decision tree, then we show that any decision tree must make (n log n)
comparisons in the worst case in order to properly sort n elements.
A sorting algorithm in the form of a decision tree is shown in Figure 3.3. At every
step, it compares two elements and branches based on the result. The more com-
parisons the algorithm makes, the more it learns about the ordering of the input
elements. When we reach a leaf, the algorithm terminates and declares how to rear-
range the input in sorted order. The height of the tree is the number of comparisons
required in the worst case. Any deterministic (not randomized) comparison-based
sorting algorithm can be abstractly represented this way, as a series of decision
trees, one for each input size n.
If we are sorting n distinct input elements, there are n! dierent orderings we might
receive as input. Each one needs to be re-arranged dierently in order to sort them
all properly. Our decision tree therefore must have at least n! leaves. Otherwise,
two dierent input orderings would find their way to the same leaf3 , and one of
them would therefore by sorted incorrectly, since the same permutation would be
applied to both. Since a decision tree of height h can have at most 2h leaves, we
need 2h n!, so h log n! = (n log n) by Stirlings approximation. The height of
our decision tree (i.e., the number of comparisons needed to sort n elements in the
3 This is an example of the so-called pigeonhole principle: if there are more pigeons than pi-
geonholes, then two pigeons must end up in the same pigeonhole. This obvious yet important
concept is used in many combinatorial proofs.
Set Disjointness. Given two sets A and B each with at most n elements, the
set disjointness problem asks whether A and B share any elements in common
(i.e., we want to test if A \ B = ;).
Set Equality. Given two sets A and B with |A| = |B| = n, the set equality
problem involves testing whether A = B.
For each of these, we can still use decision trees to argue an (n log n) worst-case
lower bound in the comparison model, although they seem to require slightly more
nuanced proofs4 than the one above for sorting [Details]. Interestingly, we will see
in Chapter 7 how to use hashing to solve all three problems in (n) expected
time in the RAM model, illustrating well the importance of our underlying compu-
tational model.
want to read ahead to Section 3.8 before listening to the proof details.
Now that we have non-trivial lower bounds for a small handful of problems, we
can transfer these to other problems using reductions, somewhat like we used
polynomial-time reductions to transfer hardness results from one problem to an-
other in Section 1.6.3.
As an example, consider the problem of rearranging an array to group equal elements
into contiguous blocks (not necessarily appearing in sorted order). An instance of
the element uniqueness problem can be reduced to this problem, by grouping equal
elements and then performing a (n) scan to test whether each element is distinct
from its neighbors. If we could group equal elements in time faster than O(n log n),
the same would therefore be true for element uniqueness, contradicting its (n log n)
worst-case lower bound in the comparison and real RAM models. Hence, it must
5 Surprisingly, even though it seems to involve (n2 ) terms, this product can be evaluated in
only (n log2 n) time using the Fast Fourier Transform, as we shall see in Chapter ??.
pivot A[i]
A:
partition(A, i):
pivot pivot
(a) Consider the following problems in the comparison model: (i) counting the number
of occurrences of a most-frequently-occurring element in an n-element array, and (ii)
given two sets A and B with |A| + |B| = n, compute A \ B, A [ B, or A B. [Solution]
(b) Many problems in computational geometry inherit lower bounds from sorting, element
uniqueness, or their relatives. Consider the following problems in the real RAM model,
since they involve points with numeric coordinates: (i) the closest pair problem asks
us to find the closest pair of points within a set of n points in the plane, and (ii)
the congruence testing problem gives us two n-point sets A and B in the 2D plane,
and asks whether B can be obtained from A by applying a rigid transformation a
translation plus a rotation plus (possibly) a reflection. [Solution]
3.4 Quicksort
One of the most popular and widely-used sorting algorithms is quicksort, which like
merge sort is a based on the principle of divide and conquer. In fact, quicksort is
almost symmetric to merge sort in its operation: merge sort performs two recursive
sorts followed by a linear-time merge, whereas quicksort performs a linear-time
partition followed by two recursive sorts. Partitioning is shown in Figure 3.4. Given
a specific array element (known as a pivot), we rearrange the array into a block of
elements less than or equal to the pivot, followed by the pivot element, followed
by a block of elements greater than or equal to the pivot. The pivot element will
i j i j
A: A:
Figure 3.5: Two methods for partitioning an array in place. In (a), we scan
pointers i and j inward from the ends of the array, stopping when A[i] is larger
than the pivot and A[j] is smaller than the pivot. We then swap A[i] and A[j] and
continue scanning inward, until the pointers meet. in (b), we scan two pointers
from left to right. In each step, if A[j + 1] is larger than the pivot, we simply
advance j; otherwise, we swap A[i + 1] and A[j + 1] and advance both i and j.
then be in its correct final location, and quicksort finishes by recursively sorting the
subarrays left and right of the pivot.
Partitioning is very straightforward if we allocate a temporary buer the same size
as our array to hold the output. In this case, we simply scan through our array
and copy out all the elements less than the pivot to the beginning of the buer,
and all those greater than the pivot to the end of the buer. However, partitioning
is more commonly performed in place without the need for an auxiliary buer.
Two common approaches for doing this are explained in Figure 3.5.
Choosing a Good Pivot. The tricky aspect of quicksort is how to choose the
pivot. Ideally, we should choose the median element, which we will shortly learn
how to find in only (n) time. Since the median evenly splits our array into two
recursive subproblems of size n/2, the running time of quicksort is then described
by the recurrence T (n) = 2T (n/2) + (n), with the (n) term representing both
the time spent finding the median and the time spent partitioning. In total, this
solves to T (n) = (n log n). Unfortunately, the (n) running time required to find
the median element has a rather high hidden constant, so this version of quicksort
is likely to run significantly slower than merge sort in practice, even though the two
share the same asymptotic running time.
Our partition doesnt need to be perfectly balanced. Even if we could only guarantee
at least a 1% 99% split we would still end up with a running time of (n log n),
albeit with a much higher hidden constant (since the solution of T (n) = T (n/100)+
T (99n/100) + (n) is still (n log n)). On the other hand, if we always pick a very
bad pivot like the minimum or maximum element, the running time can be as slow
as (n2 ), since each (n) partition operation makes very little progress, leaving us
with a subproblem of size n 1.
A simple but flawed strategy for selecting a pivot element is just to pick whichever
element happens to be at the beginning or end of the array. Unfortunately, if our
array is already sorted (a common occurrence in practice), this is a very bad choice.
A slightly more popular heuristic is to look at the elements that appear first, last,
and in the middle of the array and to use the median of these three as a pivot. This
median of three approach can still lead to a running time of (n2 ) on contrived
inputs, but in practice it has been observed to perform reasonably well.
For a high probability bound, we observe from the last approach that the random-
ized reduction lemma also tells us that we spend O(log n) time on each element
with high probability. By taking a union bound over all elements, the total running
time is therefore also O(n log n) with high probability.
Problem 43 (Matching Nuts and Bolts). We are given n nuts of dierent sizes
and n corresponding bolts, and we must match these together, determining for each nut
the unique bolt of the same size. However, it is difficult to compare two nuts or two bolts
it is only possible to compare a nut against a bolt to see if the nut fits the bolt, if it
is too small, or if it is too large. Describe a randomized algorithm running in O(n log n)
time with high probability that will properly match the nuts and bolts. [Solution]
Stopping Early. Quicksort spends a lot of time on recursive function call overhead
for very small subproblems. Therefore, a common trick to improve performance in
practice is to bottom out of its recursion earlier by leaving sufficiently small
arrays (say, with at most 4 elements) unsorted. This gives a final array that is
nearly sorted, to which we apply insertion sort as a postprocessing step.
Quicksort and Binary Search Trees. In Chapter 6, we will learn about a
powerful and versatile data structure called a binary search tree, which essentially
encodes the recursive tree of all subproblems generated by quicksort. By saving this
information, we can dynamize the sorting process, quickly computing changes to
a sorted ordering in response to insertion or deletion of elements. Many fundamen-
tal algorithms related to sorting, such as binary search, quicksort, and quickselect
(discussed shortly) are elegantly expressed in the context of binary search trees; we
postpone further discussion until Chapter 6.
6 Or, equivalently, we can use a simple deterministic pivoting strategy such as choosing the first
element in the array, after randomly permuting the array as a preprocessing step.
Problem 44 (In-Place Merge Sort). Standard merge sort does not operate in
place, since it merges two arrays of combined length n into a newly-allocated memory
buer of length n. In this problem, your goal is to develop an in-place O(n log n) sorting
algorithm based fundamentally on merging (it is of course easy to achieve this goal using
other methods like quicksort). As a hint, your algorithm may be need to be somewhat
asymmetric, either by splitting into subproblems that are dierent in size, or by perform-
ing dierent tasks on dierent subproblems. As another hint, you can use the standard
merging procedure with yet-unsorted parts of an array serving as scratch space. For exam-
ple, if A, B, C, and D denote four quarters of an array, you can merge A and B together
into the space occupied by C and D, while the elements of C and D become scrambled as
they are swapped back into the space formerly occupied by A and B. [Solution]
date, and a subject line. If you sort according to date, and then stably by from
address, then messages with the same from address would remain sorted by date.
Bubble sort, insertion sort, and merge sort are all stable, provided we implement
them carefully. With quicksort, all methods we know for performing fast (lin-
ear time) in-place partitioning seem inherently unstable, so stability seems hard to
achieve without sacrificing in-place operation. Stability and in-place operation seem
somewhat at odds with each-other when it comes to fast sorting algorithms: merge
sort is stable but not in-place, and quicksort, in-place merge sort (Problem 44), and
heap sort (introduced in Chapter 5) are in-place but not stable. The question of
whether stable and in-place sorting is possible in the comparison-based model in
O(n log n) time remained open until the late 1970s, when algorithms fulfilling all of
these requirements were finally discovered. However, to this day, all such algorithms
remain extremely complicated. By contrast, as we see in the next problem, we can
sort stably and in place in a very simple and elegant fashion if we are willing to
settle for an O(n log2 n) running time.
(a) To begin with, consider the problem of swapping two adjacent blocks in memory in
place. That is, we have an array A[1 . . . n] in which we identify a left block L =
A[1 . . . k] and a right block R = A[k + 1 . . . n], and we would like to rearrange the
contents of the array from LR to RL, with minimal additional memory usage. How
can we do this in (n) time? Note that L and R do not necessarily have the same
size. [Solution]
(b) Using divide and conquer, design an O(n log n) stable in-place algorithm for merging.
Building on this, we obtain a O(n log2 n) stable in-place version of merge sort. The
result from (a) may be of help. [Solution]
(c) Using divide and conquer, design an O(n log n) stable in-place algorithm for partition-
ing. Building on this, we obtain a O(n log2 n) stable in-place version of randomized
quicksort. The result from (a) may again be of help. [Solution]
(d) Using the O(n log n) stable in-place merging algorithm from part (b), give a simple
alternative solution to problem 44. As a hint, start by dividing your array in half.
[Solution]
Any sorting algorithm can be made stable by using additional memory (thus sac-
rificing in-place operation). To do this, we augment each element with its initial
index within the array and modify our comparison operator to break ties between
equal elements using these indices.
3.7 Selection
Selection is a close relative of the sorting problem, asking us to find the kth largest
element (some might say kth smallest) in an unordered array A[1 . . . n]. This ele-
ment is said to have rank k, and is also called the kth order statistic of the sequence.
since finding the median requires selection, the problem we are currently trying to solve!
8 This is one of the few randomized algorithms we will study where an expected running time
analysis gives a stronger bound than a high probability analysis. If we want a high probability
result for quickselect, O(n log n) is actually the best bound we can claim (and it is trivial to show
this, since the algorithm is performing a subset of the work of randomized quicksort).
socks shoes
shirt tie
So far, we have made the fundamental assumption that every pair of input elements
is comparable. In this case, we say our input is a totally-ordered set or a total
ordering. In many cases, however, our input may instead be a partially-ordered set
(poset, also called a partial ordering), where only certain pairs of input elements can
be meaningfully compared. The result of a comparison in a poset can be <, >, or
incomparable.
The poset in Figure 3.6(a) shows constraints governing the order in which you can
put on dierent articles of clothing. It also shows how we can represent a poset by a
directed acyclic graph, or DAG. A directed path in this graph between two elements
i and j indicates that i is less than j (in our example, this means i must precede
j when getting dressed). Our graph cannot have any directed cycles, since if two
nodes i and j were on a directed cycle, there would exist directed paths from i to
j and also from j to i, leaving it unclear which element is smaller.
In the example above, an edge from underwear to shoes is unnecessary since it is
implied by transitivity: underwear is less than pants, which is in turn less than
shoes. When we draw the DAG representation of a poset, we typically omit as
many implied edges as possible to simplify the diagram. The DAG with the fewest
edges that represents a given poset is known as the transitive reduction of the poset,
and the DAG with all implied edges present is called the transitive closure. We will
see how to compute both of these in Chapter ??.
Every DAG can be topologically sorted to produce a total ordering of its nodes
(known as a topological ordering) that is consistent with the partial ordering implied
by the DAG. As seen in Figure 3.6(b), all edges point consistently from left to right
when we arrange the DAG according to its topological ordering. Many topological
orderings may be valid for a given DAG; as an extreme example, any ordering is
valid if our DAG has no edges.
Topological orderings have many applications. The instance shown in Figure 3.6
can be viewed as a scheduling problem, where nodes represent tasks and edges
represent precedence constraints between these tasks; DAGs arise commonly in this
setting. A topological ordering here gives a linear task ordering that respects all of
the precedence constraints. Just as sorting can be a useful preprocessing step for
many algorithms, topological sorting is a common preprocessing step for algorithms
dealing with DAGs. We will see several examples of this when we study graphs in
greater detail later in the book.
There are several ways to topologically sort a DAG in linear time, which in this
case means O(m + n) time if the DAG has n nodes and m edges. We discuss
one method here and another, based on a simple graph algorithm called depth-first
search, in Chapter ??. It is easy to show that every DAG always has at least one
source node, with no incoming edge. If not, you could start at any node and walk
backward, moving from one node to the next by always stepping backward along an
arbitrary incoming edge. Since the graph has a finite number of nodes, this process
would eventually revisit a node and close a directed cycle, thereby contradicting the
fact that the graph is acyclic. Any source node is safe to place at the beginning of
a topological ordering, since no other node needs to come before it. Therefore, we
can generate a valid topological ordering by repeatedly finding and removing source
nodes. With care, we can implement this in linear time. [Details]
Suppose that we have an array A[1 . . . n] containing only zeros and ones. We can
easily sort A in linear time by simply counting the zeros and ones, and then building
a new sorted array containing the appropriate number of each.
This idea generalizes easily to the case where A contains integers in the range
0 . . . C 1. Here, we use an auxiliary array N [0 . . . C 1], where N [v] counts the
number of copies of the value v in A. To build N after initializing its entries to zero,
we increment N [A[i]] for every i = 1 . . . n. We can then scan through N to re-build
A in sorted order. The resulting sorting algorithm, known as counting sort, runs in
(n + C) time, which is linear time as long as C = O(n). With care, we can even
implement it in a stable fashion. [Implementation details]
Using multiple invocations of counting sort, we can build a more powerful sorting
algorithm known as radix sort that sorts in linear time if C = O(nc ) for some
constant c.
= sorted
Figure 3.7: The operation of radix sort: first perform a stable sort on the
least-significant digit, then on the second-least-significant digit, and so on.
Radix sort is illustrated in Figure 3.7. We first write our integers down in some
number base, or radix (hence the name of the sort), which in our example is base
10. We then sort according to each successive digit position, starting with the
least significant. Radix sort actually originated as mechanical technique for sorting
stacks of punched cards. Each card contained a number written in binary encoded
by a series of punched and non-punched holes, and the cards were repeatedly fed
through a mechanical sorting device, once for every digit, and sorted by separating
cards with punched digits from those with non-punched digits.
The subroutine we use to perform each of the digit sorts can be any stable sorting
algorithm, although we use counting sort since it is the natural choice for sorting
small integers. If we ideally write our integers in base n, then each digit assumes
values from 0 . . . n 1 and each individual counting sort runs in (n) time. Our
numbers will have at most logn C digits, which is constant as long as C = O(nc ) for
some constant c. The total running time is therefore linear as long as C = O(nc )
with c constant9 . Stability is crucial for our individual digit sorts, since if two num-
bers agree in their most significant digit (our final sorting pass), we want them to
remain ordered properly by their lower-order digits, as they will be at this point
thanks to induction.
9 Subtle point: A common assumption with the RAM model of computation is that each word
contains (log n) bits, which is exactly large enough to hold an integer of magnitude nc for
c = O(1). If the n numbers being radix sorted each fit into a single (log n)-bit machine word,
then radix sort therefore takes only (n) time. Furthermore, if we plan to sort larger numbers
on a machine with word size (log n), it takes (n log C/ log n) = (n logn C) words to store n
numbers of magnitude C, matching the running time for radix sort (so this running time cannot be
improved, since it takes the same amount of time just to examine all the input words). The only
way to allow for stronger RAM-based sorting algorithms is to consider slightly larger word sizes
(which is common practice in the sorting literature). For word sizes slightly larger than (log n)
bits, it is still unknown if linear-time sorting is possible see the endnotes for further discussion.
(a) Note that we typically do not know d1 . . . dn at the outset. Show how to compute these
values in only O(ndmax ) time, where dmax = max di . Once we know dmax , observe
that we can apply standard radix sort to sort A1 . . . An in O(ndmax ) time. [Solution]
(b) The ndmax term above is somewhat unsightly. Can you improve the running time for
both determining d1 . . . dn as well as sorting A1 . . . An to just O(D)? [Solution]
to drop k tests, with the final grade being determined by a set S containing only
the remaining n k tests: P
bi
g(S) = P i2S .
i2S mi
Our goal is to decide which n k tests to keep so that our grade g(S) is maximized;
let g denote this maximum value. This is known as the optimal subset selection
problem, and although it may seem solvable by simply discarding the k tests with
the lowest bi /mi ratios, this is actually not the case. You may wish to pause and
convince yourself of this fact by constructing an appropriate counterexample.
Binary Search on the Answer. As with any new problem, we might approach
it by attempting to apply one of our main algorithm design techniques. Divide and
conquer, specifically in the form of binary search, yields some useful initial insight.
For many optimization problems, it turns out to be much easier to check whether
a guess x at the answer is too high (x > g ) or too low (x < g ), than to solve
the problem outright. Any time this is the case, we can home in on the optimal
solution value g quickly using binary search on x. To check if x < g , we want to
know if there exists some subset S of tests with |S| = n k such that
P
bi
P i2S > x,
i2S m i
and by rearranging terms, we see that this is the same question as whether or not
there exists some subset S with |S| = n k such that
X
mi x + bi > 0.
i2S
involving optimization of a ratio, since through algebraic term re-arrangement it often reduces
the problem to one having a simpler (non-ratio) objective. For example, given a numeric array
A[1 . . . n], suppose we want to find a window i . . . j of length at least k with the largest average value
(without the length k constraint, the problem would be trivial, the answer being just the single
largest element in A). To test if a guess x for the answer is too small, we ask whether there exists a
window i . . . j such that j 1i+1 (A[i]+. . .+A[j]) > x, otherwise written as A[i]+. . .+A[j] > x+. . .+x
(with j i + 1 terms), which we further simplify to (A[i] x) + . . . + (A[j] x) > 0. Defining
B[i] = A[i] x, we now want to know whether there exists a subarray of B of length at least k
whose sum is positive, which we can answer by finding the maximum sum of a subarray in B of
length at least k a problem that is easy to solve in (n) time (see, e.g., Section 8.1.1). We have
therefore eectively replaced average with sum in our objective.
x = g
Figure 3.8: Visualizing the optimal subset selection problem as the geometric
problem of finding the rightmost value of x at which the y values of the n k
highest lines is still nonnegative.
lines as in Figure 3.8. In this geometric context, we want to find the value of x at
which the sum of the y values of the n k highest lines at evaluated at x is zero.
Since our lines have negative slope, observe that the sum of the y values of the n k
highest lines decreases as we move x to the right, and increases as we move x to the
left. This monotonicity is what makes binary search possible.
Solving the Problem by Already Knowing the Answer. Binary search on x
to find g will work fine in practice, although this approach still leaves something to
be desired, as it only converges to the optimal solution over time, rather than giving
an exact answer after a number of steps polynomial in n. On the real RAM, it could
theoretically even take infinitely long to converge if g is irrational. Let us therefore
consider a slightly dierent approach. Suppose we magically knew the n k highest
lines when evaluated at x = g . Observe that this is enough to compute g in (n)
time, since we can add together the linear functions representing these n k lines
to get a single linear function of x, set it equal to zero, and solve for x.
Now all we need to do is compute the highest n k lines at x = g . Normally, we
would do this in (n) time by using quickselect to select for the kth highest line
at x = g , and then partitioning on this line to obtain the n k lines above it.
This may all seem futile, however, since we dont know g , so we dont know what
value of x to plug in when evaluating our lines. Accordingly, let us keep the yi s
in the form of generic linear functions (e.g., 5x + 3), while running the algorithm
above. This doesnt cause problems until we reach a comparison, where we are now
comparing two linear functions, such as 5x + 3 versus 7x + 9. The result of this
comparison depends on x; here, the first function is smaller if x < 3 and larger if
x > 3. Remembering that we are planning to evaluate these functions at x = g ,
all we need to know to resolve the comparison is therefore whether or not g < 3,
and we can answer this question by running A(3).
Our approach now makes sense. As we run our high-level selection algorithm with
the yi s represented generically as linear functions, we pause at every comparison
and run an invocation of A(x) to test the breakpoint value of x that is necessary
to resolve the comparison. Since every comparison in the high-level (n) algorithm
invokes a lower-level (n)-time algorithm, the total running time is (n2 ).
A Clever Use of Divide and Conquer. One final idea gives a dramatic im-
provement in running time. Our high-level algorithm based on quickselect performs
a number of partition operations, each one comparing all the elements in an array
against a specific pivot element. Since these comparisons dont depend on each-
other (they could in theory be done in parallel), we can therefore use divide and
conquer based on selection and binary search to resolve n such comparisons in only
(n) time plus only O(log n) invocations of A (instead of n invocations as before).
This drops our total running time to only O(n log n) in expectation. [Full details]
The technique above, known as parametric search, is useful in a wide range of prob-
lems, particularly in computational geometry. It is also an interesting example of
how ideas from parallel computation can be used in novel ways to speed up sequen-
tial (non-parallel) algorithms. For our purposes, it serves as a wonderful example
of what one can achieve using smart combinations of basic design techniques. See
the endnotes for further references regarding the optimal subset selection problem,
as this problem even admits a (n) solution!
that the technique you are developing goes by the name of repeated squaring. Despite
its simplicity, this method is quite powerful, and it is commonly used in practice for
raising any complicated object (e.g., a matrix or a polynomial, or any other object for
which multiplication is costly) to a large power. [Solution]
(b) Median Among Two Sorted Arrays. You are given two sorted arrays A[1 . . . n/2]
and B[1 . . . n/2] as input. Please show how to compute the median among all the n
elements contained in both arrays in O(log n) time. For a challenge, what is the best
performance guarantee you can oer (in terms of n and k) if A and B contain k and
n k elements rather than n/2 elements each? What if you want to find the median
element among k sorted arrays each containing n/k elements? [Solution]
(c) Find the Missing Element. You are given two arrays: A, containing n elements,
and B, containing n 1 of the elements present in A. Design a comparison-based
algorithm running in (n) time that determines which element in A does not appear
in B. How quickly can you solve the more general problem variant where k elements
are left out of B? What about the similar problem where B contains n + 1 elements
including all of the elements in A, but with one of them occurring twice where
we wish to locate the duplicated element? [Solution]
(d) Cyclic Shift of an Increasing Sequence. Suppose A[1 . . . n] is obtained by per-
forming a k-step right cyclic shift of a length-n array whose elements form a strictly
increasing sequence (a k-step right cyclic shift moves every element to an index k
steps ahead, with the k final elements wrapping around and becoming the k first
elements). Given A, how can you determine k in O(log n) time? [Solution]
(e) Monotone Matrix Searching. Let A be an n n matrix whose entries are distinct
numbers, and let M (i) denote index of the minimum element in row i. We say A is
monotone if M (1) M (2) . . . M (n). Please give an O(n log n) algorithm for
computing M (1) . . . M (n) in an n n monotone matrix11 . [Solution]
(f) Approximate Binary Search. Suppose you want to find an unknown value v in the
range 1 . . . n. If v is an integer, we can of course binary search for v in O(log n) time
by first guessing n/2, then seeing if this guess is too high or too low, and successively
halving our search range in each step. Suppose, however, that v is not necessarily
an integer, and that we are content to find an approximate solution within the range
(1 ")v . . . (1+")v, where " is some constant. As it turns out, we can solve this problem
in only O(log log n) time by using the geometric mean rather than the arithmetic mean
as our guess for each step in the binary p search. That is, when we are searching the
interval [a, b], our next guess will be ab. Please prove that this does indeed give us
an O(log log n) running time. As a hint, consider problem 8. [Solution]
(g) Longest Line of Sight. You are given an array H[1 . . . n] containing distinct numbers
that describe the heights of n people standing in a line. We say person i and person
j > i can see each-other if H[i] and H[j] are both larger than all of the elements in
H[i + 1 . . . j 1]. Please show how to locate a farthest pair of individuals i and j that
can still see each-other in O(n log n) time12 . [Solution]
(h) Printing a Linked List Backwards. You are given as input a pointer to the first
element of an n-element linked list (without being told n), and asked to print out the
contents of the list in reverse order without physically modifying the elements of the
list in any way. This is trivial to do in (n) time if we allow (n) extra space, but
somewhat more interesting if we wish to use much less auxiliary memory while still
maintaining a fast running time. Please describe an O(n log n) time solution using
11 In Section 11.5, we will see monotone matrices with even more structure, known as totally
monotone matrices, for which we can actually compute M (1) . . . M (n) in only (n) time (for the
case of a monotone matrix there is actually an (n log n) worst-case lower bound in the comparison
model). As we will see, this result has surprisingly many applications!
12 In problem 133, we improve the running time to (n).
only O(log n) auxiliary words of memory. Can you generalize your solution to run in
O(kn) time using O(kn1/k ) space for any k = 1 . . . log n? Observe that k = 1 gives
the trivial solution where we use (n) extra space, and that k = log n leads to the
time and space bounds of O(n log n) and O(log n) above. [Solution]
(i) Finding a Local Minimum. Let f (x) be a function defined over all integers x in the
range 1 . . . n. We say x is a local minimum of f if f (x) f (x 1) and f (x) f (x + 1),
where we treat f (0) = f (n + 1) = +1. Please show how to compute a local minimum
of f by evaluating f at only O(log n) points. Next, let g(x, y) be a function defined
over integers x and y in the range 1 . . . n. We say (x, y) is a local minimum if g(x, y)
is no larger than any of the four neighboring values g(x 1, y) and g(x, y 1) (again,
we treat g(x, y) as being infinite if x or y is not in the range 1 . . . n). Please show how
to compute a local minimum of g with only O(n) function evaluations. [Solution]
(j) Popcount. The operation popcount(x) counts the number of bits set to one in the
binary representation of a nonnegative integer x. It plays a useful role in surprisingly
many algorithms. If x is a b-bit integer, popcount(x) can easily be computed in O(b)
time by summing up the individual bits in x. Please show how to use divide and
conquer to obtain an O(log b) running time. Please assume the following machine
instructions are available and take constant time each: shifting the bits of x left or
right (by any number of positions), and bitwise AND, OR, and XOR. Shifting the bits
of x left (right) by k positions corresponds to integer multiplication (division) by 2k ,
and bitwise operations on two integers x and y perform AND, OR, or XOR on each
corresponding pair of bits in x and y in parallel. For example, we can isolate the value
of the kth bit in x by shifting 1 left by k, bitwise ANDing the result with x (to mask
out just the bit we care about), and then shifting right by k. [Solution]
(k) Cake Cutting. Suppose you want to divide up a cake among n people so that every
person feels like he or she has received a fair-sized piece. This is easy if everyone values
every part of the cake equally, since the solution in this case is just to give 1/n of the
cake to everyone. However, in practice we might find that dierent individuals might
have non-uniform preferences over dierent parts of the cake (e.g., I might like the
side of the cake with more frosting, and you might prefer the side of the cake with
more sprinkles on top). To be somewhat more formal, let us consider the unit interval
[0, 1] as a long one-dimensional cake. Each of our i = 1 . . . n participants must be
served a piece of cake that corresponds to a contiguous piece of this interval. The
preferences of participant i are described by a valuation function fi (x) that specifies
his/her value for the interval [0, x]. Each fi (x) function is continuous, monotonically
increasing, and ranges from fi (0) = 0 up to fi (1) = 1 (i.e., the whole cake is worth 1
unit to each participant). Note that we can find the value of any subinterval [a, b] by
taking fi (b) fi (a). In addition to fi (x), suppose we also have access to its inverse
function fi 1 (y), so for example fi 1 (1/2) would tell us the unique point x such that
exactly half of the value for participant i lies in [0, x] and the other half lies in [x, 1].
Suppose that it takes O(1) time to evaluate fi (x) or fi 1 (y). Design an O(n log n)
time algorithm that computes a subinterval [ai , bi ] for each participant i such that
fi (b) fi (a) 1/n (i.e., each participant receives what he or she perceives to be a
fair piece of the cake). [Solution]
(l) Counting Distant Pairs of Points. You are given n points (x1 , y1 ) . . . (xn , yn ),
where the xi s and yi s are integers in the range 1 . . . n. Two points (xi , yi ) and
(xj , yj ) are said to be (a, b)-distant if |xi xj | a and |yi yj | b. Given a and b
as input, please describe an O(n log n) divide-and-conquer algorithm for counting the
number of pairs of (a, b)-distant points. For a warm-up, you may want to consider
a = b = n/2. [Solution]
(m) The Firing Squad Problem. This is a classical problem in parallel algorithm
design that has an elegant solution based on the principle of divide and conquer.
(a) Can you modify merge sort to solve this sorting problem in only O(n log k) time? Your
algorithm will not be told the value of k. [Solution]
(b) Show that randomized quicksort applied to this problem runs in O(n log k) time with
high probability, even though it also does not know the value of k. [Solution]
(c) Argue that there is a lower bound of (n log k) on the worst-case running time of any
comparison-based algorithm that solves this problem, even if the algorithm knows the
k distinct values appearing throughout the array. [Solution]
(d) Suppose you are given two strings (arrays) A[1 . . . n] and B[1 . . . n] whose characters
(elements) are drawn from an alphabet (e.g., the letters A through Z). We wish
to detect whether or not A and B are anagrams that is, whether or not you can
transform A into B by permuting its elements. If k = ||, show that in the comparison-
based model of computation one can detect anagrams in O(n log k) time, and also show
that there is a matching lower bound of (n log k) on the worst-case running time of
any anagram detection algorithm. [Solution]
Note that another nice solution for sorting in O(n log k) time will become apparent once
we learn in Chapter 6 to use balanced binary search trees as maps in that case, we
insert our n values in to a balanced binary search tree where each element is augmented
with a frequency count (this takes O(n log k) time, since the tree contains k elements),
and then we traverse the tree to enumerate its contents in sorted order).
Problem 55 (The Zero/One Sorting Theorem). The well-known zero/one
sorting theorem is very useful for proving correctness of complicated sorting algorithms. It
states that a comparison-based sorting algorithm is correct if and only if it correctly sorts
sequences consisting of just zeros and ones. Therefore, in order to argue correctness of a
complicated sorting algorithm, it suffices to argue that it sorts any sequence of zeros and
ones. Please give a short proof of the zero/one sorting theorem. [Solution]
Problem 56 (Chain and Block Sorting). Let us think of an array A[1 . . . n] as k
interleaved chains of elements. The first chain is A[1], A[k + 1], A[2k + 1], . . ., the second
is A[2], A[k + 2], A[2k + 2], . . ., and so on. We say an array A[1 . . . n] is k-chain-sorted if
each of its k chains is sorted. We can also think of an array A[1 . . . n] in terms of n/k
blocks of size k. The first block is A[1 . . . k], the second is A[k + 1 . . . 2k], and so on. We
say A is k-block-sorted if each of its n/k blocks is sorted.
(a) What is the fastest possible running time (in the comparison model) for making an
array k-chain-sorted or k-block-sorted? What is the fastest possible running time
(also in the comparison model) for sorting an array that is already k-chain-sorted and
k-block-sorted? [Solution]
(b) Take an n-element array that is k-chain-sorted and k-block-sorted. For notational
simplicity, let A = min(k, n/k) and let B = max(k, n/k). Suppose we want to search
for a particular element based on its value. Give algorithms that perform this task in
O(A log B) time (fairly easy), O(A + B) time (slightly more difficult), and O(A log(1 +
B/A)) time (a bit more difficult). Note that the last running time dominates both of
the other two. [Solution]
(c) Show that if we k-block-sort (by sorting each of its blocks independently) an array that
is already k-chain-sorted, the array remains k-chain-sorted, and vice-versa. Show also
that for any k and k0 , if we k0 -chain sort (by sorting each of its chains independently)
an array that is already k-chain-sorted, then the array stays k-chain-sorted. Is the
same true for k-block-sorting and k0 -block-sorting? [Solution]
Problem 57 (Shell Sort). Shell sort, named after its creator Donald Shell, is a
method for speeding up insertion sort by allowing elements to be moved longer distances
initially. In the context of the preceding problem, it involves making an array k-chain-
sorted (by calling insertion sort on each of its k chains independently) for decreasing
values of k coming from a so-called increment sequence. For example, suppose we use the
increment sequence {1, 3, 7, 15, . . .} of integers one less than powers of two, so we start
with k being the largest increment in this sequence less than n, and end with k = 1.
Increment sequences always end with k = 1, since this ensures the final array is sorted
(note that 1-chain-sorted means the same thing as sorted). Shell sort runs in O(n1.5 )
time for the increment sequence above, but a better sequence (at least in theory, although
sometimes not in practice) is the set containing all integers of the form 2x 3y , of which there
are (log2 n) elements at most n. Using the result from the last part of the preceding
problem, we know that when it comes time to 2x 3y -chain-sort our array, it will already be
2x+1 3y -chain-sorted and 2x 3y+1 -chain-sorted. Show that this implies that k-chain-sorting
for each increment k takes only (n) time, leading to an overall running time of (n log2 n),
which is nearly the best attainable for Shell sort (see the endnotes for further details). As
a hint, remember that insertion sort runs quickly when there are few inversions in the
arrays being sorted. [Solution]
A[1]
x max(x, y)
y min(x, y) Sn/2
(a) sorted
A[n/2]
3 Mn contents
1 6 A[n/2 + 1]
1 6 of A[1 . . . n]
3 3
6 1 Sn/2
2 2
6
2
1 A[n]
(b) (c)
Figure 3.9: Diagrams of (a) a comparator, (b) a 3-stage sorting network that
sorts 4 inputs, and (c) an n-element sorting network that implements merge sort:
The blocks labeled Sn/2 recursively sort inputs of length n/2, and the block
labeled Mn merges their output into a sorted list of length n.
(a) Batchers Odd-Even Merge Sort. Suppose we wish to merge two length-n/2 sorted
arrays A and B into a single length-n sorted array C. Consider the following somewhat
strange odd-even recursive approach for performing this merge: the odd elements
of C (i.e., C[1], C[3], etc.) are obtained by recursively merging the odd elements of
A and B, and the even elements of C are obtained by recursively merging the even
elements of A and the even elements of B. After running these recursive merges and
filling up C, we make one final postprocessing pass in which we compare successive
pairs of elements in C: C[1] versus C[2], C[3] versus C[4], and so on, swapping any
pairs that are out of order. Please use the zero/one sorting theorem to prove that this
algorithm merges correctly. Next, show how to build a sorting network based on this
merge operation that requires O(log2 n) total stages. [Solution]
(b) Bitonic Sorting. Another way to merge two sorted length-n/2 sequences is to con-
catenate the first with the reversal of the second and then to sort the resulting bitonic
sequence. We call a sequence bitonic if it increases up until some point and then de-
creases, or if it is a cyclic shift of such a sequence. Although it may seem like were
moving in the wrong direction by converting a merging problem back into a sorting
problem, it turns out that bitonic sorting is a natural and easy problem on a sorting
network. Suppose A[1 . . . n] contains a bitonic sequence. In a single stage of our sort-
ing network, we place comparators between A[1] and A[n/2 + 1], A[2] and A[n/2 + 2],
and so on. After executing this stage, show (using the zero/one sorting theorem) that
the elements in A[1 . . . n/2] must all be no larger than the elements in A[n/2 + 1 . . . n],
and that moreover, these two length n/2-subarrays must also be bitonic. As a conse-
quence, we can now continue to sort these two half-sized subproblems recursively in
parallel. Show that merging according to this strategy requires O(log n) stages, and
that this leads to a parallel version of merge sort that requires O(log2 n) total stages.
[Solution]
(a) Suppose we wish to choose the coordinates of a post office such so as to minimize
the sum of distances between the post office and each business. Show how to find a
suitable location for the post office in (n) time. If you want to start with a simpler
problem, consider first the one-dimensional variant where the businesses are points
on a number line. Can you generalize your solution to work in higher dimensions?
[Solution]
(b) Let us now assign weights w1 . . . wn to the businesses, indicating the relative important
of the post office being close to each business. We now wish to find a location for the
post office which minimizes the weighted sum of distances to the businesses. Show
how to solve this problem in (n) time. [Solution]
(a) Modify the merge sort algorithm to count the total number of inversions in an array
while still running in O(n log n) time. Try to extend this algorithm to count the
number of inversions per array element. We define the number of inversions for an
element A[j] as the number of elements preceding it in the array that have larger
values than A[j]. We only consider preceding elements so we count each inversion
exactly once summing up all of the per-element inversion counts should therefore
give the total number of inversions. [Solution]
(b) Suppose we would like to efficiently count inversions in an array of integers in the
range 0 . . . C 1. Show first that if C = O(1), then we can do this in linear time. If
C = O(nc ) for c = O(1), then we can sort in linear time via radix sort but it seems
difficult to exactly count inversions in linear time. However, we can approximate the
number of inversions in linear time. Consider summing up for each element in our
array the distance between its current index in the array and its rank (i.e., its index
within the array once the array is sorted). The resulting measure, F , is known as
Spearmans Footrule. Show that we can compute this quantity in linear time, and that
I F 2I where I denotes the number of inversions in our array. [Solution]
(a) Please describe how to count the number of subarrays A[i . . . j] with sum at least v in
O(n log n) time. [Solution]
(b) Please describe how to count the number of subarrays A[i . . . j] with median at least
v in O(n log n) time, or better yet, O(n) time (for a subarray of even size, this means
that at least half its elements must be no smaller than v). [Solution]
(c) Please describe how to count the number of subarrays A[i . . . j] with mean at least v
in O(n log n) time. [Solution]
(a) Give an O(n log n) algorithm for computing the average distance among all pairs of
points. [Solution]
(b) Give a randomized algorithm running in O(n log n) expected time that computes the
kth largest of all n2 pairwise distances. [Solution]
(c) For a challenge, can you find a deterministic O(n log n) algorithm for the problem
from part (b)? [Solution]
(a) A simple approach for pancake flipping is to place the largest pancake at the bottom of
the stack (if it isnt already there) by first flipping it up to the top, and the reversing
the entire stack. We then proceed to place the second-largest pancake, and so on,
only moving pancakes that need to be moved. Please argue that this is actually
a 2-approximation algorithm. As a hint, a useful concept here is the notion of a
breakpoint, which exists between two pancakes if they are not neighbors in the final
sorted sequence. [Solution]
(b) Now consider the general problem of sorting with substring reversals. A natural greedy
algorithm for this problem is to reverse, repeatedly, a substring that reduces the
number of breakpoints in our sequence as much as possible. Since a reversal can only
aect breakpoints at its endpoints, the best we can hope to do is reduce the number
of breakpoints by 2. The worst we can do is fail to reduce the number of breakpoints
at all (so we need to be careful in proving that the algorithm actually terminates). As
a tie-breaking mechanism, if the algorithm can only manage to decrease the number
of breakpoints by 1, then it should make its choice in such a way that it leaves a
decreasing strip if possible. A strip is just a substring of elements between any two
consecutive breakpoints, which can be either increasing or decreasing. For a challenge,
prove that the greedy algorithm is a 2-approximation algorithm. As a starting point,
show that the greedy algorithm uses only B 1 reversals to sort a sequence with B
breakpoints, as long as it has at least one decreasing strip (hence, if there are not any
decreasing strips initially, our first reversal will create one, so we will use at most B
reversals in total). [Solution]
be NP-hard, although somewhat surprisingly its signed variant can be solved in polynomial
time by a rather complicated algorithm (further references can be found in the endnotes). The
signed variant is particularly relevant to some biological situations where elements of our sequence
each have an inherent notion of directionality; for example, by reversing the substring BCD in
ABCDE we would end up with AD 0 C 0 B 0 E where X 0 denotes the element X oriented in the
reverse direction. Our sequence of reversals in the signed case must leave every element oriented
in the forward direction. In this problem, we consider only the unsigned variant.
15 Its signed variant is known as the burned pancake problem, where each pancake has a burned
(a) Design in-place O(n log n) algorithms for performing (i) a perfect shue and (ii) the
inverse of a perfect shue. As a hint, consider problem 45. [Solution]
(b) Suppose we store an m n matrix in row-major form as an array of length N = mn
in which we list the elements of the matrix one row at a time from left to right. By
taking the transpose of such a matrix, we convert it into column-major form an
array of length N listing the columns of the matrix one at a time, each one from top
to bottom. The perfect shue is equivalent to computing the transpose of a 2 n/2
matrix, and more generally the transpose of an m n matrix can be seen as an m-ary
perfect shue, where we divide a sequence into m equal blocks and then interleave
their contents. Using the result from (a) as black box, show how to compute the
transpose of an arbitrary matrix in place in O(N log2 N ) time. [Solution]
16 Normally, we would ensure each cycle is rotated only once by simply marking its elements
when they are rotated. However, that wouldnt give an in-place algorithm, so we use the leader
trick instead.
17 You may want to also consider how this analysis leads to an O(n log n) time solution to problem
133 as well.