Today’s Material
• Lower Bounds on Comparison-based Sorting
• Linear-Time Sorting Algorithms
– Counting Sort
– Radix Sort
1
How Fast Can We Sort?
• Basic Sorting Algorithms
– BubleSort, InsertionSort and SelectionSort all run
in O(N^2)
• Fast Sorting Algorithms
– Mergesort, Heapsort and Quicksort all run in O(N
log N) best case running time
• Can we do any better?
– Can we come up with a sorting algorithm that will
sort in O(N log logN)? What about O(N)?
2
Lower Bound on Comparison-based Sorting
• Recall our basic assumption: we can only
compare two elements at a time
– Suppose you are given N elements
– How many possible orderings can you get?
• Example: a, b, c (N = 3)
• How many distinct sequences exist?
• Orderings:
1. a b c
2. b c a
3. c a b
4. a c b
5. b a c
6. c b a
• N = 3: We have 6 orderings = 3•2•1 = 3!
N choices N-1 choices N-2 choices 1 choice
• For N elements:
N! orderings 3
A “Decision Tree” for Sorting N=3 Elements
a < b < c, b < c < a,
Possible
c < a < b, a < c < b,
Orderings
b < a < c, c < b < a
a < b a > b
a<b<c b<c<a
Remaining
c<a<b b<a<c
Orderings
a<c<b c<b<a
a < c a > c b < c b > c
a<b<c c<a<b b<c<a c<b<a
a<c<b b<a<c
b < c b > c a < c a > c
a<b<c a<c<b b<a<c b<c<a
Leaves contain all possible orderings of a, b, c
4
Decision Trees and Sorting
• A Decision Tree for Sorting is a Binary Tree such that:
– Each node = a set of orderings
– Each edge = 1 comparison
– Each leaf = 1 unique ordering
• How many leaves for N distinct elements?
– Only 1 leaf has correct sorted ordering for given a, b, c
• Each sorting algorithm corresponds to a decision tree
– Finds correct leaf by following edges (= comparisons)
• Run time >= maximum no. of comparisons
– Depends on: depth of decision tree
– What is the depth of a decision tree for N distinct elements?
5
Decision Trees and Sorting
• Suppose you have a binary tree of depth d . How many
leaves can the tree have?
– E.g. Depth = 1 at most 2 leaves
– Depth = 2 at most 4 leaves, etc.
– Depth = d at most 2d leaves. Easy to prove.
• Number of leaves L <= 2d d >= log(L)
– Decision tree has L = N! leaves
– Depth d >= log(N!)
– What is log(N!)?
• log(N!) = log N + log(N-1) + … log(N/2) + … + log 1
>= log N + log(N-1) + … log(N/2) (N/2 terms only)
>= (N/2)•log(N/2) = (N log N)
• Result: Any sorting algorithm based on comparisons
between elements requires (N log N) comparisons 6
Using the Lower Bound for Sorting
• Lower bound that we proved for sorting is in
fact used to prove lower bound for other
problems. Here is how:
– Say you have a problem for which you want to prove
a lower bound
– If we can reduce the sorting problem to the
problem to be solved, then we have essentially
proved that the lower bound for the new problem is
nlogn. Why?
• Because if we can solve your problem in less time than
nlogn, then we can solve sorting in less time than nlogn.
And we have just proved that that is not possible!
7
Example (NlogN) Problems
• Convex Hull
– Given a set of points in the plane, find the closest
convex polygon that encloses them
• Closest Pair
– Given a list of n elements, which pair are closest in
value
• Element Uniqueness
– Given a list of n numbers, are there any duplicates
in the list
8
What about Counting Sort?
• Problem: Sort integers in the range 1 to B
• Algorithm:
1. Allocate an array Count having B slots (“buckets”)
2. Initialize: Count[i] = 0 for i = 1 to B
3. Given input integer i, Count[i]++
4. After reading all inputs, scan Count[i] for i = 1 to B
and print i if Count[i] is non-zero
• Ex: Sort the following integers in the range 1
to 9
– 4 2 5 5 9 8 8 3 1 2
9
What if A holds records with integer key?
// A[1..n]: Holds the initial input. A[I].key is the integer
// key value on which to sort on. A[I].data is the other satellite data
// B[1..n]: Holds the sorted output
// Count[1..B]: Count[I] is the rank of I, that is, the number of
// elements of A whose key value is less than or equal to i
CountingSort(A, B, n, B) {
for I=1 to B do Count[I] = 0;
for j=1 to n do Count[A[j].key]++;
for I=2 to B do Count[I] += Count[I-1];
for j=n downto 1 do {
I = A[j].key;
B[Count[I]] = A[j];
Count[I]--;
} //end-for
} //end-CountingSort 10
Counting Sort Example
1 2 3 4 5 2 3 4 3 4
1 1 2
A 1 4 3 1 3 2 0 2 1 2 2 4 5
a e r s v
1 2 3 4 1 2 3 4 5
2 2 4 5 3
B
v
2 2 3 5
B 1 3
s v
1 2 3 5
B 1 3 3
s r v
1 2 2 5
B 1 3 3 4
s r v e
1 2 2 4
B 1 1 3 3 4
0 2 2 4 a s r v e 11
Counting Sort Running Time
• What is the running time for sorting N
integers using bucket sort?
– Running Time: O(B+N)
– B to zero/scan the array and N to read the input
– If B is O(N), then running time for Bucket sort =
O(N)
– Doesn’t this violate the (N log N) lower bound
result?
– No – When we do Count[i]++, we are comparing
one element with all B elements, not just two
elements
• Not regular 2-way comparison-based sorting
12
Radix Sort: Stable Counting Sort
• Problem: What if number of buckets needed is
too large?
• Recall: Stable sort = a sort that does not
change order of items with same key
• Radix sort = stable bucket sort on “slices” of
key
1. Divide integers/strings into digits/characters
2. Bucket-sort from least significant to most
significant digit/character
– Uses linked lists
– Stability ensures keys already sorted stay sorted
– Takes O(P(B+N)) time where P = number of digits
13
Radix Sort Example
576 49[4] 9[5]4 [1]76 176
494 19[4] 5[7]6 [1]94 194
194 95[4] 1[7]6 [2]78 278
296 57[6] 2[7]8 [2]96 296
278 29[6] 4[9]4 [4]94 494
176 17[6] 1[9]4 [5]76 576
954 27[8] 2[9]6 [9]54 954
14
Radix Sort Running Time
• P = # of digits iterations over the data set
• Each iteration takes O(B+N) time
– Total: O(P(B+N)) time where P = number of digits
• E.g., N numbers in base 10 in the range
0-999999 (max. 6 digits)
– Total: O(6*(10+N))
– Total: O(60 + 6N)
15
Summary of Sorting
• Sorting choices:
– O(N2) – Bubblesort, Selection Sort, Insertion Sort
– O(Nx) – Shellsort (x = 3/2, 4/3, 2, etc. depending on
incr. seq.)
– O(N log N) average case running time:
• Mergesort: easy to code but uses O(N) extra space
• Heapsort: needs 2 comparisons to move data (between 2
children and parent) – may not be fast in practice
• Quicksort: fastest in practice but trickier to code, O(N2)
worst case
– O(P·N)
• Radix sort (using Counting sort) for special cases where
keys are P digit integers/strings
16