KEMBAR78
Linear time sorting algorithms | PDF
Algorithms

Sandeep Kumar Poonia
Head Of Dept. CS/IT
B.E., M.Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
Algorithms
Linear-Time Sorting Algorithms

Sandeep Kumar Poonia

2/19/2012
Sorting So Far


Insertion sort:








Easy to code
Fast on small inputs (less than ~50 elements)
Fast on nearly-sorted inputs
O(n2) worst case
O(n2) average (equally-likely inputs) case
O(n2) reverse-sorted case

Sandeep Kumar Poonia

2/19/2012
Sorting So Far


Merge sort:


Divide-and-conquer:
 Split

array in half
 Recursively sort subarrays
 Linear-time merge step



O(n lg n) worst case
Doesn’t sort in place

Sandeep Kumar Poonia

2/19/2012
Sorting So Far


Heap sort:


Uses the very useful heap data structure
 Complete

binary tree
 Heap property: parent key > children’s keys



O(n lg n) worst case
Sorts in place

Sandeep Kumar Poonia

2/19/2012
Sorting So Far


Quick sort:


Divide-and-conquer:
 Partition

array into two subarrays, recursively sort
 All of first subarray < all of second subarray
 No merge step needed!





O(n lg n) average case
Fast in practice
O(n2) worst case
 Naïve

implementation: worst case on sorted input
 Address this with randomized quicksort
Sandeep Kumar Poonia

2/19/2012
How Fast Can We Sort?


We will provide a lower bound, then beat it




How do you suppose we’ll beat it?

First, an observation: all of the sorting
algorithms so far are comparison sorts




The only operation used to gain ordering
information about a sequence is the pairwise
comparison of two elements
Theorem: all comparison sorts are (n lg n)
A

comparison sort must do O(n) comparisons (why?)
 What about the gap between O(n) and O(n lg n)
Sandeep Kumar Poonia

2/19/2012
Decision Trees


Decision trees provide an abstraction of
comparison sorts


A decision tree represents the comparisons made
by a comparison sort. Every thing else ignored

What do the leaves represent?
 How many leaves must there be?


Sandeep Kumar Poonia

2/19/2012
Decision Trees

Sandeep Kumar Poonia

2/19/2012
Decision Trees


Decision trees can model comparison sorts.
For a given algorithm:





One tree for each n
Tree paths are all possible execution traces
What’s the longest path in a decision tree for
insertion sort? For merge sort?

What is the asymptotic height of any decision
tree for sorting n elements?
 Answer: (n lg n)


Sandeep Kumar Poonia

2/19/2012
Lower Bound For
Comparison Sorting
Theorem: Any decision tree that sorts n
elements has height (n lg n)
 What’s the minimum # of leaves?
 What’s the maximum # of leaves of a binary
tree of height h?
 Clearly the minimum # of leaves is less than or
equal to the maximum # of leaves


Sandeep Kumar Poonia

2/19/2012
Lower Bound For
Comparison Sorting
So we have…
n!  2h
 Taking logarithms:
lg (n!)  h
 Stirling’s approximation tells us:


n



n
n!   
e
n
n
Thus: h  lg 
e

Sandeep Kumar Poonia

2/19/2012
Lower Bound For
Comparison Sorting


So we have
n
h  lg  
e

n

 n lg n  n lg e
 n lg n 


Thus the minimum height of a decision tree is
(n lg n)

Sandeep Kumar Poonia

2/19/2012
Lower Bound For
Comparison Sorts
Thus the time to comparison sort n elements is
(n lg n)
 Corollary: Heapsort and Mergesort are
asymptotically optimal comparison sorts
 But the name of this lecture is “Sorting in
linear time”!




How can we do better than (n lg n)?

Sandeep Kumar Poonia

2/19/2012
Sorting In Linear Time


Counting sort



No comparisons between elements!
But…depends on assumption about the numbers
being sorted
 We



assume numbers are in the range 1.. k

The algorithm:
A[1..n], where A[j]  {1, 2, 3, …, k}
 Output: B[1..n], sorted (notice: not sorting in place)
 Also: Array C[1..k] for auxiliary storage
 Input:

Sandeep Kumar Poonia

2/19/2012
Counting Sort
1
2
3
4
5
6
7
8
9
10

CountingSort(A, B, k)
for i=0 to k
C[i]= 0;
for j=1 to n
C[A[j]]= C[A[j]] + 1;
for i=2 to k
C[i] = C[i] + C[i-1];
for j=n downto 1
B[C[A[j]]] = A[j];
C[A[j]] = C[A[j]] - 1;

Sandeep Kumar Poonia

2/19/2012
Counting Sort
1
2
3
4
5
6
7
8
9
10

CountingSort(A, B, k)
for i=1 to k
Takes time O(k)
C[i]= 0;
for j=1 to n
C[A[j]] += 1;
for i=2 to k
C[i] = C[i] + C[i-1];
Takes time O(n)
for j=n downto 1
B[C[A[j]]] = A[j];
C[A[j]] -= 1;

What will be the running time?

Sandeep Kumar Poonia

2/19/2012
Counting Sort


Total time: O(n + k)





Usually, k = O(n)
Thus counting sort runs in O(n) time

But sorting is (n lg n)!




No contradiction--this is not a comparison sort (in
fact, there are no comparisons at all!)
Notice that this algorithm is stable

Sandeep Kumar Poonia

2/19/2012
Counting Sort
Cool! Why don’t we always use counting sort?
 Because it depends on range k of elements
 Could we use counting sort to sort 32 bit
integers? Why or why not?
 Answer: no, k too large (232 = 4,294,967,296)


Sandeep Kumar Poonia

2/19/2012
Counting Sort
How did IBM get rich originally?
 Answer: punched card readers for census
tabulation in early 1900’s.




In particular, a card sorter that could sort cards
into different bins
 Each

column can be punched in 12 places
 Decimal digits use 10 places


Problem: only one column can be sorted on at a
time

Sandeep Kumar Poonia

2/19/2012
Radix Sort
Intuitively, you might sort on the most
significant digit, then the second msd, etc.
 Problem: lots of intermediate piles of cards to
keep track of
 Key idea: sort the least significant digit first


RadixSort(A, d)
for i=1 to d
StableSort(A) on digit i

Sandeep Kumar Poonia

2/19/2012
Radix Sort

Sandeep Kumar Poonia

2/19/2012
Radix Sort
Can we prove it will work?
 Sketch of an inductive argument (induction on
the number of passes):





Assume lower-order digits {j: j<i}are sorted
Show that sorting next digit i leaves array correctly
sorted
 If

two digits at position i are different, ordering numbers
by that digit is correct (lower-order digits irrelevant)
 If they are the same, numbers are already sorted on the
lower-order digits. Since we use a stable sort, the
numbers stay in the right order
Sandeep Kumar Poonia

2/19/2012
Radix Sort

Sandeep Kumar Poonia

2/19/2012
Radix Sort

Sandeep Kumar Poonia

2/19/2012
Radix Sort
Example

Sandeep Kumar Poonia

2/19/2012
Radix Sort


Problem:

Sandeep Kumar Poonia

2/19/2012
Radix Sort


In general, radix sort based on counting sort is






Fast
Asymptotically fast (i.e., O(n))
Simple to code
A good choice

Sandeep Kumar Poonia

2/19/2012
Bucket Sort


Bucket sort



Assumption: input is n reals from [0, 1)
Basic idea:
 Create

n linked lists (buckets) to divide interval [0,1)
into subintervals of size 1/n
 Add each input element to appropriate bucket and sort
buckets with insertion sort


Uniform input distribution  O(1) bucket size
 Therefore



the expected total time is O(n)

These ideas will return when we study hash tables

Sandeep Kumar Poonia

2/19/2012
Bucket Sort

Sandeep Kumar Poonia

2/19/2012
Bucket Sort

Sandeep Kumar Poonia

2/19/2012
Bucket Sort

Sandeep Kumar Poonia

2/19/2012

Linear time sorting algorithms

  • 1.
    Algorithms Sandeep Kumar Poonia HeadOf Dept. CS/IT B.E., M.Tech., UGC-NET LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
  • 2.
  • 3.
    Sorting So Far  Insertionsort:       Easy to code Fast on small inputs (less than ~50 elements) Fast on nearly-sorted inputs O(n2) worst case O(n2) average (equally-likely inputs) case O(n2) reverse-sorted case Sandeep Kumar Poonia 2/19/2012
  • 4.
    Sorting So Far  Mergesort:  Divide-and-conquer:  Split array in half  Recursively sort subarrays  Linear-time merge step   O(n lg n) worst case Doesn’t sort in place Sandeep Kumar Poonia 2/19/2012
  • 5.
    Sorting So Far  Heapsort:  Uses the very useful heap data structure  Complete binary tree  Heap property: parent key > children’s keys   O(n lg n) worst case Sorts in place Sandeep Kumar Poonia 2/19/2012
  • 6.
    Sorting So Far  Quicksort:  Divide-and-conquer:  Partition array into two subarrays, recursively sort  All of first subarray < all of second subarray  No merge step needed!    O(n lg n) average case Fast in practice O(n2) worst case  Naïve implementation: worst case on sorted input  Address this with randomized quicksort Sandeep Kumar Poonia 2/19/2012
  • 7.
    How Fast CanWe Sort?  We will provide a lower bound, then beat it   How do you suppose we’ll beat it? First, an observation: all of the sorting algorithms so far are comparison sorts   The only operation used to gain ordering information about a sequence is the pairwise comparison of two elements Theorem: all comparison sorts are (n lg n) A comparison sort must do O(n) comparisons (why?)  What about the gap between O(n) and O(n lg n) Sandeep Kumar Poonia 2/19/2012
  • 8.
    Decision Trees  Decision treesprovide an abstraction of comparison sorts  A decision tree represents the comparisons made by a comparison sort. Every thing else ignored What do the leaves represent?  How many leaves must there be?  Sandeep Kumar Poonia 2/19/2012
  • 9.
  • 10.
    Decision Trees  Decision treescan model comparison sorts. For a given algorithm:    One tree for each n Tree paths are all possible execution traces What’s the longest path in a decision tree for insertion sort? For merge sort? What is the asymptotic height of any decision tree for sorting n elements?  Answer: (n lg n)  Sandeep Kumar Poonia 2/19/2012
  • 11.
    Lower Bound For ComparisonSorting Theorem: Any decision tree that sorts n elements has height (n lg n)  What’s the minimum # of leaves?  What’s the maximum # of leaves of a binary tree of height h?  Clearly the minimum # of leaves is less than or equal to the maximum # of leaves  Sandeep Kumar Poonia 2/19/2012
  • 12.
    Lower Bound For ComparisonSorting So we have… n!  2h  Taking logarithms: lg (n!)  h  Stirling’s approximation tells us:  n  n n!    e n n Thus: h  lg  e Sandeep Kumar Poonia 2/19/2012
  • 13.
    Lower Bound For ComparisonSorting  So we have n h  lg   e n  n lg n  n lg e  n lg n   Thus the minimum height of a decision tree is (n lg n) Sandeep Kumar Poonia 2/19/2012
  • 14.
    Lower Bound For ComparisonSorts Thus the time to comparison sort n elements is (n lg n)  Corollary: Heapsort and Mergesort are asymptotically optimal comparison sorts  But the name of this lecture is “Sorting in linear time”!   How can we do better than (n lg n)? Sandeep Kumar Poonia 2/19/2012
  • 15.
    Sorting In LinearTime  Counting sort   No comparisons between elements! But…depends on assumption about the numbers being sorted  We  assume numbers are in the range 1.. k The algorithm: A[1..n], where A[j]  {1, 2, 3, …, k}  Output: B[1..n], sorted (notice: not sorting in place)  Also: Array C[1..k] for auxiliary storage  Input: Sandeep Kumar Poonia 2/19/2012
  • 16.
    Counting Sort 1 2 3 4 5 6 7 8 9 10 CountingSort(A, B,k) for i=0 to k C[i]= 0; for j=1 to n C[A[j]]= C[A[j]] + 1; for i=2 to k C[i] = C[i] + C[i-1]; for j=n downto 1 B[C[A[j]]] = A[j]; C[A[j]] = C[A[j]] - 1; Sandeep Kumar Poonia 2/19/2012
  • 17.
    Counting Sort 1 2 3 4 5 6 7 8 9 10 CountingSort(A, B,k) for i=1 to k Takes time O(k) C[i]= 0; for j=1 to n C[A[j]] += 1; for i=2 to k C[i] = C[i] + C[i-1]; Takes time O(n) for j=n downto 1 B[C[A[j]]] = A[j]; C[A[j]] -= 1; What will be the running time? Sandeep Kumar Poonia 2/19/2012
  • 18.
    Counting Sort  Total time:O(n + k)    Usually, k = O(n) Thus counting sort runs in O(n) time But sorting is (n lg n)!   No contradiction--this is not a comparison sort (in fact, there are no comparisons at all!) Notice that this algorithm is stable Sandeep Kumar Poonia 2/19/2012
  • 19.
    Counting Sort Cool! Whydon’t we always use counting sort?  Because it depends on range k of elements  Could we use counting sort to sort 32 bit integers? Why or why not?  Answer: no, k too large (232 = 4,294,967,296)  Sandeep Kumar Poonia 2/19/2012
  • 20.
    Counting Sort How didIBM get rich originally?  Answer: punched card readers for census tabulation in early 1900’s.   In particular, a card sorter that could sort cards into different bins  Each column can be punched in 12 places  Decimal digits use 10 places  Problem: only one column can be sorted on at a time Sandeep Kumar Poonia 2/19/2012
  • 21.
    Radix Sort Intuitively, youmight sort on the most significant digit, then the second msd, etc.  Problem: lots of intermediate piles of cards to keep track of  Key idea: sort the least significant digit first  RadixSort(A, d) for i=1 to d StableSort(A) on digit i Sandeep Kumar Poonia 2/19/2012
  • 22.
    Radix Sort Sandeep KumarPoonia 2/19/2012
  • 23.
    Radix Sort Can weprove it will work?  Sketch of an inductive argument (induction on the number of passes):    Assume lower-order digits {j: j<i}are sorted Show that sorting next digit i leaves array correctly sorted  If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant)  If they are the same, numbers are already sorted on the lower-order digits. Since we use a stable sort, the numbers stay in the right order Sandeep Kumar Poonia 2/19/2012
  • 24.
    Radix Sort Sandeep KumarPoonia 2/19/2012
  • 25.
    Radix Sort Sandeep KumarPoonia 2/19/2012
  • 26.
  • 27.
  • 28.
    Radix Sort  In general,radix sort based on counting sort is     Fast Asymptotically fast (i.e., O(n)) Simple to code A good choice Sandeep Kumar Poonia 2/19/2012
  • 29.
    Bucket Sort  Bucket sort   Assumption:input is n reals from [0, 1) Basic idea:  Create n linked lists (buckets) to divide interval [0,1) into subintervals of size 1/n  Add each input element to appropriate bucket and sort buckets with insertion sort  Uniform input distribution  O(1) bucket size  Therefore  the expected total time is O(n) These ideas will return when we study hash tables Sandeep Kumar Poonia 2/19/2012
  • 30.
    Bucket Sort Sandeep KumarPoonia 2/19/2012
  • 31.
    Bucket Sort Sandeep KumarPoonia 2/19/2012
  • 32.
    Bucket Sort Sandeep KumarPoonia 2/19/2012