Sorting
Sorting refers to the operation of arranging data in some order such as
increasing or decreasing with numerical data or alphabetically with
character data
1
Complexity of Sorting Algorithm
2
Each sorting algorithm S will be made up of the following operations, where
A1, A2 , ……, An contain the items to be sorted and B is an auxiliary
location:
(a) Comparisons, which test whether Ai < Aj or test Ai < B
(b) Interchange, which switch the content of Ai and Aj or of Ai and
B
(c) Assignments, which set B := Ai and then set Aj := B or Set Aj :=
Ai
2
3 Sorting
Algorithms are divided into two categories:
Internal Sorts
and
External sorts.
4 Sorting
Internal Sort:
Any sort algorithm which uses main memory exclusively during the sort.
This assumes high-speed random access to all memory.
5 Sorting
External Sort:
Any sort algorithm which uses external memory, such as tape or disk, during
the sort.
6 Internal Sorting
Bubble Sort
The oldest and simplest sort in use.
Unfortunately, also the slowest.
Works by comparing each item in the list with the item next to it, and
swapping them if required.
This causes larger values to "bubble" to the end of the list
7
Bubble Sort
Suppose the list of number A[1], [2], A[3], …, A[N] is in memory.
Algorithm works as follows.
[1] Compare A[1] and A[2], arrange them in the desired order so
that A[1] < A[2]. Then Compare A[2] and A[3], arrange them
in the desired order so that A[2] < A[3]. Continue until A[N-1]
is compared with A[N], arrange them so that A[N-1] < A[N].
7
8
Bubble Sort
2 Repeat Step 1, Now stop after comparing and re-arranging
A[N-2] and A[N-1].
3 Repeat Step 3, Now stop after comparing and re-arranging
A[N-3] and A[N-2].
.
.
[N-1] Compare A[1] and A[2] and arrange them in sorted order so
that A[1] < A[2].
After N-1 steps the list will be sorted in increasing order.
8
A Bubble Sort Example
9
6
Compare
5
4
3
2
1
9
10 A Bubble Sort Example
5
Swap
6
4
3
2
1
11 A Bubble Sort Example
5
6
Compare
4
3
2
1
12 A Bubble Sort Example
5
4
Swap
6
3
2
1
13 A Bubble Sort Example
5
4
6
Compare
3
2
1
14 A Bubble Sort Example
5
4
3
Swap
6
2
1
15 A Bubble Sort Example
5
4
3
6
Compare
2
1
16 A Bubble Sort Example
5
4
3
2
Swap
6
1
17 A Bubble Sort Example
5
4
3
2
6
Compare
1
A Bubble Sort Example
18
As you can see, the largest 5
number 4
has “bubbled” down, or sunk
to the bottom of the List after 3
the first pass through the 2
List.
1
Swap 6
A Bubble Sort Example
19
5
Compare
4
For our second pass through 3
the List, we start by
comparing these first two 2
elements in the List.
1
6
20
A Bubble Sort Example
4
Swap
5
3
2
1
6
21 A Bubble Sort Example
4
5
Compare
3
2
1
6
A Bubble Sort Example
22
4
3
Swap
5
2
1
6
A Bubble Sort Example
23
4
3
5
Compare
2
1
6
24
A Bubble Sort Example
4
3
2
Swap
5
1
6
A Bubble Sort Example
25
4
3
2
5
Compare
1
6
A Bubble Sort Example
26
At the end of the second pass, we
stop at element number n - 1,
4
because the largest element in the 3
List is already in the last
position. This places the second 2
largest element in the second to
last spot. 1
Swap 5
6
27
A Bubble Sort Example
4
Compare
3
We start with the first two 2
elements again at the beginning
of the third pass. 1
5
6
A Bubble Sort Example
28
3
Swap
4
2
1
5
6
A Bubble Sort Example
29
3
4
Compare
2
1
5
6
30
A Bubble Sort Example
3
2
Swap
4
1
5
6
31
A Bubble Sort Example
3
2
4
Compare
1
5
6
32
A Bubble Sort Example
At the end of the third pass, we stop
comparing and swapping at
3
element number n - 2. 2
1
Swap
4
5
6
A Bubble Sort Example
33
3
Compare
2
The beginning of the fourth pass... 1
4
5
6
34 A Bubble Sort Example
2
Swap
3
1
4
5
6
35
A Bubble Sort Example
2
3
Compare
1
4
5
6
36
A Bubble Sort Example
2
1
Swap
3
The end of the fourth pass 4
stops at element number n - 3.
5
6
37
A Bubble Sort Example
2
Compare
1
3
The beginning of the fifth pass...
4
5
6
38
A Bubble Sort Example
1
Swap
2
The last pass compares only 3
the first two elements of
the List. After this 4
comparison and possible
swap, the smallest element
5
has 6
“bubbled” to the top.
39
What “Swapping” Means
TEMP
6
5
6
4
Place the first element into the 3
Temporary Variable.
2
1
40
What “Swapping” Means
TEMP
5
5
6
4
Replace the first element with 3
the second element.
2
1
41
What “Swapping” Means
TEMP
5
6
6
4
Replace the second element 3
with the Temporary Variable.
2
1
42 Bubble Sort
DATA is an array with N elements
1 Repeat Step 2 and 3 for K =1 to N-1
2 Set PTR :=1
3 Repeat While PTR <= N –K
(a) If DATA[PTR] > DATA[PTR+1]
Interchange DATA[PTR] and DATA[PTR + 1]
(b) Set PTR = PTR + 1
4 Exit
46
43 Analysis of Bubble Sort
Best, Worst and Average cases are same:
f(n) = (n-1) + (n-2) + …. + 2+1 = n(n-1)/ 2
= O(n2 )
47
44 Variation of Bubble Sort
Bubble sort with early termination
The algorithm repeats the process until it makes a pass all the way through
the list without swapping any items.
Alternate passes in opposite direction
A pass moves the largest element in the sub-array to the end of sub-array.
This is same as a pass in standard bubble sort.
Next pass moves the smallest element in sub-array to the beginning of
sub-array.
Largest and smallest values are settled in alternate passes.
This causes larger values to "bubble" to the end of the list while smaller
values "sink" towards the beginning of the list.
Insertion Sort
45
Insertion sort is frequently used by bridge players when they first sort their
cards.
Insertion sort is frequently used when n is small.
General Idea:
An array A with N elements A[1], A[2] … A[N] is in memory
Insertion Sort scan A from A[1] to A[N] inserting each elements A[k] into its
proper position in the previously sorted subarray A[1], A[2], …. A[K-1]
45
46
Pass 1: A[1] by itself is trivially sorted
Pass 2: A[2] is inserted either before or after A[1] so that A[1],
A[2] is sorted
Pass 3: A[3] is inserted in its proper place in A[1], A[2], that is
before A[1], between A[1] and A[2] or after A[2] so that A[1],
A[2], A[3] is sorted.
46
47
Pass 4: A[4] is inserted in its proper place in A[1], A[2], A[3] so that A[1],
A[2], A[3], A[4] is sorted.
.
.
Pass N: A[N] is inserted in its proper place in A[1], A[2], A[3], .. A[N-1] so
that A[1], A[2], A[3], A[4] , A[N] is sorted.
47
48 Insertion Sort Example
Sort an array A with 8 elements
77, 33, 44, 11, 88, 66, 55
48
Insertion Sort
49
Pass A[0] A[1] A[3 A[4 A[5 A[6 A[7 A[8
] ] ] ] ] ]
A[2] K=1 -∞ 77 44 11 88 22 66 55
33 44 11 88 22 66 55
K=2 -∞ 77 33 44 11 88 22 66 55
K=3 -∞ 33 77 77 11 88 22 66 55
K=4 -∞ 33 44 44 77 88 22 66 55
K=5 -∞ 11 33 44 77 88 22 66 55
K=6 -∞ 11 33 33 44 77 88 66 55
K=7 -∞ 11 22 33 44 66 77 88 55
K=8
Sorted -∞ 11 22 33 44 55 66 77 88
49
Insertion Algorithm
50
This algorithm sort an array with N elements
1 Set A[0] = -∞ [Initialize a delimiter]
2 Repeat Steps 3 to 5 for K = 2, 3, …, N
3 Set TEMP = A[K] and PTR = K-1
4 Repeat while TEMP < A[PTR]
(a) Set A[PTR+1] = A[PTR]
(b) Set PTR = PTR -1
5 Set A[PTR+1] = TEMP
6 Exit
50
51 Complexity of Insertion Sort
Best case: When the array is completely sorted- 4n-4 = O(n)
Worst Case: When the array is in reverse order- n(n-1)/2 = O(n2)
Average Case: n(n-1)/4 = O(n2)
Improvement in Performance:
Use binary search in inner loop to find location where A[k] is
to be inserted: log2k comparisons
but (k-1)/2 movements still required so order of complexity
remain same
51
Selection Sort
52
Suppose an array A with N elements is in memory. Selection sort works as
follows
First find the smallest element in the list and put it in the first position.
Then, find the second smallest element in the list and put it in the
second position and so on.
52
Pass 1: Find the location LOC of the smallest element in the list A[1], A[2],
… A[N]. Then interchange A[LOC] and A[1]. Then: A[1] is sorted
53
Pass 2: Find the location LOC of the smallest element in the sublist A[2],
A[3], … A[N]. Then interchange A[LOC] and A[2]. Then: A[1],A[2] is
sorted since A[1] <= A[2].
Pass 3: Find the location LOC of the smallest element in the sublist A[3],
A[4], … A[N]. Then interchange A[LOC] and A[3]. Then: A[1], A[2],A[3]
is sorted, since A[2] <= A[3].
53
54
Pass N-1: Find the location LOC of the smallest element in the
sublist A[N-1], A[N]. Then interchange A[LOC] and A[N-1].
Then: A[1], A[2], ….. , A[N] is sorted, since A[N-1] <= A[N].
A is sorted after N-1 pass.
54
Selection Sort
55
Pass A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8]
K=1 LOC=4 77 33 44 11 88 22 66 55
K=2 LOC=6 11 33 44 77 88 22 66 55
K=3 LOC=6 11 22 44 77 88 33 66 55
K=4 LOC=6 11 22 33 77 88 44 66 55
K=5 LOC=8 11 22 33 44 88 77 66 55
K=6 LOC=7 11 22 33 44 55 77 66 88
K=7 LOC=4 11 22 33 44 55 66 77 88
Sorted 11 22 33 44 55 66 77 88
55
Algorithm
56
Complexity
57
Best, worst and average case complexities are same.
f(n) = (n-1) + (n-2) + …… + 2+ 1 = n(n-1)/2
=O(n2 )
57
Merge Sort
Merge sort:
divide a list into two halves
sort the halves
recombine the sorted halves into a sorted whole
Merge sort is an example of a “divide and conquer” algorithm
divide and conquer algorithm: an algorithm that repeatedly divides
the given problem into smaller pieces that can be solved more easily
it’s easier to sort the two small lists than the one big list
Merge Sort Picture
0 1 2 3 4 5 6 7
22 18 12 -4 58 7 31 42
s
22 18 12 -4 58 7 31 42 p
l
i
22 18 12 -4 58 7 31 42 t
22 18 12 -4 58 7 31 42
18 22 -4 12 7 58 31 42 m
e
r
-4 12 18 22 7 31 42 58 g
e
0 1 2 3 4 5 6 7
-4 7 12 18 22 31 42 58
60
1 2 2 4 5 6 7
Merge
2 4 5 7 1 2 3 6
Merge
2 5 4 7 1 3 2 6
Merge
5 2 4 7 1 3 2 6
60
61
62
Time complexity of Merging = O(r+s) = O(n)
62
63 Merging: improvement when r<<s
When number of elements in A is much smaller than number of elements in B, A
can be merged into B more efficiently using binary search
For each element of A, perform binary search on B to find its proper location: log
s comparisons
For r elements of A: r log s comparisons
Since r<<s, r log s < r+s
64 Merging: Further improvement
Reducing the target set
Say, first element is inserted after B[16], then we need to
perform binary search on B[17] onwards for inserting next
element
Tabbing
Find the tab using linear search and perform binary search on the
selected tab
Example
65
66 MERGE
MERGE(A,R,LBA,B,S,LBB,C,LBC)
This procedure merges sorted arrays A and B into array C.
1. Set NA=LBA, NB= LBB, PTR=LBC, UBA=LBA+R-1, UBB=LBB+S-1
2. Same as MERGING except R is replaced by UBA and S by UBB.
3. Return
67
68 Merge Sort
69 Complexity
Time- execution is independent of order of elements
Best: O(nlogn)
Worst: O(nlogn)
Average: O(nlogn)
Space- an auxiliary array of same size as input array is required
O(n)
70 Counting Sort
Counting sort is an integer sorting method
effective when the difference between different values are not large
Method
Use element value as index to a count array
count the number of times each element occurs
using arithmetic on those counts to determine the positions of
each value in the output sequence
71 Example: Counting Sort
Consider the data in the range 0 to 9.
Input data: 1, 4, 1, 2, 7, 5, 2
1. Take a count array to store the count of each unique element
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 2 2 0 1 1 0 1 0 0
2. Modify the count array such that each element at each index stores the sum of previous counts
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 2 4 4 5 6 6 7 7 7
3. Output each value from the input data followed by decreasing its count by 1
Put data 1 at index 2 in output. Decrease count by 1 to place next data 1 at an index 1 smaller than
this index
Output data: 1, 4, 1, 2, 7, 5, 2
72 Complexity of Counting Sort
Time Complexity: O(n+k) where n is the number of elements in
input array and k is the range of input
Auxiliary Space: O(n+k)
73 Bucket Sort (or Bin Sort)
Method
distribute the elements of input array into a fixed number of
buckets
Buckets are implemented usually using linked lists
Each bucket is then sorted individually, either using a different
sorting algorithm, or by recursively applying the bucket sorting
algorithm
generalization of counting sort
if each bucket has size 1 then bucket sort degenerates to counting
sort
74 Example: Bucket Sort
75 Bucket Sort: Complexity
mainly useful when input is uniformly distributed over a range
The worst-case scenario occurs when all the elements are placed in a single
bucket
When the input contains several values that are close to each other, those
elements are likely to be placed in the same bucket
overall performance would then be dominated by the algorithm used to sort
each bucket (O(n2) or O(nlogn))
When input values are distributed uniformly among the buckets, time complexity
is O(n)
Average case: O(n+n2/k+k) and k = θ(n)
Radix sort
Radix = “The base of a number system”
Idea: Counting (Bucket) Sort on each digit, bottom up (from LSD to
MSD)
Example: base 2
2 0 1 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 1 1
5 1 0 1 1 0 0 1 0 1 0 1 0 2
1 0 0 1 1 1 0 0 0 1 0 1 1 3
7 1 1 1 1 0 1 0 1 0 1 0 0 4
3 0 1 1 0 0 1 1 1 0 1 0 1 5
4 1 0 0 1 1 1 1 1 1 1 1 0 6
6 1 1 0 0 1 1 0 1 1 1 1 1 7
Radix Sort
Example: base 10
0 3 2 0 3 1 0 1 5 0 1 5
2 2 4 0 3 2 0 1 6 0 1 6
0 1 6 2 5 2 1 2 3 0 3 1
0 1 5 1 2 3 2 2 4 0 3 2
0 3 1 2 2 4 0 3 1 1 2 3
1 6 9 0 1 5 0 3 2 1 6 9
1 2 3 0 1 6 2 5 2 2 2 4
2 5 2 1 6 9 1 6 9 2 5 2
Radix sort: Complexity
Each sorting pass can be performed via counting sort
It is O(n): counting sort is applied only on a single digit
If the numbers are m digits long (or max m digits in any number),
then there are m sorting steps.
Hence, radix sort is O(mn)
79 In-place Sorts
An in-place algorithm is an algorithm that does not need an extra space and produces an
output in the same memory that contains the data by transforming the input ‘in-place’.
However, a small constant extra space (i.e. O(1)) used for variables is allowed.
An algorithm which is not in-place is sometimes called not-in-place or out-of-place
In-Place : Bubble sort, Selection Sort, Insertion Sort, Heapsort.
Not In-Place : Merge Sort.
80 Stable Sorts
Stability is mainly important when we have duplicate input values
A sorting algorithm is said to be stable if two objects with equal keys appear in the same
order in sorted output as they appear in the input array to be sorted.
Summary of Sorting Algorithms
81
Algorithm Best TC Avg TC Worst Space Com In-place? Stable? Remark
TC
Bubble n n2 n2 1 Yes Yes Best case with early termination
Insertion n n2 n2 1 Yes Yes
Selection n2 n2 n2 1 Yes No
Merge nlogn nlogn nlogn n No Yes
Quick nlogn nlogn n2 logn Yes No Extra space is for recursive calls
Heap nlogn nlogn nlogn n Yes No
Counting n+k n+k n+k n+k No Yes k is range of input values. If k is
O(n), time complexity is O(n)
Radix mn mn mn n+k No Yes m is max number of digits in any
input value. k is range of digits
Bucket n+k n+k n2 n+k No Yes Buckets are implemented using
arrays, counting sort is used for
individual buckets
82 Other Sorting Algorithms
Comparison Sorts
Shell Sort, Gnome sort, Tournament Sort, Tim Sort, Block Sort,
Cube Sort, Tree Sort, Block Sort, Library Sort, Smooth Sort,
odd-even Sort, Strand Sort etc.
Non-Comparison Sorts
Pigeonhole Sort, Flash Sort, Spread Sort, Burst Sort, Postman
Sort etc.
Others
Bitonic, Poll Sort, Bogosort, Stooge Sort etc.
Lower Bound for Comparison Sorting
Merge sort and Heap sort
worst-case running time is O(N log N)
Are there better algorithms?
Goal: Prove that any sorting algorithm based on only comparisons
takes (N log N) comparisons in the worst case (worse-case input)
to sort N elements.
Lower Bound for Sorting
Suppose we want to sort N distinct elements
How many possible orderings do we have for N elements?
We can have N! possible orderings (e.g., the sorted output for a,b,c
can be a b c, b a c, a c b, c a b, c b a, b c a.)
Lower Bound for Sorting
Any comparison-based sorting process can be represented as a
binary decision tree.
Each node represents a set of possible orderings, consistent with
all the comparisons that have been made
The tree edges are results of the comparisons
Decision tree for
Algorithm X for sorting
three elements a, b, c
Lower Bound for Sorting
A different algorithm would have a different decision tree
Decision tree for Insertion Sort on 3 elements:
Lower Bound for Sorting
The worst-case number of comparisons used by the sorting algorithm is
equal to the depth of the deepest leaf
The average number of comparisons used is equal to the average depth of the leaves
A decision tree to sort N elements must have N! leaves
a binary tree of depth d has at most 2 d leaves
the tree must have depth at least log2 (N!)
Therefore, any sorting algorithm based on only comparisons between
elements requires at least
log2(N!) comparisons in the worst case.
Lower Bound for Sorting
Any sorting algorithm based on comparisons between elements requires (N log N)
comparisons.
90 Better than nlogn
Theoretical computer scientists have detailed other sorting algorithms that provide better
than O(n log n) time complexity assuming additional constraints, including:
Thorup's algorithm, a randomized algorithm for sorting keys from a domain of finite size, taking
O(n log log n) time and O(n) space
A randomized integer sorting algorithm taking O ( n log log n ) expected time and O(n) space