0% found this document useful (0 votes)

72 views87 pages

07 Sort2

The document discusses different sorting methods for lists of records. It describes insertion sort, quick sort, and their analyses. Quick sort uses a divide and conquer approach, recursively sorting sublists by partitioning them around a pivot element.

Uploaded by

Jay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views87 pages

07 Sort2

Uploaded by

Jay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

Data Structures: Sorting

Wei-Mei Chen

Department of Electronic and Computer Engineering

National Taiwan University of Science and Technology

1/79
Introduction
Motivation

• Why efficient sorting methods are so important?

• The efficiency of a searching strategy depends on the
assumptions we make about the arrangement of records in the
list
• No single sorting technique is the “best” for all initial
orderings and sizes of the list being sorted.
• We examine several techniques, indicating when one is
superior to the others.
• Assumption
The term list here is a collection of records. Each record has
one or more fields. Each record has a key to distinguish one
record with another
+ name or phone number.
2/79
Sequential Search

• We search the list by examining the key values

Example: 4, 15, 17, 26, 30, 46, 48, 56, 58, 82, 90, 95
int seqSearch(element a[], int k, int n)
{ int i; /* data → a[1:n]
for(i=1; i<=n && a[i].key !=k; i++) ;
if (i>n) return 0;
return i; /* return i if a[i]==key; otherwise return 0
}

• Analysis:
• Unsuccessful search: n + 1 ⇒ O(n)
• Average successful search:
n
X i n+1
= = O(n)
n 2
i=1

3/79
Binary Search

• Consider a sorted list: {8, 23, 32, 45, 56, 78}

Figure out if searchnum is in the list.
• YES à i where list[i]=searchnum
• NO à −1

while (more integers to check) {

middle = (left + right) / 2;
if (searchnum < list[middle])
right = middle - 1;
else if (searchnum == list[middle])
return middle;
else left = middle + 1;
}

4/79
Interpolation Search

• Consider the telephone directory: looking for “Wilson”

We won’t start the search at the middle.
• Comparing k with a[i].key

k − a[1].key
i= ×n
a[n].key − a[1].key

Interpolation search can be used only when the file is

ordered.

5/79
List Verification

• Assume that list1[1..n] and list2[1..m]. Determine if

1. A record with key list1[i].key, but there is no record with
the same key in list2
2. A record with key list2[j].key , but there is no record with
the same key in list1
3. list1[i].key==list2[j].key, but the two records do not
match on at least one of the other fields

6/79
Verifying Two Unsorted Lists

void verify1(element list1[], element list2[], int n, int m)

{ /* Compare list1[1:n] and list2[1:m] */
int i, j, marked[MAX SIZE];
for (i=1; i<=m; i++)
marked[i] = FALSE;
for (i=1; i<=n; i++)
if(( j = seqSearch(list2, m, list1[i].key)) == 0)
printf("%d is not in list 2\n", list1[i].key);
else
marked[j] = TRUE;
for (i=1; i<=m; i++)
if(!marked[i])
printf("%d is not in list 1\n", list2[i].key);
}

Worst-case complexity: O(mn)

7/79
Verifying Two Sorted Lists

void verify2(element list1[], element list2[], int n, int m)

{ int i, j;
sort(list1, n); sort(list2, m);
i = j = 1;
while( i<=n && j<=m)
if (list1[i].key < list2[j].key) {
printf("%d is not in list 2\n", list1[i].key); i++;
}
else if (list1[i].key == list2[j].key) {
i++; j++;
}
else {
printf("%d is not in list 1\n", list2[j].key); j++;
}
for (; i<=n; i++) printf("%d is not in list 2\n", list1[i].key);
for (; j<=m; j++) printf("%d is not in list 1\n", list2[j].key);
}

O(tSort (n) + tSort (m) + n + m)

à Worst-case complexity: O(max{n log n, m log m})
8/79
Definitions

• Given a list of records (R1 , R2 , . . . , Rn ). Each record has a key

Ki . The sorting problem is then that of finding permutation,
σ, such that Kσ(i) ≤ Kσ(i+1) , 1 ≤ i ≤ n − 1. The desired
ordering is (Rσ(1) , Rσ(2) , . . . , Rσ(n) ).
• If a list has several key values that are identical, the
permutation, σs , is not unique. Let σs be the permutation of
the following properties:
1. Kσ(i) ≤ Kσ(i+1) , 1 ≤ i ≤ n − 1.
2. If i < j and Ki == Kj in the input list, then Ri precedes Rj in
the sorted list.
à stable

9/79
An Example for Stable

1. Schedule
2. Destination

10/79
Categories of Sorting Method

• Internal Method: Methods to be used when the list to be

sorted is small enough so that the entire sort list can be
carried out in the main memory.
+ Insertion sort, quick sort, merge sort, heap sort and
radix sort.
• External Method: Methods to be used on larger lists.

11/79
Insertion Sort
Insertion Sort

void insert(element e, element a[], int i) void insertionSort(element a[], int n)

{ /* a[1:i+1]→ sorted { /* a[1:n] → sorted
a[0] = e; int j;
while (e.key < a[i].key) { for ( j =2; j<=n; j++) {
a[i+1] = a[i]; element temp = a[j];
i--; insert(temp, a, j-1);
} }
a[i+1] = e; }
}

5 2 4 6 1 3
2 5 4 6 1 3 sorted key ?
2 4 5 6 1 3
2 4 5 6 1 3 Loop invariant : At the start of each
1 2 4 5 6 3 iteration of the for loop, A[1..j − 1] con-
1 2 3 4 5 6 tains it’s original elements but sorted

12/79
Analysis of Insertion Sort

• In the worst case, insert(e, a, i) makes i + 1 comparisons

before making the insertion.
X n−1
Thus, the total cost is O (i + 1) = O(n2 )
i=1
• Variations
• Binary insertion sort:
the number of comparisons in an insertion sort can be reduced
if we replace the sequential search by binary search. The
number of records moves remains the same.
• List insertion sort:
The elements of the list are represented as a linked list rather
than an array. The number of record moves becomes zero.

13/79
Quick Sort
Quick Sort

Hoare [1962]
• the best practical sorting algorithm
• in place sorting
• worst-case running time: Θ(n2 ) on n numbers
• average-case running time: Θ(n lg n)

14/79
Description of Quick Sort

Divide-and-conquer paradigm

• Divide: a[p..r ]
p q r
≤x x >x

↑ pivot
• Conquer:
Sort the two subarrays a[p..q − 1] and a[q + 1..r ] by recursive calls
to quick sort.
• Combine:
Since the subarrays are sorted in place, no work is needed to
combine them

15/79
Algorithm quickSort
i j
26 5 37 1 61 11 59 15 48 19

void quickSort(element a[],int left,int right) i j

{ /* a[left].key: the pivot key */ 26 5 37 1 61 11 59 15 48 19
int pivot, i, j; i j
element temp; 26 5 19 1 61 11 59 15 48 37
if (left < right) { j i
i = left; j = right+1; 26 5 19 1 15 11 59 61 48 37
pivot = a[left].key;
do { /*partition by swapping*/
11 5 19 1 15 26 59 61 48 37
do i++; while(a[i].key < pivot);
do j--; while(a[j].key > pivot);
1 5 11 19 15 26 59 61 48 37
if (i<j) SWAP(a[i], a[j], temp);
}while (i<j);
1 5 11 15 19 26 59 61 48 37
SWAP( a[left], a[j], temp);
quickSort(a, left, j-1);
quickSort(a, j+1, right); 1 5 11 15 19 26 48 37 59 61
}
} 1 5 11 15 19 26 37 48 59 61

16/79
Worst-Case Partitioning

• The worst-case behavior for quick sort occurs when the

partitioning routine produces one subproblem with n − 1
elements and one with 0 elements.
• Assume T (0) = Θ(1),

n
T (n) = T (n − 1) + T (0) + n − 1 0 n−1
= T (n − 1) + n − 1 + T (0)
0 n−2
= T (n − 2) + (n − 2) + (n − 1) + 2T (0)
··· 0 n−3
= T (0) + 0 + 1 + 2 + 3 + · · · + (n − 1) + nT (0)
.. ..
= Θ(n2 ) . .

17/79
Best-Case Partitioning

• In the most even possible split, PARTITION produces two

subproblems.
n
dn/2e bn/2c-1

.. .. .. ..
. . . .

• Then T (n) ≤ 2T (n/2) + Θ(n)

⇒ we have T (n) = O(n lg n)
• Thus the equal balancing of the two sides of the partition at
every level of the recursion produces an asymptotically faster
algorithm.

18/79
Average-Case Running Time

In the average case, PARTITION produces a mix of ”good” and

”bad” splits.
• Consider
Basic operation: the comparison of a[i] with a pivot a[r ] in
PARTITION.
Input size: n, the number of items in a
• Assume that the value of pivot is equally likely to be any of
the numbers from 1 through n.
↓ Probability Time to partition

n
X 1
T (n) = [T (p − 1) + T (n − p)] + n − 1
n
p=1

19/79
n n
X 1 2X
Since [T (p − 1) + T (n − p)] = T (p − 1), we have
p=1
n n p=1
n
2X
T (n) = T (p − 1) + n − 1.
n p=1
n
X
”×n” : nT (n) = 2 T (p − 1) + n(n − 1) (1)
p=1

n−1
X
”n → n − 1” : (n − 1)T (n − 1) = 2 T (p − 1) + (n − 1)(n − 2) (2)
p=1

”(1) − (2)” ⇒ nT (n) − (n − 1)T (n − 1) = 2T (n − 1) + 2(n − 1)

T (n) T (n − 1) 2(n − 1)
= +
n+1 n n(n + 1)

20/79
T (n) T (n − 1) 2(n − 1)
= + n−1 2 1
n+1 n n(n + 1) = −
n(n + 1) n+1 n
T (n − 1) T (n − 2) 2(n − 2)
= +
n n−1 (n − 1)(n)
n−2 2 1
.. .. = −
. . (n − 1)n n n−1
T (2) T (1) 2
= +
3 2 6

T (n) 4 1 1 1
summing ⇒ = +2 + + ··· + −1
n+1 n+1 n n−1 3
n
X 1 4
=2 + −1
k n+1
k=3
4
= 2Hn + −4
n+1

T (n) ≈ 2(n + 1) ln n ∈ Θ(n lg n)

21/79
Harmonic Numbers

n
X 1
Hn =
k
k=1
1 1 1
=1+ + + ··· +
2 3 n
1 1 1 1
= ln n + γ + − 2
+ 4
− + ···
2n 12n 120n 252n6
where γ = 0.5772156649

22/79
Variations of Quick Sort

1. Median of three:
Pick the median of the first, middle, and last keys in the
current sublist as the pivot. Thus,
pivot = median{Kl , K(l+r )/2 , Kr } .
à Use median as the pivot
2. Randomized Quick Sort:
Pick a key randomly.

23/79
Optimal Sorting Time
How Fast Can We Sort?

• We will provide a lower bound, then beat it.

How do you suppose we’ll beat it?
• All of the sorting algorithms so far are
comparison sorts.
• Basic operation: a sequence is the pairwise
comparison of two elements
• Theorem: all comparison sorts are Ω(n lg n)

24/79
Decision Trees

• Decision trees provide an abstraction of comparison sorts

• A decision tree represents the comparisons made by a comparison sort.
• The decision tree for insertion sort operating on three elements.
{6, 8, 5} 1:2
≤ >

2:3 1:3
≤ > ≤ >
< 1, 2, 3 > 1:3 < 2, 1, 3 > 2:3
≤ > ≤ >
< 1, 3, 2 > < 3, 1, 2 > < 2, 3, 1 > < 3, 2, 1 >

There are 3! = 6 possible permutations (leaves)

25/79
Lower Bounds

• A lower bound of a problem is the least time complexity

required for any algorithm which can be used to solve this
problem.
• worst case lower bound (lower bound)
• average case lower bound
• The lower bound for a problem is not unique.
+ as tight as possible
• e.g. Ω(1), Ω(n), Ω(n log n) are all lower bounds for sorting.
• Ω(1), Ω(n) are trivial
• If the present lower bound is Ω(n log n) and there is an
algorithm with time complexity O(n log n), then the algorithm
is optimal.

26/79
Lower Bound for Sorting

• The length of the longest path from the root of a decision

tree to any of its reachable leaves represents the worst-case
number of comparisons that the corresponding sorting
algorithm performs.
à the worst-case #of comparisons
= the height of its decision tree.
• The worst-case time complexity:
the longest path from the top of the tree to a leaf node
• Balanced tree has the smallest depth:

dlog(n!)e = Ω(n log n)

à lower bound for sorting: Ω(n log n)

27/79
Theorem
Any algorithm that sorts only by comparisons must have a
worst-case computing time of Ω(n log n)

log(n!) = log(n(n − 1)(n − 2) · · · 1)

= log 2 + log 3 + · · · + log n
Z n
> log xdx
1
Z n
= log e ln xdx
1
n
= log e x ln x − x 1
= log e [n ln n − n + 1] 1 2 3 4

= n log n − n log e + 1.44

≥ n log n − 1.44n
= Ω(n log n)
28/79
Merge Sort
Example: Merging

Merge two sorted lists into one sorted list

2 4 5 7 ∞ 1 2 3 6 ∞

1 2 2 3 4 5 6 7

29/79
Merging Two Sorted Lists

void merge(element initList[], element mergedList[], int i, int m, int n)

{/* initList[i:m] and initList[m+1:n] Ü mergedList[i:m]*/
int j, k, t;
j = m+1;
k = i;
while( i<=m && j<=n) {
if (initList[i].key <= initList[j].key)
mergedList[k++] = initList[i++];
else
mergedList[k++] = initList[j++];
} O(n − i + 1)
if(i>m)/* mergedList[k:n] = initList[j:n] */
for (t = j; t<=n; t++)
mergedList[t] = initList[t];
else /* mergedList[k:n] = initList[i:m] */
for (t = i; t<=m; t++)
mergedList[k+t-i] = initList[t];
}

30/79
Iterative Merge Sort

31/79
void mergePass(element initList[], element mergedList[], int n, int s)
{
int i, j;
for( i=1; i<= n - 2 * s + 1; i+= 2*s)
merge(initList, mergedList, i, i+s-1, i+2*s-1);
if ( i+s-1 < n)
merge(initList, mergedList, i, i+s-1, n);
else
for( j=i; j<=n; j++)
mergedList[j] = initList[j];
}

void mergeSort(element a[], int n)

{ /* sort a[1:n] */
int s =1;
element extra[MAX SIZE];
while(s<n) {
mergePass(a, extra, n, s);
s*=2;
mergePass(extra, a, n, s);
s*=2;
}
}
32/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

5 2 4 7

5 2

33/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

2 5 4 7

5 2 4 7

34/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

2 5 4 7

5 2 4 7

35/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

2 4 5 7 1 3 2 6

2 5 4 7

5 2 4 7

36/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

2 4 5 7 1 2 3 6

2 5 4 7 1 3 2 6

5 2 4 7 1 3 2 6

37/79
Example: Recursive Merge Sort

1 2 2 3 4 5 6 7

2 4 5 7 1 2 3 6

2 5 4 7 1 3 2 6

5 2 4 7 1 3 2 6

38/79
Dividing Phase

5 2 4 7 1 3 2 6

5 2 4 7 1 3 2 6

Divide

39/79
Merging Phase

1 2 2 3 4 5 6 7

2 4 5 7 1 2 3 6

2 5 4 7 1 3 2 6

5 2 4 7 1 3 2 6

Conquer and Combine

40/79
Dividing Phase: Recursive Merge Sort

41/79
Merging Phase: Recursive Merge Sort

42/79
int listMerge(element a[], int link[], int start1, int start2)
{ int last1, last2, lastResult=0;
for(last1 = start1, last2 = start2; last1 && last2; )
if (a[last1] <= a[last2]) {
link[lastResult] = last1;
lastResult = last1;
last1 = link[last1];
}
else{
link[lastResult2] = last2;
lastResult = last2;
last2 = link[last2];
}
if(last1 == 0) link[lastResult] = last2;
else link[lastResult] = last1;
return link[0];
}

int rmergeSort(element a[], int link[], int left, int right)

{ if(left >= right) return left;
int mid = (left + right)/2;
return listMerge(a, link,rmergeSort(a, link, left, mid),
rmergeSort(a, link, mid+1, right));
}
43/79
Analysis of Merge Sort

• Time Complexity:

(
O(1), if n = 1
T (n) =
2T (n/2) + O(n) if n > 1
= O(n log n)

44/79
Variations: Natural Merge Sort

45/79
Heap Sort
Recall: Heaps

• A max tree is a tree in which the key value in each node is no

smaller than the key values in its children. A max heap is a
complete binary tree that is also a max tree.

1
h
16
2 3

14 10 2h 2h + 1
4 5 6 7
1 2 3 4 5 6 7 8 9 10
8 7 9 3
16 14 10 8 7 9 3 2 4 1
8 9 10

2 4 1

46/79
Inserting 21 into a Max Heap

15 2

14 4

20 20 21

15 2 15 15 20 15 20

14 4 14 4 2 14 4 2 14 4 2

47/79
Deleting from a Max Heap

21 20 20

15 20 15 20 15 15 2

14 4 2 14 4 2 14 4 14 4

Delete Report 21 temp ← 2 Done

Delete

15 15 15

14 2 14 2 2 15 2

4 14 14 4

Done temp ← 4 Report 20

48/79
Adjusting a Max Heap

void adjust(element a[], int root, int n)

{ int child, rootkey;
element temp;
temp = a[root];
rootkey = a[root].key;
child = 2*root; /* left subtree */
while(child <= n) {
if ((child < n) && (a[child].key < a[child+1].key))
child++;
if (rootkey > a[child].key) break;
else {
a[child/2] = a[child]; /* move to parent */
child*=2;
}
}
a[child/2] = temp;
}

for (i = n/2; i>0; i--)

adjust(a, i, n); Establish a max heap
49/79
Building a Max Heap (1/2)

4 4
1 3 1 3

2 16 9 10 2 16 9 10

14 8 7 14 8 7

4 4
1 3 1 10

14 16 9 10 14 16 9 3

2 8 7 2 8 7

50/79
Building a Max Heap (2/2)

4 16
16 10 4 10

14 7 9 3 14 7 9 3

2 8 1 2 8 1

16 16
14 10 14 10

4 7 9 3 8 7 9 3

2 8 1 2 4 1

51/79
Analysis of adjust

L = blg nc: height i: the level of an internal node

à # of comparisons is at most:

L−1
X L−1
X L−1
X
2(L − i)2i = 2L 2i − 2 i · 2i
i=0 i=0 i=0 i
L L
= 2L(2 − 1) − 2(2 (L − 2) + 2) L
L
= 4 · 2 − 2L − 4
= 4 · 2blg nc − 2blg nc − 4
≤ cn ( if c ≥ 4)
PL−1
Note: i=0 i · 2i = 2L (L − 2) + 2

52/79
PL−1
Compute i=0 i · 2i

PL−1
Let S = i=1 i · 2i

S = 1 · 21 + 2 · 22 + 3 · 23 + 4 · 24 + · · · + (L − 1) · 2L−1
2S = 22 + 2 · 23 + 3 · 24 + · · · + (L − 2) · 2L−1 + (L − 1)2L

à S − 2S = 2 + 22 + 23 + · · · + 2L−1 − (L − 1)2L

2(1 − 2L−1 )
S = (L − 1)2L −
1−2
= 2L (L − 1) + 2 − 2 · 2L−1
= 2L (L − 2) + 2

53/79
Algorithm heapSort

void heapSort(element a[], int n)

{ /* sort a[1:n] */
int i, j;
element temp;
for( i = n/2; i>0; i--)
adjust(a, i, n);
for( i = n-1; i>0; i--) {
SWAP(a[1], a[i+1], temp);
adjust(a, 1, i);
}
}

54/79
Example: Heap Sort (1/2)

16 14
14 10 8 10

8 7 9 3 4 7 9 3

2 4 1 2 1 16

10 9
8 9 8 3

4 7 1 3 4 7 1 2

2 14 16 10 14 16

55/79
Example: Heap Sort (2/2)

8 7
7 3 4 3

4 2 1 9 1 2 8 9

10 14 16 10 14 16

1
2 3

4 7 8 9

10 14 16 1 2 3 4 7 8 9 10 14 16

56/79
Analysis of Heap Sort

n−1
X i nodes
2 blg ic
i=1

= 2 nblg nc − 2blg nc+1 + 2 blg ic

= 2nblg nc − 4 · 2blg nc + 4
= 2nblg nc − 4cn + 4 if 2 ≤ c ≤ 4

57/79
Pn−1
Compute i=1 blg ic

blg 1c = 0 à 1
blg 2c = blg 3c = 1 à 2
blg 4c = blg 5c = blg 6c = blg 7c = 2 à 4 + k · 2k
blg 8c = · · · = blg 15c = 3 à 8
m−1
X
k · 2k = 2m (m − 2) + 2
k=0

n−1 blg nc−1

X X
blg ic = i · 2i + n − 2blg nc blg nc
i=1 i=1

blg nc
=2 (blg nc − 2) + 2 + n − 2blg nc blg nc
= nblg nc − 2blg nc+1 + 2

58/79
Summary of Internal Sorting

+ No one method is best for all conditions.

• Insertion sort is good when the list is already partially ordered.
And it is best for small number of n.
• Merge sort has the best worst-case behavior but needs more
storage than heap sort.
• Quick sort has the best average behavior, but its worst-case
behavior is O(n2 ).
• The behavior of radix sort depends on the size of the keys and
the choice of r .

59/79
Comparison

Algorithm Time Complexity Notes

insertion sort O(n2 ) slow
stable
for small data sets (< 1K)
quick sort O(n log n) unstable
expected fastest (good for large inputs)
merge sort O(n log n) stable
sequential data access
for huge data sets (> 1M)
heap sort O(n log n) fast
unstable
for large data sets (1K - 1M)

60/79
Radix Sort
Sorting Several Keys

• Consider the problem sorting records on several keys, K 1 , K 2 , . . . ,K r

(K 1 is the most significant key). A list of records R1 , R2 , . . . , Rn are
said to be sorted with respect to the keys K 1 , K 2 , . . . , K r iff for
every pair of records i and j, i < j and
(Ki1 , Ki2 , . . . , Kir ) ≤ (Kj1 , Kj2 , . . . , Kjr ).
• (x1 , x2 , . . . , xr ) ≤ (y1 , y2 , . . . , yr ) iff either xi = yi , 1 ≤ i ≤ j, and
xj+1 < yj+1 for some j < r or xi = yi , 1 ≤ i ≤ r .
• Two popular ways to sort on multiple keys
• Most-significant-digit-first (MSD):
Ex: 2¨, . . . , A¨, 2©, . . . , A©, 2ª, . . . , Aª, 2«, . . . , A«
1: sort on suits à four piles
2: sort on face values for each pile independently.
• Least significant digit first (LSD)
1: sort on face values
à 2«, 2¨, 2ª, 2©, 3ª, 3«, 3¨, 3©, . . . , A¨, Aª, A«, A©
2: sort on suit for the whole list (stable sort)
61/79
MSD and LSD

• LSD and MSD only defines the order in which the keys are to
be sorted. But they do not specify how each key is sorted.
• Bin sort can be used for sorting on each key. The complexity
of bin sort is O(n) if there are n bins.
• LSD and MSD can be used even when there is only one key.
• If the keys are numeric, then each decimal digit may be
regarded as a subkey.
à Radix sort.
• In radix sort, we decompose the sort key using some radix r .
• The number of bins needed is r .
• Each key has d digits in the range of 0 to r − 1.

62/79
Radix Sort

179 208 306 93 859 984 55 9 271 33

r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]

33 859

271 93 984 55 306 208 179

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

271 93 33 984 55 306 208 179 859 9

64/79
Example: Radix Sort (2/3)

271 93 33 984 55 306 208 179 859 9

r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]

208 859 179

306 33 55 271 984 93

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

306 208 9 33 55 859 271 179 984 93

65/79
Example: Radix Sort (3/3)

306 208 9 33 55 859 271 179 984 33

r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]

33 271

9 179 208 306 859 984

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

9 33 55 93 179 208 271 306 859 984

66/79
External Sorting
External Sorting

• There are some lists that are too large to fit in the memory of a
computer. So internal sorting is not possible.
• Some records are stored in the disk (tape, etc.). System retrieves a
block of data from a disk at a time. A block contains multiple
records.
• The most popular method for sorting on external storage devices is
merge sort.
• Segments of the input list are sorted.
• Sorted segments (called runs) are written onto external
storage.
• Runs are merged until only a run is left.
• Three factors contributing to the read/write time of disk:
• seek time
• latency time
• transmission time
67/79
Example: External Sort (1/2)

• Consider a computer which is capable of sorting 750 records is

used to sort 4500 records.
• Six runs are generated with each run sorting 750 records
=⇒ 4 passes

68/79
Example: External Sort (2/2)

ts = maximum seek time

t` = maximum latency time
trw = time to read or write one block of 250 records
tIO = time to input or output one block =ts + t` + trw
tIS = time to internally sort 750 records
ntm = time to merge n records from input buffers to the output buffer

Analysis:
Operation Time
1 read 18 blocks of input, 18tIO , internally sort, 36tIO + 6tIS
6tIS , write 18 blocks, 18tIO
2 merge runs 1 to 6 in pairs 36tIO + 4500tm
3 merge two runs of 1500 records each, 12 blocks 24tIO + 3000tm
4 merge one run of 3000 records with one run of 36tIO + 4500tm
1500 records

132tIO +12000tm +6tIS

69/79
k-Way Merging

• To merge m runs via 2-way merging will need dlog2 me + 1

passes.
• If we use higher order merge, the number of passes over would
be reduced.
• With k-way merge on m runs, we need dlogk me + 1 passes
over.
• But is it always true that the higher order of merging, the less
computing time we will have?
• Not necessary!
• k − 1 comparisons are needed to determine the next output.
• If loser tree is used to reduce the number of comparisons, we
can achieve complexity of O(n log2 m)
• The data block size reduced as k increases. Reduced block size
implies the increase of data passes over

70/79
Optimal Merging of Runs

15
2 4 5 15
5

2 4

weighted external path length weighted external path length

= 2 · 3 + 4 · 3 + 5 · 2 + 15 · 1 = 2 · 2 + 4 · 2 + 5 · 2 + 15 · 2
= 43 = 52

71/79
Huffman Algorithm

void huffman(treePointer heap[], int n)

{/* heap[1:n] */
treePointer tree;
int i;
initialize(heap, n);

for (i = 1; i < n; i++) {

MALLOC(tree, sizeof(*tree));
tree->leftChild = pop(&n);
tree->rightChild = pop(&n);
tree->weight = tree->leftChild->weight + tree->rightChild->weight;
push(tree,&n);
}
}

72/79
Example: Huffman Tree

2 3 5 7 9 13

5 5 5 16

2 3 2 3 9 7

(a) (b) (c)

23 39
(d) (e)

10 13 16 23

5 5 9 7 10 13

2 3 5 5

2 3
73/79
Application: File Compression

+ 100 characters Ü dlog2 100e = 7 bits

• Using codes for characters in a
Character Code Frequency Total Bits
file to reduce the file size
a 000 10 30
• Example: e 001 15 45
File contains only characters a, i 010 12 36
e, i, s, t, blank spaces, and s 011 3 9
newlines t 100 4 12
space 101 13 39
• Use three bits to code each newline 110 1 3
character
Total 58 174
• size = 3 · 58 = 174
• Question: Can we use other coding to reduce the file size?
à Yes, by allowing code length to vary from character to character
• Short codes for frequently occurring characters

74/79
Tree Representation of Code

P 0 1
Cost: i di fi
di : depth of code 0 1 0 1
fi : frequency of code
0 1 0 1 0 1
a e i s t sp nl

“ ”

0 1

0 1 0 1
nl
0 1 0 1 0 1
a e i s t sp
75/79
File Compression Problem

• Property of tree representation of optimal code

à all nodes either are leaves or have two children
• Prefix code
à No character code is a prefix of another character code
• No character is contained in non-leaf node
• Provide unambiguous decoding
• Problem of file compression
Find a full binary tree with minimum total cost, where all
characters are in the leaves.

76/79
Huffman’s Codes

Huffman’s algorithm for file compression

• A greedy method
• Outline of the algorithm:
• Maintain a forest of trees: Initially, each character is a tree
• Weight of tree = sum of frequencies of its leaves
• Select two trees T1 and T2 with smallest weights to form a
new tree with subtrees T1 and T2 until one tree remains

Character Frequency

a 10
e 15
i 12
s 3
t 4
space 13
newline 1

77/79
Steps of Huffman’s Algorithm

1 3 4 10 12 13 15
(a) (e) 18
nl s t a i sp e
8 a
4
(b) 4 10 12 13 15 4 t 25
nl s t a i sp e 15
nl s i sp e

4 t
10 12 13 15 (f) 33
nl s a i sp e
18 e
(d) 18
8 a
8 a
4 t 25
4 t
12 13 15 nl s i sp
nl s i sp e

78/79
Huffman Code Assignment

Character Code Frequency Total Bits

(g) 0 58 1
33 25 a 001 10 30
0 1 0 1 e 01 15 30
18 e i sp i 10 12 24
0 1
s 00001 3 15
8 a
0 1 t 0001 4 16
4 t space 11 13 26
0 1 newline 00000 1 5
nl s
Total 58 146

100010000101 à iase

79/79

Efficient Sorting Algorithms Guide
No ratings yet
Efficient Sorting Algorithms Guide
66 pages
CH7 Sorting
No ratings yet
CH7 Sorting
97 pages
Sorting Algorithms for CS Students
No ratings yet
Sorting Algorithms for CS Students
34 pages
Insertion, Quick, and Dynamic Algorithms
No ratings yet
Insertion, Quick, and Dynamic Algorithms
27 pages
Good
No ratings yet
Good
12 pages
Unit 1
No ratings yet
Unit 1
116 pages
CS3353 Unit5
No ratings yet
CS3353 Unit5
21 pages
Sorting and Searching II
No ratings yet
Sorting and Searching II
34 pages
Advanced Topics in Sorting
No ratings yet
Advanced Topics in Sorting
36 pages
Divide-And-Conquer Sorting
No ratings yet
Divide-And-Conquer Sorting
28 pages
Lec32 35
No ratings yet
Lec32 35
40 pages
Algorithm Design & Analysis Guide
No ratings yet
Algorithm Design & Analysis Guide
33 pages
CPT212 07 Sorting - Efficient
No ratings yet
CPT212 07 Sorting - Efficient
25 pages
Lec25 Sorting
No ratings yet
Lec25 Sorting
24 pages
04 Sorting Algorithms
No ratings yet
04 Sorting Algorithms
53 pages
AST20105 Data Structure and Algorithms: Chapter 10 - Comparison-Based Sorting
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 10 - Comparison-Based Sorting
50 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
37 pages
Advanced Sorting
No ratings yet
Advanced Sorting
13 pages
Compiled By: Dr. Mohammad Omar Alhawarat: Sorting
No ratings yet
Compiled By: Dr. Mohammad Omar Alhawarat: Sorting
52 pages
CSC 344 - Algorithms and Complexity: Lecture #3 - Internal Sorting
No ratings yet
CSC 344 - Algorithms and Complexity: Lecture #3 - Internal Sorting
33 pages
CORE - 14: Algorithm Design Techniques (Unit - 2)
No ratings yet
CORE - 14: Algorithm Design Techniques (Unit - 2)
14 pages
Sem 4 AoA
No ratings yet
Sem 4 AoA
90 pages
Lecture 7 - Sorting
No ratings yet
Lecture 7 - Sorting
38 pages
s4 Quick Sort
No ratings yet
s4 Quick Sort
24 pages
Week 02 (Complexity of Sorting Algorithms)
No ratings yet
Week 02 (Complexity of Sorting Algorithms)
62 pages
UNIT V Data Structures OU
No ratings yet
UNIT V Data Structures OU
42 pages
Week 6
No ratings yet
Week 6
39 pages
Sortings
No ratings yet
Sortings
92 pages
Chapter 7
No ratings yet
Chapter 7
7 pages
Advanced Sorts
No ratings yet
Advanced Sorts
10 pages
Today's Material: - Divide & Conquer (Recursive) Sorting Algorithms
0% (1)
Today's Material: - Divide & Conquer (Recursive) Sorting Algorithms
18 pages
s4 Quick Sort
No ratings yet
s4 Quick Sort
27 pages
Sorting Algorithms for Developers
No ratings yet
Sorting Algorithms for Developers
12 pages
Sorting and Hashing
100% (1)
Sorting and Hashing
83 pages
DSD Unit 3 Sorting and Searching
No ratings yet
DSD Unit 3 Sorting and Searching
36 pages
Dsa CH 2
No ratings yet
Dsa CH 2
50 pages
Quick Sort (11.2) : CSE 2011 Winter 2011
No ratings yet
Quick Sort (11.2) : CSE 2011 Winter 2011
27 pages
Quick Sort: As The Name Implies, It Is Quick, and It Is The Algorithm Generally Preferred For Sorting
No ratings yet
Quick Sort: As The Name Implies, It Is Quick, and It Is The Algorithm Generally Preferred For Sorting
21 pages
DS
No ratings yet
DS
13 pages
DAA File1
No ratings yet
DAA File1
15 pages
Merge, Quick Sort & Tree Structures
No ratings yet
Merge, Quick Sort & Tree Structures
20 pages
04 Quicksort
No ratings yet
04 Quicksort
37 pages
04 Quicksort
No ratings yet
04 Quicksort
37 pages
Lecture-5 Sorting Algorithms QuickSort
No ratings yet
Lecture-5 Sorting Algorithms QuickSort
21 pages
MQ Sort An Innovative Algorithm Using Quick Sort and Merge Sort
No ratings yet
MQ Sort An Innovative Algorithm Using Quick Sort and Merge Sort
5 pages
Notes 03 Sorting PDF
No ratings yet
Notes 03 Sorting PDF
126 pages
Binary Search
No ratings yet
Binary Search
7 pages
Algorithm: Design and Analysis: Sorting Problem
No ratings yet
Algorithm: Design and Analysis: Sorting Problem
47 pages
Divide Conquer
No ratings yet
Divide Conquer
43 pages
6 Sorting (Insertion, Merge)
No ratings yet
6 Sorting (Insertion, Merge)
68 pages
Unit V Notes
No ratings yet
Unit V Notes
39 pages
7 Sorting
No ratings yet
7 Sorting
85 pages
Lecture 4
No ratings yet
Lecture 4
47 pages
Quick Sort
No ratings yet
Quick Sort
10 pages
4 Quicksort and Balls in Bins
No ratings yet
4 Quicksort and Balls in Bins
74 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
14 pages
Data Processing
No ratings yet
Data Processing
43 pages
Optical Disk Storage
No ratings yet
Optical Disk Storage
24 pages
What Is A Personal Computer: What Are The Different Brands of Personal Computers?
No ratings yet
What Is A Personal Computer: What Are The Different Brands of Personal Computers?
6 pages
Input PPT
No ratings yet
Input PPT
22 pages
Common Competency-Perform Computer Operations
No ratings yet
Common Competency-Perform Computer Operations
100 pages
Lecture 3 - Computer Organisation
No ratings yet
Lecture 3 - Computer Organisation
29 pages
Grade 7 Mid Exam
No ratings yet
Grade 7 Mid Exam
4 pages
Computers Texts
No ratings yet
Computers Texts
8 pages
Universal Tribometer Overview
No ratings yet
Universal Tribometer Overview
6 pages
Unit 11 12 13 14
No ratings yet
Unit 11 12 13 14
21 pages
Unit 4
No ratings yet
Unit 4
49 pages
Computer Model Answers
No ratings yet
Computer Model Answers
65 pages
DVR 0404le As Dahua
No ratings yet
DVR 0404le As Dahua
1 page
Cinema Camera 6K Manual
No ratings yet
Cinema Camera 6K Manual
186 pages
Computer Science Short Notes
100% (1)
Computer Science Short Notes
69 pages
Computer Basics Quiz
No ratings yet
Computer Basics Quiz
172 pages
Grade 10 Computer Systems Module
No ratings yet
Grade 10 Computer Systems Module
14 pages
DATA PROCESSING - Knec Notes
No ratings yet
DATA PROCESSING - Knec Notes
13 pages
1st Quarter Exam G9
No ratings yet
1st Quarter Exam G9
3 pages
Unit - 3 - Memories - B.C.A Study
No ratings yet
Unit - 3 - Memories - B.C.A Study
8 pages
GST 115 Question and Answers
No ratings yet
GST 115 Question and Answers
6 pages
Unit 3 Memory System: Structure Page No
No ratings yet
Unit 3 Memory System: Structure Page No
32 pages
DVD
No ratings yet
DVD
8 pages
Holiday Assignment
No ratings yet
Holiday Assignment
69 pages
Rebekah Jones Motion Return Property
No ratings yet
Rebekah Jones Motion Return Property
47 pages
Os Question Bank Unit-3
No ratings yet
Os Question Bank Unit-3
19 pages
Computer Basics for Students
No ratings yet
Computer Basics for Students
18 pages
751all Sol PDF
No ratings yet
751all Sol PDF
165 pages
CHS Exam
100% (1)
CHS Exam
3 pages

07 Sort2

Uploaded by

07 Sort2

Uploaded by

Data Structures: Sorting

Department of Electronic and Computer Engineering

• Why efficient sorting methods are so important?

• We search the list by examining the key values

• Consider a sorted list: {8, 23, 32, 45, 56, 78}

while (more integers to check) {

• Consider the telephone directory: looking for “Wilson”

Interpolation search can be used only when the file is

• Assume that list1[1..n] and list2[1..m]. Determine if

void verify1(element list1[], element list2[], int n, int m)

Worst-case complexity: O(mn)

void verify2(element list1[], element list2[], int n, int m)

O(tSort (n) + tSort (m) + n + m)

• Given a list of records (R1 , R2 , . . . , Rn ). Each record has a key

• Internal Method: Methods to be used when the list to be

void insert(element e, element a[], int i) void insertionSort(element a[], int n)

• In the worst case, insert(e, a, i) makes i + 1 comparisons

void quickSort(element a[],int left,int right) i j

• The worst-case behavior for quick sort occurs when the

• In the most even possible split, PARTITION produces two

• Then T (n) ≤ 2T (n/2) + Θ(n)

In the average case, PARTITION produces a mix of ”good” and

”(1) − (2)” ⇒ nT (n) − (n − 1)T (n − 1) = 2T (n − 1) + 2(n − 1)

• We will provide a lower bound, then beat it.

• Decision trees provide an abstraction of comparison sorts

There are 3! = 6 possible permutations (leaves)

• A lower bound of a problem is the least time complexity

• The length of the longest path from the root of a decision

dlog(n!)e = Ω(n log n)

à lower bound for sorting: Ω(n log n)

log(n!) = log(n(n − 1)(n − 2) · · · 1)

= n log n − n log e + 1.44

Merge two sorted lists into one sorted list

void merge(element initList[], element mergedList[], int i, int m, int n)

void mergeSort(element a[], int n)

int rmergeSort(element a[], int link[], int left, int right)

• A max tree is a tree in which the key value in each node is no

Delete Report 21 temp ← 2 Done

Done temp ← 4 Report 20

void adjust(element a[], int root, int n)

for (i = n/2; i>0; i--)

L = blg nc: height i: the level of an internal node

void heapSort(element a[], int n)

n−1 blg nc−1  

+ No one method is best for all conditions.

Algorithm Time Complexity Notes

• Consider the problem sorting records on several keys, K 1 , K 2 , . . . ,K r

int radixSort(element a[], int link[], int d, int r, int n)

179 208 306 93 859 984 55 9 271 33

271 93 984 55 306 208 179

271 93 33 984 55 306 208 179 859 9

271 93 33 984 55 306 208 179 859 9

208 859 179

306 33 55 271 984 93

306 208 9 33 55 859 271 179 984 93

306 208 9 33 55 859 271 179 984 33

9 179 208 306 859 984

9 33 55 93 179 208 271 306 859 984

• Consider a computer which is capable of sorting 750 records is

ts = maximum seek time

132tIO +12000tm +6tIS

• To merge m runs via 2-way merging will need dlog2 me + 1

weighted external path length weighted external path length

void huffman(treePointer heap[], int n)

for (i = 1; i < n; i++) {

(a) (b) (c)

+ 100 characters Ü dlog2 100e = 7 bits

• Property of tree representation of optimal code

Huffman’s algorithm for file compression

Character Code Frequency Total Bits

You might also like

n−1 blg nc−1