KEMBAR78
07 Sort2 | PDF | Time Complexity | Disk Storage
0% found this document useful (0 votes)
72 views87 pages

07 Sort2

The document discusses different sorting methods for lists of records. It describes insertion sort, quick sort, and their analyses. Quick sort uses a divide and conquer approach, recursively sorting sublists by partitioning them around a pivot element.

Uploaded by

Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views87 pages

07 Sort2

The document discusses different sorting methods for lists of records. It describes insertion sort, quick sort, and their analyses. Quick sort uses a divide and conquer approach, recursively sorting sublists by partitioning them around a pivot element.

Uploaded by

Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Data Structures: Sorting

Wei-Mei Chen

Department of Electronic and Computer Engineering


National Taiwan University of Science and Technology

1/79
Introduction
Motivation

• Why efficient sorting methods are so important?


• The efficiency of a searching strategy depends on the
assumptions we make about the arrangement of records in the
list
• No single sorting technique is the “best” for all initial
orderings and sizes of the list being sorted.
• We examine several techniques, indicating when one is
superior to the others.
• Assumption
The term list here is a collection of records. Each record has
one or more fields. Each record has a key to distinguish one
record with another
+ name or phone number.
2/79
Sequential Search

• We search the list by examining the key values


Example: 4, 15, 17, 26, 30, 46, 48, 56, 58, 82, 90, 95
int seqSearch(element a[], int k, int n)
{ int i; /* data → a[1:n]
for(i=1; i<=n && a[i].key !=k; i++) ;
if (i>n) return 0;
return i; /* return i if a[i]==key; otherwise return 0
}

• Analysis:
• Unsuccessful search: n + 1 ⇒ O(n)
• Average successful search:
n
X i n+1
= = O(n)
n 2
i=1

3/79
Binary Search

• Consider a sorted list: {8, 23, 32, 45, 56, 78}


Figure out if searchnum is in the list.
• YES à i where list[i]=searchnum
• NO à −1

while (more integers to check) {


middle = (left + right) / 2;
if (searchnum < list[middle])
right = middle - 1;
else if (searchnum == list[middle])
return middle;
else left = middle + 1;
}

4/79
Interpolation Search

• Consider the telephone directory: looking for “Wilson”


We won’t start the search at the middle.
• Comparing k with a[i].key

k − a[1].key
i= ×n
a[n].key − a[1].key

Interpolation search can be used only when the file is


ordered.

5/79
List Verification

• Assume that list1[1..n] and list2[1..m]. Determine if


1. A record with key list1[i].key, but there is no record with
the same key in list2
2. A record with key list2[j].key , but there is no record with
the same key in list1
3. list1[i].key==list2[j].key, but the two records do not
match on at least one of the other fields

6/79
Verifying Two Unsorted Lists

void verify1(element list1[], element list2[], int n, int m)


{ /* Compare list1[1:n] and list2[1:m] */
int i, j, marked[MAX SIZE];
for (i=1; i<=m; i++)
marked[i] = FALSE;
for (i=1; i<=n; i++)
if(( j = seqSearch(list2, m, list1[i].key)) == 0)
printf("%d is not in list 2\n", list1[i].key);
else
marked[j] = TRUE;
for (i=1; i<=m; i++)
if(!marked[i])
printf("%d is not in list 1\n", list2[i].key);
}

Worst-case complexity: O(mn)

7/79
Verifying Two Sorted Lists

void verify2(element list1[], element list2[], int n, int m)


{ int i, j;
sort(list1, n); sort(list2, m);
i = j = 1;
while( i<=n && j<=m)
if (list1[i].key < list2[j].key) {
printf("%d is not in list 2\n", list1[i].key); i++;
}
else if (list1[i].key == list2[j].key) {
i++; j++;
}
else {
printf("%d is not in list 1\n", list2[j].key); j++;
}
for (; i<=n; i++) printf("%d is not in list 2\n", list1[i].key);
for (; j<=m; j++) printf("%d is not in list 1\n", list2[j].key);
}

O(tSort (n) + tSort (m) + n + m)


à Worst-case complexity: O(max{n log n, m log m})
8/79
Definitions

• Given a list of records (R1 , R2 , . . . , Rn ). Each record has a key


Ki . The sorting problem is then that of finding permutation,
σ, such that Kσ(i) ≤ Kσ(i+1) , 1 ≤ i ≤ n − 1. The desired
ordering is (Rσ(1) , Rσ(2) , . . . , Rσ(n) ).
• If a list has several key values that are identical, the
permutation, σs , is not unique. Let σs be the permutation of
the following properties:
1. Kσ(i) ≤ Kσ(i+1) , 1 ≤ i ≤ n − 1.
2. If i < j and Ki == Kj in the input list, then Ri precedes Rj in
the sorted list.
à stable

9/79
An Example for Stable

1. Schedule
2. Destination

10/79
Categories of Sorting Method

• Internal Method: Methods to be used when the list to be


sorted is small enough so that the entire sort list can be
carried out in the main memory.
+ Insertion sort, quick sort, merge sort, heap sort and
radix sort.
• External Method: Methods to be used on larger lists.

11/79
Insertion Sort
Insertion Sort

void insert(element e, element a[], int i) void insertionSort(element a[], int n)


{ /* a[1:i+1]→ sorted { /* a[1:n] → sorted
a[0] = e; int j;
while (e.key < a[i].key) { for ( j =2; j<=n; j++) {
a[i+1] = a[i]; element temp = a[j];
i--; insert(temp, a, j-1);
} }
a[i+1] = e; }
}

5 2 4 6 1 3
2 5 4 6 1 3 sorted key ?
2 4 5 6 1 3
2 4 5 6 1 3 Loop invariant : At the start of each
1 2 4 5 6 3 iteration of the for loop, A[1..j − 1] con-
1 2 3 4 5 6 tains it’s original elements but sorted

12/79
Analysis of Insertion Sort

• In the worst case, insert(e, a, i) makes i + 1 comparisons


before making the insertion.
X n−1 
Thus, the total cost is O (i + 1) = O(n2 )
i=1
• Variations
• Binary insertion sort:
the number of comparisons in an insertion sort can be reduced
if we replace the sequential search by binary search. The
number of records moves remains the same.
• List insertion sort:
The elements of the list are represented as a linked list rather
than an array. The number of record moves becomes zero.

13/79
Quick Sort
Quick Sort

Hoare [1962]
• the best practical sorting algorithm
• in place sorting
• worst-case running time: Θ(n2 ) on n numbers
• average-case running time: Θ(n lg n)

14/79
Description of Quick Sort

Divide-and-conquer paradigm

• Divide: a[p..r ]
p q r
≤x x >x

↑ pivot
• Conquer:
Sort the two subarrays a[p..q − 1] and a[q + 1..r ] by recursive calls
to quick sort.
• Combine:
Since the subarrays are sorted in place, no work is needed to
combine them

15/79
Algorithm quickSort
i j
26 5 37 1 61 11 59 15 48 19

void quickSort(element a[],int left,int right) i j


{ /* a[left].key: the pivot key */ 26 5 37 1 61 11 59 15 48 19
int pivot, i, j; i j
element temp; 26 5 19 1 61 11 59 15 48 37
if (left < right) { j i
i = left; j = right+1; 26 5 19 1 15 11 59 61 48 37
pivot = a[left].key;
do { /*partition by swapping*/
11 5 19 1 15 26 59 61 48 37
do i++; while(a[i].key < pivot);
do j--; while(a[j].key > pivot);
1 5 11 19 15 26 59 61 48 37
if (i<j) SWAP(a[i], a[j], temp);
}while (i<j);
1 5 11 15 19 26 59 61 48 37
SWAP( a[left], a[j], temp);
quickSort(a, left, j-1);
quickSort(a, j+1, right); 1 5 11 15 19 26 48 37 59 61
}
} 1 5 11 15 19 26 37 48 59 61

16/79
Worst-Case Partitioning

• The worst-case behavior for quick sort occurs when the


partitioning routine produces one subproblem with n − 1
elements and one with 0 elements.
• Assume T (0) = Θ(1),

n
T (n) = T (n − 1) + T (0) + n − 1 0 n−1
= T (n − 1) + n − 1 + T (0)
0 n−2
= T (n − 2) + (n − 2) + (n − 1) + 2T (0)
··· 0 n−3
= T (0) + 0 + 1 + 2 + 3 + · · · + (n − 1) + nT (0)
.. ..
= Θ(n2 ) . .

17/79
Best-Case Partitioning

• In the most even possible split, PARTITION produces two


subproblems.
n
dn/2e bn/2c-1

.. .. .. ..
. . . .

• Then T (n) ≤ 2T (n/2) + Θ(n)


⇒ we have T (n) = O(n lg n)
• Thus the equal balancing of the two sides of the partition at
every level of the recursion produces an asymptotically faster
algorithm.

18/79
Average-Case Running Time

In the average case, PARTITION produces a mix of ”good” and


”bad” splits.
• Consider
Basic operation: the comparison of a[i] with a pivot a[r ] in
PARTITION.
Input size: n, the number of items in a
• Assume that the value of pivot is equally likely to be any of
the numbers from 1 through n.
↓ Probability Time to partition
 
n
X 1
T (n) = [T (p − 1) + T (n − p)] + n − 1
n
p=1
 

19/79
n n
X 1 2X
Since [T (p − 1) + T (n − p)] = T (p − 1), we have
p=1
n n p=1
n
2X
T (n) = T (p − 1) + n − 1.
n p=1
n
X
”×n” : nT (n) = 2 T (p − 1) + n(n − 1) (1)
p=1

n−1
X
”n → n − 1” : (n − 1)T (n − 1) = 2 T (p − 1) + (n − 1)(n − 2) (2)
p=1

”(1) − (2)” ⇒ nT (n) − (n − 1)T (n − 1) = 2T (n − 1) + 2(n − 1)

T (n) T (n − 1) 2(n − 1)
= +
n+1 n n(n + 1)

20/79
T (n) T (n − 1) 2(n − 1)
= + n−1 2 1
n+1 n n(n + 1) = −
n(n + 1) n+1 n
T (n − 1) T (n − 2) 2(n − 2)
= +
n n−1 (n − 1)(n)
n−2 2 1
.. .. = −
. . (n − 1)n n n−1
T (2) T (1) 2
= +
3 2 6
 
T (n) 4 1 1 1
summing ⇒ = +2 + + ··· + −1
n+1 n+1 n n−1 3
n
X 1 4
=2 + −1
k n+1
k=3
4
= 2Hn + −4
n+1
 
T (n) ≈ 2(n + 1) ln n ∈ Θ(n lg n)
 

21/79
Harmonic Numbers

n
X 1
Hn =
k
k=1
1 1 1
=1+ + + ··· +
2 3 n
1 1 1 1
= ln n + γ + − 2
+ 4
− + ···
2n 12n 120n 252n6
where γ = 0.5772156649

22/79
Variations of Quick Sort

1. Median of three:
Pick the median of the first, middle, and last keys in the
current sublist as the pivot. Thus,
pivot = median{Kl , K(l+r )/2 , Kr } .
à Use median as the pivot
2. Randomized Quick Sort:
Pick a key randomly.

23/79
Optimal Sorting Time
How Fast Can We Sort?

• We will provide a lower bound, then beat it.


How do you suppose we’ll beat it?
• All of the sorting algorithms so far are
comparison sorts.
• Basic operation: a sequence is the pairwise
comparison of two elements
• Theorem: all comparison sorts are Ω(n lg n)

24/79
Decision Trees

• Decision trees provide an abstraction of comparison sorts


• A decision tree represents the comparisons made by a comparison sort.
• The decision tree for insertion sort operating on three elements.
{6, 8, 5} 1:2
≤ >

2:3 1:3
≤ > ≤ >
< 1, 2, 3 > 1:3 < 2, 1, 3 > 2:3
≤ > ≤ >
< 1, 3, 2 > < 3, 1, 2 > < 2, 3, 1 > < 3, 2, 1 >

There are 3! = 6 possible permutations (leaves)

25/79
Lower Bounds

• A lower bound of a problem is the least time complexity


required for any algorithm which can be used to solve this
problem.
• worst case lower bound (lower bound)
• average case lower bound
• The lower bound for a problem is not unique.
+ as tight as possible
• e.g. Ω(1), Ω(n), Ω(n log n) are all lower bounds for sorting.
• Ω(1), Ω(n) are trivial
• If the present lower bound is Ω(n log n) and there is an
algorithm with time complexity O(n log n), then the algorithm
is optimal.

26/79
Lower Bound for Sorting

• The length of the longest path from the root of a decision


tree to any of its reachable leaves represents the worst-case
number of comparisons that the corresponding sorting
algorithm performs.
à the worst-case #of comparisons
= the height of its decision tree.
• The worst-case time complexity:
the longest path from the top of the tree to a leaf node
• Balanced tree has the smallest depth:

dlog(n!)e = Ω(n log n)

à lower bound for sorting: Ω(n log n)

27/79
Theorem
Any algorithm that sorts only by comparisons must have a
worst-case computing time of Ω(n log n)

log(n!) = log(n(n − 1)(n − 2) · · · 1)


= log 2 + log 3 + · · · + log n
Z n
> log xdx
1
Z n
= log e ln xdx
1
 n
= log e x ln x − x 1
= log e [n ln n − n + 1] 1 2 3 4

= n log n − n log e + 1.44


≥ n log n − 1.44n
= Ω(n log n)
28/79
Merge Sort
Example: Merging

Merge two sorted lists into one sorted list

2 4 5 7 ∞ 1 2 3 6 ∞

1 2 2 3 4 5 6 7

29/79
Merging Two Sorted Lists

void merge(element initList[], element mergedList[], int i, int m, int n)


{/* initList[i:m] and initList[m+1:n] Ü mergedList[i:m]*/
int j, k, t;
j = m+1;
k = i;
while( i<=m && j<=n) {
if (initList[i].key <= initList[j].key)
mergedList[k++] = initList[i++];
else
mergedList[k++] = initList[j++];  
} O(n − i + 1)
if(i>m)/* mergedList[k:n] = initList[j:n] */  
for (t = j; t<=n; t++)
mergedList[t] = initList[t];
else /* mergedList[k:n] = initList[i:m] */
for (t = i; t<=m; t++)
mergedList[k+t-i] = initList[t];
}

30/79
Iterative Merge Sort

31/79
void mergePass(element initList[], element mergedList[], int n, int s)
{
int i, j;
for( i=1; i<= n - 2 * s + 1; i+= 2*s)
merge(initList, mergedList, i, i+s-1, i+2*s-1);
if ( i+s-1 < n)
merge(initList, mergedList, i, i+s-1, n);
else
for( j=i; j<=n; j++)
mergedList[j] = initList[j];
}

void mergeSort(element a[], int n)


{ /* sort a[1:n] */
int s =1;
element extra[MAX SIZE];
while(s<n) {
mergePass(a, extra, n, s);
s*=2;
mergePass(extra, a, n, s);
s*=2;
}
}
32/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

5 2 4 7 1 3 2 6

5 2 4 7

5 2

33/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

5 2 4 7 1 3 2 6

2 5 4 7

5 2 4 7

34/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

5 2 4 7 1 3 2 6

2 5 4 7

5 2 4 7

35/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

2 4 5 7 1 3 2 6

2 5 4 7

5 2 4 7

36/79
Example: Recursive Merge Sort

5 2 4 7 1 3 2 6

2 4 5 7 1 2 3 6

2 5 4 7 1 3 2 6

5 2 4 7 1 3 2 6

37/79
Example: Recursive Merge Sort

1 2 2 3 4 5 6 7

2 4 5 7 1 2 3 6

2 5 4 7 1 3 2 6

5 2 4 7 1 3 2 6

38/79
Dividing Phase

5 2 4 7 1 3 2 6

5 2 4 7 1 3 2 6

5 2 4 7 1 3 2 6

5 2 4 7 1 3 2 6
 
Divide
 

39/79
Merging Phase

1 2 2 3 4 5 6 7

2 4 5 7 1 2 3 6

2 5 4 7 1 3 2 6

5 2 4 7 1 3 2 6
 
Conquer and Combine
 

40/79
Dividing Phase: Recursive Merge Sort

41/79
Merging Phase: Recursive Merge Sort

42/79
int listMerge(element a[], int link[], int start1, int start2)
{ int last1, last2, lastResult=0;
for(last1 = start1, last2 = start2; last1 && last2; )
if (a[last1] <= a[last2]) {
link[lastResult] = last1;
lastResult = last1;
last1 = link[last1];
}
else{
link[lastResult2] = last2;
lastResult = last2;
last2 = link[last2];
}
if(last1 == 0) link[lastResult] = last2;
else link[lastResult] = last1;
return link[0];
}

int rmergeSort(element a[], int link[], int left, int right)


{ if(left >= right) return left;
int mid = (left + right)/2;
return listMerge(a, link,rmergeSort(a, link, left, mid),
rmergeSort(a, link, mid+1, right));
}
43/79
Analysis of Merge Sort

• Time Complexity:

(
O(1), if n = 1
T (n) =
2T (n/2) + O(n) if n > 1
= O(n log n)

44/79
Variations: Natural Merge Sort

45/79
Heap Sort
Recall: Heaps

• A max tree is a tree in which the key value in each node is no


smaller than the key values in its children. A max heap is a
complete binary tree that is also a max tree.

1
h
16
2 3

14 10 2h 2h + 1
4 5 6 7
1 2 3 4 5 6 7 8 9 10
8 7 9 3
16 14 10 8 7 9 3 2 4 1
8 9 10

2 4 1

46/79
Inserting 21 into a Max Heap

20

15 2

14 4

20 20 21

15 2 15 15 20 15 20

14 4 14 4 2 14 4 2 14 4 2

47/79
Deleting from a Max Heap

21 20 20

15 20 15 20 15 15 2

14 4 2 14 4 2 14 4 14 4

Delete Report 21 temp ← 2 Done

Delete

15 15 15

14 2 14 2 2 15 2

4 14 14 4

Done temp ← 4 Report 20


48/79
Adjusting a Max Heap

void adjust(element a[], int root, int n)


{ int child, rootkey;
element temp;
temp = a[root];
rootkey = a[root].key;
child = 2*root; /* left subtree */
while(child <= n) {
if ((child < n) && (a[child].key < a[child+1].key))
child++;
if (rootkey > a[child].key) break;
else {
a[child/2] = a[child]; /* move to parent */
child*=2;
}
}
a[child/2] = temp;
}

for (i = n/2; i>0; i--)


adjust(a, i, n); Establish a max heap
49/79
Building a Max Heap (1/2)

4 4
1 3 1 3

2 16 9 10 2 16 9 10

14 8 7 14 8 7

4 4
1 3 1 10

14 16 9 10 14 16 9 3

2 8 7 2 8 7

50/79
Building a Max Heap (2/2)

4 16
16 10 4 10

14 7 9 3 14 7 9 3

2 8 1 2 8 1

16 16
14 10 14 10

4 7 9 3 8 7 9 3

2 8 1 2 4 1

51/79
Analysis of adjust

L = blg nc: height i: the level of an internal node


à # of comparisons is at most:

L−1
X L−1
X L−1
X
2(L − i)2i = 2L 2i − 2 i · 2i
i=0 i=0 i=0 i
L L
= 2L(2 − 1) − 2(2 (L − 2) + 2) L
L
= 4 · 2 − 2L − 4
= 4 · 2blg nc − 2blg nc − 4
≤ cn ( if c ≥ 4)
PL−1
Note: i=0 i · 2i = 2L (L − 2) + 2

52/79
PL−1
Compute i=0 i · 2i

PL−1
Let S = i=1 i · 2i

S = 1 · 21 + 2 · 22 + 3 · 23 + 4 · 24 + · · · + (L − 1) · 2L−1
2S = 22 + 2 · 23 + 3 · 24 + · · · + (L − 2) · 2L−1 + (L − 1)2L

à S − 2S = 2 + 22 + 23 + · · · + 2L−1 − (L − 1)2L

2(1 − 2L−1 )
S = (L − 1)2L −
1−2
= 2L (L − 1) + 2 − 2 · 2L−1
= 2L (L − 2) + 2

53/79
Algorithm heapSort

void heapSort(element a[], int n)


{ /* sort a[1:n] */
int i, j;
element temp;
for( i = n/2; i>0; i--)
adjust(a, i, n);
for( i = n-1; i>0; i--) {
SWAP(a[1], a[i+1], temp);
adjust(a, 1, i);
}
}

54/79
Example: Heap Sort (1/2)

16 14
14 10 8 10

8 7 9 3 4 7 9 3

2 4 1 2 1 16

10 9
8 9 8 3

4 7 1 3 4 7 1 2

2 14 16 10 14 16

55/79
Example: Heap Sort (2/2)

8 7
7 3 4 3

4 2 1 9 1 2 8 9

10 14 16 10 14 16

1
2 3

4 7 8 9

10 14 16 1 2 3 4 7 8 9 10 14 16

56/79
Analysis of Heap Sort

n−1
X i nodes
2 blg ic
i=1
 
= 2 nblg nc − 2blg nc+1 + 2 blg ic

= 2nblg nc − 4 · 2blg nc + 4
= 2nblg nc − 4cn + 4 if 2 ≤ c ≤ 4

57/79
Pn−1
Compute i=1 blg ic

blg 1c = 0 à 1
blg 2c = blg 3c = 1 à 2
blg 4c = blg 5c = blg 6c = blg 7c = 2 à 4 + k · 2k
blg 8c = · · · = blg 15c = 3 à 8
m−1
X
k · 2k = 2m (m − 2) + 2
k=0

n−1 blg nc−1  


X X
blg ic = i · 2i + n − 2blg nc blg nc
i=1 i=1
 
blg nc
=2 (blg nc − 2) + 2 + n − 2blg nc blg nc
= nblg nc − 2blg nc+1 + 2

58/79
Summary of Internal Sorting

+ No one method is best for all conditions.


• Insertion sort is good when the list is already partially ordered.
And it is best for small number of n.
• Merge sort has the best worst-case behavior but needs more
storage than heap sort.
• Quick sort has the best average behavior, but its worst-case
behavior is O(n2 ).
• The behavior of radix sort depends on the size of the keys and
the choice of r .

59/79
Comparison

Algorithm Time Complexity Notes


insertion sort O(n2 ) slow
stable
for small data sets (< 1K)
quick sort O(n log n) unstable
expected fastest (good for large inputs)
merge sort O(n log n) stable
sequential data access
for huge data sets (> 1M)
heap sort O(n log n) fast
unstable
for large data sets (1K - 1M)

60/79
Radix Sort
Sorting Several Keys

• Consider the problem sorting records on several keys, K 1 , K 2 , . . . ,K r


(K 1 is the most significant key). A list of records R1 , R2 , . . . , Rn are
said to be sorted with respect to the keys K 1 , K 2 , . . . , K r iff for
every pair of records i and j, i < j and
(Ki1 , Ki2 , . . . , Kir ) ≤ (Kj1 , Kj2 , . . . , Kjr ).
• (x1 , x2 , . . . , xr ) ≤ (y1 , y2 , . . . , yr ) iff either xi = yi , 1 ≤ i ≤ j, and
xj+1 < yj+1 for some j < r or xi = yi , 1 ≤ i ≤ r .
• Two popular ways to sort on multiple keys
• Most-significant-digit-first (MSD):
Ex: 2¨, . . . , A¨, 2©, . . . , A©, 2ª, . . . , Aª, 2«, . . . , A«
1: sort on suits à four piles
2: sort on face values for each pile independently.
• Least significant digit first (LSD)
1: sort on face values
à 2«, 2¨, 2ª, 2©, 3ª, 3«, 3¨, 3©, . . . , A¨, Aª, A«, A©
2: sort on suit for the whole list (stable sort)
61/79
MSD and LSD

• LSD and MSD only defines the order in which the keys are to
be sorted. But they do not specify how each key is sorted.
• Bin sort can be used for sorting on each key. The complexity
of bin sort is O(n) if there are n bins.
• LSD and MSD can be used even when there is only one key.
• If the keys are numeric, then each decimal digit may be
regarded as a subkey.
à Radix sort.
• In radix sort, we decompose the sort key using some radix r .
• The number of bins needed is r .
• Each key has d digits in the range of 0 to r − 1.

62/79
Radix Sort

int radixSort(element a[], int link[], int d, int r, int n)


{
int front[r], rear[r];
int i, bin, current, first, last;
first = 1;
for( i = 1; i < n; i++) link[i] = i+1;
link[n] = 0;
for( i = d-1; i>=0; i--) {
for(bin = 0; bin < r; bin++) front[bin] = 0;
for(current = first; current; current = link[current]) {
bin = digit(a[current], i, r);
if( front[bin] == 0) front[bin] = current;
else link[rear[bin]] = current; O(n)
rear[bin] = current;
}
for(bin = 0; !front[bin]; bin++); d
first = front[bin];
last = rear[bin];
for(bin++; bin < r; bin++)
if(front[bin]) {
link[last] =front[bin];
O(r )
last = rear[bin];
}
link[last] = 0;  
} O(d(n + r ))
return first;  
}
63/79
Example: Radix Sort (1/3)

179 208 306 93 859 984 55 9 271 33

r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]

33 859

271 93 984 55 306 208 179

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

271 93 33 984 55 306 208 179 859 9

64/79
Example: Radix Sort (2/3)

271 93 33 984 55 306 208 179 859 9

r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]

208 859 179

306 33 55 271 984 93

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

306 208 9 33 55 859 271 179 984 93

65/79
Example: Radix Sort (3/3)

306 208 9 33 55 859 271 179 984 33

r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]

93

55

33 271

9 179 208 306 859 984

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

9 33 55 93 179 208 271 306 859 984

66/79
External Sorting
External Sorting

• There are some lists that are too large to fit in the memory of a
computer. So internal sorting is not possible.
• Some records are stored in the disk (tape, etc.). System retrieves a
block of data from a disk at a time. A block contains multiple
records.
• The most popular method for sorting on external storage devices is
merge sort.
• Segments of the input list are sorted.
• Sorted segments (called runs) are written onto external
storage.
• Runs are merged until only a run is left.
• Three factors contributing to the read/write time of disk:
• seek time
• latency time
• transmission time
67/79
Example: External Sort (1/2)

• Consider a computer which is capable of sorting 750 records is


used to sort 4500 records.
• Six runs are generated with each run sorting 750 records
=⇒ 4 passes

68/79
Example: External Sort (2/2)

ts = maximum seek time


t` = maximum latency time
trw = time to read or write one block of 250 records
tIO = time to input or output one block =ts + t` + trw
tIS = time to internally sort 750 records
ntm = time to merge n records from input buffers to the output buffer

Analysis:
Operation Time
1 read 18 blocks of input, 18tIO , internally sort, 36tIO + 6tIS
6tIS , write 18 blocks, 18tIO
2 merge runs 1 to 6 in pairs 36tIO + 4500tm
3 merge two runs of 1500 records each, 12 blocks 24tIO + 3000tm
4 merge one run of 3000 records with one run of 36tIO + 4500tm
1500 records

132tIO +12000tm +6tIS


69/79
k-Way Merging

• To merge m runs via 2-way merging will need dlog2 me + 1


passes.
• If we use higher order merge, the number of passes over would
be reduced.
• With k-way merge on m runs, we need dlogk me + 1 passes
over.
• But is it always true that the higher order of merging, the less
computing time we will have?
• Not necessary!
• k − 1 comparisons are needed to determine the next output.
• If loser tree is used to reduce the number of comparisons, we
can achieve complexity of O(n log2 m)
• The data block size reduced as k increases. Reduced block size
implies the increase of data passes over

70/79
Optimal Merging of Runs

15
2 4 5 15
5

2 4

weighted external path length weighted external path length


= 2 · 3 + 4 · 3 + 5 · 2 + 15 · 1 = 2 · 2 + 4 · 2 + 5 · 2 + 15 · 2
= 43 = 52

71/79
Huffman Algorithm

void huffman(treePointer heap[], int n)


{/* heap[1:n] */
treePointer tree;
int i;
initialize(heap, n);

for (i = 1; i < n; i++) {


MALLOC(tree, sizeof(*tree));
tree->leftChild = pop(&n);
tree->rightChild = pop(&n);
tree->weight = tree->leftChild->weight + tree->rightChild->weight;
push(tree,&n);
}
}

72/79
Example: Huffman Tree

2 3 5 7 9 13

10

5 5 5 16

2 3 2 3 9 7

(a) (b) (c)

23 39
(d) (e)

10 13 16 23

5 5 9 7 10 13

2 3 5 5

2 3
73/79
Application: File Compression

+ 100 characters Ü dlog2 100e = 7 bits


• Using codes for characters in a
Character Code Frequency Total Bits
file to reduce the file size
a 000 10 30
• Example: e 001 15 45
File contains only characters a, i 010 12 36
e, i, s, t, blank spaces, and s 011 3 9
newlines t 100 4 12
space 101 13 39
• Use three bits to code each newline 110 1 3
character
Total 58 174
• size = 3 · 58 = 174
• Question: Can we use other coding to reduce the file size?
à Yes, by allowing code length to vary from character to character
• Short codes for frequently occurring characters

74/79
Tree Representation of Code

P 0 1
Cost: i di fi
di : depth of code 0 1 0 1
fi : frequency of code
0 1 0 1 0 1
a e i s t sp nl

“ ”

0 1

0 1 0 1
nl
0 1 0 1 0 1
a e i s t sp
75/79
File Compression Problem

• Property of tree representation of optimal code


à all nodes either are leaves or have two children
• Prefix code
à No character code is a prefix of another character code
• No character is contained in non-leaf node
• Provide unambiguous decoding
• Problem of file compression
Find a full binary tree with minimum total cost, where all
characters are in the leaves.

76/79
Huffman’s Codes

Huffman’s algorithm for file compression


• A greedy method
• Outline of the algorithm:
• Maintain a forest of trees: Initially, each character is a tree
• Weight of tree = sum of frequencies of its leaves
• Select two trees T1 and T2 with smallest weights to form a
new tree with subtrees T1 and T2 until one tree remains

Character Frequency

a 10
e 15
i 12
s 3
t 4
space 13
newline 1

77/79
Steps of Huffman’s Algorithm

1 3 4 10 12 13 15
(a) (e) 18
nl s t a i sp e
8 a
4
(b) 4 10 12 13 15 4 t 25
nl s t a i sp e 15
nl s i sp e

(c) 8

4 t
10 12 13 15 (f) 33
nl s a i sp e
18 e
(d) 18
8 a
8 a
4 t 25
4 t
12 13 15 nl s i sp
nl s i sp e

78/79
Huffman Code Assignment

Character Code Frequency Total Bits


(g) 0 58 1
33 25 a 001 10 30
0 1 0 1 e 01 15 30
18 e i sp i 10 12 24
0 1
s 00001 3 15
8 a
0 1 t 0001 4 16
4 t space 11 13 26
0 1 newline 00000 1 5
nl s
Total 58 146

100010000101 à iase

79/79

You might also like