LECTURE 2 SORTING
Beakal Gizachew
Based on slides of Serdar Taşıran, Shafi Goldwasser, Erik Demaine, and Alptekin Küpçü
SORTING
• Sorting
   • Input: a sequence of n numbers <a1,a2,…,an>
   • Output: a re-ordering <a’1,a’2,…,a’n> of the sequence such that a’1  a’2
      …  a’n
   • Example:
       • Input 8 2 4 9 3 6
       • Output 2 3 4 6 8 9
       • Called an instance of the problem
• Check these demos:
    • http://www.iti.fh-
      flensburg.de/lang/algorithmen/sortieren/sortcontest/sortcontest.htm
    • http://www.cs.oswego.edu/~mohammad/classes/csc241/samples/sort/Sor
      t2-E.html
    • http://www.cs.ubc.ca/~harrison/Java/sorting-demo.html
    • http://www.cs.pitt.edu/~kirk/cs1501/animations/Sort3.html
                                                                            2
INSERTION SORT
• Takes array A[1..n] containing a sequence of length n to be sorted
• Sorts the array in place
  • Numbers re-arranged inside A with at most a constant number of
    them stored outside.
  • O(1) extra space
                                                                       3
4
    Insertion Sort
                     To insert 12, we need to make
                     room for it by moving first 36
                     and then 24.
5
    Insertion Sort
6
    Insertion Sort
7
    Insertion Sort
INSERTION SORT
                  INSERTION-SORT (A, n) ⊳ A[1 . . n]
                    for j ← 2 to n
                          do key ← A[ j ]
                             i←j–1
 “pseudocode”
                             while i > 0 and A[ i ] > key
                                  do A[i+1] ← A[ i]
                                      i←i–1
                             A[i+1] = key
     1     i          j                       n
A:
                    key
         sorted
                                                       8
EXAMPLE OF INSERTION SORT
     8   2   4   9   3   6
     2   8   4   9   3   6
     2   4   8   9   3   6
     2   4   8   9   3   6
     2   3   4   8   9   6
     2   3   4   6   8   9   done
                                    9
EXAMPLE OF INSERTION SORT
     8   2   4   9   3   6
     2   8   4   9   3   6
     2   4   8   9   3   6
     2   4   8   9   3   6
     2   3   4   8   9   6
     2   3   4   6   8   9   done
                                    10
CORRECTNESS OF INSERTION
SORT
               INSERTION-SORT (A, n) ⊳ A[1 . . n]
                  for j ← 2 to n
                        do key ← A[ j ]
                           i←j–1
                           while i > 0 and A[ i ] > key
                                do A[i+1] ← A[ i]
                                    i←i–1
                           A[i+1] = key
• Loop invariant:
   • At the start of each iteration, the subarray A[1..j-1] containsthe
     elements originally in A[1..j-1] but in sorted (increasing) order
• Prove initialization, maintenance, termination
• Invariant must imply interesting property about algorithm
                                                                          11
RUNNING TIME
• The running time of insertion sort depends on the input: e.g., an
  already sorted sequence is easier to sort.
• Major Simplifying Convention: Parameterize the running time by
  the size of the input, since short sequences are easier to sort than
  long ones.
  ➢ TA(n) = time of A when run with inputs of length n
       ➢ n: Number of bits required to encode input
  ➢ Ignore machine-dependent constants
  ➢ Look at growth of T(n) as n → ∞
       ➢ For small inputs, the algorithm will run fast anyway. Important
         thing is for how big inputs we can still run the algorithm given a
         reasonable amount of time.
                                                                              12
KINDS OF ANALYSES
• Worst-case: (mostly-used)
  • T(n) = maximum time of algorithm on any input of size n.
  • Generally, we seek upper bounds on the running time, to have a
    guarantee of performance.
• Average-case: (sometimes used)
   • T(n) = expected time of algorithm over all inputs of size n.
   • Need assumption of statistical distribution of real inputs.
• Best-case: (almost never used)
   • T(n) = best possible time of algorithm over any input of size n.
   • Cheat with a slow algorithm that works fast on some input.
   • May be useful when proving negative results
        • e.g., Even the best case of algorithm X is slower than the worst
          case of algorithm Y.
                                                                             13
           Analysis of Insertion Sort
INSERTION-SORT(A)                                                     cost       times
    for j ← 2 to n                                               c1                 n
         do key ← A[ j ]                                         c2                n-1
            Insert A[ j ] into the sorted sequence A[1 . . j -1] 0                 n-1
               i←j-1                                             c4                n-1
                                                                                   
                                                                                      n
               while i > 0 and A[i] > key                        c5                   j =2 j
                                                                                              t
                                                                 c6                
                                                                                      n
                    do A[i + 1] ← A[i]                                                j =2
                                                                                              (t j − 1)
                                                                 c7                
                                                                                      n
                          i←i–1                                                        j =2
                                                                                              (t j − 1)
              A[i + 1] ← key                                     c8                 n-1
           tj: # of times the while statement is executed at iteration j
T (n) = c1n + c2 (n − 1) + c4 (n − 1) + c5  t j + c6  (t j − 1) + c7  (t j − 1) + c8 (n − 1)
                                            n           n               n
                                           j =2        j =2            j =2
                                                                                              14
INSERTION SORT ANALYSIS
Worst case: Input reverse sorted.
      T(n) = σ 𝒏 Θ(𝒋) = Θ(n2)            [arithmetic series]
                𝒋=𝟐
Average case: All permutations equally likely.
      T(n) = σ 𝒏 Θ(𝒋ൗ     𝟐) =   Θ(n2)
                 𝒋=𝟐
Best case: Input already sorted. O(n)
Is insertion sort a fast sorting algorithm?
• Moderately so, for small n.
• Not at all, for large n.
                                                           15
HOW TO DESIGN ALGORITHMS?
• We’ll see many example styles throughout the semester
• Insertion sort was an example of an “incremental algorithm”
• Another common paradigm: Divide and Conquer
  • Divide into simpler/smaller sub-problems
  • Solve (conquer) sub-problems recursively
  • Combine results of sub-problems
                                                                16
MERGE SORT
MERGE-SORT A[1 . . n]
   If n = 1, return
   Recursively sort
           A[ 1 . . n/2 ] and A[ n/2+1 . . n ]
   Merge the two sorted lists
    Key subroutine: MERGE
                                                     17
MERGE SORT EXAMPLE
                     18
MERGING TWO SORTED ARRAYS
20 12
13 11
7   9
2   1
                            19
MERGING TWO SORTED ARRAYS
20 12
13 11
7       9
2       1
                            20
MERGING TWO SORTED ARRAYS
20 12       20 12
13 11       13 11
7       9   7   9
2       1   2
                            21
MERGING TWO SORTED ARRAYS
20 12       20 12
13 11       13 11
7       9   7       9
2       1   2
    1           2
                            22
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12
13 11       13 11       13 11
7       9   7       9   7   9
2       1   2
    1           2
                                23
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12
13 11       13 11       13 11
7       9   7       9   7       9
2       1   2
    1           2           7
                                    24
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12       20 12
13 11       13 11       13 11       13 11
7       9   7       9   7       9      9
2       1   2
    1           2           7
                                            25
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12       20 12
13 11       13 11       13 11       13 11
7       9   7       9   7       9        9
2       1   2
    1           2           7        9
                                             26
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12       20 12    20 12
13 11       13 11       13 11       13 11    13 11
7       9   7       9   7       9        9
2       1   2
    1           2           7        9
                                                     27
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12       20 12    20 12
13 11       13 11       13 11       13 11    13 11
7       9   7       9   7       9        9
2       1   2
    1           2           7        9        11
                                                     28
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12       20 12    20 12   20 12
13 11       13 11       13 11       13 11    13 11   13
7       9   7       9   7       9        9
2       1   2
    1           2           7        9        11
                                                             29
MERGING TWO SORTED ARRAYS
20 12       20 12       20 12       20 12    20 12   20 12
13 11       13 11       13 11       13 11    13 11   13
7       9   7       9   7       9        9
2       1   2
    1           2           7        9        11      12
                                                             30
MERGING TWO SORTED ARRAYS
20 12       20 12        20 12       20 12    20 12   20 12
13 11       13 11        13 11       13 11    13 11   13
7       9   7       9    7       9        9
2       1   2
    1           2            7        9         11     12
                Time = O(n) to merge a total of n
                elements (linear time).
                                                              31
ANALYZING MERGE SORT
 T(n)       MERGE-SORT A[1 . . n]
 O(1)            If n = 1, return
 2T(n/2)         Recursively sort
                         A[ 1 . . n/2 ] and A[ n/2+1 . . n ]
 O(n)            Merge the two sorted lists
 Sloppiness: Should be T( n/2 ) + T( n/2 ) , but it
 turns out this does not matter asymptotically.
                                                                   32
RECURRENCE FOR MERGE SORT
                   O(1) if n = 1
        T(n) =
                   2T(n/2) + O(n) if n > 1
Note: Usually the base case T(1) = O(1)
                                             33
RECURSION TREE
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
                                                      34
RECURSION TREE
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
                           T(n)
                                                      35
RECURSION TREE
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
                            cn
              T(n/2)                 T(n/2)
                                                      36
RECURSION TREE
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
                                 cn
                 cn/2                          cn/2
        T(n/4)          T(n/4)        T(n/4)          T(n/4)
                                                               37
 RECURSION TREE
  Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
                                   cn                         cn
                     cn/2                      cn/2           cn
h = log n                               cn/4
              cn/4          cn/4                      cn/4    cn
                                                             …
            O(1)              #leaves = n                    O(n)
                                                  Total = O(n log n)
                                                                   38
CONCLUSIONS
• O(n log n) grows more slowly than O(n2).
• Therefore, merge sort asymptotically beats insertion sort in the
  worst case.
• In practice, merge sort beats insertion sort for n > 30 or so.
                                                                     39
MASTER THEOREM
• Let T(n) = a T(n/b) + f(n) where
  • a  1, b > 1 are constants
  • f(n) is an asymptotically positive function
• Then, T(n) can be bounded asymptotically as follows
  • If f(n)=O(𝒏𝐥𝐨𝐠𝒃 𝒂−𝜺) for some  > 0, then T(n)=Θ(𝒏𝐥𝐨𝐠𝒃 𝒂 )
  • If f(n)=Θ(𝒏𝐥𝐨𝐠𝒃 𝒂 ) then T(n)=Θ(𝒏𝐥𝐨𝐠𝒃 𝒂 log n)
  • If f(n)=Ω(𝒏𝐥𝐨𝐠𝒃 𝒂+ɛ) for some  > 0, and if, for all sufficiently large n
    and a constant c < 1 we have af(n/b)   cf(n) , then T(n)=Θ(f(n))
                                                                                40