Unit IV SORTING Preliminaries – Insertion Sort – Shell sort – Heap sort – Merge sort –
Quicksort
Insertion sort
Insertion sort is a simple sorting algorithm that works by iteratively inserting
each element of an unsorted list into its correct position in a sorted portion of
the list. It is like sorting playing cards in your hands. You split the cards into
two groups: the sorted cards and the unsorted cards. Then, you pick a card
from the unsorted group and put it in the right place in the sorted group.
● We start with the second element of the array as the first element is
assumed to be sorted.
● Compare the second element with the first element if the second
element is smaller then swap them.
● Move to the third element, compare it with the first two elements,
and put it in its correct position
● Repeat until the entire array is sorted.
Complexity Analysis of Insertion Sort
Time Complexity
● Best case: O(n), If the list is already sorted, where n is the number
of elements in the list.
● Average case: O(n2), If the list is randomly ordered
● Worst case: O(n2), If the list is in reverse order
Space Complexity
● Auxiliary Space: O(1), Insertion sort requires O(1) additional space,
making it a space-efficient sorting algorithm.
Advantages and Disadvantages of Insertion Sort
Advantages
● Simple and easy to implement.
● Stable sorting algorithm.
● Efficient for small lists and nearly sorted lists.
● Space-efficient as it is an in-place algorithm.
● Adoptive. the number of inversion is directly proportional to the
number of swaps. For example, no swapping happens for a sorted
array and it takes O(n) time only.
Disadvantages
● Inefficient for large lists.
● Not as efficient as other sorting algorithms (e.g., merge sort, quick
sort) for most cases.
Applications of Insertion Sort
Insertion sort is commonly used in situations where:
● The list is small or nearly sorted.
● Simplicity and stability are important.
● Used as a subroutine in Bucket Sort Can be useful when the array is
already almost sorted.
● Since Insertion sort is suitable for small sized arrays, it is used in
Hybrid Sorting algorithm along with other efficient algorithms like
Quick Sort and Merge Sort. When the subarray size becomes small,
we switch to insertion sort in these recursive algorithms. For
example IntroSort and Timsort use insertion sort.
Shell Sort
Shell Sort is mainly a variation of Insertion Sort . In insertion sort, we move
elements only one position ahead. When an element has to be moved far
ahead, many movements are involved. The idea of ShellSort is to allow the
exchange of far items. In Shell sort, we make the array h-sorted for a large
value of h. We keep reducing the value of h until it becomes 1. An array is
said to be h-sorted if all sublists of every h’th element are sorted.
Algorithm:
Step 1 − Start
Step 2 − Initialize the value of gap size, say h.
Step 3 − Divide the list into smaller sub-part. Each must have equal intervals
to h.
Step 4 − Sort these sub-lists using insertion sort.
Step 5 – Repeat this step 2 until the list is sorted.
Step 6 – Print a sorted list.
Step 7 – Stop.
Time Complexity: Time complexity of the above implementation of Shell
sort is O(n2). In the above implementation, the gap is reduced by half in every
iteration. There are many other ways to reduce gaps which leads to better
time complexity. See For more details.
Worst Case Complexity
The worst-case complexity for shell sort is O(n2)
Best Case Complexity
When the given array list is already sorted the total count of comparisons of
each interval is equal to the size of the given array.
So best case complexity is Ω(n log(n))
Average Case Complexity
The Average Case Complexity: O(n*log n)~O(n1.25)
Space Complexity
The space complexity of the shell sort is O(1).
Shell Sort Applications
1. Replacement for insertion sort, where it takes a long time to complete a
given task.
2. To call stack overhead we use shell sort.
3. When recursion exceeds a particular limit we use shell sort.
4. For medium to large-sized datasets.
5. In insertion sort to reduce the number of operations.
Heap Sort
Heap sort is a comparison-based sorting technique based on Binary Heap
data Structure. It can be seen as an optimization over Selection sort where
we first find the max (or min) element and swap it with the last (or first). We
repeat the same process for the remaining elements. In Heap Sort, we use
Binary Heap so that we can quickly find and move the max element in O(Log
n) instead of O(n) and hence achieve the O(n Log n) time complexity.
Heap Sort Algorithm
First convert the array into a max heap using heapify , Please note that this
happens in-place. The array elements are rearranged to follow heap
properties. Then one by one delete the root node of the Max-heap and
replace it with the last node and heapify. Repeat this process while the size
of the heap is greater than 1.
● Rearrange array elements so that they form a Max Heap.
● Repeat the following steps until the heap contains only one
element:
○ Swap the root element of the heap (which is the
largest element in the current heap) with the last
element of the heap.
○ Remove the last element of the heap (which is now
in the correct position). We mainly reduce heap size
and do not remove elements from the actual array.
○ Heapify the remaining elements of the heap.
● Finally we get a sorted array.
Detailed Working of Heap Sort
● Step 1: Treat the Array as a Complete Binary Tree
● We first need to visualize the array as a complete binary tree. For an
array of size n, the root is at index 0, the left child of an element at
index i is at 2i + 1, and the right child is at 2i + 2.
●
Step 2: Build a Max Heap
Step 3: Sort the array by placing the largest element at the end of
the unsorted array.
In the illustration above, we have shown some steps to sort the array. We
need to keep repeating these steps until there’s only one element left in the
heap.
Complexity Analysis of Heap Sort
Time Complexity: O(n log n)
Auxiliary Space: O(log n), due to the recursive call stack. However, auxiliary
space can be O(1) for iterative implementation.
Important points about Heap Sort
● Heap sort is an in-place algorithm.
● Its typical implementation is not stable but can be made stable (See
this)
● Typically 2-3 times slower than well-implemented QuickSort. The
reason for slowness is a lack of locality of reference.
Advantages of Heap Sort
● Efficient Time Complexity: Heap Sort has a time complexity of O(n
log n) in all cases. This makes it efficient for sorting large datasets.
The log n factor comes from the height of the binary heap, and it
ensures that the algorithm maintains good performance even with a
large number of elements.
● Memory Usage: Memory usage can be minimal (by writing an
iterative heapify() instead of a recursive one). So apart from what is
necessary to hold the initial list of items to be sorted, it needs no
additional memory space to work
● Simplicity: It is simpler to understand than other equally efficient
sorting algorithms because it does not use advanced computer
science concepts such as recursion.
Disadvantages of Heap Sort
● Costly: Heap sort is costly as the constants are higher compared to
merge sort even if the time complexity is O(n Log n) for both.
● Unstable: Heap sort is unstable. It might rearrange the relative
order.
● Inefficient: Heap Sort is not very efficient because of the high
constants in the time complexity.
Merge Sort – Data Structure and Algorithms
Merge sort is a popular sorting algorithm known for its efficiency and
stability. It follows the Divide and conquer approach. It works by recursively
dividing the input array into two halves, recursively sorting the two halves
and finally merging them back together to obtain the sorted array.
How does Merge Sort work?
Here’s a step-by-step explanation of how merge sort works:
1. Divide: Divide the list or array recursively into two halves until it can
no longer be divided.
2. Conquer: Each subarray is sorted individually using the merge sort
algorithm.
3. Merge: The sorted subarrays are merged back together in sorted
order. The process continues until all elements from both subarrays
have been merged.
Illustration of Merge Sort:
Let’s sort the array or list [38, 27, 43, 10] using Merge Sort
Let’s look at the working of above example:
Divide:
● [38, 27, 43, 10] is divided into [38, 27 ] and [43, 10] .
● [38, 27] is divided into [38] and [27] .
● [43, 10] is divided into [43] and [10] .
Conquer:
● [38] is already sorted.
● [27] is already sorted.
● [43] is already sorted.
● [10] is already sorted.
Merge:
● Merge [38] and [27] to get [27, 38] .
● Merge [43] and [10] to get [10,43] .
● Merge [27, 38] and [10,43] to get the final sorted list [10, 27, 38,
43]
Therefore, the sorted list is [10, 27, 38, 43] .
Complexity Analysis of Merge Sort
● Time Complexity:
○ Best Case: O(n log n), When the array is already
sorted or nearly sorted.
○ Average Case: O(n log n), When the array is
randomly ordered.
○ Worst Case: O(n log n), When the array is sorted in
reverse order.
● Auxiliary Space: O(n), Additional space is required for the
temporary array used during merging.
Advantages and Disadvantages of Merge Sort
Advantages
● Stability : Merge sort is a stable sorting algorithm, which means it
maintains the relative order of equal elements in the input array.
● Guaranteed worst-case performance: Merge sort has a worst-case
time complexity of O(N logN) , which means it performs well even
on large datasets.
● Simple to implement: The divide-and-conquer approach is
straightforward.
● Naturally Parallel : We independently merge subarrays that makes
it suitable for parallel processing.
Disadvantages
● Space complexity: Merge sort requires additional memory to store
the merged sub-arrays during the sorting process.
● Not in-place: Merge sort is not an in-place sorting algorithm, which
means it requires additional memory to store the sorted data. This
can be a disadvantage in applications where memory usage is a
concern.
● Merge Sort is Slower than QuickSort in general as QuickSort is
more cache friendly because it works in-place.
Quick Sort
QuickSort is a sorting algorithm based on the Divide and Conquer that picks
an element as a pivot and partitions the given array around the picked pivot
by placing the pivot in its correct position in the sorted array.
It works on the principle of divide and conquer, breaking down the problem
into smaller sub-problems.
There are mainly three steps in the algorithm:
1. Choose a Pivot: Select an element from the array as the pivot. The
choice of pivot can vary (e.g., first element, last element, random
element, or median).
2. Partition the Array: Rearrange the array around the pivot. After
partitioning, all elements smaller than the pivot will be on its left,
and all elements greater than the pivot will be on its right. The pivot
is then in its correct position, and we obtain the index of the pivot.
3. Recursively Call: Recursively apply the same process to the two
partitioned sub-arrays (left and right of the pivot).
4. Base Case: The recursion stops when there is only one element left
in the sub-array, as a single element is already sorted.
Here’s a basic overview of how the QuickSort algorithm works.
Choice of Pivot
There are many different choices for picking pivots.
● Always pick the first or last element as a pivot . The below
implementation picks the last element as pivot. The problem with
this approach is it ends up in the worst case when the array is
already sorted.
● Pick a random Element as a pivot. This is a preferred approach
because it does not have a pattern for which the worst case
happens.
● Pick the median element is pivot. This is an ideal approach in terms
of time complexity as we can find the median in linear time and the
partition function will always divide the input array into two halves.
But it takes more time on average as median finding has high
constants.
Partition Algorithm
The key process in quickSort is a partition(). There are three common
algorithms to partition. All these algorithms have O(n) time complexity.
1. Naive Partition : Here we create a copy of the array. First put all
smaller elements and then all greater. Finally we copy the
temporary array back to the original array. This requires O(n) extra
space.
2. Lomuto Partition: We have used this partition in this article. This is a
simple algorithm, we keep track of the index of smaller elements
and keep swapping. We have used it here in this article because of
its simplicity.
3. Hoare’s Partition: This is the fastest of all. Here we traverse array
from both sides and keep swapping greater elements on left with
smaller on right while the array is not partitioned. Please refer to
Hoare's
Illustration of QuickSort Algorithm
In the previous step, we looked at how the partitioning process rearranges
the array based on the chosen pivot. Next, we apply the same method
recursively to the smaller sub-arrays on the left and right of the pivot. Each
time, we select new pivots and partition the arrays again. This process
continues until only one element is left, which is always sorted. Once every
element is in its correct position, the entire array is sorted.
Below image illustrates, how the recursive method calls for the smaller
sub-arrays on the left and right of the pivot:
Quick Sort is a crucial algorithm in the industry, but there are other sorting
algorithms that may be more optimal in different cases.
Complexity Analysis of Quick Sort
Time Complexity:
● Best Case: (Ω(n log n)), Occurs when the pivot element divides the
array into two equal halves.
● Average Case (θ(n log n)), On average, the pivot divides the array
into two parts, but not necessarily equal.
● Worst Case: (O(n²)), Occurs when the smallest or largest element is
always chosen as the pivot (e.g., sorted arrays).
Auxiliary Space: O(n), due to recursive call stack
Advantages of Quick Sort
● It is a divide-and-conquer algorithm that makes it easier to solve
problems.
● It is efficient on large data sets.
● It has a low overhead, as it only requires a small amount of memory
to function.
● It is Cache Friendly as we work on the same array to sort and do not
copy data to any auxiliary array.
● Fastest general purpose algorithm for large data when stability is
not required.
● It is Tail recursive and hence all the Tail call Optimization can be
done.
Disadvantages of Quick Sort
● It has a worst-case time complexity of O(n2), which occurs when the
pivot is chosen poorly.
● It is not a good choice for small data sets.
● It is not a stable sort, meaning that if two elements have the same
key, their relative order will not be preserved in the sorted output in
case of quick sort, because here we are swapping elements
according to the pivot's position (without considering their original
positions).