88 ALGORITHMS DESIGN AND ANALYSIS
4.4 Shell Sort
algorithm and is easy to
Invented in 1959 by Donald Shell, the shell sort is a relatively fast
large elements towards one end and smal
code. It attempts toroughly sort the data first, moving the data, each finer than the last. After
elements towards the other. It performs several passes over
the final pass, the data is fully sorted.
algorithm does not actually sort the data itself; it
It is important to note that the shell sort Usually an insertion or bubble sort is used to
increases the efficiency of other sorting algorithms.
the data at each step, but other algorithms can be used. Shell sort does not noticeably
arrange
and in some cases will actually reduce
benefit the faster algorithms (such as merge and quicksort),
their performance.
effectively combined with a
Sort routines that rely on directly swapping elements are most
where n can be any
shell sort. Shell sort quickly arranges data by sorting every nth elemernt,
data
number less than half the number of data. Once the initial sort is performed, n is reduced, and thethere
otherwise
is sorted again until nequals 1. It is vital that the data is finally sorted with n=l
may be out-of-order elements remaining.
So, the gap sequence is an integral part of the Shell sort algorithm. Anù increment sequence
will work, as long as the last increment is 1. The algorithm begins by performing a gap insertion
sort, with the gap being the first number in the gap sequence. It continues to perform a gap
insertion sort for each number in the sequence, until it finishes with a gap of 1. When the gap is 1,
the gap insertion sort is simply an ordinary insertion sort, guaranteeing that the final list is
sorted.
The gap sequence that was originally suggested by Donald Shell was to begin with N /2 and
to halve the number until it reaches 1. While this sequence provides significant performance
enhancements over the quadratic algorithms such as insertion sort, itcan be changed slightly to
further decrease the average and worst-case running times.
Selecting good values for n
Choosing n is not as difficult as it might seem. The only sequence you have to avoid is one
constructed with the powers of 2. Do not choose (for example) 16 as your first n, and then keep
dividing by 2 until you reach 1. It has been mathematically proven that using only numbers from
the power series (1, 2, 4,8, 16, 32,...]produces the worst sorting times.The fastest times are (on
average) obtained by choosing an initial nsomewhere close to the maximum allowed and
continually dividing by 2.2 until you reach 1or less. Remember always sort the data with n=l as
the last step.
Cxample The data in the table needs to be sorted into ascending order. For simplicity, we have chosen the
sequence (3, 2, 1) for our n. Elements in the table that have a white background are being
ignored in that step, Elements with a shaded background are the ones we are interested in. The
top line of each table shows the data before we performed that step, the bottom shows the data
afterwards.
ANALYSIS OF SIMPLE SORTING ALGORITHMS 89
In this example, we will assume there is a selection sort being used to do
the actual sorting.
8 4 5 7 8 3 2
The unsorted data
As mentioned above, we will be using an initial value of 3
for n.
maximum (which in thiscase is 4, because this is the largest number less thanThis is less than the
half the number of
elements we have).
We pretend that the only elements in the data set are
elements containing 8, 5 and 9
(shaded). Notice that this is every 3rd element (n is 3). After sorting
to the right of the ones we just looked at. Repeat the sort and these, we look at the elements
shift routine until all of the elements
have been looked at once. So if n=3, you will need to repeat this
step 3
Note that only the highlighted elements are ever changed; the ones times to sort every element.
with a white background are
totally ignored.
8 4 1 5 7 6 3 2
5 4 1 8 7 6 3 2
5 4 1 7 6 3 2
5 3 1 8 4 6 7 2
Do the same for 4, 7 and 3
5 1 8 6 9 7 2
3 1 8 4 2 9 7 6
As well as 1, 6, and 2.
Now that all of the elements have been sorted once with n=3, we repeat the process with the
next value of n i.e, 2. If you look carefully, there is a general congregation of large numbers at the
right hand side, and the smaller numbers are at the left. There are still quite a few misplaced
numbers (most notably 8, 2, 5 and 6), but it is better sorted than it was.
5 3 1 4 2 9 7 6
3 5 2 6 7
1 3 8 5 2 6 7 9
1 2 4 3 5 76 8
And the same for the even elements...
90 ALGORITHMS DESIGN AND
You can see nowthatthe datais almost completely sorted - after just .more
2 steps! Al
ANALYSIS
remains is to sort it again with n=1to fix up any elements that are still out of order.
we are just performing a normal sort, making sure every element in the dataset is in iits When n=lthat,
correct place.
You may wonder why don't we just skip to this step. Yes, doing that would work,
the selection and bubble sorts are fastest when the data is already sorted (or howevershell
close to). The
sort method orders the data in fewer steps than would be required for either of the aboove
methods,
3 5 7 6 8 9
23456789
1 2 3 4 5 678 9
Allsorted !
Algorithm
Input :an array aof length nwith array elements numbered 0to n-1
1. inc + round (n/2)
2. while inc > 0
3 for i =inc to n -1
temp - a i]
j+i
while j >inc and a [j-inc]>temp
a]- aj -inc]
j+j-inc
}
ai] temp
}
4. inc 4- round(inc/2.2)
Shell sort improves on the efficiency of insertion sort by quickly shifting values to ther
destination. Average sort time is O(n) while worst-case time is On15) In this method, instead
of sorting the entire array at once, the array is first divided into smaller segments e.g., ksegnent,
Then these segments are sorted separately using the insertion sort. The shell sort is also ca
diminishing increment sort because the value of kcontinuously decreases, where kis preterabiy*
prime number.
Shell sort is a sorting algorithm which requires O(n*) comparisons and exchangesin
the worst case.
ANALYSIS OF SIMPLE SORTING ALGORITHMS 91
4 Shell sort is a generalization of insertion sort, with
two important observations :
Insertion sort is efficient if the input is "almost sorted".
Insertion sort is inefficient, on average, because it moves values just one
at a time. position
4 Shellsort improves insertion sort by
positions.
comparing elements separated by a gap of several
a This lets an element take "bigger steps"
toward its expected position. Multiple passes
over the data are taken with smaller and smaller gap sizes.
The last step of Shell sort is a plain insertion sort, but by then,
the array of data is
guaranteed to be almost sorted.
Shell sort is built upon the concept of H-sorting. Alist is said to be
any position, every H-th item is in sorted position relative to the otherH-sorted when, starting at
items. That concept will
become clear as you work through an example. In the following example, we start by 3-sorting the
list from Figure. In other words, we will consider only every 3rd element and sort
relative to each other.
those items
Example
45 36 75 20 05 90 80 65 30 50 10 75 85
The distance between the elements to be compared = 3
The sub files generated with the distance of 3 are as follows :
Subfile 1 a [0] a [3] a [6] a [9] a [12]
Subfile 2 a {1] a [4] a [7] a [10]
Subfile 3 a [2] a [5] a [8] a [11]
Input to Pass 1 with distarnçe =3
45 36 75 20 05 80 65 30 50 75 85
Output of Pass 1 is input to Pass 2 and distance =2
20 05 30 45 I0 75 50 36 75 80 65 90 85
92 ALGORITHMS DESIGN AND ANALYSIO
Output of Pass 2 is input to Pass 3 and distance =1
45 50 75 65 90 75 90 85
10 05 20 36 30
Output of Pass 3
05 I0 20 30 36 45 50 65 75 75 80 85 90
Best Case
The best case in the shellsort is when the array is already sorted in the right order. The
number of comparisons is less.
Worst case
< The running time of Shell sort depends on the choice of increment sequence.
4 The problem with Shell's increments is that pairs of increments are not necessarily
relatively prime and smaller increments can have little effect.
160
140
120
Seconds
100
80
60
40
20
0
10 100 1000 10000 25000 50000 75000 100000
Empirical Analysis of Shell sort
Advantage of Shell sort is that its only efficient for medium size lists. For bigger lists, the
algorithm is not the best choice. Fastest of all O(N) sorting algorithms. 5 times faster than the
bubble sort and alittle over twice as fast as the insertion sort, its closest competitor.
Disadvantage of Shell sort is that it is a complex algorithm and it's not nearly as efficient as
the merge, heap, and quick sorts.
Analysis
Best Case. The best case in the shell sort is when the array is already sorted in the right
order. The number of comparisons is less.
ANALYSIS OF SIMPLE SORTING ALGORITHMS 93
The worse-case time complexity of shell sort depends on the incremnent
increments 1 4 13 40 121.., which is what is used here, the time complexity is sequence.
O(n2) ForForother
the
rements, time complexity is known to be O(n)and even O(n. lg (n), Neither tight upper
bounds on time complexity nor the best increment Sequence are known.
Because shell sort is based on insertion sort, shell sort inherits insertion sort's
properties. Theeadapation is not as dramatic because shell sort requires one pass through adaptive
the data
ar each increment, but it is significant.For the increment sequence shown
above, there are log:(n)
crements, so the time complexity for nearly sorted data is O(n. log,(n).
Because of its low overhead, relatively simple implementation, adaptive
sub-quadratic time complexity, shell sort may be a viable alternative to the O(n properties, and
lg(n)) sorting
algorithms for some applications when the data to be sorted is not very large.
4.5 Comparison of Various Sorting Algorithms
Sorting is a fundamental task that is performed by most computers. It is used frequently in a
large variety of important applications. All spreadsheet programs contain some sort of sorting
code. Database applications used by schools, banks, and other institutions all contain sorting code.
Because of the importance of sorting in these applications,dozens of sorting algorithms have been
developed over the decades with varying complexity. Slowsorting methods such as bubble sort,
insertion sort, and selection sort have theoretical complexity of O(N') Even though these
algorithms are very slow for sorting large arrays, the algorithm is simple, so they are not useless. If
an application only needs to sort small arrays, then it is satisfactory to use one of the simple slow
sorting algorithms as opposed to a faster, but more complicated sorting algorithm. For these
applications, the increase in coding time and probability of coding mistake in using the faster
sorting algorithm is not worth the speedup in execution time. Of course, if an application needs a
faster sorting algorithm, there are certainly many ones available, including quicksort, mergesort,
and heapsort. These algorithms have atheoretical complexity of O(N log N) They are much faster
than the O(N") algorithms and can sort large arrays in areasonable amount of time. However, the
cost of these fast sorting methods is that the algorithm is much more complex and is harder to
correctly code. But the result of the more complex algorithm is an efficient sorting method capable
of being used to sort very large arrays.
In addition to varying complexity, sorting algorithmsalso fallinto twobasic categories
comparison based and non-comparison based. A comparison based algorithm orders a sorting
array by weighing the value of one element against the value of other elements. Algorithms such as
quicksort, mergesort, heapsort, bubble sort, and insertion sort are comparison based.
Alternatively, anon- comparison based algorithm sorts an array without consideration of pairwise
data elements. Radix sort is a non-comparison based algorithm that treats the sorting elements as
numbers represented in a base-M number system, and then works with individual digits of M.
Description of Sorting Algorithms
Insertion sort
Insertion sort is good only for sorting small arrays (usually less than 100 items). In fact, the
Smaller the array, the faster insertion sort is compared to any other sorting algorithm. However,
being an O(n') algorithm, it becomes very slow very quick when the size of the array increases. It
Was used in the tests with arrays of size 100.