Sorting and Searching Algorithms
1. Sorting in General
• Sorting is the process that organizes collections of data into either ascending or descending order.
• Many of the applications you will deal with will require sorting; it is easier to understand data if it
is organized.
• There are two categories of sorting algorithms
- Internal sorting requires that all of your data fit into memory (an array or a LL).
- External sorting is used when your data can't fit into memory all at once (maybe you have
a very large database), and so sorting is done using disk files.
• The efficiency of our algorithms is dependent on the
- number of comparisons we have to make with our keys.
- In addition, we will learn that sorting will also depend on how frequently we must move
items around numerically or alphabetically in order.
Types of Sort Algorithms
1. Selection Sort
Selection sort is a sort algorithm, specifically an in-place comparison sort. It has O(n2) time
complexity, making it inefficient on large lists, and generally performs worse than the similar
insertion sort. Selection sort is noted for its simplicity, and it has performance advantages over
more complicated algorithms in certain situations, particularly where auxiliary memory is limited.
• Imagine the case where we can look at all of the data at once...and to sort it...find the largest item
and put it in its correct place.
• Then, we find the second largest and put it in its place, etc.
• If we were to think about cards again,
– it would be like looking at our entire hand of cards and ordering them by selecting the largest
first and putting at the end, and then selecting the rest of the cards in order of size.
• This is called a selection sort.
- It means that to sort a list, we first need to search for the largest key.
- Because we will want the largest key to be at the last position, we will need to swap the
last item with the largest item.
• The next step will allow us to ignore the last (i.e. largest) item in the list.
– This is because we know that it is the largest item...we don't have to look at it again or move it
again!
• So, we can once again search for the largest item...in a list one smaller than before.
• When we find it, we swap it with the last item (which is really the next to the last item in the
original list).
• We continue doing this until there is only 1 item left.
Here is an example of this sort algorithm sorting five elements:
64 25 12 22 11 // this is the initial, starting state of the array
11 25 12 22 64 // sorted sublist = {11}
11 12 25 22 64 // sorted sublist = {11, 12}
11 12 22 25 64 // sorted sublist = {11, 12, 22}
11 12 22 25 64 // sorted sublist = {11, 12, 22, 25}
11 12 22 25 64 // sorted sublist = {11, 12, 22, 25, 64}
C++ Implementation
void selectionSort(int A[],int n){
for(int i=0;i<n-1;i++){
int iMin=i;
for(int j=i+1;j<n;j++){
if(A[j]<A[iMin])
iMin=j;
}
if(iMin!=i){
swap(A[i],A[iMin]);
}
}
}
Example 2.
29, 64, 73, 34, 20,
20, 64, 73, 34, 29,
20, 29, 73, 34, 64
20, 29, 34, 73, 64
20, 29, 34, 64, 73
• Selection sort requires both comparisons and exchanges (i.e., swaps).
- Start analyzing it by counting the number of comparisons and exchanges for an array of N
elements.
-Remember the selection sort first searches for the largest key and swaps the last item with
the largest item found.
• Remember that means for the first time around there would be N-1 comparisons.
- The next time around there would be N-2 comparisons (because we can exclude comparing
the previously found largest! Its already in the correct spot!).
- The third time around there would be N-3 comparisons.
- So...the number of comparisons would be: (N-1)+(N-2)+...+ 1 = N*(N-1)/2
• Next, think about exchanges.
• Every time we find the largest...we perform a swap.
• This causes 3 data moves (3 assignments).
• This happens N-1 times!
• Therefore, a selection sort of N elements requires 3*(N-1) moves.
• Lastly, put all of this together: N*(N-1)/2 + 3*(N-1)
• which is: N*N/2 - N/2 + 6N/2 - 3
• which is: N*N/2 + 5N/2 - 3
• Put this in perspective of what we learned about with the BIG O method: 1/2 N*N + O(N)
• Or, O(N2)
• Given this, we can make a couple of interesting observations.
- The efficiency DOES NOT depend on the initial arrangement of the data.
- This is an advantage of the selection sort.
- However, O(N2) grows very rapidly, so the performance gets worse quickly as the number
of items to sort increases.
2. Bubble Sort
Bubble sort, is a simple sort algorithm that repeatedly steps through the list to be sorted, compares
each pair of adjacent items and swap them if they are in the wrong order. The pass through the list
is repeated until no swaps are needed, which indicates that the list is sorted. The algorithm, which is
a comparison sort, is named for the way smaller elements "bubble" to the top of the list. Although
the algorithm is simple, it is too slow and impractical for most problems even when compared to
insertion sort.
• The bubble sort simply compares adjacent elements and exchanges them if they are out of order.
• To do this, you need to make several passes over the data.
• During the first pass, you compare the first two elements in the list.
- If they are out of order, you exchange them.
- Then you compare the next pair of elements (positions 2 and 3).
- If they are out of order, you exchange them.
- This algorithm continues comparing and exchanging pairs of elements until you reach the
end of the list.
• Notice that the list is not sorted after this first pass.
- We have just "bubbled" the largest element up to its proper position at the end of the list!
- During the second pass, you do the exact same thing....but excluding the largest (last)
element in the array since it should already be in sorted order.
- After the second pass, the second largest element in the array will be in its proper position
(next to the last position).
Here is example of bubble sort
First Pass
( 5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5 > 1.
( 1 5 4 2 8 ) ( 1 4 5 2 8 ), Swap since 5 > 4
( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 > 2
( 1 4 2 5 8 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does
not swap them.
Second Pass
(14258) (14258)
(14258) ( 1 2 4 5 8 ), Swap since 4 > 2
(12458) (12458)
(12458) (12458)
Now, the array is already sorted, but the algorithm does not know if it is completed. The algorithm
needs one whole pass without any swap to know it is sorted.
Third Pass
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)
C++ Implementation
void bubbleSort(int B[],int n){
for(int k=0;k<n;k++){
int flag=0;
for(int i=0;i<n-1;i++){
if(B[i]>B[i+1]){
swap(B[i],B[i+1]);
flag=1;
}
}
if(flag==0){
break;
}
}
}
3. Insertion Sort Algorithm
Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a
time. It is much less efficient on large lists than more advanced algorithms such as quicksort,
heapsort, or merge sort. However, insertion sort provides several advantages:
Efficient for (quite) small data sets, much like other quadratic sorting algorithms
More efficient in practice than most other simple quadratic (i.e., O(n2)) algorithms such as
selection sort or bubble sort
Adaptive, i.e., efficient for data sets that are already substantially sorted: the time
complexity is O(nk) when each element in the input is no more than k places away from its
sorted position
Stable; i.e., does not change the relative order of elements with equal keys
In-place; i.e., only requires a constant amount O(1) of additional memory space
• The insertion sort divides the data into a sorted and unsorted region.
• Initially, the entire list is unsorted.
• Then, at each step, the insertion sort takes the first item of the unsorted region and places it into its
correct position in the sorted region.
Example: The following table shows the steps for sorting the sequence {3, 7, 4, 9, 5, 2, 6, 1}. In
each step, the key under consideration is underlined. The key that was moved (or left in place
because it was biggest yet considered) in the previous step is shown in bold.
37495261
37495261
37495261
34795261
34795261
34579261
23457961
23456791
12345679
2. Introduction to Search Algorithms
• A search algorithm is a method of locating a specific item of information in a larger
collection of data. This section discusses two algorithms for searching the contents of an
array.
2.1 The Linear Search
• This is a very simple algorithm.
• It uses a loop to sequentially step through an array, starting with the first element.
• It compares each element with the value being searched for and stops when that value is
found or the end of the array is reached.
Implementation linear search
Function:
int linearSearch(int A[],int n,int key){
for(int i=0;i<n;i++){
if(key==A[i]){
return i;
}
}
return -1;
}
Function for Linked List:
void addbegn(){
node *temp;
temp=new node;
cout<<"Enter Id :";
cin>>temp->id;
cout<<"Enter Age :";
cin>>temp->age;
cout<<"Enter Name :";
cin>>temp->name;
temp->next=NULL;
if(Head == NULL)
Head = temp;
else
{
temp->next = Head;
Head = temp;
}
}
void searchid(){
int k;
cout<<"Enter id you want :"<<endl;
cin>>k;
cout<<"_________________________"<<endl;
node *temp=new node;
if(Head != NULL)
temp=Head;
while(temp != NULL){
if(temp->id == k){
cout<<setw(10)<<"ID"<<setw(14)<<"AGE"<<setw(13)<<"NAME"<<endl;
cout<<setw(10)<<temp->id<<setw(15)<<temp->age<<setw(15)<<temp-
>name<<endl;
cout<<"__________________________"<<endl;
return;
}
temp=temp->next;
}
cout<<"the key is not found!!!!!\n\n";
}
void searchname(){
char k[30];
cout<<"Enter Name You Want :";
cin>>k;
node *temp=new node;
if(Head != NULL)
temp=Head;
while(temp != NULL){
if(strcmp(temp->name,k)==0){
cout<<setw(10)<<"ID"<<setw(14)<<"AGE"<<setw(13)<<"NAME"<<endl;
cout<<setw(10)<<temp->id<<setw(15)<<temp->age<<setw(15)<<temp-
>name<<endl;
cout<<"__________________________"<<endl;
return;
}
temp=temp->next;
}
cout<<"the key is not found!!!!!\n\n";
}
void display(){
node *temp=new node;
if(Head != NULL){
temp=Head;
cout<<"The Elements Are : "<<endl;
cout<<"_______________________"<<endl;
cout<<setw(11)<<"ID"<<setw(16)<<"AGE"<<setw(12)<<"NAME"<<endl;
while(temp != NULL)
{
cout<<setw(10)<<temp->id<<setw(15)<<temp->age<<setw(15)<<temp->name<<endl;
cout<<"-------------------------------------------------"<<endl;
temp=temp->next;
}
cout<<"________________________"<<endl;
}
else
cout<<"the list is empty!!!\n";
}
Efficiency of the Linear Search
• The advantage is its simplicity.
– It is easy to understand
– Easy to implement
– Does not require the array to be in order
• The disadvantage is its inefficiency
– If there are 20,000 items in the array and what you are looking for is in the 19,999th
element, you need to search through the entire list.
2.2 Binary search algorithm
Generally, to find a value in unsorted array, we should look through elements of an array one by
one, until searched value is found. In case of searched value is absent from array, we go through all
elements. In average, complexity of such an algorithm is proportional to the length of the array.
Situation changes significantly, when array is sorted. If we know it, random access capability can be
utilized very efficiently to find searched value quick. Cost of searching algorithm reduces to binary
logarithm of the array length. For reference, log2(1 000 000) ≈ 20. It means, that in worst case,
algorithm makes 20 steps to find a value in sorted array of a million elements or to say, that it
doesn't present it the array.
Binary search is a fast search algorithm with run-time complexity of Οlogn. This search algorithm
works on the principle of divide and conquer. For this algorithm to work properly the data
collection should be in sorted form.
Binary search search a particular item by comparing the middle most item of the collection. If
match occurs then index of item is returned. If middle item is greater than item then item is
searched in sub-array to the right of the middle item other wise item is search in sub-array to the
left of the middle item. This process continues on sub-array as well until the size of sub-array
reduces to zero.
Algorithm
Algorithm is quite simple. It can be done either recursively or iteratively:
1. get the middle element;
2. if the middle element equals to the searched value, the algorithm stops;
3. otherwise, two cases are possible:
• searched value is less, than the middle element. In this case, go to the step 1 for the
part of the array, before middle element.
• searched value is greater, than the middle element. In this case, go to the step 1 for
the part of the array, after middle element.
Function:
int binarySearch(int A[],int n,int key){
int low=0;
int high=n-1;
int mid;
while(low<=high){
mid=(low+high)/2;
if(A[mid]==key){
return mid;
}else if(A[mid]<key){
high=mid-1;
}else if(A[mid]>key){
low=mid+1;
}
}
return -1;
}
Tree and Graph Data Structure
1. Tree Data Structure
In computer science, a tree is a widely used abstract data type (ADT)—or data structure
implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees
of children with a parent node, represented as a set of linked nodes.
A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root
node), where each node is a data structure consisting of a value, together with a list of references to
nodes (the "children"), with the constraints that no reference is duplicated, and none points to the
root.
A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2 and 6, and
one parent, labeled 2. The root node, at the top, has no parent.
Terminologies used in Trees
Root
The top node in a tree.
Child
A node directly connected to another node when moving away from the Root.
Parent
The converse notion of a child.
Siblings
A group of nodes with the same parent.
Descendant
A node reachable by repeated proceeding from parent to child.
Ancestor
A node reachable by repeated proceeding from child to parent.
Leaf
A node with no children.
Internal node
A node with at least one child.
External node
A node with no children.
Degree
Number of sub trees of a node.
Edge
Connection between one node to another.
Path
A sequence of nodes and edges connecting a node with a descendant.
Level
The level of a node is defined by 1 + (the number of connections between the node and the
root).
Height of node
The height of a node is the number of edges on the longest path between that node and a leaf.
Height of tree
The height of a tree is the height of its root node.
Depth
The depth of a node is the number of edges from the node to the tree's root node.
Forest
A forest is a set of n ≥ 0 disjoint trees.
The Most Commonly Used Tree Data Structure is Binary Tree.
Binary Tree
A binary tree is made of nodes, where each node contains a "left" reference, a "right" reference, and
a data element. The topmost node in the tree is called the root. Every node (excluding a root) in a
tree is connected by a directed edge from exactly one other node. This node is called a parent.
Every node (excluding a root) in a tree is connected by a directed edge from exactly one other node.
This node is called a parent. On the other hand, each node can be connected to arbitrary number of
nodes, called children. Nodes with no children are called leaves, or external nodes. Nodes which are
not leaves are called internal nodes. Nodes with the same parent are called siblings.
More tree terminology:
The depth of a node is the number of edges from the root to the node.
The height of a node is the number of edges from the node to the deepest leaf.
The height of a tree is a height of the root.
A full binary tree.is a binary tree in which each node has exactly zero or two children.
A complete binary tree is a binary tree, which is completely filled, with the possible
exception of the bottom level, which is filled from left to right.
Advantages of trees
Trees are so useful and frequently used, because they have some very serious advantages:
Trees reflect structural relationships in the data
Trees are used to represent hierarchies
Trees provide an efficient insertion and searching
Trees are very flexible data, allowing to move subtrees around with minumum effort
Binary Search Tree Traversals
A traversal is a process that visits all the nodes in the tree. Since a tree is a nonlinear data structure,
there is no unique traversal. We will consider several traversal algorithms with we group in the
following two kinds
depth-first traversal
breadth-first traversal
There are three different types of depth-first traversals, :
PreOrder traversal - visit the parent first and then left and right children;
InOrder traversal - visit the left child, then the parent and the right child;
PostOrder traversal - visit left child, then the right child and then the parent;
There is only one kind of breadth-first traversal--the level order traversal. This traversal visits nodes
by levels from top to bottom and from left to right.
As an example consider the following tree and its four traversals:
PreOrder - 8, 5, 9, 7, 1, 12, 2, 4, 11, 3
InOrder - 9, 5, 1, 7, 2, 12, 8, 4, 3, 11
PostOrder - 9, 1, 2, 12, 7, 5, 3, 11, 4, 8
LevelOrder - 8, 5, 4, 9, 7, 11, 1, 12, 3, 2
Function for Traversing:
void Preorder(BSTNode *root){
if(root==NULL) return;
cout<<"Data are : "<<root->data<<endl;
Preorder(root->left);
Preorder(root->right);
void Inorder(BSTNode *root){
if(root==NULL) return;
Inorder(root->left);
cout<<"Data are : "<<root->data<<endl;
Inorder(root->right);
void Postorder(BSTNode *root){
if(root==NULL) return;
Postorder(root->left);
Postorder(root->right);
cout<<"Data are : "<<root->data<<endl;
}
We consider a particular kind of a binary tree called a Binary Search Tree (BST). The basic idea
behind this data structure is to have such a storing repository that provides the efficient way of data
sorting, searching and retrieving.
A BST is a binary tree where nodes are ordered in the following way:
each node contains one key (also known as data)
the keys in the left subtree are less then the key in its parent node, in short L < P;
the keys in the right subtree are greater the key in its parent node, in short P < R;
duplicate keys are not allowed.
In the following tree all nodes in the left subtree of 10 have keys < 10 while all nodes in the right
subtree > 10. Because both the left and right subtrees of a BST are again search trees; the above
definition is recursively applied to all internal nodes:
Insertion
The insertion procedure is quite similar to searching. We start at the root and recursively go down
the tree searching for a location in a BST to insert a new node. If the element to be inserted is
already in the tree, we are done (we do not insert duplicates). The new node will always replace a
NULL reference.
Exercise. Given a sequence of numbers:
11, 6, 8, 19, 4, 10, 5, 17, 43, 49, 31
Draw a binary search tree by inserting the above numbers from left to right.
Function for Insertion:
BSTNode* getNewNode(int data){
BSTNode *newNode=new BSTNode;
newNode->data=data;
newNode->left=newNode->right=NULL;
return newNode;
BSTNode* InsertTree(BSTNode* root,int data){
if(root==NULL){
root=getNewNode(data);
return root;
if(data<=root->data){
root->left=InsertTree(root->left,data);
}else{
root->right=InsertTree(root->right,data);
return root;
Deletion
Deletion is somewhat more tricky than insertion. There are several cases to consider. A node to be
deleted (let us call it as toDelete)
• is not in a tree;
• is a leaf;
• has only one child;
• has two children.
If toDelete is not in the tree, there is nothing to delete. If toDelete node has only one child the
procedure of deletion is identical to deleting a node from a linked list - we just bypass that node
being deleted .
Deletion of an internal node with two children is less straightforward. If we delete such a node, we
split a tree into two subtrees and therefore, some children of the internal node won't be accessible
after deletion. In the picture below we delete 8:
Deletion starategy is the following: replace the node being deleted with the largest node in the left
subtree and then delete that largest node. By symmetry, the node being deleted can be swapped with
the smallest node is the right subtree.