KEMBAR78
Data Structures | PDF | Queue (Abstract Data Type) | Pointer (Computer Programming)
0% found this document useful (0 votes)
81 views158 pages

Data Structures

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views158 pages

Data Structures

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 158

1 CS3301-DATASTRUCTURES

SRI RAAJA RAAJAN COLLEGE OF


ENGINEERING AND TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3301-DATASTRUCTURES
(REGULATIONR 2021–III SEMESTER)
2 CS3301-DATASTRUCTURES

CS3301 DATA STRUCTURES LTPC 3003


OBJECTIVES:
 To understand the concepts of ADTs
 To Learn linear datastructures –lists, stacks, and queues
 To understand non-linear data structures –trees and graphs.
 To understand sorting, searching and hashing algorithms
 To apply Tree and Graph structures

UNITI LISTS 9
Abstract Data Types (ADTs) – List ADT – Array-based implementation – Linked list
implementation – Singly linked lists – Circularly linked lists – Doubly-linked lists –
Applications of lists – Polynomial ADT – Radix Sort – Multi lists.

UNITII STACKS AND QUEUES 9


Stack ADT – Operations – Applications – Balancing Symbols – Evaluating arithmetic
expressions- Infix to Postfix conversion – Function Calls – Queue ADT – Operations –
Circular Queue – DeQueue – Applications of Queues.

UNITIII TREES 9
Tree ADT – Tree Traversals - Binary Tree ADT – Expression trees – Binary Search Tree
ADT – AVL Trees – Priority Queue (Heaps) – Binary Heap.

UNITIV MULTIWAY SEARCH TREES AND GRAPHS 9


B-Tree – B+ Tree – Graph Definition – Representation of Graphs – Types of Graph -
Breadth-first traversal – Depth-first traversal –– Bi-connectivity – Euler circuits –Topological
Sort – Dijkstra's algorithm – Minimum Spanning Tree – Prim's algorithm – Kruskal's
algorithm

UNITV SEARCHING,SORTING AND HASHING TECHNIQUES 9


Searching – Linear Search – Binary Search. Sorting – Bubble sort– Selection sort – Insertion
sort – Shell sort –. Merge Sort – Hashing – Hash Functions – Separate Chaining – Open
Addressing – Rehashing – Extendible Hashing.
3 CS3301-DATASTRUCTURES

TOTAL: 45 PERIODS

OUTCOMES:
At the end of the course,the student should beable to:
 Define linear and non-linear data structures.
 Implement linear and non–linear data structure operations.
 Use appropriate linear/non–linear data structure operations for solving a given
problem.
 Apply appropriate graph algorithms for graph applications.
 Analyze the various searching and sorting algorithms.
TEXT BOOKS:
1. Mark AllenWeiss,“Data Structures and Algorithm Analysis in C”,2nd Edition, Pearson
Education,1997.
2. Kamthane, Introduction to Data Structures in C,1st Edition,Pearson Education,2007

REFERENCES:
1. Langsam,Augenstein andTanenbaum, Data Structures Using C and C++, 2nd Edition,
Pearson Education, 2015.
2. Thomas H.Cormen,Charles E.Leiserson,RonaldL.Rivest,CliffordStein, “Introduction
to Algorithms”, Second Edition, Mcgraw Hill, 2002.
3. Alfred V. Aho, Jeffrey D. Ullman,John E. Hopcroft ,Data Structures and Algorithms,
1st edition, Pearson, 2002.
4. Kruse,Data Structures and Program Design in C,2nd Edition,Pearson Education, 2006.
4 CS3301-DATASTRUCTURES

UNITI LISTS 9
Abstract Data Types (ADTs) – List ADT – Array-based implementation – Linked list
implementation – Singly linked lists – Circularly linked lists – Doubly-linked lists –
Applications of lists – Polynomial ADT – Radix Sort – Multilists.

Abstract Data Types


An abstract data type (ADT) is a set of operations. Abstract data types are
mathematical abstractions; ADT's defines how the set of operations is implemented.
This can be viewed as an extension of modular design.
Objects such as lists, sets, and graphs, along with their operations, can be
viewed as abstract data types, just as integers, reals, and booleans are data types.
Use of ADT
Reusability of the code, that the implementation of these operations is written
once in the program, and any other part of the program that needs to perform an
operation on the ADT can do so by calling the appropriate function.
The List ADT
A general list of the forma1,a2,a3,...........,an.We say that the size of this listis
n. We will call the special list of size 0 a null list. For any list except the null list,
wesay that ai+lfollows (or succeeds) ai (i < n) and that ai-1 precedes ai (i > 1).
The first element of the list is a1,and the last element is an.there is no
predecessor of a1 or the successor of an. The position of element ai in a list is i.
Some operations in list are,
1. Find,which returns the position of the first occurrence of a key;
2. Insert and delete, which generally insert and delete some key from some
position in the list;
3. Find_kth,which returns the element in some position(specified as an
argument).
Example:
1. If the list is 34,12,52,16,12,then find(52)might return3;
5 CS3301-DATASTRUCTURES

2. Insert(x,3)might make the list into 34,12,52,x,16,12(if we insert after


the position given);
3. Delete(3) might turn that list into34,12,x,16,12.
IMPLEMENTATION OF THE LIST
1. Array implementation
2. Linked listimplementation
Simple Array Implementation of Lists
Even if the array is dynamically allocated, an estimate of the maximum size of
the list is required. Usually this requires a high over-estimate, which wastes
considerable space. This could be a serious limitation, especially if there are manylists
of unknown size.
6 CS3301-DATASTRUCTURES
7 CS3301-DATASTRUCTURES
8 CS3301-DATASTRUCTURES
9 CS3301-DATASTRUCTURES
10 CS3301-DATASTRUCTURES
11 CS3301-DATASTRUCTURES

Other limitations are,


 Printing the list element and find to be carried out in linear time, whichis
as good as can be expected, and the find_kth operation takes constant
time.
 Insertion and deletion are expensive. Because the running time for
insertions and deletions is so slow and the list size must be known in
advance.
12 CS3301-DATASTRUCTURES

Linked Lists Implementation


In order to avoid the linear cost of insertion and deletion, we need to ensurethat
the list is not stored contiguously, since otherwise entire parts of the list will need to
be moved.
Definition:
The linked list consists of a series of structures, which are not necessarily
adjacent in memory. Each structure contains the element and a pointer to a
structure containing its successor. We call this the next pointer. The last cell's
next pointer is always NULL.
Structureoflinkedlist

P is declared to be a pointer to a structure, then the value stored in p is


interpreted as the location, in main memory, where a structure can be found.
A field of that structure can be accessed by p->field_name, where field_name
is the name of the field.
Consider the list contains five structures, which happen to reside in memory
locations 1000, 800, 712, 992, and 692 respectively. The next pointer in the first
structure has the value 800, which provides the indication of where the second
structure is located.
Actual representation of linked list with value and pointer

Deletion from a linked list

Insertion into a linked list


13 CS3301-DATASTRUCTURES

 The delete command can be executed in one pointer change.Above


diagram shows the result of deleting the third element in the original
list.
 The insert command requires obtaining a new cell from the system by
using an malloc call function and then changing two pointer.

ProgrammingDetails
 First,It is difficult to insert at the front of the list from the list given.
 Second, deleting from the front of the list is a special case, because it changes
the start of the list;
 A third problem concerns deletion in general. Although the pointer moves
above are simple, the deletion algorithm requires us to keep track of the cell
before the one that we want to delete.
In order to solve all three problems, we will keep a sentinel node, which is called as a
header or dummy node. (a header node contains the address of the first node in
the linked list)
Linkedlistwithaheader

If we use a header, then if we wish to delete the first element in the list, find_previous
will return the position of the header.
14 CS3301-DATASTRUCTURES

Type declarations for linked


liststypedefstructnode*node_ptr;
struct node
{
element_type element;
node_ptr next; };
typedef node_ptr LIST;
typedefnode_ptrposition;
Empty list with header

Function to test whether a linked list is empty


intis_empty(LISTL)
{
return(L->next==NULL); }
Function to test whether current position is the last in a linked list
intis_last(positionp,LISTL)
{
return(p->next==NULL);
}
Functiont of in the element in the list
Position find ( element_type x, LIST L )
{
positionp;
p=L->next;
while((p!=NULL)&&(p->element!=x)) p =
p->next;
returnp; }
15 CS3301-DATASTRUCTURES

Function to delete an element in the list


This routine will delete some element x in list L. We need to decide what to do
if x occurs more than once or not at all. Our routine deletes the first occurrence of x
anddoesnothingif xisnotinthelist. First we find p, which is the cell prior to the one
containing x, via a call to find_previous.
/*Delete from a list.Cell pointed to by p->nextiswipedout. */
/*Assume that the position is legal.Assume use of a header node.*/
Void delete( element_type x, LIST L )
{ positionp,tmp_cell;
p = find_previous( x, L );
if(p->next!=NULL)/*
{/*xisfound:deleteit*/ tmp_cell
= p->next;
p->next=tmp_cell->next;
free( tmp_cell ); } }
Functiont of in previous position of an element in the list
find_previous( element_type x, LIST L )
{
Position
p; p = L;
while((p->next!=NULL)&&(p->next->element!=x)) p =
p->next;
return p;
}
Function to insert an element in the list
Insertion routine will insert an element after the position implied by p. It is
quite possible to insert the new element into position p which means before the
element currently in position p.
16 CS3301-DATASTRUCTURES

Void insert ( element_type x, LIST L, position p )


{
positiontmp_cell;
tmp_cell=(position)malloc(sizeof(structnode));
if( tmp_cell == NULL )
fatal_error("Outofspace!!!"); else
{
tmp_cell->element = x;
tmp_cell->next=p->next;
p->next = tmp_cell;
}}
Function to delete the list
delete_list( LIST L )
{
positionp;
p=L->next;

L->next = NULL;
while(p!=NULL)
{
free(p);
p=p->next;
} }
Function to delete the list
/* correct way to delete a list*/
Voiddelete_list(LISTL)
{
positionp, tmp;
p=L->next;
17 CS3301-DATASTRUCTURES

L->next = NULL;
while(p!=NULL)
{
tmp=p->next;
free( p );
p=tmp;
} }
Doubly Linked Lists
A linked list is called as doubly when it has two pointers namely forward and
backward pointers. It is convenient to traverse lists both forward and backwards.
An extra field in the data structure, containing a pointer to the previous cell;
The cost of this is an extra link, which adds to the space requirement and also doubles
the cost of insertions and deletions because there are more pointers to fix.
Node

Adoublylinkedlist

Structuredeclaration
struct node
{
intElement;
structnode*FLINK;
structnode*BLINK;
}
18 CS3301-DATASTRUCTURES

Insertion

Insert(15,L,P)

Deletion:
19 CS3301-DATASTRUCTURES

Circularly Linked Lists


A linked list is called as circular when its last pointer point to the first cell inthe
linked list forms a circular fashion. It can be singly circular and doubly circular
with header or without header.
Singly Circular linked list:

Structure declaration:
struct node
{
Int Element;
Struct node*Next; }
20 CS3301-DATASTRUCTURES

Insert at beginning:

Void Insert_beg (intX,ListL)


Insert in middle:
Void insert_mid(intX,ListL,Position P)
21 CS3301-DATASTRUCTURES

Insert at Last
22 CS3301-DATASTRUCTURES

Deletion at first node:

Deletion at middle
23 CS3301-DATASTRUCTURES

Deletion at last:

Doubly Linked list


A doubly circular linked list is a doubly linked list in which forward link of the
last node points to the first node and backward link of first node points to the last
node of the list.

StructureDeclaration:
struct node
{
Int Element;
Struct node*FLINK;
struct node*BLINK;
}
24 CS3301-DATASTRUCTURES

Insertatbeginning:

Insert at Middle:
25 CS3301-DATASTRUCTURES

Insert at Last:
26 CS3301-DATASTRUCTURES

Deletion
Deleting First node
voiddele_first(ListL)

Deletion at middle:
Void dele_mid(intX,List L)
{ Position P, Temp;
P=Find Previous(X);
27 CS3301-DATASTRUCTURES

Deletion at Last node:

Application of linked list


Three applications that uses linked lists are,
1. The Polynomial ADT
2. Radix sort
3. Multilist
1) Polynomial ADT:
 To overcome the disadvantage of array implementation an alternative way is to
use a singly linked list.
 Each term in the polynomial is contained in one cell, and the cells are sorted in
decreasing order of exponents.
28 CS3301-DATASTRUCTURES

Coeff Exponent Next

Example:
P1:4X10+5X5+3
P2:10X6-5X2+2X

Strucuture declarations for Linked List Implementation of the polynomial ADT:

Struct link
{
int
coeff;
int pow;
struct link*next;
};struct link*poly1=NULL,*poly2=NULL,*poly=NULL;

Procedure to add two polynomials


Void poly add(structlink*poly1,structlink*poly2,structlink*poly)
{
while(poly1->next!=NULL&&poly2->next!=NULL)
{
if(poly1->pow>poly2->pow)
{
poly->pow=poly1->pow;
poly->coeff=poly1->coeff;
poly1=poly1->next; }
else if(poly1->pow < poly2->pow)
{
poly->pow=poly2->pow;
29 CS3301-DATASTRUCTURES

poly->coeff=poly2->coeff;
poly2=poly2->next;
}
else
{
poly->pow=poly1->pow;
poly->coeff=poly1->coeff+poly2->coeff;
poly1=poly1->next;
poly2=poly2->next;
}
poly->next=(structlink*)malloc(sizeof(structlink));
poly=poly->next;
poly->next=NULL;
}
if(poly1->next!=NULL)
{
poly->coeff=poly1->coeff;
poly->pow = poly1->pow;
poly->next=(structlink*)malloc(sizeof(structlink));
poly=poly->next;
poly->next=NULL;
}
else
{
poly->coeff=poly2->coeff;
poly->pow = poly2->pow;
poly->next=(structlink*)malloc(sizeof(structlink));
poly=poly->next;
poly->next=NULL;
}
}
30 CS3301-DATASTRUCTURES

Finally we get the polynomial C as

11 14 -3 10 2 8

2 8 10 6
SUBTRACTION OF TWO POLYNOMIAL
Void sub()
0 1
{
poly*ptr1,*ptr2,*newnode;
ptr1 = list1 ;
ptr2=list2;
while(ptr1!=NULL&&ptr2!=NULL)
{
new node=malloc(sizeof(Structpoly)); if
(ptr1 power = = ptr2 power)
{
newnode→coeff = (ptr1 coeff) - (ptr2 coeff);
newnode→power = ptr1 power;
newnode→next = NULL;
list3=create(list3,newnode); ptr1
= ptr1→next;
31 CS3301-DATASTRUCTURES

ptr2 = ptr2→next; }
else
{
if(ptr1→power>ptr2→power)
{
newnode→coeff = ptr1→coeff;
newnode→power=ptr1→power;
newnode→next = NULL;
list3=create(list3,newnode);
ptr1 = ptr1→next; }
else
{
newnode→coeff=-(ptr2→coeff);
newnode→power = ptr2→power;
newnode→next = NULL;
list3=create(list3,newnode);
ptr2= ptr2 next; } } }

POLYNOMIAL DIFFERENTIATION
Void diff()
{
poly*ptr1,*newnode;
ptr1 = list 1;
while(ptr1!=NULL)
{
newnode = malloc (sizeof (Struct poly));
newnode coeff = ptr1 coeff *ptr1 power;
newnode power = ptr1 power - 1;
newnode next = NULL;
list3=create(list3,newnode);
ptr1 = ptr1→next; } }
32 CS3301-DATASTRUCTURES

Radix Sort
A second example where linked lists are used is called radix sort. Radix sort is
also known as card sort.Because it was used,until the adven to modern computers,to
sort old-style punch cards.
If we have n integers in the range 1 to m (or 0 to m - 1) 9, we can use this
information to obtain a fast sort known as bucket sort. We keep an array called count,
of size m, which is initialized to zero. Thus, count has m cells (or buckets), which are
initially empty.
When ai is read, increment (by one) counts [ai]. After all the input is read, scan
the count array, printing out a representation of the sorted list. This algorithm takes
O(m + n); If m = (n), then bucket sort takes O(n) times.
Thefollowingexampleshowstheactionofradixsorton10numbers.Theinput is 64,
8, 216, 512, 27, 729, 0, 1, 343, and 125. The first step (Pass 1) bucket sorts by theleast
significant digit..The buckets are asshownin below figure, sothe list, sorted by least
significant digit, is 0, 1, 512, 343, 64, 125, 216, 27, 8, 729. These are now sorted by
the next least significant digit (the tens digit here)
Pass2 gives output 0,1,8,512,216,125,27,729,343,64.This list is now sorted
with respect to the two least significant digits. The final pass, shown in Figure 3.26,
bucket-sorts by most significant digit.
The final list is 0,1,8,27,64,125,216,343,512,and729.
The running time is O(p(n+b))where p is the number of passes,n is the number
of elements to sort, and b is the number of buckets. In our case, b = n.
Buckets after first step of radix sort
0 1 512 343 64 125 216 27 8 729
0 1 2 3 4 5 6 7 8 9
Buckets after the second pass of radix sort
8 216 729 343 64
1 512 27
0 125
0 1 2 3 4 5 6 7 8 9
33 CS3301-DATASTRUCTURES

Buckets after the last pass of radix sort


64 125 216 343 512 729
27
8
1
0
0 1 2 3 4 5 6 7 8 9

Multilists
A university with 40,000 students and 2,500 courses needs to be able to
generate two types of reports. The first report lists the class registration for each class,
and the second report lists, by student, the classes that each student is registered for.
If we use a two-dimensional array, such an array would have 100 million entries. The
average student registers for about three courses, so only 120,000 of these entries, or
roughly 0.1 percent, would actually have meaningful data.
To avoid the wastage of memory, a linked list can be used. We can use two link list
one contains the students in the class. Another linked list contains the classes the
student is registered for.
All lists use a header and are circular. To list all of the students in class C3, we
start at C3 and traverse its list . The first cell belongs to student S1.
Multilist implementation for registration problem
34 CS3301-DATASTRUCTURES

Linked List Implementation of Multilists:


 Multilists can be used to represent the above scenario.
o One list to represent each class containing the students in the class.
o One list to represent each student containing the classes the student is
registered for.
 All lists use a header and are circular.
 Tolistall the students is class C3:
o Start the traversal at C3 and traverse its list(by go in gright).
o The first cell belongs to student S1.
o The next cell belongs to student S3.By continuing this it is found that
student S4 and student S5 also belongs to the class C3.
 In a similar manner, for any student, all of the classes in which the student
isregistered can be determined.
 Advantage of Using Linked List:
o Saves memory space.
 Disadvantage of Using Linked List:
o Saves memory space only at the expense of time.
35 CS3301-DATASTRUCTURES

UNITII STACKSAND QUEUES 9


Stack ADT – Operations – Applications – Balancing Symbols – Evaluating arithmetic
expressions- Infix to Postfix conversion – Function Calls – Queue ADT – Operations –
Circular Queue – DeQueue – Applications of Queues.

The Stack
ADT Stack
Model
A stack is a list with the restriction that inserts and deletes can be performed in
only one position, namely the end of the list called the top. Stacks are sometimes
known as LIFO (last in, first out) lists.
Thefundamentaloperationsonastackare
1. Push-insert,
2. Pop-deletes,inserted element.
3. Top-display the top most element in the stack.
Errorconditions
Push on to the Full Stack and Pop or Top on an empty stack is generally
considered an error in the stack ADT.
Stack model : input to as tack is by Push,output is by Pop

The model depicted in above figure signifies that pushes are input
operationsand pops and tops are output.
Stack model:only the top element is accessible
36 CS3301-DATASTRUCTURES

Implementation of Stacks
A stack is a list,gives two popular implementations.
1. Array implementation
2. Linked list implementation

Linked List Implementation of Stacks


The first implementationof a stack uses a singlylinkedlist. We performa push
by inserting at the front of the list. We perform a pop by deleting the element at the
front of the list. A top operation merely examines the element at the front of the list,
returning its value. Sometimes the pop and top operations are combined into one.
Creating an empty stack is also simple. We merely create a header node;
make_null sets the next pointer to NULL.
Type declaration for linked list implementation of the stack ADT
Struct Node;
typedefstructnode*ptrToNode;
typedef ptrToNode Stack;
int IsEmpty(Stack S);
Stack Create
Satck(void);
37 CS3301-DATASTRUCTURES

void Dispose Stack(StackS);


void MakeEmpty(Stack S);
void Push(ElementTypeX,StackS);
Element Type Top (Stack S);
Void Pop(StackS);

Struct node
{
Element_type element;
Ptr To Node next;
};
Routine to test whether a stack is empty-linked list implementation
int is_empty(STACKS)
{
return(S->next==NULL);
}
Routine to create an empty stack-linked list implementation
This routine creates a Stack and return a pointer of the stack. Otherwise
returna warning to say Stack is not created.
STACKcreate_stack(void)
{
STACKS;
S=malloc(sizeof(structnode)); if( S
== NULL )
fatal_error("Outofspace!!!");
return S; }
38 CS3301-DATASTRUCTURES

Routine to make the stack as empty-linked list implementation


This routine makes Stack as empty and return NULL pointer.
Void make Empty(STACKS)
{
if(S==NULL)
error("Mustusecreate_stackfirst");
else
while(!IsEmpty(S))
pop(S); }
Routine to push on to as tack-linked list implementation
Void push( element_type x, STACK S )
{
node_ptrtmp_cell;
tmp_cell=(node_ptr)malloc(sizeof(structnode));
if( tmp_cell == NULL )
fatal_error("Outofspace!!!"); else
{
tmp_cell->element = x;
tmp_cell->next=S->next;
S->next=tmp_cell; } }
Routine to return to pelement in a stack—linked list implementation
This routine is to return the topmost element from the stack.
element_type top( STACK S )
{
if( is_empty( S ) )
error("Emptystack");
else
returnS->next->element;
}
39 CS3301-DATASTRUCTURES

Routine to pop from a stack—linked list implementation


Voidpop(STACKS)
{
PtrToNode first_cell;
if( is_empty( S ) )
error("Emptystack");
else
{
first_cell=S->next;
S->next=S->next->next;
free( first_cell );
} }
Array implementation of Stacks
An alternative implementation to avoid pointers is that by using an array
implementation. One problem here is that we need to declare an array size
ahead of time. Generally this is not a problem, if the actual number of elements
in the stack is knows in advance. It is usually easy to declare the array to be
large enough without wasting too much space.
Associated with each stack is the top of stack, tos, which is -1 for an empty
stack. To push some element x onto the stack, we increment tos and then set
STACK[tos] = x, where STACK is the array representing the actual stack.
Topop,we set the return value to STACK[tos] and then decrementtos.
Notice that these operations are performed in not only constant time, but
very fast constant time.
Errorchecking:
The efficiency of implementation in stacks is error testing. linked list
implementation carefully checked for errors.
A pop on an empty stack or a push on a full stack will overflow the array bounds
and cause a crash. Ensuring that this routines does not attempt to pop an empty stack
and Push onto the full stack.
40 CS3301-DATASTRUCTURES

ASTACKisdefinedasapointertoastructure.Thestructure
contains the top_of_stack and stack_size fields.
Once the maximum size is known,the stack array can be dynamically allocated.
Stack Declaration
Struct Stack Record
Type def struct
Stack Record*Stack; int
IsEmpty(Stack S);
Stack Create Stack(int Max Elements);
void Dispose Stack(Stack S);
Void Make Empty(StackS);
Void Push(Element TypeX,StackS);
Element Type Top (Stack S);
Void Pop(StackS);
Element Type Top and Pop(StackS);
struct StackRecord
{
Int Capacity;
int Top of
Satck;
ElementType*array;
};
#define EmptyTOS(-1) /*Signifies an empty stack*/
#define MinStackSize (5)
Routine to create an empty stack-Array implementation
This routine creates a Stack and return a pointer of the stack. Otherwise
returna warning to say Stack is not created.
Stack Create Stack(unsignedintMaxElements)
{
STACKS;
if(MaxElements<MinStackSize)
error("Stack size is too small");
S=(malloc(sizeof(structStackRecord) );
41 CS3301-DATASTRUCTURES

if(S==NULL)
fatal_error("Outofspace!!!");
S->Array=malloc(sizeof(ElementType)*MaxElements); if( S-
>Array == NULL )
fatalerror("Out of space!!!");
S->Capacity=MaxElements;
MakeEmpty(S);
return(S);}
Routine for freeing stack—array implementation
This routine frees or removes the Stack Structure itself by deleting the array
elements one by one.
Void dispose_stack(StackS)
{
if(S!=NULL)
{ free(S->Array);
free(S); } }
Routine to test whether a stack is empty—array implementation
This routine is to check whether stack is empty or not.
int IsEmpty( Stack S )
{
return(S->top_of_stack==EmptyTOS);}
Routine to create an empty stack—array implementation
Void MakeEmpty( STACK S )
{
S->top_of_stack=EMPTY_TOS;}
Routine to push on to a stack—array implementation
Void push(Element Type X,Stack S)
{ if(IsFull(S))
42 CS3301-DATASTRUCTURES

Error("Full stack");
else

S->Array[++S->TopofStack]=X; }
Routine to return top of stack—array implementation.
Element Type Top(StackS)
{
if(!IsEmpty(S))
returnS->Array[S->Top of Stack];
error("Empty stack");
return0;
}
Routine to pop from a stack—array implementation
Void pop(StackS)
{
if( IsEmpty( S ) )
error("Empty stack");
else
S->Top of Stack--;
}
Routine to give top element and pop a stack—array implementation
ElementType Top and Pop( Stack S )
{
if( IsEmpty( S ) )
error("Empty stack");
else
returnS->Array[S->Top of Stack--];
}
43 CS3301-DATASTRUCTURES

Stack Applications
1. Reversing of the string
2. Tower’s of Hanoi’s problem
3. Balancing Symbols
4. Conversion of Infix to postfix expression
5. Conversion of Infix to prefix expression
6. Evaluation of Postfix expression
7. Used in Function calls

Balancing Symbols
Compilers check your programs for syntax errors,but frequently a lack of one
symbol (such as a missing brace or comment starter) will cause the compiler to
Spill out a hundred lines of diagnostics without identifying the real error.
Thus, every right brace, bracket, and parenthesis must correspond to their
left counterparts.
The sequence [()] is legal, but [(]) is wrong. That it is easy to check these things. For
simplicity, we will just check for balancing of parentheses, brackets, and braces and
ignore any other character that appears.

The simple algorithm uses a stack and is as follows:


 Make an empty stack.
 Read characters until end of file.
 If the character is an open anything,push it on to the stack.
 If it is a close anything, then
 If the stack is empty report an error.
 Otherwise,pop the stack.
 If the symbol popped is not the corresponding opening symbol,then
report an error.
 At end of file,if the stack is not empty report an error.
44 CS3301-DATASTRUCTURES

Expression:
Expression is defined as a collection of operands and operators. The operators
can be arithmetic, logical or Boolean operators.
Rules for expression
 No two operand should be continuous
 No two operator should be continuous
Types of expression:
Based on the position of the operator,it is classified into three.
1. Infix Expression/ Standard notation
2. Prefix Expression/Polished notation
3. Postfix Expression/Reversed Polished notation
Infix Expression:
In an expression if the operator is placed in between the operands,then it is
called as Infix Expression.
Eg:A+B
Prefix Expression:
In an expression if the operator is placed before the operands, then it is
calledas Prefix Expression.
Eg:+AB
Postfix Expression:
In an expression if the operator is placed after the operands,then it is called as
Postfix Expression.
Eg:AB+

Conversion of infix to Post fix Expressions


Stack is used to convert an expression in standard form (otherwise known as
infix) into postfix. We will concentrate on a small version of the general problem by
allowing only the operators +, *, and (, ), and insisting on the usual precedence rules.
Supposewewanttoconverttheinfixexpression
45 CS3301-DATASTRUCTURES

a+b*c+(d*e+f) *g .
A correct answer is abc*+de*f +g*+.
Algorithm:
1. We start with an initially empty stack
2. When an operand is read,it is immediately placed on to the output.
3. Operators are not immediately placed onto the output, so they must be saved
somewhere. The correct thing to do is to place operators that have been seen,
but not placed on the output, onto the stack. We will also stack left parentheses
when they are encountered.
4. If we see a right parenthesis, then we pop the stack, writing symbols until we
encounter a (corresponding) left parenthesis, which is popped but not output.
5. If we see any other symbol ('+','*', '(' ), then we pop entries from the stack until
we find an entry of lower priority. One exception is that we never remove a '('
from the stack except when processing a ')'. For the purposes of this operation,
'+' has lowest priority and '(' highest. When the popping is done, we push the
operand onto the stack.
6. Finally, if we read the end of input, we pop the stack until it is empty, writing
symbols onto the output.

To see how this algorithm performs, we will convert the infix expression
intoits postfix form.
a+b*c+(d*e +f)*g
First, the symbol a is read, so it is passed through to the output. Then '+' is read and
pushed ontothestack. Next bisreadand passed throughtotheoutput.Then the stack will
be as follows.

Next a '*' is read. The top entry on the operator stack has lower precedence than '*', so
nothing is output and '*' is put on the stack. Next, c is read and output.
46 CS3301-DATASTRUCTURES

The next symbol isa '+'.Checking the stack, we findthatwe willpopa '*'and place it on
the output, pop the other '+', which is not of lower but equal priority, on the stack, and
then push the '+'.

The next symbol read is an '(', which, being of highest precedence, is placed on the
stack. Then d is read and output.

We continue by reading a '*'. Since open parentheses do not get removed except when
a closed parenthesis is being processed, there is no output. Next, e is read and output.

The next symbol read is a '+'. We pop and output '*' and then push '+'. Then we read
and output f.

.
Nowwereada')',so the stack is emptied back to the'('.We output a'+'0n to the stack.

We read a'*'next;it is pushed on to the stack.Then g is read and output.


47 CS3301-DATASTRUCTURES

As before, this conversion requires only O(n) time and works in one pass
through the input. We can add subtraction and division to this repertoire by assigning
subtraction and addition equal priority and multiplication and division equal priority.
Asubtlepointisthattheexpressiona - b- c willbeconvertedtoab - c-and not abc - -.
Our algorithm does the right thing, because these operators associate from left to right.
This is not necessarily the case in general, since exponentiation associates right to left:
223 = 28 = 256 not 43 = 64.

Evaluation of a Postfix Expression


Algorithm:
When a number is seen,it is pushed on to the stack;
When an operator is seen, the operator is applied to the two numbers (symbols)
that are popped from the stackand the result is pushed onto the stack.
For example,the postfix expression 6523+8*+3+*isevaluatedas follows:
The first four symbols are placed on the stack.The resulting stack is
Top of Stack 3
2
5
6

Next a'+'is read,so 3 and 2 are popped from the stack and their sum,5,is pushed.
Top of Stack 5
5
6
48 CS3301-DATASTRUCTURES

Next 8 is pushed.
Top of Stack 8
5
5
6
8*5=40 is pushed.
Top of Stack 40
5
6
40+5=45 is pushed.
Top of Stack 45
6
Now,3 is pushed.
Top of Stack 3
45
6
Next'+'pop s3 and 45 and pushes 45+3=48.
Top of Stack 48
6
Finally,a'*'is seen and 48 and 6 are popped,the result 6*48=288 is pushed.
TopofStack 288

The time to evaluate a postfix expression is O(n), because processing each


element in the input consists of stack operations and thus takes constant time. The
algorithm to do so is very simple.
Advantage of postfix expression:
When an expression is given in postfix notation, there is no need to know
any precedence rules;
49 CS3301-DATASTRUCTURES

Function Calls
 When a call is made to a new function, all the variables local to the calling
routine need to be saved by the system. Otherwise the new function will
overwrite the calling routine's variables.
 The current location in the routine must be saved so that the new function
knows where to go after it is done.
 The reason that this problem is similar to balancing symbols is that a function
call and function return are essentially the same as an open parenthesis and
closed parenthesis, so the same ideas should work.
 When there is a function call, all the important information that needs to be
saved, such as register values (corresponding to variable names) and the return
address is saved "on a piece of paper" in an abstract way and put at the top of a
pile.Thenthecontrolistransferredtothenewfunction, whichisfreetoreplace the
registers with its values.
 If it makes other function calls, it follows the same procedure. When the
function wants to return, it looks at the "paper" at the top of the pile andrestores
all the registers. It then makes the return jump.
 Theinformationsavediscalledeitheranactivationrecordorstackframe.
 There is always the possibility that you will run out of stack space by having
toomanysimultaneouslyactivefunctions.Runningoutofstackspaceisalways a fatal
error.
 In normal events, you should not run out of stack space; doing so is usually an
indication of runaway recursion. On the other hand, some perfectly legal and
seemingly innocuous program can cause you to run out of stack space.
A bad use of recursion:printing a linked list
void/*Not using a header*/
print_list( LIST L )
{if(L!=NULL)
{
print_element(L->element);
print_list(L->next);} }
50 CS3301-DATASTRUCTURES

 Activation records are typically large because of all the information they
contain, so this program is likely to run out of stack space. This program is an
example of an extremely bad use of recursion known as tail recursion. Tail
recursion refers to a recursive call at the last line.
 Tail recursion can be mechanically eliminated by changing the recursive call
to a goto receded by one assignment per function argument.
 This simulates the recursive call because nothing needs to be saved -- after the
recursive call finishes, there is really no need to know the saved values.
Because of this, we can just go to the top of the function with the values that
would have been used in a recursive call.
The below program is the improved version. Removal of tail recursion is so simple
that some compilers do it automatically.
Printing a list without recursion
Void print_list(LISTL)/*No header */
{
top:
if(L!=NULL)
{
print_element(L->element); L
= L->next;
gototop;
}
}
Recursion can always be completely removed. But doing so can be quite tedious. The
non-recursive programs are generally faster than recursive programs; the speed
advantage rarely justifies the lack of clarity that results from removing the recursion.
51 CS3301-DATASTRUCTURES

The Queue ADT


Queue is also a list in which insertion is done at one end, whereas deletion is
performed at the other end. Insertion will be at rear end of the queue and deletion will
be at front of the queue.It is also called as FIFO( First In Firs tOut)which means the
element which inserted first will be removed first from the queue.
Queue Model
The basic operations on a queue are
1. enqueue,which insert san element at the end of the list(called the rear)
2. dequeue, which deletes (and returns) the element at the start of the
list(known as the front).
Abstract model of a queue

Array Implementation of Queues


 Like stacks, both the linked list and array implementations give fast O(1)
running times for every operation. The linked list implementation is
straightforward and left as an exercise. We will now discuss an array
implementation of queues.
 For each queue data structure, we keep an array, QUEUE[], and the positions
q_front and q_rear, which represent the ends of the queue. We also keep track
of the number of elements that are actually in the queue, q_size.
The following figures hows a queue in some intermediate state.

 By the way, the cells that are blanks have undefined values in them. In
particular, the first two cells have elements that used to be in the queue.
52 CS3301-DATASTRUCTURES

 To enqueue an element x, we increment q_size and q_rear, then set


QUEUE[q_rear] = x.
 To dequeue an element, we set the return value to QUEUE[q_front], decrement
q_size,and then increment q_front.. After10enqueues,the queue appears to be
full, since q_front is now 10, and the next enqueue would be in a nonexistent
position.
 However, there might only be a few elements in the queue, because several
elements may have already been dequeued.
 The simple solution is that whenever q_front or q_rear gets to the end of the
array,it is wrappedaroundto the beginning. Thisis knownas a circular array
implementation.
There are two warnings about the circular array implementation of queues.
 First, it is important to check the queue for emptiness, because a dequeue when
the queue is empty will return an undefined value.
 Secondly, some programmers use different ways of representing the front and
rearofaqueue.Forinstance,some do not use an entry to keep track of the size,
because they rely on the base case that when the queue is empty, q_rear =
q_front - 1.
If the size is not part of the structure,then if the array size is A_SIZE, the queue is full
when there are A_SIZE -1 elements.
In applications where you are sure that the number of enqueues is not larger
than the size of the queue, the wraparound is not necessary.
The routine queue_create and queue_dispose routines also need to be
provided. We also provide routines to test whether a queue is empty and to make an
empty queue.
Notice that q_rear is preinitialized to 1 before q_front. The final operation we
will write is the enqueue routine.
53 CS3301-DATASTRUCTURES

Type declarations for queue—array implementation


Struct Queue Record
{
int Capacity;

int Front;
int Rear;
int Size;/*Current#of elements in Q*/ Element
Type *Array;
};
Typedef struct Queue Record*Queue;
Routine to test whether a queue is empty-array implementation
Int is empty(QueueQ)
{
return(Q->q_size==0 ); }
Routine to make an empty queue-array implementation
Void make empty(QueueQ)
{
Q->size=0;
Q->Front=-1;
Q->Rear=-1; }
Routines to enqueue-array implementation
Static int succ(int value,QueueQ)
{
if(++value==Q->Capacity) value
= 0;
return value;}
Void enqueue(Element typex,QueueQ)
{
if( isfull( Q ) )
error("Full queue");
54 CS3301-DATASTRUCTURES

else
{
Q->Size++;
Q->Rear=succ(Q->Rear,Q);
Q->Array[ Q->Rear ] = x;
} }
Applications of Queues
The applications are,
1. When jobs are submitted to a printer,they are arranged in order of arrival. Then
jobs sent to a line printer are placed on a queue.
2. Lines at ticket counters are queues,because service is first-comefirst-served.
3. Another example concerns computer networks. There are many network setups
of personal computers in which the disk is attached to one machine, known as
the file server.
4. Users on other machines are given access to files on a first-come first-served
basis, so the data structure is a queue.
Circular Queue:
In Circular Queue, the insertion of a new element is performed at the very first
locations of the queue if the last location of the queue is full, in which the first
element comes after the last element.
Advantages:
Itovercomestheproblemofunutilizedspaceinlinearqueue,whenitis implemented
as arrays.
55 CS3301-DATASTRUCTURES

To perform the insertion of the element to the queue, the position of the
element is calculated as rear=(rear+1)%queue_sizeandsetQ[rear]=value.
Similarly the element deleted from the queue using front = (front + 1) %
queue_size.
Enqueue:

Dequeue:
void CQ_dequeue( )
{
If(front==-1&&rear==-1)
Print(“Queue is empty”);
Else
{
Temp=CQueue[front];
If(front==rear)
Front=rear=-1;
Else
Front=(front+1)%maxsize;
}}
56 CS3301-DATASTRUCTURES

Priority Queue:
In an priority queue, an element with high priority is served before an element
with lower priority.
If two elements with the same priority, they are served according to their order
in the queue.
Two types of priority Queue.
57 CS3301-DATASTRUCTURES
58 CS3301-DATASTRUCTURES
59 CS3301-DATASTRUCTURES
60 CS3301-DATASTRUCTURES
61 CS3301-DATASTRUCTURES

UNIT III TREES 9


Tree ADT–Tree Traversals – Binary Tree ADT–Expression trees–Binary Search Tree
ADT – AVL Trees – Priority Queue (Heaps) – Binary Heap.

TREES
Tree is a Non-Linear data structure in which data are stored in a hierarchal manner.It is also
defined as a collection of nodes. The collection can be empty. Otherwise, a tree consists of a
distinguished node r, called the root, and zero or more (sub) trees T1, T2, . . . , Tk, each of
whose roots are connected by a directed edge to r.

The root of each subtree is said to be a child of r, and r is the parent of each subtree
root.Atree is acollection ofn nodes, one ofwhich is theroot,and n - 1 edges. That thereare n - 1
edges follows from the fact that each edge connects some node to its parent and every node
except the root has one parent
Generic tree

A tree

Terms in Tree
In the tree above figure,the root is A.
 Node F has A as aparent and K, L, andMas children.
 Each node may have an arbitrary number of children,possibly zero.
62 CS3301-DATASTRUCTURES

 Nodes with no children are known as leaves;


 The leaves in the tree above are B, C, H,I,P, Q,K,L,M, and N.
 Nodes with the same parent are siblings; thus K,L,and M are all siblings.
Grand parent and grand child relations can be defined in a similar manner.
 Apath from noden1 to nk is defined as a sequence of nodes n1, n2, . . . , nk such that
ni is the parent of ni+1 for 1 i < k.
 The length of this path is the number of edges on the path,namely k-1.
 Thereisa pathof length zero from every node to itself.
 For any node ni,the depth of n iist the length of the unique path from the
root to ni. Thus, the root is at depth 0.
 The height of ni is the longest path from ni to a leaf. Thus all leaves are at height 0.
 The height of a tree is equal to the height of the root.
Example:For the above tree,
E is at depth 1 and height2;
Fis at depth1 and height1;the height of the treeis 3. T
Note:
 The depth of a tree is equal to the depth of the deepest leaf; this is always
equal to the height of the tree.
 If there is a path from n1 to n2, then n1 is an ancestor of n2 and n2 is a
descendant of n1. If n1 n2, then n1 is a proper ancestor of n2 and n2 is a
proper descendant of n1.
 A tree there is exactly one path from the root to each node.

Types of the Tree


Based on the no. of children for each node in the tree, it is classified into two to types.
1. Binary tree
2. General tree
Binary tree
In a tree,each and every node has a maximum of two children.It can be empty,
one or two. Then it is called as Binary tree.

Eg:
63 CS3301-DATASTRUCTURES

GeneralTree

In a tree,node can have any no of children. Then it is called a sgeneral Tree.


Eg:

Implementation of Trees
Tree can be implemented by two methods.
1. Array Implementation
2. LinkedList implementation
Apart from these two methods, it can also be represented by First Child and
Next sibling Representation.
One way to implement a tree would be to have in each node,be sides its data,a pointer
to each child of the node.However,since the number of children pe rnode can vary so greatly
and is not known in advance, it might be infeasible to make the children direct links in the
data structure, because there would be too much wasted space. The solution is simple: Keep
the children of each node in a linked list of tree nodes.
Node declarations for trees
Typedef structtree_node*tree_ptr;
struct tree_node
{
element_typeelement;
tree_ptr first_child;
tree_ptr next_sibling;
};
64 CS3301-DATASTRUCTURES

First child/next sibling representation of the tree shown in the below Figure

Arrows that point downward are first_child pointers. Arrows that go left to right are
next_sibling pointers. Null pointers are not drawn, because there are too many. In the above
tree, node E has both a pointer to a sibling (F) and a pointer to a child (I), while some nodes
have neither.
Tree Traversals
Visiting of each and every node in a tree exactly only once is called as Tree
traversals. Here Left subtree and right subtree are traversed recursively.
Types of Tree Traversal:
1. Inorder Traversal
2. Preorder Traversal
3. Postorder
Traversal Inorder
traversal:
Rules:
 Traverse Left subtree recursively
 Process the node
 Traverse Right sub tree recursively

Eg
Inorder traversal: a+ b*c+ d*e+ f*g.
Preorder traversal:
Rules:
 Processthenode
 TraverseLeftsubtree recursively
 TraverseRightsubtree recursively
Preorder traversal:++a*b c*+*def g
65 CS3301-DATASTRUCTURES

Post order traversal


Rules:
 Traverse Left sub tree recursively
 Traverse Right sub tree recursively
 Process the node
Postorder traversal:ab c*+de*f+g* +

Tree Traversals with an Application


There are many applications for trees.Most important two applications are,
1. Listing a directory in a hierarchical file system
2. Calculating the size of a directory

1. Listing a directory in a hierarchical file system


One of the popular uses is the directory structure in many common operating
systems, including UNIX, VAX/VMS, and DOS.
Typical directories in the UNIX file system (UNIX directory)

 The root of this directory is /usr. (The asterisk next to the name indicates that /usr
isitself a directory.)
 /usr has three children,mark,alex,and bill,which are themselves directories.Thus,
/usr contains three directories and no regular files.
 The filename /usr/mark/book/ch1.r is obtained by following the leftmost child three
times. Each / after the first indicates an edge; the result is the full pathname.
 Two files in different directories can share the same name,because they must have
different paths from the root and thus have different pathnames.
66 CS3301-DATASTRUCTURES

 A directory in the UNIX file system is just a file with a list of all its children, so the
directories are structured almost exactly in accordance with the type declaration.
 Each directory in the UNIX file system also has one entry that points to itself and
anotherentrythat point to theparent ofthedirectory.Thus, technically, theUNIXfile
system is not a tree, but is treelike.
Routine to list a directoryin a hierarchical file system
voidlist_directory(Directory_or_fileD)
{
list_dir (D, 0 ); }
Voidlist_dir (Directory_or_fileD,unsigned intdepth )
{
if (D is alegitimateentry)
{
print_name(depth,D); if(
D is a directory )
for each child,c,ofD list_dir(
c, depth+1 );
} }
The logic of the algorithm is as follow.
 The argument to list_dir is some sort of pointer into the tree. As long as the pointer is
valid, the name implied by the pointer is printed out with the appropriate number of
tabs.
 If the entry is a directory, then we process all children recursively, one by one. These
children are one level deeper, and thus need to be indenting an extra space.
This traversal strategy is known as a preorder traversal. In a preorder traversal, work at a
node is performed before (pre) its children are processed. If there are n file names to be
output, then the running time is O (n).
The(preorder)directorylisting
/usr
mark

book
chr1.c
chr2.c
chr3.c
67 CS3301-DATASTRUCTURES

course
cop3530
fall88
syl.r
spr89
syl.r
sum89
syl.r
junk.c
alex
junk.c
bill
work
course
cop3212
fall88
grades
prog1.r
prog2.r
fall89
prog1.r
prog2.r
grades

2. Calculating the size of adirectory


As above UNIX Directory Structure, the numbers in parentheses representing the
number of disk blocks taken up by each file, since the directories are themselves files, they
have sizes too. Suppose we would like to calculate the total number of blocks used by all the
files in the tree. Here the work at a node is performed after its children are evaluated. So it
follows Postorder traversal.
The most natural way to do this would be to find the number of blocks contained in the
subdirectories /usr/mark (30), /usr/alex (9), and /usr/bill (32). The total number of blocks is
then the total in the subdirectories (71) plus the one block used by /usr, for a total of 72.
Routine to calculate the size of a directory unsigned
68 CS3301-DATASTRUCTURES

Int size_directory(Directory_or_fileD)
{
Unsigned int total_size;
total_size = 0;
if(D is a legitimate entry)
{
total_size=file_size(D);
if( D is a directory )
for each child, c,ofD
total_size+=size_directory(c);
}
return(total_size);
}
Size of the UNIX Directory
ch1.r 3
ch2.r 2
ch3.r 4
book 10
syl.r 1
fall88 2
syl.r 5
spr89 6
syl.r 2
sum89 3
cop3530 12
course 13
junk.c 6
mark 30
junk.c 8
alex 9
work 1
Grades 3
prog1.r 4
prog2.r 1
69 CS3301-DATASTRUCTURES

fall88 9
prog2.r 2
prog1.r 7
grades9
fall89 19
cop3212 29
course 30
bill 32
/usr 72

If D is not a directory,then size_directory merely returns the number of blocks


used by D. Otherwise, the number of blocks used by D is added to the number of
blocks (recursively) found in all of the children.
.
Binary Trees
A binary tree is a tree in which no node can have more than two children.

Figures hows that a binary tree consists of a root and two sub trees,Tl and
Tr, both of which could possibly be empty.
Worst-case binary tree

Implementation
A binary tree has at most two children; we can keep direct pointers to them. The
declaration of tree nodes is similar in structure to that for doubly linked lists, in that a node is
a structure consisting of the key information plus two pointers (left and right) to other nodes.
70 CS3301-DATASTRUCTURES

Binary tree node declarations


typedefstructtree_node*tree_ptr;
struct tree_node
{
element_typeelement;
tree_ptr left;
tree_ptrright;
};
typedeftree_ptrTREE;

Expression Trees
When an expression is represented in a binary tree, then it is called as an expression Tree.
The leaves of an expression tree are operands, such as constants or variable names, and the
other nodes contain operators. It is possible for nodes to have more than two children. It is
also possible for a node to have only one child, as is the case with the unary minus operator.
We can evaluate an expression tree,T,by applying the operator at the root to the values
obtained by recursively evaluating the left and right subtrees.
In our example, the left subtree evaluates to a + (b * c) and the right subtree evaluates
to ((d *e) + f ) *g. The entire tree therefore represents (a + (b*c)) + (((d * e) + f)* g).
We can produce an(overly parenthesized)infix expression by recursively
producing a parenthesized left expression, then printing out the operator at the
root, and finally recursively producing a parenthesized right expression. This
General strattegy(left,node,right)is known as an inorder traversal;it gives Infix Expression.
Analternatetraversal strategyisto recursivelyprintout theleftsubtree, the
right subtree, and then the operator. If we apply this strategy to our tree above, the output is a
b c * + d e * f + g * +, which is called as postfix Expression. This traversal strategy is
generally known as a postorder traversal.
A third traversal strategy is to print out the operator first and then recursively print out
the left and right subtrees. The resulting expression, ++a* b c* +* d efg, is theless useful
prefix notation and the traversal strategy is a preorder traversal
Expression tree for(a +b* c) +((d *e+ f ) * g)
71 CS3301-DATASTRUCTURES

Constructing an Expression Tree


Algorithm to convert a postfix expression in to an expression tree
1. Read the postfix expression one symbol at atime.
2. If the symbol is an operand, then
a. We create a one node tree and push a pointer to it on to a stack.
3. If the symbol is an operator,
a. We pop pointers to two treesT1andT2 from the stack(T1 is popped first)and
form a new tree whose root is the operator and whose left and right children
point to T2 and T1 respectively.
4. A pointer both is new tree is then pushed on to the
stack. Suppose the input is
a b + cd e+ * *

The first two symbols are operands, so we create one-node trees and push pointers to
them onto a stack.

Next,a'+'is read,so two pointers to trees are popped,an ew tree is formed,and a pointer to it is
pushed onto the stack.
72 CS3301-DATASTRUCTURES

Finally,the last symbol is read,two trees are merged,and a pointer to the final
tree is left on the stack.

The SearchTree ADT-Binary Search Tree


The property that makes a binary tree into a binary search tree is that for every
node, X, in the tree, the values of all the keys in the left subtree are smaller than the key
value in X, and the values of all the keys in the right subtree are larger than the key
value in X.
Notice that this implies that all the elements in the tree can be ordered in some
consistent manner.

In the above figure, thetreeon theleftis abinary search tree, but thetree on theright is
not. The tree on the right has a node with key 7 in the left subtree of a node with key 6.The
average depth of a binary search tree is O(log n).
Binary search tree declarations
typedefstructtree_node*tree_ptr;
struct tree_node
{
element_typeelement;
tree_ptr left;
tree_ptrright;
73 CS3301-DATASTRUCTURES

};
typedeftree_ptrSEARCH_TREE;
Make Empty:
This operation is mainly for initialization. Some programmers prefer to initialize the
first element as a one-node tree, but our implementation follows the recursive definition of
trees more closely.
Find
This operation generally requires returning a pointer to the node in tree T that has key
x, or NULL if there is no such node. The structure of the tree makes this simple. If T is , then
we can just return . Otherwise, if the key stored at T is x, we can return T. Otherwise, we
make a recursive call on a subtree of T, either left or right, depending on the relationship of x
to the key stored in T.
Routine to make an empty tree
Search Tree make empty(search treeT)
{
if(T!=NULL)
{
Makeempty(T->left);
Makeempty(T->Right);
Free(T);
}
returnNULL;}
Routine for Find operation
Position find(Element type X,Search TreeT)
{
if(T== NULL )
returnNULL;
if( x < T->element )
return(find(x, T->left ));
else
if(x>T->element )
return(find(x, T->right ) );
else

returnT;
}
Find Min & Find Max:
To perform a findmin, start at the root and go left as long as there is a left child. The
74 CS3301-DATASTRUCTURES

stopping point is the smallest element.


The find max routine is the same, except that branching is to the right child.

Recursive implementation of Find min for binary search trees


Position find min(SearchTreeT )
{
if(T== NULL )
returnNULL;
else
if(T->left==NULL)
return( T );
else
return(findmin(T->left));
}
Recursive implementation of Find Max for binary search trees
Position find max(SearchTreeT )
{
if(T== NULL )
returnNULL;
else
if(T->Right==NULL)
return( T );
else
return(findmax( T->right ) );
}
Nonrecursive implementation of Find Min for binary search trees
Position find min(SearchTreeT )
{
if(T !=NULL )
while(T->left!=NULL)
T=T->left;
return(T);
}
Nonrecursive implementation of Find Max for binary search trees
Position find max(SearchTreeT )
{
75 CS3301-DATASTRUCTURES

Insert
76 CS3301-DATASTRUCTURES

if(T !=NULL )
while(T->right!=NULL) T=T-
>right;
return(T); }

insert-insertx at thelast spot onthepath traversed.

Toinsert 5, we traverse the tree as though a find were occurring. At the node with key
4, we need to go right, but there is no subtree, so 5 is not in the tree, and this is the correct
spot.
Insertion routine
Since T points to the root of the tree, and the root changes on the first insertion, insert
is written as a function that returns a pointer to the root of the new tree.

Search Tree insert(elementtypex,SearchTreeT )


{
if(T== NULL )
{
T=(SEARCH_TREE)malloc(sizeof(structtree_node)); if( T
== NULL )
fatal_error("Outofspace!!!");
else
{
T->element=x;
T->left= T->right =NULL;}
}
else
if(x<T->element )
T->left=insert(x,T->left);
else
if(x>T->element )
77 CS3301-DATASTRUCTURES

T->right=insert( x, T->right );
/*elsexisinthetreealready.We'lldonothing*/ return T;}

Delete
if a node with two children. The general strategy is to replace the key of this nodewith
the smallest key of the right subtree and recursively delete that node. Because the smallest
node in the right subtree cannot have a left child, the second
deleteis aneasy one.

The node to be deleted is the left child of the root; the key value is 2. It is replaced
with the smallest key in its right subtree (3), and then that node is deleted as before.
Deletion of a node(4) with one child,before and after

Deletion of a node(2) with two children,before and after

Ifthe number of deletionsis expected to besmall,then apopularstrategy to


use is lazy deletion: When an element is to be deleted, it is left in the tree and merely marked
as being deleted.
78 CS3301-DATASTRUCTURES

Deletion routine for binary search trees


Search tree delete(element typex,searchtreeT )
{
Position tmpcell;
if(T== NULL )
error("Elementnotfound");
else
if(x<T->element)/*Goleft*/ T-
>left = delete( x, T->left ); else
if(x>T->element)/*Goright*/ T-
>right = delete( x, T->right );
else/*Foundelement tobedeleted */
if(T->left&&T->right) /*Twochildren*/
{
tmp_cell = find_min( T->right );
T->element=tmp_cell->element;
T->right=delete(T->element,T->right);
}
else/* Onechild */
{
tmpcell=T;
if(T->left==NULL)/*Only a rightchild*/ T=
T->right;
if(T->right==NULL)/*Only a left child*/ T =
T->left;
free(tmpcell);
}
returnT;
}

Average-CaseAnalysisofBST
 All of the operations of the previous section, except makeempty, should take O(log n)
time, because in constant time we descend a level in the tree, thus operating on a tree
that is now roughly half as large.
 The running time of all the operations, except makeempty is O(d), where d is the
depth of the node containing the accessed key.
 Theaveragedepth overall nodes in atreeis O(logn).
 Thesum ofthedepthsofall nodesin atreeis knownas theinternal path length.
79 CS3301-DATASTRUCTURES

AVLTrees
The balance condition and allow the tree to be arbitrarily deep, but after every
operation, a restructuring rule is applied that tends to make future operations efficient. These
types of data structures are generally classified as self-adjusting.
An AVL tree is identical to a binary search tree, except that for every node in thetree,
the height of the left and right subtrees can differ by at most 1. (The height of an empty tree is
defined to be -1.)
An AVL (Adelson-Velskii and Landis) tree is a binary search tree with a balance
condition. Thesimplest ideais to requirethat the left and right subtrees havethesame height.
The balance condition must be easy to maintain, and it ensures that the depth of the tree is
O(log n).

Theabove figureshows,a bad binary tree. Requiring balance at the root is not enough.

InFigure,the tree on the leftis anAVLtree, butthetreeon the rightis not.


Thus,all thetreeoperationscan beperformed inO(log n)time, exceptpossibly insertion.
When we do an insertion, we need to update all the balancing information for the
nodes on the path back to the root, but the reason that insertion is difficult is that inserting a
node could violate the AVL tree property.
Inserting a node into the AVL tree would destroy the balance condition.
Let us call the unbalanced nodeα.Violation due to insertion migh to ccur in four
cases:

1. An insertion into the left sub tree of the left child ofα
2. An insertion into the right sub tree of the left child of α
3. An insertion into the left subtree of the right child of α
4. An insertion into the right subtree of the right child of α
80 CS3301-DATASTRUCTURES

Violation of AVL property due to insertion can be avoided by doing some


modification on the node α. This modification process is called as Rotation.
Types of rotation
1. SingleRotation
2. DoubleRotation
Single Rotation (case1)–Single rotate with Left

The two trees in the above Figure contain the same elements and are both binary
search trees.
First of all, in both trees k1 < k2. Second, all elements in the subtree X are smaller
than k1 in both trees. Third, all elements in subtree Z are larger than k2. Finally, all elements
in subtree Y are in between k1 and k2. The conversion of one of the above trees to the otheris
known as a rotation.
In an AVL tree, if an insertion causes some node in an AVL tree to lose the balance
property: Do a rotation at that node.
The basic algorithm is to start at the node inserted and travel up the tree, updating thebalance
information at every node on the path.

In the above figure, after the insertion of the in the original AVL tree on the left, node 8
becomesunbalanced. Thus, wedo asingle rotation between 7 and 8, obtaining thetreeon the
right.
81 CS3301-DATASTRUCTURES

Routine:
Static position Single rotate with left(PositionK2)
{
Position k1;
K1=k2->left;
K2->left=k1->right;
K1->right=k2;
K2->height=max(height(k2->left),height(k2->right));
K1->height=max(height(k1->left),k2->height);Return
k1;
}

Single Rotation(case4)–Single rotate with Right


(Refer diagram from Class note)
Suppose we start with an initially empty AVL tree and insert the keys 1 through 7 in
sequential order. The first problem occurs when it is time to insert key 3, because the AVL
property is violated at the root. We perform a single rotation between the root and its right
child to fix the problem. The tree is shown in the following figure, before and after the
rotation.

A dashed line indicates the two nodes that are the subject of the rotation. Next, we insert the
key 4, which causes no problems, but the insertion of5 creates a violation at node 3, which is
fixed by a single rotation.
Next, we insert 6. This causes a balance problem for the root, since its left subtree
isofheight0,and its right subtreewouldbe height 2.Therefore, weperform asinglerotationat the
root between 2 and 4.
The rotation is performed by making 2 a child of 4 and making 4's original left subtree
the new right subtree of 2. Every key in this subtree must lie between 2 and 4, so this
transformation makes sense. The next key we insert is 7, which causes another rotation.
82 CS3301-DATASTRUCTURES

Routine:
Static position Single rotate with right(PositionK1)
{
Position k2;
K2=k1->right;
K1->right=k2->left;
K2->left=k1;
K1->height=max(height(k1->left),height(k1->right));
K2->height=max(height(k2->left),k1->height);Return
k2;
}
83 CS3301-DATASTRUCTURES

DoubleRotation
(Right-left)double rotation

(Left-right)double rotation

In the above diagram, suppose we insert keys 8 through 15 in reverse order. Inserting 15 is
easy, since it does not destroy the balance property, but inserting 14 causes a height
imbalance at node 7.
As the diagram shows, the single rotation has not fixed the height imbalance. The problem is
that the height imbalance was caused by a node inserted into the tree containing the middle
elements (tree Y in Fig. (Right-left) double rotation) at the same time as the other trees had
identical heights. This process is called as double rotation, which is similar to a single
rotation but involves four subtrees instead of three.
84 CS3301-DATASTRUCTURES
85 CS3301-DATASTRUCTURES

In our example, the double rotation is a right-left double rotation and involves 7, 15,
and 14. Here, k3 is the node with key 7, k1 is the node with key 15, and
k2 is the nodewith key 14.
Next we insert 13, which requirea double rotation. Herethe double rotation is again a
right-left double rotation that will involve 6, 14, and 7 and will restore the tree. In this case,
k3 is the node with key 6, k1 is the node with key 14, and k2 is the node with key 7. Subtree
A is the tree rooted at the node with key 5, subtree B is the empty subtree that was originally
the left child of the node with key 7, subtree C is the tree rooted at the node with key 13, and
finally, subtree D is the tree rooted at the node with key 15.
If12 is now inserted,thereis an imbalanceat theroot. Since12 is not between
4 and 7, we know that the single rotation will work. Insertion of 11 will require a single
rotation:
To insert 10, a single rotation needs to be performed, and the same is true for the
subsequent insertion of 9. We insert 8 without a rotation, creating the almost perfectly
balanced tree.

Routine for double Rotation with left(Case2)


Static position double rotate with left(positionk3)
{
K3->left=single rotate with right(k3->left);
Return single rotate with left(k3);
}
Routine for double Rotation with right(Case3)
Static position double rotate withl right(positionk1)
{
K1->right=single rotate with left(k1->right);
Return single rotate with right(k1);
}
86 CS3301-DATASTRUCTURES

Node declaration for AVL trees:


Typedef struct avl node*position;
typedef struct avlnode *avltree;
struct avl node
{
Element type element;

avltree left;
avltree right;
int height;
};
typedefavl_ptr SEARCH_TREE;

Routine for find in g height of an AVL node


Intheight(avltreep)
{
if(p ==NULL )
return-1;
else
returnp->height;
}

Routine for insertion of new element into aAVL TREE


87 CS3301-DATASTRUCTURES
88 CS3301-DATASTRUCTURES

PRIORITY QUEUES (HEAPS)


A queue is said to be priority queue, in which the elements are dequeued based on the
priority of the elements.
Apriorityqueueisused in,
 Jobs sent toa line printeraregenerally placed on a queue. For instance, one job might
be particularly important, so that it might be desirable to allow that job to be run as
soon as the printer is available.
 In a multiuser environment, the operating system scheduler must decide which of
several processes to run. Generally a process is only allowed to run for a fixed period
of time. One algorithm uses a queue. Jobs are initially placed at the end of the queue.
The scheduler will repeatedly take the first job on the queue, run it until either it
finishes or its time limit is up, and place it at the end of the queue. This strategy is
generally not appropriate, because very short jobs will seem to take a long time
because of the wait involved to run. Generally, it is important that short jobs finish as
fast as possible. This is called as Shortest Job First (SJF). This particular application
seems to require a special kind of queue, known as a priority queue.
Basic model of a priority queue
Apriority queueis a data structure that allows atleast the following two operations:
1. Insert,equivalent of enqueue
2. Deletemin,removes the minimum element in the heap equivalen to the
Queue’s dequeue operation.
Implementations of Priority Queue
1. Array Implementation
2. Linkedlist Implementation
3. BinarySearchTreeimplementation
4. BinaryHeap Implementation
Array Implementation
Drawbacks:
1. There will be more wastage of memory due to maximum size of the array should
bedefine in advance
2. Insertiontakenattheendofthearraywhichtakes O(N) time.
3. Delete_minwillalsotakeO(N)times.
89 CS3301-DATASTRUCTURES

Linked list Implementation


Itovercomesfirsttwoproblemsinarrayimplementation.Butdelete_minoperation takes
O(N) time similar to array implementation.
Binary SearchTree implementation
Another way of implementing priority queues would be to use a binary search tree.
This gives anO(log n)average running timeforboth operations.
Binary Heap Implementation
Another way of implementing priority queues would be to use a binary heap. This
gives an O(1) average running time for both operations.
Binary Heap
Like binary search trees, heaps have two properties, namely, a structure property and a
heap order property. As with AVL trees, an operation on a heap can destroy one of the
properties, so a heap operation must not terminate until all heap properties are in order.
1. Structure Property
2. Heap Order Property
Structure Property
Aheap is a binary tree thatis completely filled, with thepossibleexception ofthe
bottom level, which is filled from lefttoright.Such a treeis known as acompletebinary
tree.
AcompleteBinaryTree

A complete binary tree of height h has between 2h and 2h+1 - 1 nodes. This
impliesthat the height of a complete binary tree is log n, which is clearly O(log n).

Array implementation of complete binary tree


Note:
For any element in array position i,the left child is in position2i,the right
child is in the cell after the left child (2i + 1), and the parent is in position
i/2 .
90 CS3301-DATASTRUCTURES

The only problem with this implementation is that an estimate of the maximum heap
size is required in advance.
Types of Binary Heap
Min Heap
A binary heap is said to be Min heap such that any node x in the heap, the key
valueof X is smaller than all of its descendants children.

Max Heap
A binary heap is said to be Min heap such that any no dex in the heap,the key
Value of X is larger than all of its descendants children.

It is easy to find the minimum quickly, it makes sense that the smallest elementshould
be at the root. If we consider that any subtree should also be a heap, then any node should be
smaller than all of its descendants.
Applying this logic, we arrive at the heap order property. In a heap, for every node X,
the key in the parent of X is smaller than (or equal to) the key in X.
Similarly we can declare a (max) heap, which enables us to efficiently find and
remove the maximum element, by changing the heap order property. Thus, a priority queue
can be used to find either a minimum or a maximum.
By the heap order property,the minimum element can always befoundattheroot.
91 CS3301-DATASTRUCTURES

Declaration for priority queue


Struct heap struct
{
int capacity;
int size;
element_type*elements;
};

Typedef struct heap struct*priorityQ;

Create routine of priority Queue


Priority Q create(intmax_elements)
{
priorityQH;
if( max_elements < MIN_PQ_SIZE )
error("Priorityqueuesizeistoosmall");
H=(priorityQ)malloc(sizeof(structheapstruct)); if( H
== NULL )
fatal_error("Outof space!!!");
H->elements=(element_type*)malloc((max_elements+1)*sizeof(element_type)
);
if( H->elements == NULL )
fatal_error("Outofspace!!!");
H->capacity= max_elements;
H->size = 0;
H->elements[0]=MIN_DATA;
return H; }

Basic Heap Operations


It is easy to perform the two required operations. All the work involves ensuring that
the heap order property is maintained.
1. Insert
2. Deletemin
92 CS3301-DATASTRUCTURES

Insert
To insert an element x into the heap,we create a hole in then extavailable location,
Since otherwise the tree will not be complete.
If x can be placed in the hole without violating heap order, then we do so and are
done. Otherwise we slide the element that is in the hole's parent node into the hole, thus
bubbling the hole up toward the root. We continue this process until x can be placed in the
hole.

Figure shows that to insert 14, we create a hole in the next available heap location.
Inserting 14 in the hole would violate the heap order property, so 31 is slide down into the
hole.

This strategy is continued until the correct location for 14 is found. This general
strategy is known as a percolate up; the new element is percolated up the heap until the
correct location is found.
We could have implemented the percolation in the insert routine by performing
repeated swaps until the correct order was established, but a swap requires three assignment
statements. If an element is percolated up d levels, the number of assignments performed by
the swaps would be 3d. Our method uses d + 1 assignments.

Routine to insertinto abinary heap


/*H->element[0]isa sentinel */
Void insert(element_typex,priorityQ H)
93 CS3301-DATASTRUCTURES

{
int i;
if( is_full( H ) )
error("Priorityqueueisfull");
else
{
i=++H->size;
while(H->elements[i/2]>x )
{
H->elements[i]=H->elements[i/2]; i
/= 2;
}H->elements[i]=x; }}
If the element to be inserted is the new minimum, it will be pushed all the way to the
top. The time to do the insertion could be as much as O (log n), if the element to be insertedis
the new minimum and is percolated all the way to the root. On
Deletemin
Deletemin are handled in a similar manner as insertions. Finding the minimum iseasy;
the hard part is removing it.
When the minimum is removed, a hole is created at the root. Since the heap now
becomes one smaller, it follows that the last element x in the heap must move somewhere in
the heap. If x can be placed in the hole, then we are done. This is unlikely, so we slide the
smaller of the hole's children into the hole, thus pushing the hole down one level. We repeat
this step until x can be placed in the hole. This general strategy is known as a percolate
down.

In Figure, after 13 is removed, we must now try to place 31 in the heap. 31 cannot be
placed in the hole,because this would vio late heap order.Thus,we place the smaller child
(14)in the hole,sliding the holed own one level.We repeat this again,placing19 into the
94 CS3301-DATASTRUCTURES

hole and creating a new hole one level deeper. We then place 26 in the hole and create a new
hole on the bottom level. Finally, we are able to place 31 in the hole.

Routine to perform deletemin in a binary heap


element_type delete_min( priorityQ H )
{
int i, child;
element_typemin_element,last_element; if(
is_empty( H ) )
{
error("Priorityqueueisempty");
return H->elements[0];
}
min_element = H->elements[1];
last_element=H->elements[H->size--];
for( i=1; i*2 <= H->size; i=child )
{
child=i*2;
if((child!=H->size)&&(H->elements[child+1]<H->elements[child])) child++;
if(last_element>H->elements[child])
H->elements[i] = H->elements[child];
else
break;
95 CS3301-DATASTRUCTURES

}
H->elements[i]=last_element;
return min_element;
}
So other average running time is O (log n).
Other Heap Operations
The other heap operations are
1. Decrease key
2. Increase key
3. Delete
4. Build heap
96 CS3301-DATASTRUCTURES

UNITIV MULTIWAY SEARCH TREES AND GRAPHS 9


B-Tree – B+ Tree – Graph Definition – Representation of Graphs – Types of Graph -
Breadth-first traversal – Depth-first traversal –– Bi-connectivity – Euler circuits –
Topological Sort – Dijkstra's algorithm – Minimum Spanning Tree – Prim's algorithm–
Kruskal's algorithm

B-Trees
AVL tree and Splay tree are binary; there is a popular search tree that is not binary.
This tree is known as aB-tree.
AB-tree of order m is a tree with the following structural properties:
a. The root is either a leaf or has between 2 and m children.
b. All non leaf nodes (except the root)have between m/2 and m children.
c. All leaves are at the same depth.
All data is stored at the leaves. Contained in each interior node are pointers
p1, p2, . . . , pm to the children, and values k1, k2, . . . , km - 1, representing the smallest key
found in the subtrees p2, p3, . . . , pm respectively. Some of these pointers might be NULL,
and the corresponding ki would then be undefined.
For every node, all the keys in subtree p1 are smaller than the keys in subtree p2, and so on.
The leaves contain all the actual data, which is either the keys themselves or pointers to
records containing the keys.
The number of keys in a leaf is also between m/2 and m.
An example of a B-tree of order4

A B-tree of order 4 is morepopularly known as a 2-3-4 tree, and a B-tree of order 3 is


known as a 2-3 tree
97 CS3301-DATASTRUCTURES

Ourstarting pointis the2-3treethat follows.

A dash line as a second piece of information in an interior node indicates that the
node has only two children. Leaves are drawn in boxes, which contain the keys. The keys in
the leaves are ordered.
To perform a find, we start at the root and branch in one of (at most) three directions,
depending on the relation of the key we are looking for to the two values stored at the node.
When we get to a leaf node, wehave found the correct place to put x. Thus, to insert a
node with key 18, we can just add it to a leaf without causing any violations of the 2-3 tree
properties. The result is shown in the following figure.

If we now try to insert 1 into the tree, we find that the node where it belongs is
already full. Placing our new key into this node would give it a fourth element which is not
allowed. This can be solved by making two nodes of two keys each and adjusting the
information in the parent.
98 CS3301-DATASTRUCTURES

Toinsert 19 into the current tree, two nodes of two keys each,we obtain the following tree.

Again split this node into two nodes with two children. Now this node might be one of three
children itself, and thus splitting it would create a problem for its parent but we can keep on
splitting nodes on the way up to the root until we either get to the root or find a node with
only two children.

If we now insert an element with key 28, we create a leaf with four children, which is
split into two leaves of two children.
99 CS3301-DATASTRUCTURES

This creates an internal node with four children, which is then split into two children.
Like to insert 70 into the tree above, we could move 58 to the leaf containing 41 and 52,place
70 with 59 and 61, and adjust the entries in the internal nodes.
Deletionin B-Tree
 If this key was one of only two keys in a node, then its removal leaves only one key.
We can fix this by combining this node with a sibling. If the sibling has three keys,we
can steal one and have both nodes with two keys.
 If the sibling has only two keys, we combine the two nodes into a single node with
three keys. The parent of this node now loses a child, so we might have to percolate
this strategy all the way to the top.
We repeat this until we find aparent with less than m children. If we split the root,we
create a new root with two children.
The depth of aB-treeis at most log m/2 n.
The worst-case running time for each of the insert and delete operations is thus O(m
logm n) = O( (m / log m ) log n), but a find takes only O(log n ).

Definitions
Graph
A graph G=(V,E)consists of a set of vertices,V,and a set of edges,E.Each edge
is a pair (v,w), where v,w € V. Edges are sometimes referred to as arcs.

A B
Edge/ arcs
A,B,C, Dand Eare vertices
Vertex C E

D
Types of graph
1. Directed Graph
If the pair is ordered, then the graph is directed. In a graph, if all the edges are
directionally oriented, then the graph is called as directed Graph.Directed graphs are
sometimes referred to as digraphs.
Vertexwisadjacenttov if A B andonly if(v, w)has anedgeE.

C E

D
100 CS3301-DATASTRUCTURES

2. Undirected Graph
In a graph, if all the edges are not directionally oriented, then the graph is called as
undirected Graph. In an undirected graph with edge(v,w), and hence(w,v),wis adjacent to v
and v is adjacent to w.

1 2
3
4
5
3. Mixed Graph
In a graph if the edges are either directionally or not directionally oriented, then it is
called as mixed graph.

S U

T V

Path
A path in a graph is a sequence of vertices w1,w2,w3,...,wn such that(wi,wi+i)€
Efor 1<i <n.

Path length
The length of a path is the number of edges on the path, which is equal to n – 1 where
n is the no of vertices.
Loop
A path from a vertex to itself; if this path contains no edges, then the path length is 0.
If the graph contains an edge (v,v) from a vertex to itself, then the path v, v is sometimes
referred to as a loop.
101 CS3301-DATASTRUCTURES

Simple Path
A simple path is a path such that all vertices are distinct, except that the first and last
could be the same.

A B

A->C->D->E
C E

D
Cycle
In a graph, if the path starts and ends to the same vertex then it is known as Cycle.

A->C->D->E->A
Cyclic Graph
A directed graph is said to be cyclic graph, if it has cyclic path.

Acyclic Graph
A directed graph is acyclic if it has no cycles.Adirected acyclicgraph is also referred
as DAG.
Connected Graph
An undirected graph is connected if there is a path from every vertex to every other
vertex.

Strongly connected Graph


A directedgraphiscalled strongly connected if there isa pathfrom every vertex to every
other vertex.
102 CS3301-DATASTRUCTURES

Weakly connected Graph


If a directed graph is not strongly connected, but the underlying graph (without
direction to the arcs) is connected, then the graph is said to be weakly connected.

Complete graph
A complete graph is a graph in which there is an edge between every pair of vertices.

Weighted Graph
In a directed graph, if some positive non zero integer values are assigned to
each and every edges, then it is known as weighted graph. Also called as Network
An example of a real-life situation that can be modeled by a graph is the airport
system. Each airport is a vertex, and two vertices are connected by an edge if there is a
nonstop flight from the airports that are represented by the vertices. The edge could have a
weight, representing the time, distance, or cost of the flight.

Indegree and Outdegree


Indegree : number of edges entering or coming towards a vertex is called Indegree.
Outdegree:Number of edge sex it in gorgo in gout from a vertex is called Out degree.
Degree : Number of edges incident on a vertex is called Degree of a vertex.
Degree= Indegree+ Outdegree
Source/StartVertex:A vertex whose in degree is zero is called sink vertex
Sink/DestinationVertex :A vertex whose out degree is zero is called sink vertex
103 CS3301-DATASTRUCTURES

Representation of Graphs
1. Adjacency matrix/Incidence Matrix
2. Adjacency Linked List/Incidence Linked List

Adjacency matrix
We will consider directed graphs.(Fig.1)

Now we can number the vertices, starting at 1. The graph shown in above figure represents 7
vertices and 12 edges.
One simple way to represent a graph is to use a two-dimensional array. This is known
as an adjacency matrix representation.
For each edge (u, v), we set a[u][v]= 1; otherwise the entry in the array is 0. If the edge has a
weight associated with it, then we can set a[u][v] equal to the weight and use either a very
large or a very small weight as a sentinel to indicate nonexistent edges.
Advantage : it is extremely simple, and the space requirementis (|V|2).

For directed graph


A[u][v]={ 1,if thereis edgefrom uto v
0 otherwise }
For undirected graph
A[u][v]={ 1,if thereis edgebetween u and v
0 otherwise }
For weighted graph
A[u][v]={ value, ifthereis edgefrom u tov
∞,if noedgebetween uand v }
104 CS3301-DATASTRUCTURES

Adjacency lists
Adjacency lists are the standard way to represent graphs. Undirected graphs can be
similarly represented; each edge (u, v) appears in two lists, so the space usage essentially
doubles. A common requirement in graph algorithms is to find all vertices adjacent to some
given vertex v, and this can be done, in time proportional to the number of such vertices
found, by a simple scan down the appropriate adjacency list.
Anadjacency list representation of a graph(See above fig 5.1)

Topological Sort
A topological sort is an ordering of vertices in a directed acycli c graph,such that if
there is a path from vi to vj, then vj appears after vi in the ordering.
It is clear that a topological ordering is not possible if the graph has a cycle, since for
two vertices v and w on the cycle, v precedes w and w precedes v.
Directed acyclic graph

In the above graph v1, v2, v5, v4, v3, v7,v6 andv1, v2, v5, v4,v7, v3,v6 are both
topological orderings.
105 CS3301-DATASTRUCTURES

A simple algorithm to find a topological ordering


First, find any vertex with no incoming edges (Source vertex). We can then print this
vertex, and remove it, along with its edges, from the graph.
To formalize this, we define the indegree of a vertex v as the number of edges (u,v).
We compute the indegrees of all vertices in the graph. Assuming that the indegree array is
initialized and that the graph is read into an adjacency list,
In degree Before Dequeue#
Vertex 1 2 3 4 5 6 7

v1 0 0 0 0 0 0 0
v2 1 0 0 0 0 0 0
v3 2 1 1 1 0 0 0
v4 3 2 1 0 0 0 0
v5 1 1 0 0 0 0 0
v6 3 3 3 3 2 1 0
v7 2 2 2 1 0 0 0

Enqueue v1 v2 v5 v4 v3 v7 v6

Dequeue v1 v2 v5 v4 v3 v7 v6

Simple Topological Ordering Routine


Void top sort(graph G )
{
Unsigned int
counter; vertex v, w;
for(counter=0; counter <NUM_VERTEX;counter++)
{
v=find_new_vertex_of_indegree_zero();
if( v = NOT_A_VERTEX )
{
error("Graph has a cycle");
break;
}
106 CS3301-DATASTRUCTURES

top_num[v] = counter;
for each w adjacent tov
indegree[w]--;
}
}

Routine to perform Topological Sort


Void top sort(graphG )
{
QUEUEQ;
Unsigned int
counter; vertex v, w;
Q=create_queue(NUM_VERTEX);
makeempty( Q );
counter=0;
for each vertex v
if(indegree[v]=0)
enqueue( v, Q );
while(!isempty( Q))
{
v=dequeue( Q);
top_num[v]=++counter;/*assign next number*/ for
each w adjacent to v
if(--indegree[w]=0)
enqueue( w, Q );
}
if( counter != NUMVERTEX )
error("Graph has a cycle");
dispose_queue(Q);/*freethememory*/
107 CS3301-DATASTRUCTURES

}
Graph Traversal:
Visiting of each and every vertex in the graph only once is called as Graph traversal.
There are two types of Graph traversal.
1. Depth First Traversal/Search(DFS)
2. Breadth First Traversal/Search(BFS)

Depth First Traversal/Search(DFS)


Depth-first search is a generalization of preorder traversal. Starting at some vertex, v,
we process v and then recursively traverse all vertices adjacent to v. If this process is
performed on a tree, then all tree vertices are systematically visited in a total of O(|E|) time,
since |E| = (|V|).
We need tobe careful to avoid cycles.To do this,when we visit a vertexv, we mark it
visited, since now we have been there, and recursively call depth-first search on all adjacent
vertices that are not already marked.
The two important key points of depth first search
1. If path exists from one node to another node walk across the edge – exploring
the edge
2. If path does not exist from one specific node to any other nodes, return to the
previous node where we have been before – backtracking
Procedure for DFS
Starting at some vertexV,we process and then recursively traverse all the vertices adjacent to
V.This process continues until all the vertices are processed. If some vertex is not processed
recursively, then it will be processed by using backtracking. If vertex W is visited from V,
then the vertices are connected by means of tree edges. If the edges not included in tree, then
they are represented by back edges. At the end of this process, it will construct a tree called as
DFS tree.
Routine toper form a depth-first search void
Void dfs(vertexv)
{
visited[v]= TRUE;
for each w adjacent tov
if( !visited[w] )
dfs(w); }
108 CS3301-DATASTRUCTURES

The (global) boolean array visited[ ] is initialized to FALSE. By recursively calling


the procedures only on nodes that have not been visited, we guarantee that we do not loop
indefinitely.
* An efficient way of implementing this is to begin the depth-first search at v1. If we need to
restart the depth-first search, we examine the sequence vk, vk + 1, . . . for an unmarked
vertex,where vk - 1 is the vertex where the last depth-first search was started.

An undirected graph

Steps to construct depth-first spanning tree


a. We start at vertex A. Then we mark A as visited and call dfs(B) recursively. dfs(B)
marks B as visited and calls dfs(C) recursively.
b. dfs(C)marksCasvisited andcallsdfs(D) recursively.
c. dfs(D) sees both A and B, but both these are marked, so no recursive calls are made.
dfs(D) also sees that C is adjacent but marked, so no recursive call is made there, and
dfs(D) returns back to dfs(C).
d. dfs(C) sees B adjacent, ignores it, finds a previously unseen vertex E adjacent, and
thus calls dfs(E).
e. dfs(E)marksE,ignoresAandC, andreturnsto dfs(C).
f. dfs(C)returnstodfs(B).dfs(B)ignoresbothA andDand returns.
g. dfs(A)ignoresboth Dand Eand returns.
Depth-firstsearch of the graph

-------->Back edge
109 CS3301-DATASTRUCTURES

Tree edge
The root of the tree is A, the first vertex visited. Each edge (v, w) in the graph is presentin
the tree. If, when we process (v, w), we find that w is unmarked, or if, when we process(w,
v), we find that v is unmarked, we indicate this with a tree edge.
If when we process (v, w), we find that w is already marked, and when processing (w, v), we
find that v is already marked, we draw a dashed line, which we will call a back edge, to
indicate that this "edge" is not really part of the tree.
Breadth First Traversal(BFS)
Here starting from some vertex v, and its adjacency vertices are processed. After all the
adjacency vertices are processed, then selecting any one the adjacency vertex and processwill
continue. If the vertex is not visited, then backtracking is applied to visit the unvisited vertex.
Routine: Example:BFS of the above graph

void BFS(vertexv) A

{
visited[v]= true;
B E
For each w adjacent tov D
If (!visited[w])
visited[w] = true; C
}
Difference between DFS & BFS
S. No DFS BFS
1 Back tracking is possible from a dead end. Back tracking is not possible.
2 Vertices from which exploration is The vertices to be explored are
Incomplete are processed in a LIFO order. Organized as a FIFO queue.
3 Search is done in one particular direction at the The vertices in the same level are
time. maintained parallel.(Left to right) (
Alphabetical ordering)
110 CS3301-DATASTRUCTURES

A
A
B D
C D

B C
E

E G H
Order of traversal: ABCDE F

Order of traversal:
ABCDEFGH

Bi-connectivity/Biconnected Graph:
A connected undirected graph is biconnected if there are no vertices whose removal
disconnects the rest of the graph.

Articulation points
If a graph is not biconnected, the vertices whose removal would disconnect the graph
are known as articulation points.
111 CS3301-DATASTRUCTURES

The above graph is not biconnected:C and D are articulation points.


Depth-first search provides a linear-time algorithm to find all articulation points in a
connected graph.
 First, starting at any vertex, we perform a depth-first search and number the
nodes as they are visited.
 For each vertex v, we call this preorder number num (v). Then, for every
vertex v in the depth-first search spanning tree, we compute the lowest-
numbered vertex, which we call low(v), that is reachable from v by takingzero
or more tree edges and then possibly one back edge (in that order).
Bythe definition oflow, low (v)is the minimum of
1. num(v)
2. the lowest num(w) among all back edges(v, w)
3. the lowest low(w)among all tree edges(v, w)
The first condition is the option of taking no edges, the second way is to choose no
tree edges and a back edge, and the third way is to choose some tree edges and possibly a
back edge.
112 CS3301-DATASTRUCTURES

The depth-first search tree in the above Figure shows the preorder number first, and then the
lowest-numbered vertex reachable under the rule described above.
The lowest-numbered vertex reachable by A, B, and C is vertex 1 (A), because they can all
take tree edges to D and then one back edge back to A and find low value for all other
vertices.
Depth-first tree that results if depth-first search starts at C

To find articulation points,


 The root is an articulation point if and only if it has more than one child, because if it
has two children, removing the root disconnects nodes in different subtrees, and if it
has only one child, removing the root merely disconnects the root.
 Any other vertex v is an articulation point if and only if v has some child w such that
low (w)>= num (v). Notice that this condition is always satisfied at the root;
We examine the articulation points that the algorithm determines, namely C and D. D has a
child E, and low (E)>= num (D), since both are 4. Thus, there is only one way for E to get to
any node above D, and that is by going through D.
Similarly,C is an articulation point,because low(G)>=num(C).

Routine to assign num to vertices


Void assign num (vertexv)
{
Vertex w;
num[v]=counter++;
visited[v] = TRUE;
113 CS3301-DATASTRUCTURES

for each w adjacent to v


if( !visited[w] )
{
parent[w]=v;
assign num( w ); } }

Routin to compute low and to test for articulation


Void assign low(vertexv)
{
Vertex w;
low[v]=num[v];/*Rule1*/ for
each w adjacent to v
{
if(num[w]>num[v])/*forward edge*/
{
Assign low(w );
if(low[w] >=num[v] )
printf( "%v is an articulation point\n", v );
low[v]=min(low[v],low[w]);/*Rule3*/
}
else
if(parent[v]!=w )/* back edge */
low[v]=min( low[v],num[w] ); /*Rule 2 */ } }
Testing for articulation points in one depth-first search (test for the root is omitted)

void findart( vertex v )


{
vertex
w;visited[v]=TRU
E;
low[v]=num[v]=counter++;/*Rule1*/ for
each w adjacent to v
{
if(!visited[w] )/*forwardedge*/
{
parent[w]=v;
114 CS3301-DATASTRUCTURES

findart(w);
if(low[w] >=num[v])
printf("%vis an articulation point\n",v);
low[v]=min(low[v], low[w]);/*Rule */
}
else
if(parent[v]!=w )/* back edge */
low[v]=min( low[v],num[w] ); /*Rule 2 */
}
}

Euler Circuits
We must find a path in the graph that visits every edge exactly once. If we are to solve
the "extra challenge," then we must find a cycle that visits every edge exactly once. This
graph problem was solved in 1736 by Euler and marked the beginning of graph theory. The
problem is thus commonly referred to as an Euler path or Euler tour or Euler circuit
problem, depending on the specific problem statement.
Consider the three figures as shown below. A popular puzzle is to reconstruct these
figures using a pen, drawing each lineexactly once. Thepen may not beliftedfrom the paper
while the drawing is being performed. As an extra challenge, make the pen finish at the same
point at which it started.
Three drawings

1. The first figure can be drawn only if the starting point is the lower left- or right-hand
corner, and it is not possible to finish at the starting point.
2. Thesecondfigureiseasilydrawnwiththefinishingpointthesameasthestarting point.
3. Thethirdfigure cannotbedrawn at allwithin theparameters ofthe puzzle.
115 CS3301-DATASTRUCTURES

We can convert this problem to a graph theory problem by assigning a vertex to each
intersection. Then the edges can be assigned in the natural manner, as in figure.

The first observation that can be made is that an Euler circuit, which must end on its starting
vertex, is possible only if the graph is connected and each vertex has an even degree (number
of edges). This is because, on the Euler circuit, a vertex is entered and then left.
If exactly two vertices have odd degree, an Euler tour, which must visit every edge but need
not return to its starting vertex, is still possible if we start at one of the odd-degree vertices
and finish at the other.
If more than two vertices have odd degree, then an Euler touris not possible.
That is, any connected graph, all ofwhosevertices haveeven degree, musthavean Euler
circuit
As an example, consider the graph in

The main problem is thatwe might visit a portionof the graph and return to the starting point
prematurely. If alltheedgescomingoutofthestartvertexhavebeenused up,thenpartofthe graph is
untraversed.
The easiest way to fix this is to find the first vertex on this path that has an untraversed edge,
and perform another depth-first search. This will give another circuit, which can be spliced
into the original. This is continued until all edges have been traversed.
Supposewestartatvertex5,andtraversethecircuit5,4,10,5.Thenwearestuck, and most
of the graph is still untraversed. The situation is shown in the Figure.
116 CS3301-DATASTRUCTURES

We then continue from vertex 4, which still has unexplored edges. A depth-first search might
come up with the path4,1,3,7,4,11,10,7,9,3,4.If we splice this path in to the previous
Path of 5,4,10,5,then we get a new path of 5,4,1,3,7,4,11,10,7,9,3,4,10,5. The graph
that remains after this is shown in the Figure

The next vertex on the path that has untraversed edges is vertex 3. A possible circuit would
then be3,2,8,9,6,3.When spliced in,this gives the path5,4,1,3,2,8,9,6,3,7,4,11,10,
7, 9, 3, 4, 10, 5.
The graph that remains is in the Figure.

On this path,the next vertex with an untraversed edge is 9, and the algorithm finds the circuit
9,12,10,9.Whenthisisaddedtothecurrentpath,acircuitof5,4,1,3,2,8,9,12,10,9,6,
3,7,4,11,10,7,9,3,4,10,5isobtained.Asalltheedgesaretraversed,thealgorithm terminates with an
Euler circuit.
ThentheEulerPathfortheabovegraphis5,4,1,3,2,8,9,12,10,9,6,3,7,4,11,
10, 7, 9, 3, 4, 10, 5

Cutvertex and edges


117 CS3301-DATASTRUCTURES

A cut vertex is a vertex that when removed(with its boundary edges)from a graph creates
more components than previously in the graph.

A cute dgeis an edge that when removed(the vertices stay in place)from a graph creates
more components than previously in the graph.

Find the cut vertices and cut edges for the following graphs

Answers
31) The cut vertex is c.There are no cut edges.
32) The cut vertices arecand d. The cut edge is(c,d)
33) The cut vertices are b,c,e and i.The cut edges are:(a,b),(b,c),(c,d),(c,e),(e,i),(i,h)
Applications of graph:
Minimum Spanning
Tree Definition:
A minimum spanning tree exists if and only if G is connected. A minimum spanning
tree of an undirected graph G is a tree formed from graph edges that connects all the vertices
of G at lowest total cost.
The number of edges in the minimumspanning tree is |V| - 1. The minimum spanning
tree is a tree because it is acyclic, it is spanning because it covers every edge.
Application:
 House wiring with a minimum length of cable,reduces cost of the
wiring.
118 CS3301-DATASTRUCTURES

A graph G and its minimum spanning tree

There are two algorithms to find the minimum spanning tree


1. Prim's Algorithm
2. Kruskal'sAlgorithm
Kruskal's Algorithm
A second greedy strategy is continually to select the edges in order of smallest weight
and accept an edge if it does not cause a cycle.
Formally,Kruskal's algorithm maintains a forest.Forest is a collection of trees.
Procedure
 Initially,there are |V|single-node trees.
 Adding an edge merges two trees into one.
 When the algorithm terminates,there is only one tree, and this is the minimum
spanning tree.
 The algorithm terminates when enough edges are accepted.
At any point in the process,two vertices belong to the same set if and only if they are
connected in the current spanning forest. Thus, each vertex is initially in its own set.
 If u and v are in the same set,the edge is rejected,becauses in they are already
connected, adding (u, v) would form a cycle.
 Otherwise, the edge is accepted, and a union is performed on the two sets containing
u and v.
119 CS3301-DATASTRUCTURES

Action of Kruskal's algorithm on G


Edge Weight Action

(v1,v4) 1 Accepted
(v6,v7) 1 Accepted
(v1,v2) 2 Accepted
(v3,v4) 2 Accepted
(v2,v4) 3 Rejected
(v1,v3) 4 Rejected
(v4,v7) 4 Accepted
(v3,v6) 5 Rejected
(v5,v7) 6 Accepted

Kruskal's algorithm after each stage

Routine for Kruskal's algorithm


Void Graph::kruskal()
{
int edges accepted = 0; DISJSETds(Numvertex);
PRIORIT_QUEUE < edge> pg( getedges ( ));
Edgee;
120 CS3301-DATASTRUCTURES

VertexU,V;
while(edges accepted<NUMVERTEX-1 )
{
Pq. deletemin( e ); //e=(u,v)
Settype Uset =ds. find( U, S );
Settype Vset = ds.find( V, S );
if(Uset!=Vset)
{
// accept the edge
edgesaccepted++;
ds.setunion(S,Uset,Vset);
} } }

Dijkstra's Algorithm
If the graph is weighted,the problem become sharder,but we can still use the ideas
from the unweighted case.

Each vertex is marked as either known or unknown. A tentative distance dv is kept for each
vertex. The shortest path length from s to v using only known vertices as intermediates.
The general method to solve the single-source shortest-path problem is known as Dijkstra's
algorithm.
Dijkstra's algorithm proceeds in stages, just like the unweighted shortest-path
algorithm. At each stage, Dijkstra's algorithm selects a vertex v, which has the smallest dv
among all the unknown vertices, and declares that the shortest path from s to v is known.
121 CS3301-DATASTRUCTURES

In the above graph, assuming that the start node, s, is v1. The first vertex selected is v1, with
path length 0. This vertex is marked known. Now that v1 is known.
Initial configuration table
v Knowndv pv

v1 0 0 0
v2 0 ∞ 0
v3 0 ∞ 0
v4 0 ∞ 0
v5 0 ∞ 0
v6 0 ∞ 0
v7 0 ∞ 0
Theverticesadjacenttov1arev2andv4.Boththeseverticesgettheirentriesadjusted,as indicated
below
After v1 is declared known
v Known dv pv

v1 1 0 0
v2 0 2 v1
v3 0 ∞ 0
v4 0 1 v1
v5 0 ∞ 0
v6 0 ∞ 0
v7 0 ∞ 0
Next,v4 is selected and marked known.Vertices v3,v5, v6,and v7 are adjacent.
After v4 is declared known
v Known dv pv

v1 1 0 0
v2 0 2 v1
v3 0 3 v4
v4 1 1 v1
v5 0 3 v4
v6 0 9 v4
v7 0 5 v4
122 CS3301-DATASTRUCTURES

Next, v2 is selected. v4 is adjacent but already known, so no work is performed on it. v5 is


adjacent but not adjusted, because the cost of going through v2 is 2 + 10 = 12 and a path of
length 3 is already known.After v2 is declared known
V Known dv pv

v1 1 0 0
v2 1 2 v1
v3 0 3 v4
v4 1 1 v1
v5 0 3 v4
v6 0 9 v4
v7 0 5 v4
The next vertex selected is v5 at cost 3. v7 is the only adjacent vertex, but it is not
adjusted,because 3 + 6 > 5. Then v3 is selected, and the distance for v6 is adjusted down to 3
+ 5 = 8.
After v5 and v3 are declared known
v Known dv pv

v1 1 0 0
v2 1 2 v1
v3 1 3 v4
v4 1 1 v1
v5 1 3 v4
v6 0 8 v3
v7 0 5 v4
Next v7 is selected; v6 gets updated down to5 +1=6. The resulting table is
After v7 is declared known
v Known dv pv

v1 1 0 0
v2 1 2 v1
v3 1 3 v4
v4 1 1 v1
v5 1 3 v4
v6 0 6 v7
v7 1 5 v4
123 CS3301-DATASTRUCTURES

Finally,v6 is selected. The final table is shown


After v6 is declared known and algorithm
terminates v Known dv pv

v1 1 0 0
v2 1 2 v1
v3 1 3 v4
v4 1 1 v1
v5 1 3 v4
v6 1 6 v7
v7 1 5 v4
124 CS3301-DATASTRUCTURES

Vertex class for Dijikstra’s algorithm


Struct Vertex
{
List adj;
Bool known;
disttype dist;
Vertex path;};
#define NOTAVERTEX0

Routine for Dijkstra' salgorithm


void graph :: dijkstra( Vertex S )
{
For each vertex v
{
v.dist=INFINITY;
v.known = false; }
s.dist =0;
for(;; )
{
v=smallest unknown distance vertex;
if(v== Not AVertex)
break;
v. known = TRUE;
for each w adjacent to v
if( !w. known )
if(v.dist +Cv,w<w.dist )
{
decrease(w.disttov.dist+Cv,w);
w.path = v; } } }
Routine to print the actual shortest path
Void Graph::print path(Vertexv)
{
if(v.path !=NOTAVERTEX)
{
Print path(v.path);
cout<<" to " ; }
cout<<v ; }
125 CS3301-DATASTRUCTURES

UNITV SEARCHING,SORTING AND HASHING TECHNIQUES 9


Searching – Linear Search – Binary Search. Sorting – Bubble sort – Selection sort –
Insertion sort – Shell sort –. Merge Sort – Hashing – Hash Functions – Separate
Chaining – Open Addressing – Rehashing – Extendible Hashing.
126 CS3301-DATASTRUCTURES
127 CS3301-DATASTRUCTURES
128 CS3301-DATASTRUCTURES
129 CS3301-DATASTRUCTURES
130 CS3301-DATASTRUCTURES
131 CS3301-DATASTRUCTURES
132 CS3301-DATASTRUCTURES
133 CS3301-DATASTRUCTURES
134 CS3301-DATASTRUCTURES
135 CS3301-DATASTRUCTURES
136 CS3301-DATASTRUCTURES
137 CS3301-DATASTRUCTURES
138 CS3301-DATASTRUCTURES
139 CS3301-DATASTRUCTURES
140 CS3301-DATASTRUCTURES
141 CS3301-DATASTRUCTURES
142 CS3301-DATASTRUCTURES
143 CS3301-DATASTRUCTURES
144 CS3301-DATASTRUCTURES
145 CS3301-DATASTRUCTURES
146 CS3301-DATASTRUCTURES

RoutineforHashFunction
INDEXhash(char*key,inttablesize)
{
int hash_val = 0;
while( *key != '\0' )
hash_val+=*key++;
return(hash_val % H_SIZE );
}
Collision:
Collisionoccurswhenaashvalueofarecordbeinginsertedhashestoanaddressthatalready contain a
different record (i.e) when two key values hash to the same position.
Example
Values89 and 39arehash tothesame address9, if thetablesizeis 10.

Collision Resolution Methods:


1. OpenHashing–EachbucketinthehashtableistheheadofaLinkedlist.Collide
elements are stored outside the table. Eg.Separate Chaining
147 CS3301-DATASTRUCTURES

2. ClosedHashing–Collideelementsarestoredatanotherslotinthetable.Ensures that all


the elements are stored directly into the hash table. Eg Open addressing.
Rehashing and Extendible Hashing
Openaddressing:LinearProbing,QuadraticProbingandDoubleHashing

1.SeparateChaining /Open Hashing

Thefirststrategy,commonlyknownaseitheropenhashing,orseparatechaining,istokeepa list of all


elements that hash to the same value. For convenience, our lists have headers. hash(x) = x
mod 10. (The table size is 10)
To perform an insert, we traverse down the appropriate list to check whether the element is
alreadyinplace. Iftheelementturnsouttobenew,itisinsertedeitheratthefrontofthelist or at the
end of the list. New elements are inserted at the front of the list.

Struct list node


{
Element type element;

position next;
};

Struct hash tbl


{
int table size;
148 CS3301-DATASTRUCTURES

LIST*the lists;
};
Initialization routine for open hash table
HASH TABLE initialize table(int table size)
{
HASH TABLE H;
int i;
if(table size<MIN_TABLE_SIZE)
{
error("Table size too small");
return NULL;
}
H=(HASH_TABLE)malloc(size of(struct hasht bl));

if( H == NULL )
Fatal error("Out of space!!!");
H->table size=next prime(table size);
H->the lists=malloc(size of(LIST)*H->table size);

if( H->the lists == NULL )


fatal error("Out of space!!!");
for(i=0;i<H->tablesize;i++)
{
H->the lists[i]=malloc(size of(struct list node));
if( H->the lists[i] == NULL )
Fatal error("Out of
space!!!");
else
H->the lists[i]->next=NULL;
}
Return H;
}

Routine for Find operation


Position find(element type key,HASH TABLE H)
149 CS3301-DATASTRUCTURES

{
position p;
LISTL;
L=H->the lists[hash(key,H->table size)]; p
= L->next;
while((p!=NULL)&&(p->element!=key)) p =
p->next;
return p;
}
Routine for Insert Operation
Void insert(element type key,HASH TABLE H)
{
Position pos,new cell;LISTL;

pos = find( key, H );


if(pos ==NULL )
{
New cell=(position)malloc(size of(struct list node));

if( newcell == NULL )


Fatal error("Out of
space!!!"); else
{
L=H->the lists[hash(key,H->table size)];
new cell->next = L->next;
New cell->element=key;
L->next=new cell;} }}

Closed Hashing(Open Addressing)


Separate chaining has the disadvantage of requiring pointers.This ends to slow the algorithm
down a bit because of the time required to allocate new cells, and also essentially requires the
implementation of a second data structure.
Closed hashing, also known as open addressing,is an alternative to resolving collisions with
linked lists.
150 CS3301-DATASTRUCTURES

In a closed hashing system,if a collision occurs,alternate cells are tried until an empty cell is
found. More formally, cells h0(x), h1(x), h2(x), . . . are tried in succession where hi(x) =
(hash(x) + F(i) mod tablesize), with F(0) = 0.
The function,F,is the collision resolution strategy.Because all the data goes inside the table, a
bigger table is needed for closed hashing than for open hashing. Generally, the load factor
should be below = 0.5 for closed hashing.

Three common collision resolution strategies are


1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Linear Probing
In linear probing, F is a linear function of i,typically F(i)=i.This amounts to trying cells
sequentially (with wrap around) in search of an empty cell.
F(i)=i.
The below Figures hows theresul to f inserting keys{89,18,49,58,69}into a closed table
using the same hash function as before and the collision resolution strategy, The first
collision occurs when 49 is inserted; it is put in then ext available spot,namely 0,which is
open. 58 collides with 18, 89, and then 49 before an empty cell is found three away.
{89, 18, 49, 58, 69}

Quadratic Probing
151 CS3301-DATASTRUCTURES

Quadratic probing is a collision resolution method that eliminates the primary clustering
problem of linear probing.Quadratic probing is what you would expect-the collision function
is quadratic.The popular choice isF(i)=i2

When 49 collide with 89, then ext position attempted is one cell away.This cell is empty,so
49 is placed there.Next 58 collides at position 8.Then the cellone away is tried but another
collision occurs.Avacant cell isfoundatthenext celltried,whichis2 2=4away.58isthus placed in
cell 2.
{89, 18, 49, 58, 69}

Double Hashing
The last collision resolution method we will examine is double hashing. For double hashing,
one popular choice is f(i) = i h2 (x). This formula says that we apply a second hash function
to x and probe at a distance h2(x),2h2 (x),...,andsoon.A function such as h2(x)=R -(x mod R),
with R a prime smaller than H_SIZE, will work well.
152 CS3301-DATASTRUCTURES

Rehashing
If the table gets too full, the running time for the operations will start taking too long and
inserts might fail for closed hashing with quadratic resolution.This can happen if there are
too many deletions intermixed with insertions.
A solution,then,is to build another table that is about twice as big and scan down the entire
original hash table, computing the new hash value for element and inserting it in the new
table.
As an example,suppose the elements13,15,24,and 6 are inserted into a closed hash table of
size 7. The hash function is h(x) = x mod 7. Suppose linear probing is used to resolve
collisions.
153 CS3301-DATASTRUCTURES

Rehashing routines
Hash table are hash(HASH_TABLEH)
{
Unsigned inti,old_size;
cell *old_cells;
old_cells = H->the_cells;
old_size=H->table_size;
/*Geta new, empty table*/
H=initialize_table(2*old_size);
/*Scan through oldtable,reinserting into new*/

for( i=0; i<old_size; i++ )


if(old_cells[i].info==legitimate)
insert( old_cells[i].element, H );
free( old_cells );
returnH;
}
154 CS3301-DATASTRUCTURES

Extendible Hashing
If the amount of data is too large to fit in main memory, then is the number of disk accesses
required to retrieve data. As before, we assume that at any point we have n records to store;
the value of n changes over time. Further more, at most m records fit in one disk block. We
will u sem=4 in this section.To be more formal, D will represent the number of b its used by
the root,which is sometimes known as the directory.The number of entries in the directory is
thus 2D . dL is the number of leading bits that all the elements of some leaf have in
common.dL will depend on the particular leaf, and dL<=D.

Suppose that we want to insert the key 100100.This would go into the third leaf,but as the
third leaf is already full,there is no room.Weth us split this leaf in to two leaves,which are
now determined by the first three bits. This requires increasing the directory size to 3.
155 CS3301-DATASTRUCTURES

If the key 00000 is now inserted,then the first leaf is split,generating two leaves with dL= 3.
SinceD=3,the only change required in the directory is the updating of the 000 and 001
pointers.
156 CS3301-DATASTRUCTURES

Different Methods of Hashing function


157 CS3301-DATASTRUCTURES
158 CS3301-DATASTRUCTURES

You might also like