DSA Unit-5
DSA Unit-5
5 is adjacent to 7
7 is adjacent from 5
Pros:
• Simple to implement
• Easy and fast to tell if a pair (i,j) is an edge: simply check if A[i][j] is 1 or
0
Cons:
• No matter how few edges the graph has, the matrix takes O(n2) in
memory
Adjacency List Representation
• In this representation we store a graph as a linked structure. We store all the vertices in a list
for each vertex; we have a linked list of its adjacency vertices.
3 1 2
0 0
1 2 3 0
1 2 0
1 2 1 3 0
0 1 2
2
3
Adjacency List-Pros and Cons
Pros:
• Saves on space (memory): the representation takes as many
memory words as there are nodes and edge.
Cons:
• It can take up to O(n) time to determine if a pair of nodes (i,j) is an
edge: one would have to search the linked list L[i], which takes time
proportional to the length of L[i].
Graph specification based on adjacency
matrix representation
const int NULL_EDGE = 0;
template<class VertexType> private:
int numVertices;
class GraphType {
int maxVertices;
public: VertexType* vertices;
GraphType(int); int **edges;
~GraphType(); bool* marks;
};
void MakeEmpty();
bool IsEmpty() const;
bool IsFull() const;
void AddVertex(VertexType);
void AddEdge(VertexType, VertexType, int);
int WeightIs(VertexType, VertexType);
void GetToVertices(VertexType, QueType<VertexType>&);
void ClearMarks();
void MarkVertex(VertexType); (continues)
bool IsMarked(VertexType) const;
template<class VertexType> void GraphType<VertexType>::AddVertex(VertexType vertex)
GraphType<VertexType>::GraphType(int maxV) {
{ vertices[numVertices] = vertex;
numVertices = 0;
maxVertices = maxV; for(int index = 0; index < numVertices; index++) {
vertices = new VertexType[maxV]; edges[numVertices][index] = NULL_EDGE;
edges = new int[maxV]; edges[index][numVertices] = NULL_EDGE;
• Then the vertex B is dequeued and its adjacency vertices C and D are
taken from the adjacency matrix for enqueuing. Since vertex C is already
in the queue, vertex D alone is enqueue
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A, D }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A, D, F }
A
Topo sort example
• function topologicalSort():
• ordering := { }.
• Repeat until graph is empty:
• Find a vertex v with in-degree of 0 (no incoming edges).
• (If there is no such vertex, the graph cannot be sorted; stop.)
• Delete v and all of its
outgoing edges from the graph. C
• ordering += v .
F
B E
• ordering = { B, C, A, D, F, E }
A
Revised algorithm
• We don't want to literally delete vertices and edges from
the graph while trying to topological sort it; so let's revise
the algorithm:
• map := {each vertex → its in-degree}.
• queue := {all vertices with in-degree = 0}.
• ordering := { }.
• Repeat until queue is empty:
• Dequeue the first vertex v from the queue.
• ordering += v.
• Decrease the in-degree of all v's neighbors by 1 in the map.
• queue += {any neighbors whose in-degree is now 0}.
• If all vertices are processed, success.
Otherwise, there is a cycle.
Topo sort example 2
• function topologicalSort():
• map := {each vertex → its in-degree}. C
• queue := {all vertices with in-degree = 0}. F
• ordering := { }. B E
• Repeat until queue is empty:
• Dequeue the first vertex v from the queue.
• ordering += v.
D
• Decrease the in-degree of all v's
neighbors by 1 in the map. A
• queue += {any neighbors whose in-degree is now 0}.
• function topologicalSort():
• map := {each vertex → its in-degree}. // O(V)
• queue := {all vertices with in-degree = 0}.
• ordering := { }.
• Repeat until queue is empty: // O(V)
• Dequeue the first vertex v from the queue. // O(1)
• ordering += v. // O(1)
• Decrease the in-degree of all v's // O(E) for all passes
neighbors by 1 in the map.
• queue += {any neighbors whose in-degree is now 0}.
2. Most Efficient Time Complexity of Topological Sorting is? (V – number of vertices, E – number of
edges)
a) O(V + E) b) O(V) c) O(E) d) O(V*E)
3. In most of the cases, topological sort starts from a node which has ___
a) Maximum Degree b) Minimum Degree c) Any degree d) Zero Degree
1.Cluster Analysis
2.Handwriting recognition
3.Image segmentation
Applications of MST
minimized
Growing a MST – Generic Approach
• Grow a set A of edges (initially
8 7
empty) b c d
4 9
2
• Incrementally add edges to A a 11 i 14 e
4
such that they would belong 8
7 6
10
h g f
to a MST 1 2
a 11 i 14 e
- Is it safe for A initially? 7 6
4
8 10
• Later on: S h
1
g
2
f
iv. After selecting the vertex v, the update rule is applied for each
unknown w adjacent to v. The rule is dw = min (dw , Cw,v), that is more
than one path exist between v to w then d w is updated with minimum
cost.
Prim’s Algorithm Example
1.v1 is selected as initial node in the
spanning tree and construct initial
configuration of the table.
v Known dv pv
V1 0 0 0
V2 0 ∞ 0
V3 0 ∞ 0
V4 0 ∞ 0
V5 0 ∞ 0
V6 0 ∞ 0
V7 0 ∞ 0
Prim’s Algorithm Example
2.v1 is declared as known vertex. Then its adjacent vertices v2, v3, v4 are updated.
v Known dv pv
T[v2].dist = min(T[v2].dist, Cv1,v2) = min (∞ ,2) = 2
V1 1 0 0
T[v3].dist = min(T[v3].dist, Cv1,v3) = min (∞ ,4) = 2
V2 0 2 V1
V4 0 1 V1
V5 0 ∞ 0
V6 0 ∞ 0
V7 0 ∞ 0
Prim’s Algorithm Example
3. Among all adjacent vertices V2, V3, V4. V1 -> V4 distance is small. So V4 is selected and
declared as known vertex. Its adjacent vertices distance are updated.
• V1 is not examined because it is known vertex.
• No change in V2 , because it has dv = 2 and the edge cost from V 4 -> V2 = 3.
V3 0 2 0
T[v7].dist = min(T[v7].dist, Cv4,v7 ) = min (∞ ,4) = 4
V4 1 1 0
V5 0 7 0
V6 0 8 0
V7 0 4 0
Prim’s Algorithm Example
4. Among all either we can select v2, or
v3 whose dv = 2, smallest among v5, v6 and v7.
• v2 is declared as known vertex.
• Its adjacent vertices are v1, v4 and v5. v1, v Known dv pv
v4 V1 1 0 0
are known vertex, no change in their dv V2 V1
1 2
value.
T[v5].dist = min(T[v5].dist, Cv2,v5) = min V3 0 2 V4
(7 ,10) = 7 V4 1 1 V1
V5 0 7 V4
V6 0 8 V4
V7 0 4 V4
Prim’s Algorithm Example
5. Among all vertices v3‟s dv value is lower so v3 is selected. v3‟s adjacent
vertices are v1, v4 and v6. No changes in v1 and v4.
V3 1 2 V4
V4 1 1 V1
V5 0 6 V7
V6 0 1 V7
V7 1 4 V4
Prim’s Algorithm Example
V1 1 0 0
8. Finally v5 is declared as known vertex. Its
adjacent vertices are v2, v4, and v7, no change in V2 1 2 V1
V4 1 1 V1
The minimum cost of spanning tree is 16.
V5 1 6 V7
Algorithm Analysis V6 1 1 V7
void prims(Table T)
{
vertex v, w;
for( i=1; i<=Numvertex; i++) {
T[i]. known = False;
T[i] . Dist = Infinity;
T[i]. path = 0; }
for(;;) { //let v be the start vertex with the smallest distance T[v] . Dist = 0;
T[v]. known = true;
for each w adjacent to v if(!T[w] . known)
{
T[w] . Dist = Min(T[w]. Dist, Cv,w);
T[w]. path = v; } }
}
Exercise
Convert the graph given below to minimum spanning
tree using prims algorithm.
Review Questions
3 4
2 3 4
4 5
2
9
5 6
1
1
12
8 10
7 8 9
7
The graph contains 9 vertices and 11 edges. So, the minimum spanning tree
formed will be having (9 – 1) = 8 edges.
Weight Source Destination
Sort the weights of the graph 1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Now pick all edges one by one from sorted list of edges
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destinatio
n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Sourc Destination
e
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weight Source Destinatio
n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
12 9 6
Weigh Source Destinatio
t n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8 Cycle is formed because of weight 8. hence it is discarded.
8 1 7
9 5 6
10 8 9
12 9 6
Weigh Source Destination
t
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6 Cycle is formed because of weight 9. hence it is discarded .
10 8 9
12 9 6
Weight Source Destinatio
n
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6 Cycle is formed because of weight 12. hence it is discarded .
10 8 9
12 9 6
Weight Source Destination
1 8 5
2 3 5
3 2 3
4 1 2
4 3 4
5 4 6
7 7 8
8 1 7
9 5 6
10 8 9
Cycle is formed because of weight 12. hence it is discarded.
12 9 6
3 4
2 3 4
5
4
2
6
1 5
1
10
7 8 9
7
v Known dv pv
V1 1 0 0
V2 0 ∞ 0
V3 0 ∞ 0
V4 0 ∞ 0
V1 is taken as source
V5 0 ∞ 0
V6 0 ∞ 0
V7 0 ∞ 0
DIJKSTRA’S ALGORITHM-Example
v Known dv pv
V1 1 0 0
V2 0 2 V1
V3 0 ∞ 0
V4 0 1 V1
V5 0 ∞ 0
2. Now V1 is known vertex, marked as 1. Its adjacent vertices are v 2, v4, pv and dv V6 0 ∞ 0
• 4. Select the vertex which is shortest distance from source v 1. v2 is smallest one. v2 is marked
as known vertex. Its adjacent vertices are v4 ad v5. The distance from v1 to v4 and v5 through
v2 is more comparing with previous value of dv. No change in dv and pv value.
v Known dv pv
V1 1 0 0
V2 1 2 V1
V3 0 3 V4
V4 1 1 V1
V5 0 3 V4
V6 0 9 V4
V7 0 5 V4
DIJKSTRA’S ALGORITHM- Example
• 5. Select the smallest vertex from source. v 3 and v5 are smallest one. Adjacent vertices for v 3 is v1 and v6. v1 is source
• there is no change in dv and pv
• T[v6]. dist = Min (T[v6].dist, T[v3].dist + Cv3, v6) = Min (9 , 3+5) = 8
• dv and pv values are updated. Adjacent vertices for v 5 is v7. No change in dv and pv value.
DIJKSTRA’S ALGORITHM- Example
• 6. Next smallest vertex v7. Its adjacent vertex is v6. T[v6]. dist = Min (T[v6].dist, T[v7].dist +
Cv7, v6) = Min (8 , 5+1) = 6
• dv and pv values are updated.
v Known dv pv
V1 1 0 0
V2 1 2 V1
V3 1 3 V4
V4 1 1 V1
V5 1 3 V4
V6 0 6 V7
V7 1 5 V4
DIJKSTRA’S ALGORITHM- Example
• 7. The last vertex v6 is declared as known. No adjacent vertices for v 6. No updation in the table.
• The shortest distance from source v1 to all vertices. v1 -> v2 = 2
• v1 -> v3 = 3 v1 -> v4 = 1 v1 -> v5 = 3 v1 -> v6 = 6 v1 -> v7 = 5
• Algorithm Analysis
•
void Dijkstra(Table T)
{
Vertex v, w; for( ; ;)
{
v = smallest unknown distance vertex;
if( v = = NotAVertex) break;
T[v]. kown = True;
for each w adjacent to v if(!T[w].known)
if(T[v].Dist + Cvw < T[w]. Dist)
{/* update w*/
Decrease(T[w]. Dist to T[v].Dist + Cvw);
T[w]. path = v;
}} }
Exercise
Using Dijkstra’s Algorithm, find the shortest distance from source
vertex ‘S’ to remaining vertices in the following graph
Review Questions
•In Facebook, users are considered to be the vertices and if they are
friends then there is an edge running between them. Facebook’s Friend
suggestion algorithm uses graph theory. Facebook is an example
of undirected graph.
Applications of graphs
•In World Wide Web, web pages are considered to be the vertices.
There is an edge from a page u to other page v if there is a link of
page v on page u. This is an example of Directed graph. It was the
basic idea behind Google Page Ranking Algorithm.
Capacities
Capacities
Any valid flow can be decomposed into flow paths and circulations
– s → a → b → t: 11
– s → c → a → b → t: 1
– s → c → d → b → t: 7
– s → c → d → t: 4
Flow Decomposition
A B 11
7 H
3 5
2 12 6
9
C G 6
4 11
10 13
20 I
D 4 E
Flow Decomposition
Start flow at 0
“While there’s room for more flow, push more flow across the
network!”
While there’s some path from s to t, none of whose edges are saturated
Push more flow along the path until some edge is saturated
B C
3 4
1
A D
2
4
2
2
E F
Flow /
Capacity
Example1
Flow / Capacity
Residual Capacity
Example1
0/2
B C
3/3 2 0/4
4
0 1/1
A 0 2/2 0 D
2 3/4
0/2 2/2 1
E F
0
Flow / Capacity
Residual Capacity
Hash Functions
The total possible number of hash functions for n items assigned to m
positions in a table (n < m) is mn.
The number of perfect hash functions is equal to the number
(m−n )!
With 50 elements and a 100-position array, we would have a total of
10050 hash functions and about 1094 perfect hash functions (about 1 in a
million).
Most of the perfect hashes are impractical and cannot be expressed in a
simple formula.
Hash Functions(Contd...)
Division
Hash functions must guarantee that the value they produce is a valid index to the table
A fairly easy way to ensure this is to use modular division, and divide the keys by the size
of the table, so
This works best if the table size is a prime number, but if not, we can
use h(K) = (K mod p) mod TSize for a prime p > Tsize
However, nonprimes work well for the divisor provided they do not have any prime factors
less than 20.
The division method is frequently used when little is known about the keys
Hash Function Contd...
Folding
In folding, the keys are divided into parts which are then combined (or “folded”)
together and often transformed into the address.
Two types of folding are used, shift folding and boundary folding
In shift folding, the parts are placed underneath each other and then processed (for
example, by adding).
Using a Social Security number, say 123-45-6789, we can divide it into three parts -
123, 456, and 789 – and add them to get 1368.
This can then be divided modulo TSize to get the address
With boundary folding, the key is visualized as being written on a piece of paper
and folded on the boundaries between the parts.
Hash Functions Contd...
Folding (continued)
The result is that alternating parts of the key are reversed, so the Social Security
number part would be 123, 654, 789, totaling 1566.
As can be seen, in both versions, the key is divided into even length parts of
some fixed size, plus any leftover digits.
Then these are added together and the result is divided modulo the table size
Consequently this is very fast and efficient, especially if bit strings are used
instead of numbers.
With character strings, one approach is to exclusively-or the individual character
together and use the result.
In this way, h(“abcd”) = “a” ⋁ “b” ⋁ “c” ⋁ “d”
Hash Function Contd...
Folding (continued)
However, this is limited, because it will only generate values between 0 and
127.
A better approach is to use chunks of characters, where each chunk has as
many characters as bytes in an integer.
On the IBM PC, integers are often 2 bytes long, so h(“abcd”) = “ab” ⋁
“cd”, which would then be divided modulo Tsize.
Hash Function Contd...
Mid-Square Function
In the mid-square approach, the numeric value of the key is squared and the
middle part is extracted to serve as the address.
If the key is non-numeric, some type of preprocessing needs to be done to create a
numeric value, such as folding.
Since the entire key participates in generating the address, there is a better chance
of generating different addresses for different keys.
So if the key is 3121, 31212 = 9,740,641, and if the table has 1000 locations,
h(3121) = 406, which is the middle part of 31212.
In application, powers of two are more efficient for the table size and the middle
of the bit string of the square of the key is used.
Assuming a table size of 1024, 31212 is represented by the bit string
1001010 0101000010 1100001, and the key, 322, is in italics.
Hash Functions Contd...
Extraction
In the extraction approach, the address is derived by using a portion of the key.
Using the SSN 123-45-6789, we could use the first four digits, 1234, the last four
6789, or the first two combined with the last two 1289.
Other combinations are also possible, but each time only a portion of the key is
used.
With careful choice of digits, this may be sufficient for address generation.
For example, some universities give international students ID numbers beginning
with 999; ISBNs start with digits representing the publisher
So these could be excluded from the address generation if the nature
of the data is appropriately limited.
Hash Functions Contd...
Radix Transformation
h(K)
…
key space (e.g., integers, strings)
TableSize –1
Hash Table
simple/fast to compute,
Avoid collisions
have keys distributed evenly among cells.
Collision Resolution
Collision: when two keys map to the same location in the hash table.
Two ways to resolve collisions:
Separate Chaining
Open Addressing (linear probing, quadratic probing, double hashing)
Open Addressing:
Problem: -
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash
table-
50, 700, 76, 85, 92, 73 and 101
Use linear probing technique for collision resolution.
Solution-
The given sequence of keys will be inserted in the hash table as-
Contd...
Step-01:
● Draw an empty hash table.
● For the given hash function, the possible range of hash values is [0, 6].
● So, draw an empty hash table consisting of 7 buckets as-
Contd...
Step-02:
Step-03:
Step -08:
Book has examples on Cichelli’s Method and FHCD for minimal perfect
hash functions for small number of strings.
Hash Functions for Extendible Files
1. The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hash table of length 10
using open addressing with hash function h(k) = k mod 10 and linear probing. What is the resultant
hash table?
2. Consider the table below which shows the cost of allocating 5 jobs to 5 machines.
Machine
A B C D EF
Job 1 22 30 26 16 2
2 27 29 28 20 32
3 33 25 21 29 23
4 24 24 30 19 26
5 30 33 32 37 31
Which jobs should be allocated to which machines so as to minimise the total cost?
Collision
If two more keys hashes to the same index, the corresponding records cannot be
stored in the same location. This condition is known as collision.
Characteristics of Good Hashing Function:
It should be Simple to compute.
Number of Collision should be less while placing record in Hash Table.
Hash function with no collision is called as
Perfect hash function.
Hash Function should produce keys which are distributed uniformly in hash
table.
The hash function should depend upon every bit of the key. Thus the hash
function that simply extracts the portion of a key is not suitable
Review Questions
What is the best definition of a collision in a hash table?
a) Two entries are identical except for their keys
b) Two entries with different data have the exact same key
c) Two entries with different keys have the same exact hash value
d) Two entries with the exact same key have different hash values
Advantages
1. More number of elements can be inserted using array of Link List
Disadvantages
1. It requires more pointers, which occupies more memory space.
2.Search takes time. Since it takes time to evaluate Hash Function and
also to traverse the List
Review Questions
h0(23) = (23 % 7) % 7 = 2
h0(13) = (13 % 7) % 7 = 6
h0(21) = (21 % 7) % 7 = 0
h0(14) = (14 % 7) % 7 = 0 collision
h1(14) = (0 + 1^2) % 7 = 1
h0(7) = (7 % 7) % 7 = 0 collision
h1(7) = (0 + 1^2) % 7 = 1 collision
h-1(7) = (0 – 1^2) % 7 = -1
NORMALIZE: (-1 + 7) % 7 = 6 collision
h2(7) = (0 + 2^2) % 7 = 4
h0(8) = (8 % 7)%7 = 1 collision
h1(8) = (1 + 1^2) % 7 = 2 collision
h-1(8) = (1 – 1^2) % 7 = 0 collision
h2(8) = (1 + 2^2) % 7 = 5
h0(15) = (15 % 7)%7 = 1 collision Quadratic probing is better than
h1(15) = (1 + 1^2) % 7 = 2 collision linear probing because it
h-1(15) = (1 – 1^2) % 7 = 0 collision
h2(15) = (1 + 2^2) % 7 = 5 collision eliminates primary clustering.
h-2(15) = (1 – 2^2) % 7 = -3
NORMALIZE: (-3 + 7) % 7 = 4 collision
2. Quadratic Probing
Limitation:
• at most half of the table can be used as alternative locations to
resolve collisions.
• This means that once the table is more than half full, it's difficult to
find an empty spot. This new problem is known as secondary
clustering because elements that hash to the same hash key will
always probe the same alternative cells.
3.Double Hashing
• Double hashing uses the idea of applying a second hash function to the
key when a collision occurs. The result of the second hash function will
be the number of positions forms the point of collision to insert.
There are a couple of requirements for the second function:
• It must never evaluate to 0 must make sure that all cells can be probed.
Hi(X)=(Hash(X)+i*Hash2(X))mod Tablesize
• A popular second hash function is:
Hash2 (key) = R - (key % R) where R is a prime number that is
smaller than the size of the table.
3.Double Hashing -Example
Review Questions
Consider a hash table of size seven, with starting index zero, and a hash function
(3x + 4)mod7. Assuming the hash table is initially empty, which of the following is
the contents of the table when the sequence 1, 3, 8, 10 is inserted into the table
using closed hashing? Note that ‘_’ denotes an empty location in the table.
a) 8, _, _, _, _, _, 10
b) 1, 8, 10, _, _, _, 3
c) 1, _, _, _, _, _,3
d) 1, 10, 8, _, _, _, 3
Extendible Hashing
• Extendible Hashing is a mechanism for altering the size of the hash
table to accommodate new entries when buckets overflow.
• Common strategy in internal hashing is to double the hash table and
rehash each entry.
• However, this technique is slow, because writing all pages to disk is
too expensive. Therefore, instead of doubling the whole hash table, we
use a directory of pointers to buckets, and double the number of
buckets by doubling the directory, splitting just the bucket that
overflows.
• Since the directory is much smaller than the file, doubling it is much
cheaper. Only one page of keys and pointers is split.
Extendible Hashing
Extendible hashing is a type of hash system which treats a hash as a
bit string and uses a trie for bucket lookup. Because of the
hierarchical nature of the system, re-hashing is an incremental operation
(done one bucket at a time, as needed).
Extendible hashing - Example
Extendible hashing - Example
Extendible hashing - Example
Extendible hashing - Example
Review Questions