HASHING
Module 5
1
Hashing
• Hashing is a technique used to Perform
Insertion, deletion & search operations in the
constant average time by implementing Hash
table Data Structure .
• It is used to Index and Retrieve Items in a
Database.
Types of Hashing
• Static Hashing .
• Dynamic Hashing .
➢Static Hashing :
• It is the hash function maps search key
value to a fixed set of locations.
➢Dynamic Hashing :
• The Hash Table can grow to handle more
Items. The associated Hash Function must
Change as the table grows.
Hash Functions and Hash
Tables
• Hashing has 2 major components
– Hash function h
– Hash Table Data Structure of size N
• A hash function h maps keys (a identifying element of
record set) to hash value or hash key which refers to
specific location in Hash table
• Example:
k h(k) = kod N
is a hash function for integer keys
• The integer h(x) is called the hash value of key x
4
Hash Functions and Hash Tables
• A hash table data structure is an array or array
type ADTof some fixed size, containing the keys.
• An array in which records are not stored
consecutively - their place of storage is
calculated using the key and a hash function
hash array
Key index
function
5
• Hashed key: the result of applying a hash function to a
key
• Keys and entries are scattered throughout the array
• Contains the main advantages of both Arrays and Trees
• Mainly the topic of hashing depends upon the two main
factors / parts
(a) Hash Function (b) Collision Resolution
• Table Size is also an factor (miner) in Hashing, which is
0 to tablesize-1.
6
Table Size
• Hash table size
– Should be appropriate for the hash function used
– Too big will waste memory; too small will
increase collisions and may eventually force
rehashing (copying into a larger table)
7
Example
• We design a hash table for
a dictionary storing items 0
(SSN, Name), where SSN 1 025-612-0001
(social security number) is a 2 981-101-0002
3
nine-digit positive integer
4 451-229-0004
• The actual data is not
…
stored in hash table
9997
• Pin points the location of
9998 200-751-9998
actual data or set of data 9999
• Our hash table uses an
array of size N = 10,000 and
the hash function
h(x) = last four digits of x 8
Hash Function
• The mapping of keys into the table is called Hash
Function
• A hash function,
– Ideally, it should distribute keys and entries evenly
throughout the table
– It should be easy and quick to compute.
– It should minimize collisions, where the position
given by the hash function is already occupied
– It should be applicable to all objects
9
Types of Hash Function
• Different types of hash functions are used for
the mapping of keys into tables.
(a) Division Method
(b) Mid-square Method
(c) Folding Method
10
1. Division Method
• Choose a number m larger than the number n of keys
in k.
• The number m is usually chosen to be a prime no.
• The hash function H is defined as,
H(k) = k(mod m)
• Denotes the remainder, when k is divided by m
• 2nd formula is used when range is from 1 to m.
11
• Example:
Elements are: 3205, 7148, 2345
Table size: 0 – 99 (prime)
m = 97 (prime)
H(3205)= 4, H(7148)=67, H(2345)=17
2. Folding Method
• The key k is partitioned into no. of parts
• Then add these parts together and ignoring the
last carry.
• One can also reverse the first part before
adding (right or left justified. Mostly right)
H(k) = k1 + k2 + ………. + kn
13
• Example:
H(3205)=32+05=37 or H(3250)=32+50=82
H(7148)=71+48=119 or
H(7184)=71+84=155
H(2345)=23+45=68 or H(2354)=23+54=77
14
3. Mid-Square Method
• The key k is squared. Then the hash function
H is defined as
H(k) = l
• The l is obtained by deleting the digits from
both ends of K2.
• The same position must be used for all the
keys.
• Example:
k: 3205 7148 2345
k2: 10272025 51093904 5499025
H(k): 72 93 99
• 4th and 5th digits have been selected. From the
right side.
16
Converting keys to integers
• h(ABCD)=‘A’+’B’+’C’+’D’=266
• In this method if key k is a string ,the hash
value can be obtained by converting this string
into integer.
• This can be done by adding all ASCII values
of each character in the string
C function
18
Hashing(Collision)
0
U
(universe of keys)
h(k1)
h(k4)
K k1 k4
(actual k2 collision h(k2)=h(k5)
keys) k5
k3
h(k3)
m–1
Collision Resolution Techniques
• If two keys map on the same hash table index then we
have a collision.
• As the number of elements in the table increases, the
likelihood of a collision increases - so make the table
as large as practical
• Collisions may still happen, so we need a collision
resolution strategy
20
• Two approaches are used to resolve
collisions.
(a) Separate chaining: chain together
several keys/entries in each position.
(b) Open addressing/Linear Probing: store
the key/entry in a different position.
21
1. Linear Probing/open addressing
• Locations are checked from the hash location k to the
end of the table and the element is placed in the first
empty slot
• If the bottom of the table is reached, checking “wraps
around” to the start of the table. Modulus is used for
this purpose
• Thus, if linear probing is used, these routines must
continue down the table until a match or empty location
is found
22
• Linear probing is guaranteed to find a slot for the
insertion if there still an empty slot in the table.
• Even though the hash table size is a prime number is
probably not an appropriate size; the size should be at
least 30% larger than the maximum number of elements
ever to be stored in the table.
• If the load factor is greater than 50% - 70% then the
time to search or to add a record will increase.
23
H(k)=h, h+1, h+2, h+3,……, h+I
• However, linear probing also tends to promote
clustering within the table.
1 2 3 4 5 6 7 8
24
2.Separate Chaining
Separate Chaining
• The idea behind separate chaining is to
implement the array as a linked list called a
chain.
• Separate chaining is one of the most popular
and commonly used techniques in order to
handle collisions.
• Hence, the conclusion is that in separate
chaining, if two different elements have the
same hash value then we store both the
elements in the same linked list one after the
26
other.
Collision Resolution by Chaining
0
U
(universe of keys)
k1 k4
k1
k4
K
(actual k2 k6
keys)
k5 k5 k2 k6
k8 k7
k3
k7 k3
k8
m–1
Comp 122, Fall 2003
Applications of Hashing
• Compilers use hash tables to keep track of declared
variables
• A hash table can be used for on-line spelling checkers
if misspelling detection (rather than correction) is
important, an entire dictionary can be hashed and
words checked in constant time
• Game playing programs use hash tables to store seen
positions, thereby saving computation time if the
position is encountered again
• Hash functions can be used to quickly check for
inequality — if two elements hash to different values
they must be different
28
Dynamic Hashing
• The dynamic hashing method is used to
overcome the problems of static hashing like
bucket overflow.
• In this method, data buckets grow or shrink
as the records increases or decreases. This
method is also known as Extendable hashing
method.
• This method makes hashing dynamic, i.e., it
allows insertion or deletion without resulting
in poor performance.
Dynamic Hashing using
Directories
• Consider the following grouping of keys into
buckets, depending on the prefix of their
hash address:
Dynamic Hashing using
Directories
• The last two bits of 2 and 4 are 00. So it
will go into bucket B0.
• The last two bits of 5 and 6 are 01, so it
will go into bucket B1.
• The last two bits of 1 and 3 are 10, so it
will go into bucket B2. The last two bits of
7 are 11, so it will go into B3.
31
Dynamic Hashing
32
• Insert key 9 with hash address 10001 into
the above structure:
33
34
Priority Queues and Leftist
Trees
35
Priority Queue
• A priority queue is a special type of queue
in which each element is associated with a
priority value.
• And, elements are processed on the basis
of their priority.
• That is, higher priority elements are served
first. However, if elements with the same
priority occur, they are processed based on
first come first serve.
36
Characteristics of Priority queue
• A priority queue is an extension of a queue
that contains the following characteristics:
• Every element in a priority queue has a
priority value associated with it
• The element with the higher priority will be
moved to the top and removed first
• If two elements in a priority queue have the
same priority value, they’ll be arranged using
the FIFO principle
37
Priority queue example
38
Priority Queue Types
1.Single ended priority queue
2.Double ended Priority queue
3.Meldable priority queue
➢ Single ended priority queue:
• Returns an element with min(max)priority.
• Insert an element with arbitrary priority.
• Delete an element with min(max)priority.
39
1.Single ended queue
• Min-priority queue: in a min-priority queue, a
lower priority number is given a higher
priority.
➢Insert an element with arbitrary priority.
➢Delete an item with least priority.
• Max-priority queue: in a max-priority queue,
a higher priority number is given a higher
priority
➢Insert an element with arbitrary priority.
➢Delete an item with max priority
40
Min and max priority queues
41
2. Double ended Priority queue
• A double ended priority queue supports
operations of both max priority queue and
min priority queue.
• The following operations are expected from
double ended priority queue.
42
3.Meldable Priority queues
• Meldable priority queue is an extension of
single-ended priority queue which melds
two priorities together.
• Two data structures that are used for
implementing meldable priority queues are
Leftist trees and binomial heaps
43
Leftist Trees
44
Leftist Trees
45
Types of Leftist Trees
• Height-Biased Leftist trees.
• Weight-Biased Leftist trees.
46
Height Biased Leftist tree
47
Height Biased Leftist tree
• The definition of the HBLT is like:
• A binary tree is a Height Balanced Leftist
Tree (HBLT), if and only if, at every
internal node, the s value of the left child is
greater or equal to the s value of right
child.
48
Height Biased Leftist tree
49
Weight Biased Leftist Tree
50
Weight Biased Leftist Tree
51
Merging of leftist tree
52
Merging of leftist tree
53
54
55
56
Introduction to Efficient BST
Optimal Binary Search Tree
57
Introduction to Efficient BST
• As we know that in binary search tree, the
nodes in the left subtree have lesser value
than the root node and the nodes in the
right subtree have greater value than the
root node.
• We know the key values of each node in
the tree, and we also know the
frequencies of each node in terms of
searching means how much time is
required to search a node.
58
Introduction to Efficient BST
• The frequency and key-value determine
the overall cost of searching a node. The
cost of searching is a very important factor
in various applications.
• The overall cost of searching a node
should be less.
• A binary search tree using which we can
retrieve the data very fast is an efficient
binary search tree.
59
Types of efficient BST
➢Optimal BST(in syllabus)
➢AVL trees
➢Red-black trees
➢Splay trees
60
Optimal BST(Defintion)
• Given a BST with fixed number of
elements and the probabilities of each key,
the average expected cost of accessing
the elements in a tree can be computed.
• An optimal BST or OBST is a binary
search tree which has minimal or least
expected cost.
61
Optimal BST
• Let's understand through an example:
If the keys are 10, 20, 30
how many binary search trees can be made
from the given number of keys?
For example: 10, 20, 30 are the keys, and the
following are the binary search trees that can
be made out from these keys.
62
Optimal BST
• The Formula for calculating the number of
trees:
63
Optimal BST
• The cost required for searching an element
depends on the comparisons to be made to
search an element.
•
• In the above tree, total number of 3
comparisons can be made. The average
number of comparisons can be made as:
64
Optimal BST
• In the above tree, total number of 3
comparisons can be made. The average
number of comparisons can be made as:
65
Optimal BST
• In the above tree, total number of 2
comparisons can be made.
66
Optimal BST
• In the above tree, the total number of
comparisons can be made as 3.
Therefore, the average number of
comparisons that can be made as:
67
Optimal BST
• In the above tree, the total number of
comparisons can be made as 3.
Therefore, the average number of
comparisons that can be made as:
68
Optimal BST
• In the third case, the number of
comparisons is less because the height of
the tree is less, so it's a efficient binary
search tree.
• To find the optimal binary search tree, we
will determine the frequency of searching
a key.
69
Problem:
70
Solution:
71
Solution:
72