3 June 2022 C-DAC 1
Hashing
Hashing is the solution that can be used in almost all such situations and
performs extremely well compared to above data structures like Array, Linked
List, Balanced BST in practice. With hashing O(1) search time on average
(under reasonable assumptions) and O(n) in worst case.
Hashing is an improvement over Direct Access Table. The idea is to use hash
function that converts a given phone number or any other key to a smaller
number and uses the small number as index in a table called hash table.
3 June 2022 C-DAC 2
Hash Function
A function that converts a given big phone number to a small practical integer
value. The mapped integer value is used as an index in hash table. In simple
terms, a hash function maps a big number or string to a small integer that can be
used as index in hash table.
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely
for each key)
For example for phone numbers a bad hash function is to take first three digits.
A better function is consider last three digits. Please note that this may not be
the best hash function. There may be better ways.
3 June 2022 C-DAC 3
Hash Table
Hash Table:
An array that stores pointers to records corresponding to a given phone number.
An entry in hash table is NIL if no existing phone number has hash function
value equal to the index for the entry.
3 June 2022 C-DAC 4
Types of Hash functions
Division Method.
Mid Square Method.
Folding Method.
Multiplication Method.
3 June 2022 C-DAC 5
Division Method
This is the most simple and easiest method to generate a hash value. The hash
function divides the value k by M and then uses the remainder obtained.
Formula: h(K) = k mod M
Here, k is the key value, and M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are
more uniformly distributed. The hash function is dependent upon the remainder
of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
3 June 2022 C-DAC 6
Division Method
Pros:
This method is quite good for any value of M.
The division method is very fast since it requires only a single division
operation.
Cons:
This method leads to poor performance since consecutive keys map to
consecutive hash values in the hash table.
Sometimes extra care should be taken to chose value of M.
3 June 2022 C-DAC 7
Mid Square Method
It involves two steps to compute the hash value-
1) Square the value of the key k i.e. k2
2) Extract the middle r digits as the hash value.
Formula: h(K) = h(k x k)
Here, k is the key value. The value of k is based on the size of the table.
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits
are required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
3 June 2022 C-DAC 8
Mid Square Method - Example
Suppose a 4-digit seed is taken. seed = 4765
Hence, square of seed is = 4765 * 4765 = 22705225
Now, from this 8-digit number, any four digits are extracted (Say, the middle
four).
So, the new seed value becomes seed = 7052
Now, square of this new seed is = 7052 * 7052 = 49730704
Again, the same set of 4-digits is extracted.
So, the new seed value becomes seed = 7307
This process is repeated as many times as a key is required.
3 June 2022 C-DAC 9
Mid Square Method - Example
3 June 2022 C-DAC 10
Mid Square Method
Pros:
The performance of this method is good as most or all digits of the key value
contribute to the result. This is because all digits in the key contribute to
generating the middle digits of the squared result.
The result is not dominated by the distribution of the top digit or bottom digit of
the original key value.
Cons:
The size of the key is one of the limitations of this method, as the key is of big
size then its square will double the number of digits.
Another disadvantage is that there will be collisions but we can try to reduce
collisions.
3 June 2022 C-DAC 11
Digit Folding Method
This method involves two steps:
1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where
each part has the same number of digits except for the last part that can have
lesser digits than the other parts.
2. Add the individual parts. Hash value is obtained by ignoring last carry if any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,s is obtained by adding the parts of the key k
Example:
k = 12345; k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5 = 51 ie h(K) = 51
3 June 2022 C-DAC 12
Collision Resolution
3 June 2022 C-DAC 13
Collision Resolution
1) Separate chaining:-
This technique creates a linked list to the slot for which collision occurs.
The new key is then inserted in the linked list.
These linked lists to the slots appear like chains.
That is why, this technique is called as separate chaining.
Complexity is O(log n).The data is stored in each of these linked lists.
Data type for storage :- LinkedList[ ] Table; Table = new LinkedList(N), where
N is the table size
3 June 2022 C-DAC 14
Collision Resolution
2) Open chaining (Probing)
(i) Linear probing:- : When collision occurs, scan down the array one cell at a
time looking for an empty cell.
(ii) Quadratic probing:- when an incoming data's hash value indicates it
should be stored in an already-occupied slot or bucket. Spread out the search for
an empty slot – Increment by i^2 instead of i.
3 June 2022 C-DAC 15
Double Hashing
1.Take the key to store on the hash-table.
2.Apply the first hash function h_1h1(key) over key to get the location to store the
key.
3.If the location is empty, place the key on that location.
4.If the location is already filled, apply the second hash function h_2h2(key) in
combination with the first hash function h_1h1(key) to get the new location for the
key.
3 June 2022 C-DAC 16
Double Hashing
3 June 2022 C-DAC 17
Insert Operation
Whenever an element is to be inserted, compute the hash code of the key passed
and locate the index using that hash code as an index in the array.
Use linear probing for empty location, if an element is found at the computed
hash code.
3 June 2022 C-DAC 18
Delete Operation
Whenever an element is to be deleted, compute the hash code of the key passed
and locate the index using that hash code as an index in the array.
Use linear probing to get the element ahead if an element is not found at the
computed hash code.
When found, store a dummy item there to keep the performance of the hash
table intact.
3 June 2022 C-DAC 19