Hashing_Unit4.pptx Data Structures and Algos

Unit 3. Hashing
Prof.Vedashree Gokhale
SKNCOE, Dept of Computer Engineering

TABLES: Hashing
 Hash functions balance the efficiency of direct
access with better space efficiency. For example,
hash function will take numbers in the domain of
SSN’s, and map them into the range of 0 to 10,000.
3482
1201
546208102
541253562
f(x)
f(x)
Hash Function Map: The function f(x) will take SSNs and return indexes in a range
we can use for a practical array.

Where hashing is helpful?
 Any where from schools to department stores or
manufactures can use hashing method to simple
and easy to insert and delete or search for a
particular record.

Compare to Binary Search?
 Hashing make it easy to add and delete elements
from the collection that is being searched.
 Providing an advantage over binary search.
 Since binary search must ensure that the entire
list stay sorted when elements are added or
deleted.

How does hashing work?
 Example: suppose, the Tractor company sell all
kind of tractors with various stock numbers,
prices, and other details. They want us to store
information about each tractor in an inventory
so that they can later retrieve information about
any particular tractor simply by entering its
stock number.

 Suppose the information about each tractor is
an object of the following form, with the stock
number stored in the key field:
 struct Tractor
 {
 int key; // The stock number
 double cost; // The price, in dollar
 int horsepower; // Size of engine
 };

 Suppose we have 50 different stock number
and if the stock numbers have values ranging
from 0 to 49, we could store the records in an
array of the following type, placing stock
number “j” in location data[ j ].
 If the stock numbers ranging from 0 to 4999,
we could use an array with 5000 components.
But that seems wasteful since only a small
fraction of array would be used.

 It is bad to use an array with 5000
components to store and search for a
particular elements among only 50 elements.
 If we are clever, we can store the records in a
relatively small array and yet retrieve
particular stock numbers much faster than we
would by serial search.

 Suppose the stock numbers will be these: 0,
100, 200, 300, … 4800, 4900
 In this case we can store the records in an
array called data with only 50 components.
The record with stock number “j” can be
stored at this location:
 data[ j / 100]
 The record for stock number 4900 is stored in
array component data[49]. This general
technique is called HASHING.

Key & Hash function
 In our example the key was the stock number
that was stored in a member variable called
key.
 Hash function maps key values to array
indexes. Suppose we name our hash function
hash.
 If a record has the key value of j then we will
try to store the record at location
data[hash(j)], hash(j) was this expression: j /
100

Basic terminologies in Hashing
 Hash table
A data structure used to store and retrieve data
faster. Every entry in hash table is made using
hash function.
Hash function:
Function that convert key to array position is called
hash function

 Bucket: Hash function maps several dictionary
entries in the hash table. Each position of the
hash table is called bucket
 Collision : Collision occurs when the hash
function returns same address for more than one
record.
 Probe: Calculation of address and testing for
success is known as probe.

 Synonym: The set of keys that get mapped to the same
location are called as synonym.
 Overflow: When hash table becomes full and new
record is to be inserted then the hash table is said to be
overflow.
 Load factor and load density:
α=n/sb
b= number of buckets
s= size of bucket
n=number of keys to be mapped

Hash Functions
 Division Method
 54, 88 102 75 are to be placed in hash table
 Thus 88%10 = 8. 88 is palced at position 8 in hash
table

25
The Division Method
 Idea:
 Map a key k into one of the m slots by
taking the remainder of k divided by m
h(k) = k mod m
 Advantage:
 fast, requires only one operation
 Disadvantage:
 Certain values of m are bad, e.g.,
 power of 2
 non-prime numbers

26
The Multiplication Method
Idea:
 Multiply key k by a constant A, where 0 < A < 1
 Extract the fractional part of kA
 Multiply the fractional part by m
 Take the floor of the result
h(k) = = m (k A mod 1)
 Disadvantage: Slower than division method
 Advantage: Value of m is not critical, e.g., typically 2p
fractional part of kA = kA - kA

 Extraction
 Mid square
 Folding
 Fold shift
 Fold boundary

Collision resolution
strategies

31
Separate Chaining
 The hash table is implemented as an array of linked lists.
 Inserting an item, r, that hashes at index i is simply insertion into the
linked list at position i.
 Synonyms are chained in the same linked list.

32
Separate Chaining
 The hash table is implemented as an array of linked lists.
 Inserting an item, r, that hashes at index i is simply insertion into the
linked list at position i.
 Synonyms are chained in the same linked list.

33
Separate Chaining (cont’d)
 Retrieval of an item, r, with hash address, i, is simply retrieval from
the linked list at position i.
 Deletion of an item, r, with hash address, i, is simply deleting r from
 Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a
hash table of size 7 using separate chaining with the hash function:
h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0 collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1 collision

34
Separate Chaining (cont’d)
 Retrieval of an item, r, with hash address, i, is simply retrieval from
 Deletion of an item, r, with hash address, i, is simply deleting r from
 Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a
hash table of size 7 using separate chaining with the hash function:
h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0 collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1 collision

Linear probing (linear open
addressing)
 Open addressing ensures that all elements are
stored directly into the hash table, thus it
attempts to resolve collisions using various
methods.
 Linear Probing resolves collisions by placing
the data into the next open slot in the table.

Linear Probing – Get And
Insert
 divisor = b (number of buckets) = 17.
 Home bucket = key % 17.
0 4 8 12 16
• Insert pairs whose keys are 6, 12, 34, 29, 28, 11,
23, 7, 0, 33, 30, 45
6 12 29
34 28 11
23 7
0 33
30
45

Linear Probing – Delete
 Delete(0)
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12 29
34 28 11
23 7
45 33
30
• Search cluster for pair (if any) to fill vacated bucket.
0 4 8 12 16
6 12 29
34 28 11
23 7
45 33
30

Linear Probing – Delete(34)
 Search cluster for pair (if any) to fill vacated bucket.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12 29
0 28 11
23 7 33
30
45
0 4 8 12 16
6 12 29
0 28 11
23 7 33
30
45
0 4 8 12 16
6 12 29
28 11
23 7
0 33
30
45

Linear Probing – Delete(29)
 Search cluster for pair (if any) to fill vacated bucket.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12 11
34 28
23 7
0 33
30
45
0 4 8 12 16
6 12 11
34 28
23 7
0 33
30
45
0 4 8 12 16
6 12 11
34 28
23 7
0 33
30 45

Performance Of Linear
Probing
 Worst-case find/insert/erase time is (n), where n is
the number of pairs in the table.
 This happens when all pairs are in the same cluster.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45

Expected Performance
  = loading density = (number of pairs)/b.
  = 12/17.
 Sn = expected number of buckets examined in a
successful search when n is large
 Un = expected number of buckets examined in a
unsuccessful search when n is large
 Time to put and remove is governed by Un.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45

Problem of Linear Probing
 Identifiers tend to cluster together
 Adjacent cluster tend to coalesce
 Increase the search time

Quadratic Probing
 Linear probing searches buckets (H(x)+i2
)%b
 Quadratic probing uses a quadratic function of
i as the increment
 Examine buckets H(x), (H(x)+i2
)%b, (H(x)-i2
)%b,
for 1<=i<=(b-1)/2
 b is a prime number of the form 4j+3, j is an
integer

Random Probing
 Random Probing works incorporating with
random numbers.
 H(x):= (H’(x) + S[i]) % b
 S[i] is a table with size b-1
 S[i] is a random permuation of integers [1,b-1].

Rehashing
 Rehashing: Try H1, H2, …, Hm in sequence if
collision occurs. Here Hi is a hash function.
 Double hashing is one of the best methods for
dealing with collisions.
 If the slot is full, then a second hash function is
calculated and combined with the first hash
function.
 H(k, i) = (H1(k) + i H2(k) ) % m

Summary:
Hash Table Design
 Performance requirements are given, determine
maximum permissible loading density. Hash
functions must usually be custom-designed for the
kind of keys used for accessing the hash table.
 We want a successful search to make no more than
10 comparisons (expected).
 Sn ~ ½(1 + 1/(1 – ))
  <= 18/19

Hashing_Unit4.pptx Data Structures and Algos

More Related Content

Similar to Hashing_Unit4.pptx Data Structures and Algos

More from snehalkulkarni78

Recently uploaded

Hashing_Unit4.pptx Data Structures and Algos

Editor's Notes