KEMBAR78
Hashing_Unit4.pptx Data Structures and Algos | PPTX
Unit 3. Hashing
Prof.Vedashree Gokhale
SKNCOE, Dept of Computer Engineering
TABLES: Hashing
 Hash functions balance the efficiency of direct
access with better space efficiency. For example,
hash function will take numbers in the domain of
SSN’s, and map them into the range of 0 to 10,000.
3482
1201
546208102
541253562
f(x)
f(x)
Hash Function Map: The function f(x) will take SSNs and return indexes in a range
we can use for a practical array.
Where hashing is helpful?
 Any where from schools to department stores or
manufactures can use hashing method to simple
and easy to insert and delete or search for a
particular record.
Compare to Binary Search?
 Hashing make it easy to add and delete elements
from the collection that is being searched.
 Providing an advantage over binary search.
 Since binary search must ensure that the entire
list stay sorted when elements are added or
deleted.
How does hashing work?
 Example: suppose, the Tractor company sell all
kind of tractors with various stock numbers,
prices, and other details. They want us to store
information about each tractor in an inventory
so that they can later retrieve information about
any particular tractor simply by entering its
stock number.
 Suppose the information about each tractor is
an object of the following form, with the stock
number stored in the key field:
 struct Tractor
 {
 int key; // The stock number
 double cost; // The price, in dollar
 int horsepower; // Size of engine
 };
 Suppose we have 50 different stock number
and if the stock numbers have values ranging
from 0 to 49, we could store the records in an
array of the following type, placing stock
number “j” in location data[ j ].
 If the stock numbers ranging from 0 to 4999,
we could use an array with 5000 components.
But that seems wasteful since only a small
fraction of array would be used.
 It is bad to use an array with 5000
components to store and search for a
particular elements among only 50 elements.
 If we are clever, we can store the records in a
relatively small array and yet retrieve
particular stock numbers much faster than we
would by serial search.
 Suppose the stock numbers will be these: 0,
100, 200, 300, … 4800, 4900
 In this case we can store the records in an
array called data with only 50 components.
The record with stock number “j” can be
stored at this location:
 data[ j / 100]
 The record for stock number 4900 is stored in
array component data[49]. This general
technique is called HASHING.
Key & Hash function
 In our example the key was the stock number
that was stored in a member variable called
key.
 Hash function maps key values to array
indexes. Suppose we name our hash function
hash.
 If a record has the key value of j then we will
try to store the record at location
data[hash(j)], hash(j) was this expression: j /
100
Basic terminologies in Hashing
 Hash table
A data structure used to store and retrieve data
faster. Every entry in hash table is made using
hash function.
Hash function:
Function that convert key to array position is called
hash function
 Bucket: Hash function maps several dictionary
entries in the hash table. Each position of the
hash table is called bucket
 Collision : Collision occurs when the hash
function returns same address for more than one
record.
 Probe: Calculation of address and testing for
success is known as probe.
 Synonym: The set of keys that get mapped to the same
location are called as synonym.
 Overflow: When hash table becomes full and new
record is to be inserted then the hash table is said to be
overflow.
 Load factor and load density:
α=n/sb
b= number of buckets
s= size of bucket
n=number of keys to be mapped
Hash Functions
 Division Method
 54, 88 102 75 are to be placed in hash table
 Thus 88%10 = 8. 88 is palced at position 8 in hash
table
25
The Division Method
 Idea:
 Map a key k into one of the m slots by
taking the remainder of k divided by m
h(k) = k mod m
 Advantage:
 fast, requires only one operation
 Disadvantage:
 Certain values of m are bad, e.g.,
 power of 2
 non-prime numbers
26
The Multiplication Method
Idea:
 Multiply key k by a constant A, where 0 < A < 1
 Extract the fractional part of kA
 Multiply the fractional part by m
 Take the floor of the result
h(k) = = m (k A mod 1)
 Disadvantage: Slower than division method
 Advantage: Value of m is not critical, e.g., typically 2p
fractional part of kA = kA - kA
27
 Extraction
 Mid square
 Folding
 Fold shift
 Fold boundary
Collision resolution
strategies
31
Separate Chaining
 The hash table is implemented as an array of linked lists.
 Inserting an item, r, that hashes at index i is simply insertion into the
linked list at position i.
 Synonyms are chained in the same linked list.
32
Separate Chaining
 The hash table is implemented as an array of linked lists.
 Inserting an item, r, that hashes at index i is simply insertion into the
linked list at position i.
 Synonyms are chained in the same linked list.
33
Separate Chaining (cont’d)
 Retrieval of an item, r, with hash address, i, is simply retrieval from
the linked list at position i.
 Deletion of an item, r, with hash address, i, is simply deleting r from
the linked list at position i.
 Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a
hash table of size 7 using separate chaining with the hash function:
h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0 collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1 collision
34
Separate Chaining (cont’d)
 Retrieval of an item, r, with hash address, i, is simply retrieval from
the linked list at position i.
 Deletion of an item, r, with hash address, i, is simply deleting r from
the linked list at position i.
 Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a
hash table of size 7 using separate chaining with the hash function:
h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0 collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1 collision
Linear probing (linear open
addressing)
 Open addressing ensures that all elements are
stored directly into the hash table, thus it
attempts to resolve collisions using various
methods.
 Linear Probing resolves collisions by placing
the data into the next open slot in the table.
Linear Probing – Get And
Insert
 divisor = b (number of buckets) = 17.
 Home bucket = key % 17.
0 4 8 12 16
• Insert pairs whose keys are 6, 12, 34, 29, 28, 11,
23, 7, 0, 33, 30, 45
6 12 29
34 28 11
23 7
0 33
30
45
Linear Probing – Delete
 Delete(0)
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12 29
34 28 11
23 7
45 33
30
• Search cluster for pair (if any) to fill vacated bucket.
0 4 8 12 16
6 12 29
34 28 11
23 7
45 33
30
Linear Probing – Delete(34)
 Search cluster for pair (if any) to fill vacated bucket.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12 29
0 28 11
23 7 33
30
45
0 4 8 12 16
6 12 29
0 28 11
23 7 33
30
45
0 4 8 12 16
6 12 29
28 11
23 7
0 33
30
45
Linear Probing – Delete(29)
 Search cluster for pair (if any) to fill vacated bucket.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12
34 28 11
23 7
0 33
30
45
0 4 8 12 16
6 12 11
34 28
23 7
0 33
30
45
0 4 8 12 16
6 12 11
34 28
23 7
0 33
30
45
0 4 8 12 16
6 12 11
34 28
23 7
0 33
30 45
Performance Of Linear
Probing
 Worst-case find/insert/erase time is (n), where n is
the number of pairs in the table.
 This happens when all pairs are in the same cluster.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
Expected Performance
  = loading density = (number of pairs)/b.
  = 12/17.
 Sn = expected number of buckets examined in a
successful search when n is large
 Un = expected number of buckets examined in a
unsuccessful search when n is large
 Time to put and remove is governed by Un.
0 4 8 12 16
6 12 29
34 28 11
23 7
0 33
30
45
Problem of Linear Probing
 Identifiers tend to cluster together
 Adjacent cluster tend to coalesce
 Increase the search time
Quadratic Probing
 Linear probing searches buckets (H(x)+i2
)%b
 Quadratic probing uses a quadratic function of
i as the increment
 Examine buckets H(x), (H(x)+i2
)%b, (H(x)-i2
)%b,
for 1<=i<=(b-1)/2
 b is a prime number of the form 4j+3, j is an
integer
Random Probing
 Random Probing works incorporating with
random numbers.
 H(x):= (H’(x) + S[i]) % b
 S[i] is a table with size b-1
 S[i] is a random permuation of integers [1,b-1].
Rehashing
 Rehashing: Try H1, H2, …, Hm in sequence if
collision occurs. Here Hi is a hash function.
 Double hashing is one of the best methods for
dealing with collisions.
 If the slot is full, then a second hash function is
calculated and combined with the first hash
function.
 H(k, i) = (H1(k) + i H2(k) ) % m
Summary:
Hash Table Design
 Performance requirements are given, determine
maximum permissible loading density. Hash
functions must usually be custom-designed for the
kind of keys used for accessing the hash table.
 We want a successful search to make no more than
10 comparisons (expected).
 Sn ~ ½(1 + 1/(1 – ))
  <= 18/19

Hashing_Unit4.pptx Data Structures and Algos

  • 1.
    Unit 3. Hashing Prof.VedashreeGokhale SKNCOE, Dept of Computer Engineering
  • 4.
    TABLES: Hashing  Hashfunctions balance the efficiency of direct access with better space efficiency. For example, hash function will take numbers in the domain of SSN’s, and map them into the range of 0 to 10,000. 3482 1201 546208102 541253562 f(x) f(x) Hash Function Map: The function f(x) will take SSNs and return indexes in a range we can use for a practical array.
  • 10.
    Where hashing ishelpful?  Any where from schools to department stores or manufactures can use hashing method to simple and easy to insert and delete or search for a particular record.
  • 11.
    Compare to BinarySearch?  Hashing make it easy to add and delete elements from the collection that is being searched.  Providing an advantage over binary search.  Since binary search must ensure that the entire list stay sorted when elements are added or deleted.
  • 12.
    How does hashingwork?  Example: suppose, the Tractor company sell all kind of tractors with various stock numbers, prices, and other details. They want us to store information about each tractor in an inventory so that they can later retrieve information about any particular tractor simply by entering its stock number.
  • 13.
     Suppose theinformation about each tractor is an object of the following form, with the stock number stored in the key field:  struct Tractor  {  int key; // The stock number  double cost; // The price, in dollar  int horsepower; // Size of engine  };
  • 14.
     Suppose wehave 50 different stock number and if the stock numbers have values ranging from 0 to 49, we could store the records in an array of the following type, placing stock number “j” in location data[ j ].  If the stock numbers ranging from 0 to 4999, we could use an array with 5000 components. But that seems wasteful since only a small fraction of array would be used.
  • 15.
     It isbad to use an array with 5000 components to store and search for a particular elements among only 50 elements.  If we are clever, we can store the records in a relatively small array and yet retrieve particular stock numbers much faster than we would by serial search.
  • 16.
     Suppose thestock numbers will be these: 0, 100, 200, 300, … 4800, 4900  In this case we can store the records in an array called data with only 50 components. The record with stock number “j” can be stored at this location:  data[ j / 100]  The record for stock number 4900 is stored in array component data[49]. This general technique is called HASHING.
  • 17.
    Key & Hashfunction  In our example the key was the stock number that was stored in a member variable called key.  Hash function maps key values to array indexes. Suppose we name our hash function hash.  If a record has the key value of j then we will try to store the record at location data[hash(j)], hash(j) was this expression: j / 100
  • 18.
    Basic terminologies inHashing  Hash table A data structure used to store and retrieve data faster. Every entry in hash table is made using hash function. Hash function: Function that convert key to array position is called hash function
  • 19.
     Bucket: Hashfunction maps several dictionary entries in the hash table. Each position of the hash table is called bucket  Collision : Collision occurs when the hash function returns same address for more than one record.  Probe: Calculation of address and testing for success is known as probe.
  • 20.
     Synonym: Theset of keys that get mapped to the same location are called as synonym.  Overflow: When hash table becomes full and new record is to be inserted then the hash table is said to be overflow.  Load factor and load density: α=n/sb b= number of buckets s= size of bucket n=number of keys to be mapped
  • 24.
    Hash Functions  DivisionMethod  54, 88 102 75 are to be placed in hash table  Thus 88%10 = 8. 88 is palced at position 8 in hash table
  • 25.
    25 The Division Method Idea:  Map a key k into one of the m slots by taking the remainder of k divided by m h(k) = k mod m  Advantage:  fast, requires only one operation  Disadvantage:  Certain values of m are bad, e.g.,  power of 2  non-prime numbers
  • 26.
    26 The Multiplication Method Idea: Multiply key k by a constant A, where 0 < A < 1  Extract the fractional part of kA  Multiply the fractional part by m  Take the floor of the result h(k) = = m (k A mod 1)  Disadvantage: Slower than division method  Advantage: Value of m is not critical, e.g., typically 2p fractional part of kA = kA - kA
  • 27.
  • 28.
     Extraction  Midsquare  Folding  Fold shift  Fold boundary
  • 30.
  • 31.
    31 Separate Chaining  Thehash table is implemented as an array of linked lists.  Inserting an item, r, that hashes at index i is simply insertion into the linked list at position i.  Synonyms are chained in the same linked list.
  • 32.
    32 Separate Chaining  Thehash table is implemented as an array of linked lists.  Inserting an item, r, that hashes at index i is simply insertion into the linked list at position i.  Synonyms are chained in the same linked list.
  • 33.
    33 Separate Chaining (cont’d) Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list at position i.  Deletion of an item, r, with hash address, i, is simply deleting r from the linked list at position i.  Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a hash table of size 7 using separate chaining with the hash function: h(key) = key % 7 h(23) = 23 % 7 = 2 h(13) = 13 % 7 = 6 h(21) = 21 % 7 = 0 h(14) = 14 % 7 = 0 collision h(7) = 7 % 7 = 0 collision h(8) = 8 % 7 = 1 h(15) = 15 % 7 = 1 collision
  • 34.
    34 Separate Chaining (cont’d) Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list at position i.  Deletion of an item, r, with hash address, i, is simply deleting r from the linked list at position i.  Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a hash table of size 7 using separate chaining with the hash function: h(key) = key % 7 h(23) = 23 % 7 = 2 h(13) = 13 % 7 = 6 h(21) = 21 % 7 = 0 h(14) = 14 % 7 = 0 collision h(7) = 7 % 7 = 0 collision h(8) = 8 % 7 = 1 h(15) = 15 % 7 = 1 collision
  • 35.
    Linear probing (linearopen addressing)  Open addressing ensures that all elements are stored directly into the hash table, thus it attempts to resolve collisions using various methods.  Linear Probing resolves collisions by placing the data into the next open slot in the table.
  • 36.
    Linear Probing –Get And Insert  divisor = b (number of buckets) = 17.  Home bucket = key % 17. 0 4 8 12 16 • Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45 6 12 29 34 28 11 23 7 0 33 30 45
  • 37.
    Linear Probing –Delete  Delete(0) 0 4 8 12 16 6 12 29 34 28 11 23 7 0 33 30 45 0 4 8 12 16 6 12 29 34 28 11 23 7 45 33 30 • Search cluster for pair (if any) to fill vacated bucket. 0 4 8 12 16 6 12 29 34 28 11 23 7 45 33 30
  • 38.
    Linear Probing –Delete(34)  Search cluster for pair (if any) to fill vacated bucket. 0 4 8 12 16 6 12 29 34 28 11 23 7 0 33 30 45 0 4 8 12 16 6 12 29 0 28 11 23 7 33 30 45 0 4 8 12 16 6 12 29 0 28 11 23 7 33 30 45 0 4 8 12 16 6 12 29 28 11 23 7 0 33 30 45
  • 39.
    Linear Probing –Delete(29)  Search cluster for pair (if any) to fill vacated bucket. 0 4 8 12 16 6 12 29 34 28 11 23 7 0 33 30 45 0 4 8 12 16 6 12 34 28 11 23 7 0 33 30 45 0 4 8 12 16 6 12 11 34 28 23 7 0 33 30 45 0 4 8 12 16 6 12 11 34 28 23 7 0 33 30 45 0 4 8 12 16 6 12 11 34 28 23 7 0 33 30 45
  • 40.
    Performance Of Linear Probing Worst-case find/insert/erase time is (n), where n is the number of pairs in the table.  This happens when all pairs are in the same cluster. 0 4 8 12 16 6 12 29 34 28 11 23 7 0 33 30 45
  • 41.
    Expected Performance  = loading density = (number of pairs)/b.   = 12/17.  Sn = expected number of buckets examined in a successful search when n is large  Un = expected number of buckets examined in a unsuccessful search when n is large  Time to put and remove is governed by Un. 0 4 8 12 16 6 12 29 34 28 11 23 7 0 33 30 45
  • 42.
    Problem of LinearProbing  Identifiers tend to cluster together  Adjacent cluster tend to coalesce  Increase the search time
  • 43.
    Quadratic Probing  Linearprobing searches buckets (H(x)+i2 )%b  Quadratic probing uses a quadratic function of i as the increment  Examine buckets H(x), (H(x)+i2 )%b, (H(x)-i2 )%b, for 1<=i<=(b-1)/2  b is a prime number of the form 4j+3, j is an integer
  • 44.
    Random Probing  RandomProbing works incorporating with random numbers.  H(x):= (H’(x) + S[i]) % b  S[i] is a table with size b-1  S[i] is a random permuation of integers [1,b-1].
  • 45.
    Rehashing  Rehashing: TryH1, H2, …, Hm in sequence if collision occurs. Here Hi is a hash function.  Double hashing is one of the best methods for dealing with collisions.  If the slot is full, then a second hash function is calculated and combined with the first hash function.  H(k, i) = (H1(k) + i H2(k) ) % m
  • 46.
    Summary: Hash Table Design Performance requirements are given, determine maximum permissible loading density. Hash functions must usually be custom-designed for the kind of keys used for accessing the hash table.  We want a successful search to make no more than 10 comparisons (expected).  Sn ~ ½(1 + 1/(1 – ))   <= 18/19

Editor's Notes

  • #41 A put that increases the number of pairs in the table involves an unsuccessful search followed by the addition of an element. An unsuccessful remove is essentially an unsuccessful search. A successful remove must also go to the end of the cluster and so is like an unsuccessful search.