KEMBAR78
Hashing in datastructure | PPTX
R. Pavithra
I-Msc(it)
 Another important and widely useful technique for
implementing dictionaries
 Constant time per operation (on the average)
 Worst case time proportional to the size of the set for each
operation (just like array and chain implementation)
 Use hash function to map keys into positions in a hash
table Ideally
 If element e has key k and h is hash function, then e is
stored in position h(k) of table
 To search for e, compute h(k) to locate position. If no
element, dictionary does not contain e.
Dictionary Student Records
 Keys are ID numbers (951000 - 952000), no more
than 100 students
 Hash function: h(k) = k-951000 maps ID into
distinct table positions 0-1000
 array table[1001]
...
0 1 2 3 1000
hash table
buckets
 If key range too large, use hash table with fewer
buckets and a hash function which maps multiple keys
to same bucket:
h(k1) =  = h(k2): k1 and k2 have collision at slot 
 Popular hash functions: hashing by division
h(k) = k%D, where D number of buckets in hash table
 Example: hash table with 11 buckets
h(k) = k%11
80  3 (80%11= 3), 40  7, 65  10
58  3 collision!
 Two classes:
 (1) Open hashing, a.k.a. separate
chaining
 (2) Closed hashing, a.k.a. open
addressing
 Difference has to do with whether
collisions are stored outside the table
(open hashing) or whether collisions
result in storing one of the records at
another slot in the table (closed hashing)
 Associated with closed hashing is a rehash strategy:
“If we try to place x in bucket h(x) and find it occupied,
find alternative location h1(x), h2(x), etc. Try each in order,
if none empty table is full,”
 h(x) is called home bucket
 Simplest rehash strategy is called linear hashing
hi(x) = (h(x) + i) % D
 In general, our collision resolution strategy is to generate
a sequence of hash table slots (probe sequence) that can
hold the record; test each slot until find empty one
(probing)
0
2
3
4
5
6
7
1
b
a
c
Where do we insert d? 3 already filled
Probe sequence using linear hashing:
h1(d) = (h(d)+1)%8 = 4%8 = 4
h2(d) = (h(d)+2)%8 = 5%8 = 5*
h3(d) = (h(d)+3)%8 = 6%8 = 6
etc.7, 0, 1, 2
Wraps around the beginning of the
table! d
 Test for membership: findItem
 Examine h(k), h1(k), h2(k), …, until we find k or an
empty bucket or home bucket
 If no deletions possible, strategy works.
 If we reach empty bucket, cannot be sure that k is not
somewhere else and empty bucket was occupied when
k was inserted.
 Need special placeholder deleted, to distinguish bucket
that was never used from one that once held a value.
 May need to reorganize table after many deletions.
 Consider: h(x) = x%16
 poor distribution, not very random
 depends solely on least significant four bits of key
 Better, mid-square method
 if keys are integers in range 0,1,…,K , pick integer C such
that DC2 about equal to K2, then
h(x) = x2/C % D
extracts middle r bits of x2, where 2
r
=D (a base-D digit)
 better, because most or all of bits of key contribute to
result
 Folding Method:
int h(String x, int D) {
int i, sum;
for (sum=0, i=0; i<x.length(); i++)
sum+= (int)x.charAt(i);
return (sum%D);
}
 sums the ASCII values of the letters in the string
 ASCII value for “A” =65; sum will be in range 650-900 for 10
upper-case letters; good when D around 100, for example
 order of chars in string has no effect
 Each bucket in the hash table is the head of
a linked list.
 All elements that hash to a particular bucket
are placed on that bucket’s linked list.
 Records within a bucket can be ordered in
several ways by order of insertion, by key
value order, or by frequency of access order.
 Worst case performance is O(n) for both
 Number of operations for hashing
 23 6 8 10 23 5 12 4 9 19
 D=9
 h(x) = x % D
 Draw the 11 entry hash table for hashing the
keys 12, 44, 13, 88, 23, 94, 11, 39, 20 using the
function (2i+5) mod 11, closed hashing,
linear probing
 Pseudo-code for listing all identifiers in a
hash table in lexicographic order, using
open hashing, the hash function h(x) = first
character of x. What is the running time.
T
H
A
N
K
Y
O
u

Hashing in datastructure

  • 1.
  • 2.
     Another importantand widely useful technique for implementing dictionaries  Constant time per operation (on the average)  Worst case time proportional to the size of the set for each operation (just like array and chain implementation)  Use hash function to map keys into positions in a hash table Ideally  If element e has key k and h is hash function, then e is stored in position h(k) of table  To search for e, compute h(k) to locate position. If no element, dictionary does not contain e.
  • 3.
    Dictionary Student Records Keys are ID numbers (951000 - 952000), no more than 100 students  Hash function: h(k) = k-951000 maps ID into distinct table positions 0-1000  array table[1001] ... 0 1 2 3 1000 hash table buckets
  • 4.
     If keyrange too large, use hash table with fewer buckets and a hash function which maps multiple keys to same bucket: h(k1) =  = h(k2): k1 and k2 have collision at slot   Popular hash functions: hashing by division h(k) = k%D, where D number of buckets in hash table  Example: hash table with 11 buckets h(k) = k%11 80  3 (80%11= 3), 40  7, 65  10 58  3 collision!
  • 5.
     Two classes: (1) Open hashing, a.k.a. separate chaining  (2) Closed hashing, a.k.a. open addressing  Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing)
  • 6.
     Associated withclosed hashing is a rehash strategy: “If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x), h2(x), etc. Try each in order, if none empty table is full,”  h(x) is called home bucket  Simplest rehash strategy is called linear hashing hi(x) = (h(x) + i) % D  In general, our collision resolution strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing)
  • 7.
    0 2 3 4 5 6 7 1 b a c Where do weinsert d? 3 already filled Probe sequence using linear hashing: h1(d) = (h(d)+1)%8 = 4%8 = 4 h2(d) = (h(d)+2)%8 = 5%8 = 5* h3(d) = (h(d)+3)%8 = 6%8 = 6 etc.7, 0, 1, 2 Wraps around the beginning of the table! d
  • 8.
     Test formembership: findItem  Examine h(k), h1(k), h2(k), …, until we find k or an empty bucket or home bucket  If no deletions possible, strategy works.  If we reach empty bucket, cannot be sure that k is not somewhere else and empty bucket was occupied when k was inserted.  Need special placeholder deleted, to distinguish bucket that was never used from one that once held a value.  May need to reorganize table after many deletions.
  • 9.
     Consider: h(x)= x%16  poor distribution, not very random  depends solely on least significant four bits of key  Better, mid-square method  if keys are integers in range 0,1,…,K , pick integer C such that DC2 about equal to K2, then h(x) = x2/C % D extracts middle r bits of x2, where 2 r =D (a base-D digit)  better, because most or all of bits of key contribute to result
  • 10.
     Folding Method: inth(String x, int D) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charAt(i); return (sum%D); }  sums the ASCII values of the letters in the string  ASCII value for “A” =65; sum will be in range 650-900 for 10 upper-case letters; good when D around 100, for example  order of chars in string has no effect
  • 11.
     Each bucketin the hash table is the head of a linked list.  All elements that hash to a particular bucket are placed on that bucket’s linked list.  Records within a bucket can be ordered in several ways by order of insertion, by key value order, or by frequency of access order.
  • 12.
     Worst caseperformance is O(n) for both  Number of operations for hashing  23 6 8 10 23 5 12 4 9 19  D=9  h(x) = x % D
  • 13.
     Draw the11 entry hash table for hashing the keys 12, 44, 13, 88, 23, 94, 11, 39, 20 using the function (2i+5) mod 11, closed hashing, linear probing  Pseudo-code for listing all identifiers in a hash table in lexicographic order, using open hashing, the hash function h(x) = first character of x. What is the running time.
  • 14.