KEMBAR78
Hashing in Data Structure and analysis of Algorithms | PPT
1
HASHING
2
Hashing
• Mathematical concept
– To define any number as set of numbers in
given interval
– To cut down part of number
– Used in discreet maths, e.g graph theory, set
theory
– Used in Searching technique
– Used in encryption methods
3
Hash Functions and Hash
Tables
• Hashing has 2 major components
– Hash function h
– Hash Table Data Structure of size N
• A hash function h maps keys (a identifying element of
record set) to hash value or hash key which refers to
specific location in Hash table
• Example:
h(x) = x mod N
is a hash function for integer keys
• The integer h(x) is called the hash value of key x
4
Hash Functions and Hash Tables
• A hash table data structure is an array or array
type ADTof some fixed size, containing the keys.
• An array in which records are not stored
consecutively - their place of storage is
calculated using the key and a hash function
Key hash
function
array
index
5
• Hashed key: the result of applying a hash function to a
key
• Keys and entries are scattered throughout the array
• Contains the main advantages of both Arrays and Trees
• Mainly the topic of hashing depends upon the two main
factors / parts
(a) Hash Function (b) Collision Resolution
• Table Size is also an factor (miner) in Hashing, which is
0 to tablesize-1.
6
Table Size
• Hash table size
– Should be appropriate for the hash function used
– Too big will waste memory; too small will
increase collisions and may eventually force
rehashing (copying into a larger table)
7
Example
• We design a hash table for
a dictionary storing items
(SSN, Name), where SSN
(social security number) is a
nine-digit positive integer
• The actual data is not
stored in hash table
• Pin points the location of
actual data or set of data
• Our hash table uses an
array of size N = 10,000 and
the hash function
h(x) = last four digits of x




0
1
2
3
4
9997
9998
9999
…
451-229-0004
981-101-0002
200-751-9998
025-612-0001
8
Hash Function
• The mapping of keys into the table is called Hash
Function
• A hash function,
– Ideally, it should distribute keys and entries evenly
throughout the table
– It should be easy and quick to compute.
– It should minimize collisions, where the position
given by the hash function is already occupied
– It should be applicable to all objects
9
• Different types of hash functions are used for the
mapping of keys into tables.
(a) Division Method
(b) Mid-square Method
(c) Folding Method
10
1. Division Method
• Choose a number m larger than the number n of keys
in k.
• The number m is usually chosen to be a prime no.
• The hash function H is defined as,
H(k) = k(mod m) or H(k) = k(mod m) + 1
• Denotes the remainder, when k is divided by m
• 2nd formula is used when range is from 1 to m.
11
• Example:
Elements are: 3205, 7148, 2345
Table size: 0 – 99 (prime)
m = 97 (prime)
H(3205)= 4, H(7148)=67, H(2345)=17
• For 2nd formula add 1 into the remainders.
12
2. Folding Method
• The key k is partitioned into no. of parts
• Then add these parts together and ignoring the
last carry.
• One can also reverse the first part before
adding (right or left justified. Mostly right)
H(k) = k1 + k2 + ………. + kn
13
• Example:
H(3205)=32+05=37 or H(3250)=32+50=82
H(7148)=71+43=19 or H(7184)=71+84=55
H(2345)=23+45=77 or H(2354)=23+54=68
14
3. Mid-Square Method
• The key k is squared. Then the hash function H is
defined as
H(k) = l
• The l is obtained by deleting the digits from both
ends of K2.
• The same position must be used for all the keys.
15
• Example:
k: 3205 7148 2345
k2: 10272025 51093904 5499025
H(k): 72 93 99
• 4th and 5th digits have been selected. From the
right side.
16
Collision Resolution Strategies
• If two keys map on the same hash table index then we
have a collision.
• As the number of elements in the table increases, the
likelihood of a collision increases - so make the table
as large as practical
• Collisions may still happen, so we need a collision
resolution strategy
17
• Two approaches are used to resolve collisions.
(a) Separate chaining: chain together several keys/entries
in each position.
(b) Open addressing: store the key/entry in a different
position.
• Probing: If the table position given by the hashed
key is already occupied, increase the position by
some amount, until an empty position is found
18
Open Addressing
• Types of open addressing are
1. Linear Probing
2. Quadratic Probing
3. Double Hashing.
19
1. Linear Probing
• Locations are checked from the hash location k to the
end of the table and the element is placed in the first
empty slot
• If the bottom of the table is reached, checking “wraps
around” to the start of the table. Modulus is used for
this purpose
• Thus, if linear probing is used, these routines must
continue down the table until a match or empty location
is found
20
• Linear probing is guaranteed to find a slot for the
insertion if there still an empty slot in the table.
• Even though the hash table size is a prime number is
probably not an appropriate size; the size should be at
least 30% larger than the maximum number of elements
ever to be stored in the table.
• If the load factor is greater than 50% - 70% then the
time to search or to add a record will increase.
21
H(k)=h, h+1, h+2, h+3,……, h+I
• However, linear probing also tends to promote
clustering within the table.
1 2 3 4 5 6 7 8
22
2. Quadratic Probing
• Quadratic probing is a solution to the clustering
problem
– Linear probing adds 1, 2, 3, etc. to the original
hashed key
– Quadratic probing adds 12, 22, 32 etc. to the original
hashed key
• However, whereas linear probing guarantees that all
empty positions will be examined if necessary,
quadratic probing does not
23
• If the table size is prime, this will try approximately
half the table slots.
• More generally, with quadratic probing, insertion may
be impossible if the table is more than half-full!
H(k) = h, h+1, h+4, h+5, h+6,……, h+i2
24
3. Double Hashing
• 2nd hash function H’ is used to resolve the collision.
• Here H’(k) = h’ ≠ m
• Therefore we can search the locations with addresses,
H’(k) = h, h+h’, h+2h’, h+3h’,…….
• If m is prime, then this sequence access all the
locations.
25
Double Hashing
• Double hashing uses a
secondary hash function
d(k) and handles
collisions by placing an
item in the first available
cell of the series
(h + jd(k)) mod N
for j = 0, 1, … , N - 1
• The secondary hash
function d(k) cannot
have zero values
• The table size N must be
a prime to allow probing
of all the cells
• Common choice of
compression map for the
secondary hash function:
d2(k) = k mod q
where
– q < N
– q is a prime
• The possible values for
d2(k) are
1, 2, … , q
26
• Consider a hash
table storing integer
keys that handles
collision with double
hashing
– N = 13
– h(k) = k mod 13
– d(k) = k mod 7
• Insert keys 18, 41,
22, 44, 59, 32, 31,
73, in this order
Example of Double Hashing
0 1 2 3 4 5 6 7 8 9 10 11 12
59 41 18 32 44 8 22 44 11
0 1 2 3 4 5 6 7 8 9 10 11 12
k h(k ) d (k ) Probes
18 5 9 5
41 2 8 2
22 9 10 9
44 5 5 5 7
59 7 10 7 10 0
32 6 4 6
31 5 8 5 8
73 8 11 8 11
27
Applications of Hashing
• Compilers use hash tables to keep track of declared
variables
• A hash table can be used for on-line spelling checkers
— if misspelling detection (rather than correction) is
important, an entire dictionary can be hashed and
words checked in constant time
• Game playing programs use hash tables to store seen
positions, thereby saving computation time if the
position is encountered again
• Hash functions can be used to quickly check for
inequality — if two elements hash to different values
they must be different

Hashing in Data Structure and analysis of Algorithms

  • 1.
  • 2.
    2 Hashing • Mathematical concept –To define any number as set of numbers in given interval – To cut down part of number – Used in discreet maths, e.g graph theory, set theory – Used in Searching technique – Used in encryption methods
  • 3.
    3 Hash Functions andHash Tables • Hashing has 2 major components – Hash function h – Hash Table Data Structure of size N • A hash function h maps keys (a identifying element of record set) to hash value or hash key which refers to specific location in Hash table • Example: h(x) = x mod N is a hash function for integer keys • The integer h(x) is called the hash value of key x
  • 4.
    4 Hash Functions andHash Tables • A hash table data structure is an array or array type ADTof some fixed size, containing the keys. • An array in which records are not stored consecutively - their place of storage is calculated using the key and a hash function Key hash function array index
  • 5.
    5 • Hashed key:the result of applying a hash function to a key • Keys and entries are scattered throughout the array • Contains the main advantages of both Arrays and Trees • Mainly the topic of hashing depends upon the two main factors / parts (a) Hash Function (b) Collision Resolution • Table Size is also an factor (miner) in Hashing, which is 0 to tablesize-1.
  • 6.
    6 Table Size • Hashtable size – Should be appropriate for the hash function used – Too big will waste memory; too small will increase collisions and may eventually force rehashing (copying into a larger table)
  • 7.
    7 Example • We designa hash table for a dictionary storing items (SSN, Name), where SSN (social security number) is a nine-digit positive integer • The actual data is not stored in hash table • Pin points the location of actual data or set of data • Our hash table uses an array of size N = 10,000 and the hash function h(x) = last four digits of x     0 1 2 3 4 9997 9998 9999 … 451-229-0004 981-101-0002 200-751-9998 025-612-0001
  • 8.
    8 Hash Function • Themapping of keys into the table is called Hash Function • A hash function, – Ideally, it should distribute keys and entries evenly throughout the table – It should be easy and quick to compute. – It should minimize collisions, where the position given by the hash function is already occupied – It should be applicable to all objects
  • 9.
    9 • Different typesof hash functions are used for the mapping of keys into tables. (a) Division Method (b) Mid-square Method (c) Folding Method
  • 10.
    10 1. Division Method •Choose a number m larger than the number n of keys in k. • The number m is usually chosen to be a prime no. • The hash function H is defined as, H(k) = k(mod m) or H(k) = k(mod m) + 1 • Denotes the remainder, when k is divided by m • 2nd formula is used when range is from 1 to m.
  • 11.
    11 • Example: Elements are:3205, 7148, 2345 Table size: 0 – 99 (prime) m = 97 (prime) H(3205)= 4, H(7148)=67, H(2345)=17 • For 2nd formula add 1 into the remainders.
  • 12.
    12 2. Folding Method •The key k is partitioned into no. of parts • Then add these parts together and ignoring the last carry. • One can also reverse the first part before adding (right or left justified. Mostly right) H(k) = k1 + k2 + ………. + kn
  • 13.
    13 • Example: H(3205)=32+05=37 orH(3250)=32+50=82 H(7148)=71+43=19 or H(7184)=71+84=55 H(2345)=23+45=77 or H(2354)=23+54=68
  • 14.
    14 3. Mid-Square Method •The key k is squared. Then the hash function H is defined as H(k) = l • The l is obtained by deleting the digits from both ends of K2. • The same position must be used for all the keys.
  • 15.
    15 • Example: k: 32057148 2345 k2: 10272025 51093904 5499025 H(k): 72 93 99 • 4th and 5th digits have been selected. From the right side.
  • 16.
    16 Collision Resolution Strategies •If two keys map on the same hash table index then we have a collision. • As the number of elements in the table increases, the likelihood of a collision increases - so make the table as large as practical • Collisions may still happen, so we need a collision resolution strategy
  • 17.
    17 • Two approachesare used to resolve collisions. (a) Separate chaining: chain together several keys/entries in each position. (b) Open addressing: store the key/entry in a different position. • Probing: If the table position given by the hashed key is already occupied, increase the position by some amount, until an empty position is found
  • 18.
    18 Open Addressing • Typesof open addressing are 1. Linear Probing 2. Quadratic Probing 3. Double Hashing.
  • 19.
    19 1. Linear Probing •Locations are checked from the hash location k to the end of the table and the element is placed in the first empty slot • If the bottom of the table is reached, checking “wraps around” to the start of the table. Modulus is used for this purpose • Thus, if linear probing is used, these routines must continue down the table until a match or empty location is found
  • 20.
    20 • Linear probingis guaranteed to find a slot for the insertion if there still an empty slot in the table. • Even though the hash table size is a prime number is probably not an appropriate size; the size should be at least 30% larger than the maximum number of elements ever to be stored in the table. • If the load factor is greater than 50% - 70% then the time to search or to add a record will increase.
  • 21.
    21 H(k)=h, h+1, h+2,h+3,……, h+I • However, linear probing also tends to promote clustering within the table. 1 2 3 4 5 6 7 8
  • 22.
    22 2. Quadratic Probing •Quadratic probing is a solution to the clustering problem – Linear probing adds 1, 2, 3, etc. to the original hashed key – Quadratic probing adds 12, 22, 32 etc. to the original hashed key • However, whereas linear probing guarantees that all empty positions will be examined if necessary, quadratic probing does not
  • 23.
    23 • If thetable size is prime, this will try approximately half the table slots. • More generally, with quadratic probing, insertion may be impossible if the table is more than half-full! H(k) = h, h+1, h+4, h+5, h+6,……, h+i2
  • 24.
    24 3. Double Hashing •2nd hash function H’ is used to resolve the collision. • Here H’(k) = h’ ≠ m • Therefore we can search the locations with addresses, H’(k) = h, h+h’, h+2h’, h+3h’,……. • If m is prime, then this sequence access all the locations.
  • 25.
    25 Double Hashing • Doublehashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series (h + jd(k)) mod N for j = 0, 1, … , N - 1 • The secondary hash function d(k) cannot have zero values • The table size N must be a prime to allow probing of all the cells • Common choice of compression map for the secondary hash function: d2(k) = k mod q where – q < N – q is a prime • The possible values for d2(k) are 1, 2, … , q
  • 26.
    26 • Consider ahash table storing integer keys that handles collision with double hashing – N = 13 – h(k) = k mod 13 – d(k) = k mod 7 • Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order Example of Double Hashing 0 1 2 3 4 5 6 7 8 9 10 11 12 59 41 18 32 44 8 22 44 11 0 1 2 3 4 5 6 7 8 9 10 11 12 k h(k ) d (k ) Probes 18 5 9 5 41 2 8 2 22 9 10 9 44 5 5 5 7 59 7 10 7 10 0 32 6 4 6 31 5 8 5 8 73 8 11 8 11
  • 27.
    27 Applications of Hashing •Compilers use hash tables to keep track of declared variables • A hash table can be used for on-line spelling checkers — if misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time • Game playing programs use hash tables to store seen positions, thereby saving computation time if the position is encountered again • Hash functions can be used to quickly check for inequality — if two elements hash to different values they must be different