08 Hash Tables

1.
Analysis of Algorithms HashTables Andres Mendez-Vazquez October 3, 2014 1 / 81

2.
Outline 1 Basic datastructures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 2 / 81

3.
First: About BasicData Structures Remark It is quite interesting to notice that many data structures actually share similar operations!!! Yes If you think them as ADT 3 / 81

4.
First: About BasicData Structures Remark It is quite interesting to notice that many data structures actually share similar operations!!! Yes If you think them as ADT 3 / 81

5.
Examples Search(S,k) Example: Search ina BST 8 3 1 6 4 7 10 14 13 k=7 4 / 81

6.
Examples Insert(S,x) Example: Insert ina linked list CA EDB ﬁrstNode NULL K 5 / 81

7.
And Again Delete(S,x) Example: Deletein a BST 8 3 1 4 7 10 14 13 8 3 1 6 4 7 10 14 13 Delete = 6 6 / 81

8.
Basic data structuresand operations. Therefore This are basic structures, it is up to you to read about them. Chapter 10 Cormen’s book 7 / 81

9.

10.
Hash tables: Concepts Definition Ahash table or hash map T is a data structure, most commonly an array, that uses a hash function to efficiently map certain identifiers of keys (e.g. person names) to associated values. Advantages They have the advantage of having a expected complexity of operations of O(1 + α) Still, be aware of α However, If you have a large number of keys, U Then, it is impractical to store a table of the size of |U|. Thus, you can use a hash function h : U→{0, 1, ..., m − 1} 9 / 81

11.

12.

13.
When you havea small universe of keys, U Remarks It is not necessary to map the key values. Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(T, k) return T[k] 2 Direct-Address-Search(T, x) T [x.key] = x 3 Direct-Address-Delete(T, x) T [x.key] = NIL 10 / 81

14.
When you havea small universe of keys, U Remarks It is not necessary to map the key values. Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(T, k) return T[k] 2 Direct-Address-Search(T, x) T [x.key] = x 3 Direct-Address-Delete(T, x) T [x.key] = NIL 10 / 81

15.
When you havea large universe of keys, U Then Then, it is impractical to store a table of the size of |U|. You can use a especial function for mapping h : U→{0, 1, ..., m − 1} (1) Problem With a large enough universe U, two keys can hash to the same value This is called a collision. 11 / 81

16.

17.

18.
Collisions This is aproblem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 12 / 81

19.

20.

21.
Hash tables: Chaining APossible Solution Insert the elements that hash to the same slot into a linked list. U (Universe of Keys) 13 / 81

22.

23.
Analysis of hashingwith Chaining: Assumptions Assumptions We have a load factor α = n m , where m is the size of the hash table T, and n is the number of elements to store. Simple uniform hashing property: This means that any of the m slots can be selected. This means that if n = n0 + n1 + ... + nm−1, we have that E(nj ) = α. To simplify the analysis, you need to consider two cases Unsuccessful search Successful search 15 / 81

24.
Analysis of hashingwith Chaining: Assumptions Assumptions We have a load factor α = n m , where m is the size of the hash table T, and n is the number of elements to store. Simple uniform hashing property: This means that any of the m slots can be selected. This means that if n = n0 + n1 + ... + nm−1, we have that E(nj ) = α. To simplify the analysis, you need to consider two cases Unsuccessful search Successful search 15 / 81

25.
Why? After all You arealways looking for keys when Searching Inserting Deleting It is clear that we have two possibilities Finding the key or not ﬁnding the key 16 / 81

26.
Why? After all You arealways looking for keys when Searching Inserting Deleting It is clear that we have two possibilities Finding the key or not ﬁnding the key 16 / 81

27.
For this, wehave the following theorems Theorem 11.1 In a hash table in which collisions are resolved by chaining, an unsuccessful search takes average-case time Θ (1 + α), under the assumption of simple uniform hashing. Theorem 11.2 In a hash table in which collisions are resolved by chaining, a successful search takes average-case time Θ (1 + α) under the assumption of simple uniform hashing. 17 / 81

28.
For this, wehave the following theorems Theorem 11.1 In a hash table in which collisions are resolved by chaining, an unsuccessful search takes average-case time Θ (1 + α), under the assumption of simple uniform hashing. Theorem 11.2 In a hash table in which collisions are resolved by chaining, a successful search takes average-case time Θ (1 + α) under the assumption of simple uniform hashing. 17 / 81

29.
Analysis of hashing:Constant time. Finally These two theorems tell us that if n = O(m) α = n m = O(m) m = O(1) Or search time is constant. 18 / 81

30.
Analysis of hashing:Which hash function? Consider that: Good hash functions should maintain the property of simple uniform hashing! The keys have the same probability 1/m to be hashed to any bucket!!! A uniform hash function minimizes the likelihood of an overﬂow when keys are selected at random. Then: What should we use? If we know how the keys are distributed uniformly at the following interval 0 ≤ k < 1 then h(k) = km . 19 / 81

31.
Analysis of hashing:Which hash function? Consider that: Good hash functions should maintain the property of simple uniform hashing! The keys have the same probability 1/m to be hashed to any bucket!!! A uniform hash function minimizes the likelihood of an overﬂow when keys are selected at random. Then: What should we use? If we know how the keys are distributed uniformly at the following interval 0 ≤ k < 1 then h(k) = km . 19 / 81

32.
What if... Question: What aboutsomething with keys in a normal distribution? 20 / 81

33.
Possible hash functionswhen the keys are natural numbers The division method h(k) = k mod m. Good choices for m are primes not too close to a power of 2. The multiplication method h(k) = m(kA mod 1) with 0 < A < 1. The value of m is not critical. Easy to implement in a computer. 21 / 81

34.
Possible hash functionswhen the keys are natural numbers The division method h(k) = k mod m. Good choices for m are primes not too close to a power of 2. The multiplication method h(k) = m(kA mod 1) with 0 < A < 1. The value of m is not critical. Easy to implement in a computer. 21 / 81

35.
When they arenot, we need to interpreting the keys as natural numbers Keys interpreted as natural numbers Given a string “pt”, we can say p = 112 and t=116 (ASCII numbers) ASCII has 128 possible symbols. Then (128 × 112) + 1280 × 116 = 14452 Nevertheless This is highly dependent on the origins of the keys!!! 22 / 81

36.

37.
Hashing methods: Thedivision method Hash function h(k) = k mod m Problems with some selections m = 2p, h(k) is only the p lowest-order bits. m = 2p − 1, when k is interpreted as a character string interpreted in radix 2p, permuting characters in k does not change the value. It is better to select Prime numbers not too close to an exact power of two. For example, given n = 2000 elements. We can use m = 701 because it is near to 2000/3 but not near a power of two. 24 / 81

38.

39.

40.

41.
Hashing methods: Themultiplication method The multiplication method for creating hash functions has two steps 1 Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of kA. 2 Then, you multiply the value by m an take the ﬂoor, h(k) = m (kA mod 1) . The mod allows to extract that fractional part!!! kA mod 1 = kA − kA , 0 < A < 1. Advantages: m is not critical, normally m = 2p. 26 / 81

42.

43.

44.
Implementing in acomputer First First, imagine that the word in a machine has w bits size and k ﬁts on those bits. Second Then, select an s in the range 0 < s < 2w and assume A = s 2w . Third Now, we multiply k by the number s = A2w . 27 / 81

45.

46.

47.
Example Fourth The result ofthat is r12w + r0, a 2w-bit value word, where the ﬁrst p-most signiﬁcative bits of r0 are the desired hash value. Graphically 28 / 81

48.
Example Fourth The result ofthat is r12w + r0, a 2w-bit value word, where the ﬁrst p-most signiﬁcative bits of r0 are the desired hash value. Graphically extract p bits 28 / 81

49.

50.
However Sooner or Latter Wecan pick up a hash function that does not give us the desired uniform randomized property Thus We are required to analyze the possible clustering of the data by the hash function 30 / 81

51.
However Sooner or Latter Wecan pick up a hash function that does not give us the desired uniform randomized property Thus We are required to analyze the possible clustering of the data by the hash function 30 / 81

52.
Measuring Clustering througha metric C Deﬁnition If bucket i contains ni elements, then C = m n − 1 m i=1 n2 i n − 1 (2) Properties 1 If C = 1, then you have uniform hashing. 2 If C > 1, it means that the performance of the hash table is slowed down by clustering by approximately a factor of C. 3 If C < 1, the spread of the elements is more even than uniform!!! Not going to happen!!! 31 / 81

53.
Measuring Clustering througha metric C Deﬁnition If bucket i contains ni elements, then C = m n − 1 m i=1 n2 i n − 1 (2) Properties 1 If C = 1, then you have uniform hashing. 2 If C > 1, it means that the performance of the hash table is slowed down by clustering by approximately a factor of C. 3 If C < 1, the spread of the elements is more even than uniform!!! Not going to happen!!! 31 / 81

54.
However Unfortunately Hash table donot give a way to measure clustering Thus, table designers They should provide some clustering estimation as part of the interface. Thus The reason the clustering measure works is because it is based on an estimate of the variance of the distribution of bucket sizes. 32 / 81

55.

56.

57.
Thus First If clustering isoccurring, some buckets will have more elements than they should, and some will have fewer. Second There will be a wider range of bucket sizes than one would expect from a random hash function. 33 / 81

58.
Thus First If clustering isoccurring, some buckets will have more elements than they should, and some will have fewer. Second There will be a wider range of bucket sizes than one would expect from a random hash function. 33 / 81

59.
Analysis of C:First, keys are uniformly distributed Consider the following random variable Consider bucket i containing ni elements, with Xij= I{element j lands in bucket i} Then, given ni = n j=1 Xij (3) We have that E [Xij] = 1 m , E X2 ij = 1 m (4) 34 / 81

60.

61.

62.
Next We look atthe dispersion of Xij Var [Xij] = E X2 ij − (E [Xij])2 = 1 m − 1 m2 (5) What about the expected number of elements at each bucket E [ni ] = E   n j=1 Xij   = n m = α (6) 35 / 81

63.
Next We look atthe dispersion of Xij Var [Xij] = E X2 ij − (E [Xij])2 = 1 m − 1 m2 (5) What about the expected number of elements at each bucket E [ni ] = E   n j=1 Xij   = n m = α (6) 35 / 81

64.
Then, we have Becauseindependence of {Xij}, the scattering of ni Var [ni ] = Var   n j=1 Xij   = n j=1 Var [Xij] = nVar [Xij] 36 / 81

65.
Then What about therange of the possible number of elements at each bucket? Var [ni ] = n m − n m2 = α − α m But, we have that E n2 i = E   n j=1 X2 ij + n j=1 n k=1,k=j XijXik   (7) Or E n2 i = n m + n j=1 n k=1,k=j 1 m2 (8) 37 / 81

66.

67.

68.
Thus We re-express therange on term of expected values of ni E n2 i = n m + n (n − 1) m2 (9) Then E n2 i − E [ni ]2 = n m + n (n − 1) m2 − n2 m2 = n m − n m2 = α − α m 38 / 81

69.
Thus We re-express therange on term of expected values of ni E n2 i = n m + n (n − 1) m2 (9) Then E n2 i − E [ni ]2 = n m + n (n − 1) m2 − n2 m2 = n m − n m2 = α − α m 38 / 81

70.
Then Finally, we havethat E n2 i = α 1 − 1 m + α2 (10) 39 / 81

71.
Then, we havethat Now we build an estimator of the mean of n2 i which is part of C 1 n m i=1 n2 i (11) Thus E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = m n α 1 − 1 m + α2 = 1 α α 1 − 1 m + α2 = 1 − 1 m + α 40 / 81

72.
Then, we havethat Now we build an estimator of the mean of n2 i which is part of C 1 n m i=1 n2 i (11) Thus E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = m n α 1 − 1 m + α2 = 1 α α 1 − 1 m + α2 = 1 − 1 m + α 40 / 81

73.
Finally We can plugback on C using the expected value E [C] = m n − 1 E m i=1 n2 i n − 1 = m n − 1 1 − 1 m + α − 1 = m n − 1 n m − 1 m = m n − 1 n − 1 m = 1 41 / 81

74.
Explanation Using a hashtable that enforce a uniform distribution in the buckets We get that C = 1 or the best distribution of keys 42 / 81

75.
Now, we havea really horrible hash function ≡ It hits only one of every b buckets Thus E [Xij] = E X2 ij = b m (12) Thus, we have E [ni ] = αb (13) Then, we have E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = αb − b m + 1 43 / 81

76.

77.

78.
Finally We can plugback on C using the expected value E [C] = m n − 1 E m i=1 n2 i n − 1 = m n − 1 αb − b m + 1 − 1 = m n − 1 nb m − b m = m n − 1 b (n − 1) m = b 44 / 81

79.
Explanation Using a hashtable that enforce a uniform distribution in the buckets We get that C = b > 1 or a really bad distribution of the keys!!! Thus, you only need the following to evaluate a hash function 1 n m i=1 n2 i (14) 45 / 81

80.
Explanation Using a hashtable that enforce a uniform distribution in the buckets We get that C = b > 1 or a really bad distribution of the keys!!! Thus, you only need the following to evaluate a hash function 1 n m i=1 n2 i (14) 45 / 81

81.

82.
A Possible Solution:Universal Hashing Issues In practice, keys are not randomly distributed. Any ﬁxed hash function might yield retrieval Θ(n) time. Goal To ﬁnd hash functions that produce uniform random table indexes irrespective of the keys. Idea To select a hash function at random from a designed class of functions at the beginning of the execution. 47 / 81

83.

84.

85.
Hashing methods: Universalhashing Example Set of hash functions Choose a hash function randomly (At the beginning of the execution) HASH TABLE 48 / 81

86.
Deﬁnition of universalhash functions Deﬁnition Let H = {h : U → {0, 1, ..., m − 1}} be a family of hash functions. H is called a universal family if ∀x, y ∈ U, x = y : Pr h∈H (h(x) = h(y)) ≤ 1 m (15) Main result With universal hashing the chance of collision between distinct keys k and l is no more than the 1 m chance of collision if locations h(k) and h(l) were randomly and independently chosen from the set {0, 1, ..., m − 1}. 49 / 81

87.
Deﬁnition of universalhash functions Deﬁnition Let H = {h : U → {0, 1, ..., m − 1}} be a family of hash functions. H is called a universal family if ∀x, y ∈ U, x = y : Pr h∈H (h(x) = h(y)) ≤ 1 m (15) Main result With universal hashing the chance of collision between distinct keys k and l is no more than the 1 m chance of collision if locations h(k) and h(l) were randomly and independently chosen from the set {0, 1, ..., m − 1}. 49 / 81

88.
Hashing methods: Universalhashing Theorem 11.3 Suppose that a hash function h is chosen randomly from a universal collection of hash functions and has been used to hash n keys into a table T of size m, using chaining to resolve collisions. If key k is not in the table, then the expected length E[nh(k)] of the list that key k hashes to is at most the load factor α = n m . If key k is in the table, then the expected length E[nh(k)] of the list containing key k is at most 1 + α. Corollary 11.4 Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n INSERT, SEARCH, and DELETE operations O(m) INSERT operations. 50 / 81

89.
Hashing methods: Universalhashing Theorem 11.3 Suppose that a hash function h is chosen randomly from a universal collection of hash functions and has been used to hash n keys into a table T of size m, using chaining to resolve collisions. If key k is not in the table, then the expected length E[nh(k)] of the list that key k hashes to is at most the load factor α = n m . If key k is in the table, then the expected length E[nh(k)] of the list containing key k is at most 1 + α. Corollary 11.4 Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n INSERT, SEARCH, and DELETE operations O(m) INSERT operations. 50 / 81

90.
Example of UniversalHash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Deﬁne the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 51 / 81

91.
Example of UniversalHash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Deﬁne the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 51 / 81

92.
Example: Universal hashfunctions Example p = 977, m = 50, a and b random numbers ha,b(k) = ((ak + b) mod p) mod m 52 / 81

93.
Example of keydistribution Example, mean = 488.5 and dispersion = 5 53 / 81

94.
Example with 10keys Universal Hashing Vs Division Method 54 / 81

95.

96.

97.

98.
Another Example: MatrixMethod Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and deﬁne h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    58 / 81

99.

100.

101.
Proof of beinga Universal Family First fix assume that you have two different keys l = m Without loosing generality assume the following 1 li = mi ⇒ li = 0 and mi = 1 2 lj = mj ∀j = i Thus The column i does not contribute to the final answer of h (l) because of the zero!!! Now Imagine that we fix all the other columns in h, thus there is only one answer for h (l) 59 / 81

102.

103.

104.
Now For ith column Thereare 2b possible columns when changing the ones and zeros Thus, given the randomness of the zeros and ones The probability that we get the zero column          0 0 ... ... 0          (16) is equal 1 2b (17) with h (l) = h (m) 60 / 81

105.
Now For ith column Thereare 2b possible columns when changing the ones and zeros Thus, given the randomness of the zeros and ones The probability that we get the zero column          0 0 ... ... 0          (16) is equal 1 2b (17) with h (l) = h (m) 60 / 81

106.
Then We get theprobability P (h (l) = h (m)) ≤ 1 2b (18) 61 / 81

107.
Implementation of thecolumn*vector mod 2 Code i n t product ( i n t row , i n t v e c t o r ){ i n t i = row & v e c t o r ; i = i − (( i >> 1) & 0x55555555 ) ; i = ( i & 0x33333333 ) + (( i >> 2) & 0x33333333 ) ; i = ( ( ( i + ( i >> 4)) & 0x0F0F0F0F ) ∗ 0x01010101 ) >> 24; r e t u r n i & i & 0x00000001 ; } 62 / 81

108.
Advantages of universalhashing Advantages Universal hashing provides good results on average, independently of the keys to be stored. Guarantees that no input will always elicit the worst-case behavior. Poor performance occurs only when the random choice returns an ineﬃcient hash function; this has a small probability. 63 / 81

109.
Open addressing Deﬁnition All theelements occupy the hash table itself. What is it? We systematically examine table slots until either we ﬁnd the desired element or we have ascertained that the element is not in the table. Advantages The advantage of open addressing is that it avoids pointers altogether. 64 / 81

110.

111.

112.
Insert in Openaddressing Extended hash function to probe Instead of being ﬁxed in the order 0, 1, 2, ..., m − 1 with Θ (n) search time Extend the hash function to h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1} This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1) A permutation of 0, 1, 2, ..., m − 1 65 / 81

113.
Hashing methods inOpen Addressing HASH-INSERT(T, k) 1 i = 0 2 repeat 3 j = h (k, i) 4 if T [j] == NIL 5 T [j] = k 6 return j 7 else i = i + 1 8 until i == m 9 error “Hash Table Overﬂow” 66 / 81

114.
Hashing methods inOpen Addressing HASH-SEARCH(T,k) 1 i = 0 2 repeat 3 j = h (k, i) 4 if T [j] == k 5 return j 6 i = i + 1 7 until T [j] == NIL or i == m 8 return NIL 67 / 81

115.

116.
Linear probing: Deﬁnitionand properties Hash function Given an ordinary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = h (k) + i mod m, (19) Sequence of probes Given key k, we ﬁrst probe T[h (k)], then T[h (k) + 1] and so on until T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1]. Distinct probes Because the initial probe determines the entire probe sequence, there are m distinct probe sequences. 69 / 81

117.

118.

119.
Linear probing: Definitionand properties Disadvantages Linear probing suffers of primary clustering. Long runs of occupied slots build up increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability i+1 m . Long runs of occupied slots tend to get longer, and the average search time increases. 70 / 81

120.
Example Example using keysuniformly distributed It was generated using the division method Then 71 / 81

121.
Example Example using keysuniformly distributed It was generated using the division method Then 71 / 81

122.
Example Example using Gaussiankeys It was generated using the division method Then 72 / 81

123.
Example Example using Gaussiankeys It was generated using the division method Then 72 / 81

124.

125.
Quadratic probing: Definitionand properties Hash function Given an auxiliary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = (h (k) + c1i + c2i2 ) mod m, (20) where c1, c2 are auxiliary constants Sequence of probes Given key k, we first probe T[h (k)], later positions probed are offset by amounts that depend in a quadratic manner on the probe number i. The initial probe determines the entire sequence, and so only m distinct probe sequences are used. 74 / 81

126.
Quadratic probing: Definitionand properties Hash function Given an auxiliary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = (h (k) + c1i + c2i2 ) mod m, (20) where c1, c2 are auxiliary constants Sequence of probes Given key k, we first probe T[h (k)], later positions probed are offset by amounts that depend in a quadratic manner on the probe number i. The initial probe determines the entire sequence, and so only m distinct probe sequences are used. 74 / 81

127.
Quadratic probing: Deﬁnitionand properties Advantages This method works much better than linear probing, but to make full use of the hash table, the values of c1,c2, and m are constrained. Disadvantages If two keys have the same initial probe position, then their probe sequences are the same, since h(k1, 0) = h(k2, 0) implies h(k1, i) = h(k2, i). This property leads to a milder form of clustering, called secondary clustering. 75 / 81

128.
Quadratic probing: Deﬁnitionand properties Advantages This method works much better than linear probing, but to make full use of the hash table, the values of c1,c2, and m are constrained. Disadvantages If two keys have the same initial probe position, then their probe sequences are the same, since h(k1, 0) = h(k2, 0) implies h(k1, i) = h(k2, i). This property leads to a milder form of clustering, called secondary clustering. 75 / 81

129.

130.
Double hashing: Definitionand properties Hash function Double hashing uses a hash function of the form h(k, i) = (h1(k) + ih2(k)) mod m, (21) where i = 0, 1, ..., m − 1 and h1, h2 are auxiliary hash functions (Normally for a Universal family) Sequence of probes Given key k, we first probe T[h1(k)], successive probe positions are offset from previous positions by the amount h2(k) mod m. Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in two ways upon the key k, since the initial probe position, the offset, or both, may vary. 77 / 81

131.
Double hashing: Definitionand properties Hash function Double hashing uses a hash function of the form h(k, i) = (h1(k) + ih2(k)) mod m, (21) where i = 0, 1, ..., m − 1 and h1, h2 are auxiliary hash functions (Normally for a Universal family) Sequence of probes Given key k, we first probe T[h1(k)], successive probe positions are offset from previous positions by the amount h2(k) mod m. Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in two ways upon the key k, since the initial probe position, the offset, or both, may vary. 77 / 81

132.
Double hashing: Deﬁnitionand properties Advantages When m is prime or a power of 2, double hashing improves over linear or quadratic probing in that Θ(m2) probe sequences are used, rather than Θ(m) since each possible (h1(k), h2(k)) pair yields a distinct probe sequence. The performance of double hashing appears to be very close to the performance of the “ideal” scheme of uniform hashing. 78 / 81

133.
Why? Look At This 79/ 81

134.
Analysis of OpenAddressing Theorem 11.6 Given an open-address hash table with load factor α = n m < 1, the expected number of probes in an unsuccessful search is at most 1 1−α assuming uniform hashing. Corollary Inserting an element into an open-address hash table with load factor˛ requires at most 1 1−α probes on average, assuming uniform hashing. Theorem 11.8 Given an open-address hash table with load factor α < 1, the expected number of probes in a successful search is at most 1 α ln 1 1−α assuming uniform hashing and assuming that each key in the table is equally likely to be searched for. 80 / 81

135.

136.

137.
Excercises From Cormen’s book,chapters 11 11.1-2 11.2-1 11.2-2 11.2-3 11.3-1 11.3-3 81 / 81

08 Hash Tables

More Related Content

What's hot

Viewers also liked

Similar to 08 Hash Tables

More from Andres Mendez-Vazquez

Recently uploaded

08 Hash Tables