KEMBAR78
Hash | PDF | Computer Science | Algorithms
0% found this document useful (0 votes)
20 views5 pages

Hash

Uploaded by

hahue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

Hash

Uploaded by

hahue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2 Data Structure, Hashing, and Heap

• Hashing: Many applications require a dynamic set that supports only the
dictionary operations INSERT, SEARCH, and DELETE.
• The best strategy for searching so far is O(logn) for a sorted input.
• assume each element has a key drawn from the universe set U = {0, 1, ..., m−
1},we can use containers, or bins, each indexed by possible values of the
key.
• Ex. Student ID, University.

• 1: procedure Direct-Address-Search(T, k)
2: return T [k]
3: end procedure
4: procedure Direct-Address-Insert(T, x)
5: T [x.key] = x
6: end procedure
7: procedure Direct-Address-Delete(T, x)
8: T [x.key] = N il
9: end procedure

• Runtime:O(1)
• Downsides:
– If the universe U is large, a table T of size |U | may be impracti-
cal/impossible.
– Ex: student ID and university.
• When the set K of keys stored in a dictionary is much smaller than the
universe U of all possible keys, a hash table requires much less storage
than a direct address. table.

8
• We use hash function: h(k) : U → {0, 1, ..., m − 1}. m is the size of hash
table, m << |U |
• Collision: hash function reduces the size of the array, instead we might
have two keys that hash to the same slot.

• Chain hashing: we place all the elements that hash to the same slot into
the same linked list.

• 1: procedure Chained-Address-Search(T, k)
2: search for an element with key k in list T [h(k)]
3: end procedure
4: procedure Direct-Address-Insert(T, x)
5: Insert x at the head of list T [h(x.key)]
6: end procedure
7: procedure Direct-Address-Delete(T, x)
8: Delete x from the list T [h(x.key)]
9: end procedure

• Worst runtime for insertion is O(1)

• Worst runtime for searching depends on the length of the list.


• We can delete an element in O(1) if the lists are doubly linked./ with single
linked lists, both deletion and search would have the same asymptotic
running time.


• Given a hash table T with m slots that stores n elements, we define the
load factor α for T as n/m, that is, the average number of elements stored
in a chain.
• Worst case Θ(n)

9
• Simple uniform hashing: any given element is equally likely to hash
into any of the m slots, independently of where any other element has
hashed to.
• Theorem: In a hash table in which collisions are resolved by chaining, an
unsuccessful search takes average-case time Θ(1+α) under the assumption
of simple uniform hashing.
• Theorem: In a hash table in which collisions are resolved by chaining, a
successful search takes average-case time Θ(1 + α) under the assumption
of simple uniform hashing

• If the number of hash-table slots is at least proportional to the number of


elements in the table, we have n = O(m) and, consequently, α = n/m =
O(m)/m = O(1) Thus, searching takes constant time on average.
• Ex Find common elements:

– 1: for i ← 1 to n do
2: search for A[i] in B
3: end for

– linear search: O(n) → O(n2 )


– Binary search: (sorted) O(logn) → O(nlogn)
– Hashing: O(1) → O(n)

• 1: for i ← 1 to n do
2: add B[i] to T
3: end for
4: for i ← 1 to n do
5: search for A[i] in T
6: end for

• Prefix example: In: array A, is there a continuous sub-array of A that


sum of all elements of sub array is 0

– {1, 4,-2,-2, 5, −4, 3}


– {1, 5,3,1, 6, 2, 5}

10
• 1: P [1] = A[1]
2: Add P [1] to hash table T .
3: for i ← 2 to n do
4: P [i] = P [i − 1] + A[i]
5: search P [i] in T
6: if Found then return T rue
7: else
8: Add P [i] to hash table T .
9: end if
10: end forreturn F alse

• Hash Function:A good hash function satisfies (approximately) the as-


sumption of simple uniform hashing: each key is equally likely to hash to
any of the m slots, independently of where any other key has hashed to.
• Division method: map a key k into one of m slots by taking the remain-
der of k divided by m, h(k) = k mod m
• Ex.
– m = 12
– k = 100
– h(100) = 100mod12 = 4
• For the division method, we avoid certain values for m,for example
– m = 2p since h(k) will be the p lowest-order bit of k.
– m = 2p − 1 when k is a character string interpreted in radix 2p , since
permutation of characters of k does not change the hash value.
• A prime not too close to an exact power of 2 is often a good choice for m
• Multiplication method: First, we multiply the key k by a constant A
in the range 0 < A < 1 and extract the fractional part of kA. Then,
we multiply this value by m and take the floor of the result. h(k) =
⌊m(kAmod1)⌋

11
• Typically, m = 2p for some integer p
• Ex.
– k = 123
– n = 100
– A = 0.618033
– h(123) = 100(123.0.618033mod1)
– = 100(76.018059mod1)
– = 100(0.018059) = 1

• Java hashcode() method for string: Java hashcode method returns


the hashcode of a string. The hashcode value of an empty string is Zero.
• The formula behind the hashcode is

s[0] ∗ 31(n−1) + s[1] ∗ 31(n−2) + ... + s(n − 2)

where s[i] is the ith character of the string and n is the length of the
string.

12

You might also like