0% found this document useful (0 votes)

24 views28 pages

06 - APS - Hash Table

The document provides an overview of hash tables, detailing their structure, advantages, and methods for resolving collisions. It discusses hash functions, including division and multiplication methods, and techniques for collision resolution such as chaining and open addressing. Additionally, it covers probing strategies like linear, quadratic, and double hashing, along with their respective implementations and complexities.

Uploaded by

kgztjmqsss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views28 pages

06 - APS - Hash Table

Uploaded by

kgztjmqsss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Algorithms & data structures

Hash table
Damjan Strnad
2

Hash table
●
a hash table is a data structure that stores key-data
pairs (k,r), where k is the key and r is the associated
data of a table element
●
hash table allows direct access to data through key
values, therefore a natural implementation of a hash
table is by using an array:
– the element index in the array is calculated from the key
3

Hash table
●
if the array is large enough and the keys are integers,
each key maps to a unique array location and we can use
direct addressing (array location k belongs to key k):
– the set of active keys K is a subset of a set of all keys U
and determines the
U
locations with valid T
0 key data
0 6 NIL
pointers to stored 2 1
8 1
elements; other 2
K 1 NIL
locations have value 3 3
3
NIL 7 4 4
4
– such table is not yet 5 NIL
5

a hash table NIL

7
7
8
NIL
4

Hash table
●
when the number of possible keys is bigger than the array
size, we calculate the element address from the key value
using a hash function h : U  {0,...,m-1}, which maps from a
set of keys U into the slots of a hash table T:
– h(k) is the address of element with key k in a hash table

U T
0

h(k1)
k1
K k2
k3 h(k2)=h(k3)
k4 k5 h(k5)

h(k4)

m-1
5

Hash table
●
the advantage of a hash table to a table with direct
addressing is smaller memory consumption (O(|K|) instead
O(|U|)
●
average access time is still O(1), but not for the worst case
●
the disadvantage of a hash table is that two keys can map
to the same slot, which is called a collision
●
the number of collisions can be reduced with a good hash
function that maps keys uniformly across the table
addresses
●
because collisions can occur for |U|>m in any case, we
must use one of techniques for collision resolving
6

Hash functions
●
a good hash function maps each of the keys with equal
probability into one of m slots in a hash table
●
we will assume that keys are natural numbers; when they
are not, we have to transform them into natural numbers:
– example: let the key be a string CLRS. The ASCII values for
individual letters are: C=67, L=76, R=82, S=83. There are
128 values for a 7-bit ASCII, therefore the string CLRS can
be uniquely transformed into a natural number as:
(67 · 1283) + (76 · 1282) + (82 · 1281) + (83 · 1280) = 141 764 947
●
two methods for construction of good hash functions:
– division method
– multiplication method
7

Division method
●
uses the following equation for a hash function:
h(k) = k mod m
●
example: hash table size is m=12, the key is k=100
– h(100) = 100 mod 12 = 4
– values 4,16,28,... (k=4+12i, i=0,1,2,...) map into the
same slot
●
using powers of 2 for m is not always good:
– operation k mod m, where m=2p, returns bottom p bits
of the key; if those are not uniformly distributed among
all possible keys (e.g., postal codes), it will cause poor
dispersion of h(k) and consequently many collisions
●
in practice a good value for m is a prime number that is
not very close to the power of 2
8

Multiplication method
●
uses the following equation for a hash function:
h(k) = ⌊m(kA mod 1)⌋
●
A is a constant from range (0,1), a recommended value is:
A=( √ 5−1)/2≈0,618034
●
(kA mod 1) means we only keep the fractional part of the
product
●
the method advantage is that the value of m is not critical,
the disadvantage is it is slow compared to division method
●
example: m=8, A=13/32, k=21: h(k )=⌊8⋅(21⋅13 mod 1)⌋=
32
=⌊8⋅(8,53125 mod 1)⌋=
=⌊8⋅0,53125⌋=
=⌊ 4,25⌋=
=4
9

Resolving collisions by chaining

●
in chaining, elements that hash to the same slot are
stored in a linked list at that slot
●
each slot contains a pointer to the head of the linked list; if
the slot is empty the pointer is NIL
U T
NIL

k1 k8 NIL
k1
K k8 NIL

k3 k2
k7 k2 k3 NIL
k6 k7
k4 k5 NIL

k6 NIL
NIL

NIL

k4 k5 NIL
10

Resolving collisions by chaining

●
example: a phone book
keys indices key-value pairs
11

Chained hash table operations

● insertion: INSERT-INTO-CHAINED-SLOT(T,x)
– inserts element x into the head of list T[h(key[x])]
– time complexity is O(1) if we assume that x is not yet in T;
if we need to verify that assumption, we must execute the
search
● searching: FIND-IN-CHAINED-SLOT(T,k)
– finds the element with key k in the list T[h(k)]
– worst case time complexity is proportional to the length of
linked list in slot h(k)
● removal: REMOVE-FROM-CHAINED-SLOT(T,x)
– removes element x from list T[h(key[x])]
– if the lists are doubly linked, the time complexity is O(1); if
they are singly linked, we must find the predecessor of x,
for which time complexity equals to that of searching
12

Chained hashing analysis

●
a load factor α for a hash table storing n elements in m
slots is defined as:
n
α=
m
●
α is the average number of elements stored in a chain
(list)
●
the worst case occurs when all elements are in the same
slot; in this case the time complexity of search is O(n), to
which we must add the time of computing the hash
function value
●
average time complexity depends on how well the hash
function h distributes the keys among m slots
13

Chained hashing analysis

●
suppose that each key is equally likely mapped to any of
m slots; this assumption is called simple uniform
hashing
●
theorem: With chained hashing both successful and
unsuccessful search has average time complexity Θ(1+α)
if we assume simple uniform hashing
●
if the hash table does not contain the element with key k,
the search is unsuccessful, otherwise it is successful
●
in expression Θ(1+α) the constant 1 is the time for
computing the hash function and α is the average search
time for a list of length α=n/m
14

Open addressing
●
open addressing is the second technique for resolving
collisions
●
all elements are stored in the same hash table; each slot
contains either a key or a NIL
●
searching is done by systematic inspection of slots until
the sought element is found or it is determined that the
element is not in the table
●
the advantage of open addressing is that we avoid
pointers; the saved memory can be used to enlarge the
hash table
●
hash table size m must be greater than the expected
number of elements n, therefore open addressing is only
used when the latter is known
15

Open addressing – insertion

●
insertion is performed by probing the hash table until we
find a free slot
●
the hash function is extended by using an additional
argument, which is the probing trial counter:
h : U × {0, 1, ... ,m-1}  {0, ... ,m-1}
●
for each key k a probe sequence 〈h(k,0), h(k,1), ... ,
h(k,m-1)〉 is determined:
– h(k,0) is the first possible slot for key k, h(k,1) is the second
possible slot for key k, etc.
●
the probe sequence is a permutation of 〈0, 1, ... , m-1〉,
which means that it contains all m slots of a hash table
16

Open addressing – insertion

● procedure INSERT-INTO-HASH-TABLE(T,k) returns a
slot index into which key k is written or an overflow error if
there is no free slot
INSERT-INTO-HASH-TABLE(T,k)
i ← 0
repeat
j ← h(k,i)
if T[j] = NIL or T[j] = DELETED then % slot j is free
T[j] ← k % insert k into slot j
return j % return slot number
else % slot is occupied, try next one
i ← i + 1
until i = m
error „hash table overflow“
17

Open addressing – search

●
the procedure for finding key k searches the same
sequence of slots as the insertion procedure
●
search is unsuccessful if a free slot is encountered or if all
slots are full but none contains key k
●
worst case time complexity is O(n), where n is the number
of elements stored in the hash table
SEARCH-IN-HASH-TABLE(T,k)
i ← 0
repeat
j ← h(k,i)
if T[j] = k then % key k found in slot j
return j % return slot number
i ← i + 1
until T[j] = NIL or i = m
return NIL % key not found
18

Open addressing – removal

●
when removing an element from a hash table with open
addressing, we need to write a special value at the location of
the removed element; the special value signals that the table
location was not empty all of the time
●
this allows us to continue subsequent searches to the first truly
empty slot, stepping over once occupied locations (if we
stored NIL to the removed element location, all elements
behind it in the probe sequence would become inaccessible)
●
when inserting DELETE-IN-HASH-TABLE(T,k)
i ← 0
new elements repeat
the locations j ← h(k,i)
if T[j] = k then % key found
containing the T[j] ← DELETED % mark deleted
special value return
can be i ← i + 1
until T[j] = NIL or i = m
overwritten error „key not found“
19

Probe sequences
●
uniform hashing is a generalization of simple uniform
hashing in which every of m! possible permutations of
〈0, 1, ... , m-1〉 is selected as a probe sequence with equal
probability
●
uniform hashing is difficult to achieve in practice, so we
approximate it using methods that guarantee at least that
the probe sequence is a permutation of 〈0, 1, ... , m-1〉:
– linear probing
– quadratic probing
– double hashing
20

Linear probing
●
uses the following hash function:
h(k,i) = (h‘(k) + i) mod m; i=0, 1, ..., m-1
●
h‘(k) is an auxiliary hash function which determines the
first inspected slot T[h‘(k)]
●
the probe sequence is 〈T[h‘(k)], T[h‘(k)+1], ... , T[m-1],
T[0], T[1], ..., T[h‘(k)-1]〉
●
implementation of linear probing is simple, but the
problem is emergence of long clusters of occupied slots in
the hash table (i.e., primary clustering), which increases
the average search time
21

Quadratic probing
●
uses the following hash function:
h(k,i) = (h‘(k) + c1i + c2i2) mod m; i=0, 1, ..., m-1
● c1 and c2 are non-zero constants, which must be chosen
so that all hash table slots are addressed:
– for m=2n a good choice is c1=c2=0.5
– for arbitrary m and c1=c2=0.5 instead of „mod m“ we use
the first higher power of 2 and skip values h(k,i)≥m
●
the first probed slot is T[h‘(k)]; the index of next probed
slots changes according to a quadratic function of i
●
in practice quadratic probing performs better than linear
probing
22

Double hashing
●
uses the following hash function:
h(k,i) = (h1(k) + i·h2(k)) mod m; i=0, 1, ..., m-1
● h1 are h2 are auxiliary hash functions; the values of h2 should
be non-zero 0 and coprime to m
– for m=2n this is achieved if h2 only returns odd values
● the first probed slot is T[h1(k)]; addresses of subsequently
probed slots depend on h2(k)
●
example:
– h1(k)=k mod m, h2(k)=1 + (k mod (m-1)), k=123456, m=701
– h1(k) = 123456 mod 701 = 80, h2(k) = 1 + (123456 mod 700) =
257
– h(k,i) = [80 + i · 257] mod 701
– the probe sequence is 〈80, 337, 594, 150, 407, ...〉
23

Double hashing – example

●
insert key k=14 into the hash table of size m=13 below using
open addressing with double hashing; the hashing functions are
h1(k) = k mod 13 and h2(k) = 1 + (k mod 11):
0 – h1(14) = 14 mod 13 = 1
1 79
2
– h2(14) = 1 + (14 mod 11) = 1 + 3 = 4
3 68 – h(14,i) = [h1(14) + ih2(14)] mod 13 = [1 + 4i] mod 13
4
5 96
– the probe sequence for i = 0, 1, ..., m-1 is
6 〈1,5,9,0,4,8,12,3,7,11,2,6,10〉
7 – we try to store the key into slot 1, which is occupied
8 – next we try to store the key into slot 5, which is also
9 occupied
10 49 – in the third trial we successfully store the key into
11
slot 9 which is still free*
12
24

Double hashing – example

●
insert key k=14 into the hash table of size m=13 below using
open addressing with double hashing; the hashing functions are
h1(k) = k mod 13 and h2(k) = 1 + (k mod 11):
0 – h1(14) = 14 mod 13 = 1
1 79
2
– h2(14) = 1 + (14 mod 11) = 1 + 3 = 4
3 68 – h(14,i) = [h1(14) + ih2(14)] mod 13 = [1 + 4i] mod 13
4
5 96
– the probe sequence for i = 0, 1, ..., m-1 is
6 〈1,5,9,0,4,8,12,3,7,11,2,6,10〉
7 – we try to store the key into slot 1, which is occupied
8 – next we try to store the key into slot 5, which is also
9 14 occupied
10 49 – in the third trial we successfully store the key into
11
slot 9 which is still free
12
25

Double hashing – example

●
we have removed key 96 from a hash table and marked the
removed element location using a special symbol ×
●
we want to verify if key 14 is stored in the hash table:
0 – the probe sequence will be identical to the one
1 79 used at insertion, i.e. 〈1,5,9,0,4,8,12,3,7,11,2,6,10〉
2 – the first inspected slot is 1, where the key is not
3 68
found
4
5 ×
– the next inspected slot is 5, where the special
6 symbol is encountered, which means we must
7 continue the search
8 – the next inspected slot is 9, where the key is finally
9 14 found*
10 49
11
12
26

Double hashing – example

Open addressing analysis

●
the average number of trials for an unsuccessful search is
at most 1/(1−α) if we assume uniform hashing and a load
factor α < 1
●
element insertion requires at most 1/(1−α) trials if uniform
hashing is assumed
●
if a load factor α < 1, uniform hashing and equal
probability of finding every key in a hash table are
assumed, then the average number of trials for a
successful search is at most:
1 1
⋅ln
α 1−α
28

Disadvantages and applications

●
disadvantages of a hash table:
– it cannot be a dynamic data structure owing to a fixed hash
function
– it cannot be adapted to a changed element distribution
– it does not allow efficient implementation of operations like
minimum or maximum element extraction, predecessor or
successor search, sorting, ...
●
applications:
– dictionaries and associative arrays
– symbol tables in compilers
– memory pages in operating systems

Hashing Techniques Done
No ratings yet
Hashing Techniques Done
53 pages
Hashing
No ratings yet
Hashing
33 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
6 - Hashing
No ratings yet
6 - Hashing
52 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Lecture 4 Hash Table Stu
No ratings yet
Lecture 4 Hash Table Stu
13 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
32 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hashing
No ratings yet
Hashing
20 pages
CH 4
No ratings yet
CH 4
58 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
56 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
4
No ratings yet
4
29 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
Hash Tables: Concepts & Implementations
No ratings yet
Hash Tables: Concepts & Implementations
53 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
23 pages
Hashing Techniques & Functions
No ratings yet
Hashing Techniques & Functions
30 pages
Hash Tables: A Guide for CS Students
No ratings yet
Hash Tables: A Guide for CS Students
48 pages
Hashing
No ratings yet
Hashing
16 pages
Collision
No ratings yet
Collision
24 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
47 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hashing
No ratings yet
Hashing
44 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Hashing
No ratings yet
Hashing
10 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
61 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
Unit29 Hashing2
No ratings yet
Unit29 Hashing2
20 pages
CSC508 Hashing
No ratings yet
CSC508 Hashing
35 pages
Maps and Hashing - Final
No ratings yet
Maps and Hashing - Final
51 pages
Hashing Techniques Explained
No ratings yet
Hashing Techniques Explained
20 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Ds 5 Update
No ratings yet
Ds 5 Update
26 pages
Hash Tables: A Programmer's Guide
No ratings yet
Hash Tables: A Programmer's Guide
26 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Module 5
No ratings yet
Module 5
33 pages
Unit2 Hashing DSA
No ratings yet
Unit2 Hashing DSA
55 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
No ratings yet
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
31 pages
Hashing
No ratings yet
Hashing
37 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Dsa Week 15 C
No ratings yet
Dsa Week 15 C
5 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
Učni Načrt Predmeta / Course Syllabus
No ratings yet
Učni Načrt Predmeta / Course Syllabus
3 pages
03 - APS - Tree
No ratings yet
03 - APS - Tree
47 pages
08 - APS - Decrease and Conquer
No ratings yet
08 - APS - Decrease and Conquer
10 pages
01 - APS - Algorithm Analysis
No ratings yet
01 - APS - Algorithm Analysis
34 pages
09 - APS - Greedy Method
No ratings yet
09 - APS - Greedy Method
69 pages
Ders Notları
No ratings yet
Ders Notları
24 pages
Object Oriented Thinking
No ratings yet
Object Oriented Thinking
77 pages
Databases I - 04 SQL-DQL, Other
No ratings yet
Databases I - 04 SQL-DQL, Other
57 pages
Databases I - 02 Logical Modeling, Docker
No ratings yet
Databases I - 02 Logical Modeling, Docker
14 pages
Exceptions 2
No ratings yet
Exceptions 2
829 pages
Logic Design Questions
No ratings yet
Logic Design Questions
11 pages
Inheritance and Polymorphism
No ratings yet
Inheritance and Polymorphism
70 pages
Exceptions
No ratings yet
Exceptions
75 pages
Job Analysis of An Accountant
0% (1)
Job Analysis of An Accountant
4 pages
Module 03. Project Planning and Scheduling - PM
100% (1)
Module 03. Project Planning and Scheduling - PM
43 pages
Lec 1
No ratings yet
Lec 1
24 pages
The Role of Big Data Analytics in Detecting and Preventing Financial Fraud
No ratings yet
The Role of Big Data Analytics in Detecting and Preventing Financial Fraud
8 pages
Find BTEs in SAP Transactions
No ratings yet
Find BTEs in SAP Transactions
33 pages
Pruebas de Acceso A Enseñanzas Universitarias Oficiales de Grado
No ratings yet
Pruebas de Acceso A Enseñanzas Universitarias Oficiales de Grado
1 page
C-TPAT Compliance Document List
No ratings yet
C-TPAT Compliance Document List
52 pages
WI1 3E Relay
100% (1)
WI1 3E Relay
30 pages
Networks - WS2
No ratings yet
Networks - WS2
2 pages
Tennis Court Srinakarin 78x54.9 Redesigned
No ratings yet
Tennis Court Srinakarin 78x54.9 Redesigned
2 pages
Email Engine 700 PDF
No ratings yet
Email Engine 700 PDF
328 pages
DM Unit-1 Notes
No ratings yet
DM Unit-1 Notes
47 pages
M527-M506 (Funcionamiento)
No ratings yet
M527-M506 (Funcionamiento)
370 pages
Breakingupisnthardtodo
No ratings yet
Breakingupisnthardtodo
6 pages
Course - Handout - EC101 - Basic Electrical and Electronic Engineering
No ratings yet
Course - Handout - EC101 - Basic Electrical and Electronic Engineering
3 pages
AMA3724
No ratings yet
AMA3724
3 pages
Ready-Mix Concrete Project Brief
100% (1)
Ready-Mix Concrete Project Brief
28 pages
Juzunej
100% (1)
Juzunej
3 pages
Big Data Documentation - Big Data Documentation
No ratings yet
Big Data Documentation - Big Data Documentation
2 pages
REG615 5.0 FP1 CN Modbus Point List Manual
No ratings yet
REG615 5.0 FP1 CN Modbus Point List Manual
116 pages
Product Data Sheet: Tesys LRD Thermal Overload Relays - 30... 40 A - Class 10A
No ratings yet
Product Data Sheet: Tesys LRD Thermal Overload Relays - 30... 40 A - Class 10A
2 pages
Vehicle Security System
100% (1)
Vehicle Security System
16 pages
Acceptance Criteria - Sample
No ratings yet
Acceptance Criteria - Sample
10 pages
Dbms Imp Questions Unit Wise
No ratings yet
Dbms Imp Questions Unit Wise
2 pages
SAP SD Interview Questions
No ratings yet
SAP SD Interview Questions
9 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
21 pages
John Strikwerda - Finite Difference Schemes and Partial Differential Equations
100% (2)
John Strikwerda - Finite Difference Schemes and Partial Differential Equations
437 pages
Thermofluids Laboratory Report
No ratings yet
Thermofluids Laboratory Report
28 pages
Unit 5 - Final
No ratings yet
Unit 5 - Final
12 pages
1st Chapter English - ICTFI BPE 100 - F
No ratings yet
1st Chapter English - ICTFI BPE 100 - F
37 pages

06 - APS - Hash Table

Uploaded by

06 - APS - Hash Table

Uploaded by

Algorithms & data structures

a hash table NIL

Resolving collisions by chaining

Resolving collisions by chaining

Chained hash table operations

Chained hashing analysis

Chained hashing analysis

Open addressing – insertion

Open addressing – insertion

Open addressing – search

Open addressing – removal

Double hashing – example

Double hashing – example

Double hashing – example

Double hashing – example

Open addressing analysis

Disadvantages and applications

You might also like