Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Unit IV - Hashing (09 Hours)
Faculty Orientation Program
Data Structures
Dr. Chandrakant Kokane
HoD-IT
Nutan Maharsahtra Institute of Engineering and
Technology,Pune
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Syllabus
Hash Table : Concepts-hash table, hash function, basic operations (1 Hr)
Bucket, collision, probe, synonym, overflow (1 Hr)
Open hashing, closed hashing, perfect hash function (2 Hr)
Load density, full table, load factor, rehashing, properties of good hash
function (1 Hr)
Collision resolution strategies- open addressing and chaining (1 Hr)
Hash table overflow- open addressing and chaining (2 Hr)
Extendible hashing, closed addressing and separate chaining (1 Hr)
Case study : Dictionary Application using Hash Tables
Description: Implement a dictionary where words and meanings are stored
and retrieved using hashing with collision resolution
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table : Concepts-hash table, hash
function, basic operations
What is a Hash Table?
• A hash table (or hash map) is a data structure that
provides efficient insertion, deletion, and search
operations, typically in average-case constant time O(1).
• It stores key-value pairs.
• Keys are transformed into an index of an array (called
buckets or slots) using a hash function.
• Handles collisions when two keys map to the same index.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table : Concepts-hash table, hash
function, basic operations
Hash Function
A hash function converts keys into an integer index within the bounds of the table size.
• Properties of a good hash function:
Fast to compute.
Uniformly distributes keys to minimize collisions.
Deterministic: same key always maps to the same index.
• Common hash functions:
• Division method:
• hash(key)=keymod table_size\text{hash}(key) = key \mod table\
_sizehash(key)=keymodtable_size
• Multiplication method:
Uses fractional parts of key times a constant (Knuth’s method).
• Universal hashing:
Uses randomization
Data Structures to reduce
Unit-IV worst-case scenarios.
Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table : Concepts-hash table, hash
function, basic operations
• Collision Handling
• When two keys hash to the same index, collisions
occur. Common methods to handle collisions:
• Chaining:
Each bucket holds a linked list (or another dynamic
structure) of all elements hashed to it.
• Open addressing:
Finds another empty slot via probing (linear probing,
quadratic probing, double hashing).
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table : Concepts-hash table, hash
function, basic operations
• Basic Operations
• Insert(key, value)
Compute index using the hash function.
If collision occurs, handle using chaining or probing.
Insert the key-value pair at the calculated index.
• Search(key)
Compute index using the hash function.
Check if the key is present in the bucket.
Return the value if found; else indicate key is not present.
• Delete(key)
Compute index using the hash function.
Locate the key in the bucket or probe sequence.
Remove the key-value pair.
Structures
Data In open addressing, markHashing
Unit-IV slot as deleted for proper future probing.
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table : Concepts-hash table, hash
function, basic operations
Summary Table
Operation Average Time Complexity Description
Insert key-value pair into
Insert O(1)
the table
Find value associated with
Search O(1)
a key
Remove key-value pair
Delete O(1)
from the table
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Bucket, collision, probe, synonym, overflow
• 1. Bucket
• Definition:
A bucket is a slot or container in a hash table that stores
one or more key-value pairs.
In chaining, a bucket holds a list or chain of entries. In
open addressing, a bucket usually holds a single entry.
• Synonyms/Related terms:
– Slot
– Cell
– Container
– Index position
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Bucket, collision, probe, synonym, overflow
• 2. Collision
• Definition:
A collision happens when two different keys produce
the same hash index and thus map to the same
bucket.
• Synonyms/Related terms:
– Conflict
– Hash clash
– Hash conflict
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Bucket, collision, probe, synonym, overflow
• 3. Probe
• Definition:
A probe refers to the process of searching for an empty
bucket or the target key in the event of a collision in
open addressing. It involves checking subsequent
buckets according to a probing sequence (linear,
quadratic, etc.).
• Synonyms/Related terms:
– Search sequence
– Probe sequence
– Collision resolution step
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Bucket, collision, probe, synonym, overflow
• 4. Synonym
• In hashing:
A synonym is a key that hashes to the same
index (bucket) as another key. Essentially, it's a
different key that causes a collision.
• Synonyms/Related terms:
– Colliding key
– Conflicting key
– Hash synonym
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Bucket, collision, probe, synonym, overflow
• 5. Overflow
• Definition:
Overflow occurs when a bucket exceeds its storage
capacity, especially in fixed-size buckets or hash tables
without dynamic resizing.
• Synonyms/Related terms:
– Bucket overflow
– Collision overflow
– Overflow chain (in chaining when bucket chains grow)
– Overflow area (in some implementations that use a
secondary storage for overflow)
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Open hashing, closed hashing, perfect hash
function
• 1. Open Hashing (Chaining)
• Definition:
In open hashing, each bucket in the hash table contains a linked list (or chain) of
entries that hash to the same index. When collisions occur, new elements are simply
added to the chain.
• How it works:
– The table is an array of buckets.
– Each bucket stores a list of key-value pairs.
– Collisions are resolved by chaining keys in the same bucket.
• Advantages:
– Easy to implement.
– Table can store more items than the number of buckets.
– Deletion is straightforward.
• Disadvantages:
– Extra memory overhead for pointers.
– Performance depends on linked list length.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Open hashing, closed hashing, perfect hash function
• 2. Closed Hashing (Open Addressing)
• Definition:
In closed hashing, all elements are stored directly in the hash table array itself.
When collisions happen, the algorithm searches for another free slot within the
array using a probing technique.
• Probing Methods:
– Linear probing: Check next slot sequentially.
– Quadratic probing: Check slots based on quadratic function.
– Double hashing: Use a secondary hash function for steps.
• Advantages:
– No extra pointers needed (less memory).
– Cache-friendly due to contiguous storage.
• Disadvantages:
– Clustering can degrade performance.
– Table must have empty slots; can't exceed capacity easily.
– Deletion is trickier (needs special markers).
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Open hashing, closed hashing, perfect hash function
• Perfect Hash Function
• Definition:
A perfect hash function maps a set of keys to a hash table with no
collisions. That means every key hashes to a unique index.
• Types:
– Static perfect hashing: For fixed key sets known in advance.
– Minimal perfect hashing: A perfect hash function that uses exactly n slots for
n keys, with no wasted space.
• Uses:
– Compiler symbol tables.
– Databases and static sets where fast lookup with no collisions is critical.
• Challenges:
– Designing perfect hash functions dynamically for changing key sets is complex.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Open hashing, closed hashing, perfect hash function
Term Collision Handling Storage Use Case
Open Hashing Linked lists per Extra memory for When data grows
(Chaining) bucket chains dynamically
Memory
Closed Hashing
Probing sequence All in array constrained
(Open Addressing)
environments
Perfect Hash Minimal or exact Static key sets,
No collisions (ideal)
Function size performance critical
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Load density, full table, load factor, rehashing,
properties of good hash function
• 1. Load Density
• Definition:
The number of elements stored in the hash table per bucket
or per slot.
In chaining, it's the average length of the linked lists (chains)
in each bucket.
• Example:
If 30 keys are stored in a table of size 10 using chaining, the
load density is 3 (on average, 3 keys per bucket).
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Load density, full table, load factor, rehashing,
properties of good hash function
• 2. Full Table
• Definition:
A hash table is said to be full when there are
no empty slots available for insertion.
This term mainly applies to open addressing
hash tables since chaining can theoretically
hold unlimited keys (by extending the chains).
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Load density, full table, load factor, rehashing,
properties of good hash function
(n) to the number of buckets (m) in the hash table:𝛼=𝑛/𝑚
• Load Factor (α) Definition: Ratio of the number of keys stored
• Interpretation: For chaining, α indicates the average chain
length.
• For open addressing, α should be less than 1 to avoid a full
table.
• Impact:Higher load factor means more collisions and slower
operations.Keeping α below a threshold helps maintain
performance.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Load density, full table, load factor, rehashing,
properties of good hash function
• Rehashing
• Definition:
The process of creating a new larger hash table and
reinserting all existing keys into it, typically triggered
when the load factor exceeds a threshold.
• Purpose:
– To reduce collisions and maintain efficient operations.
– Usually doubles the table size and recalculates hash
indices.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Load density, full table, load factor, rehashing,
properties of good hash function
• Properties of a Good Hash Function
• Uniform Distribution:
Spreads keys evenly across the table to minimize collisions.
• Deterministic:
The same key always produces the same hash value.
• Fast Computation:
Computes the hash value quickly to keep insert/search efficient.
• Minimizes Collisions:
Reduces the chance that different keys produce the same hash
value.
• Uses All Key Bits:
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Collision Resolution Strategies
1. Open Addressing
• Concept:
All entries are stored directly in the hash table array. When a collision occurs, the algorithm searches
(probes) for another empty slot in the table according to a defined sequence.
• How it works:
– Compute the initial hash index.
– If occupied, try next slot(s) based on a probing method until an empty slot is found.
• Common Probing Techniques:
– Linear Probing: Check next slot (index + 1, index + 2, ...)
– Quadratic Probing: Check slots with quadratic offset (index + 1², index + 2², ...)
– Double Hashing: Use a second hash function to determine step size.
• Advantages:
– Simple and requires no extra memory for pointers.
– Cache-friendly (elements stored contiguously).
• Disadvantages:
– Clustering (groups of filled slots) can degrade performance.
– Table can become full, requiring rehashing.
Data – Deletion is tricky
Structures because Hashing
Unit-IV removing elements can break probe sequences.
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Collision Resolution Strategies
• 2. Chaining (Open Hashing)
• Concept:
Each slot (bucket) of the hash table holds a linked list (or another dynamic data structure)
of all elements hashing to that slot.
• How it works:
– Compute the hash index.
– Insert the new key-value pair into the linked list at that bucket.
– Search and delete operations involve traversing the list.
• Advantages:
– Simple to implement.
– Table can handle more elements than the number of slots.
– Deletion is straightforward.
– Less affected by load factor than open addressing.
• Disadvantages:
– Extra memory overhead for pointers in linked lists.
– Potentially slower due to linked list traversal.
– Cache-unfriendly due to pointer-based structures.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Collision Resolution Strategies
Aspect Open Addressing Chaining
Storage All elements stored in array Buckets hold linked lists
Memory Overhead Low (no extra pointers) Higher (pointers for chains)
Performance (Low Load) Very fast Fast
Performance (High Load) Degrades due to clustering Handles collisions better
More complex (needs
Deletion Easy
special markers)
Must be larger than Can exceed number of
Table Size
number of keys buckets
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table Overflow: Open Addressing vs Chaining
• 1. Overflow in Open Addressing
• What is it?
In open addressing, overflow occurs when the hash table becomes full—that is,
there are no empty slots available to insert new keys.
• Why does it happen?
Because all keys must be stored inside the fixed-size array, once all slots are
occupied, no further insertions are possible without resizing.
• Consequences:
– Insertions fail or require rehashing.
– Performance degrades as the load factor approaches 1.
– Searching takes longer due to clustering.
• Handling overflow:
– Rehashing: Create a larger table (typically double the size) and reinsert all keys.
– Maintain the load factor (e.g., below 0.7) by resizing proactively.
– Use better probing methods to reduce clustering.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table Overflow: Open Addressing vs Chaining
• 2. Overflow in Chaining
• What is it?
In chaining, overflow means that a bucket’s linked list grows too long because
many keys hash to the same index.
• Why does it happen?
Due to poor hash function distribution or high load factor (more keys than
buckets).
• Consequences:
– Longer chains lead to slower search, insert, and delete operations (degrades from O(1)
average to O(n) worst case).
– Increased memory usage due to longer lists.
• Handling overflow:
– Use a better hash function to distribute keys more evenly.
– Increase the number of buckets (resize the table and rehash).
– Use dynamic data structures like balanced trees in buckets instead of linked lists to speed
up access. Unit-IV Hashing
Data Structures
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Hash Table Overflow: Open Addressing vs Chaining
Rehash, better hash
How to handle Rehash to a larger table functions, or use balanced
trees in buckets
Table completely full (no Buckets have long chains
What causes overflow
empty slots) (many collisions)
Cannot insert new Performance degrades due
Effect
elements unless resized to long chains
Overflow Type Open Addressing Chaining
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Extendible Hashing, Closed Addressing, and Separate
Chaining
• 1. Extendible Hashing
• What is it?
Extendible hashing is a dynamic hashing technique that grows the hash table
directory as needed to handle more keys without excessive collisions.
• Key Features:
– Uses a directory indexed by some bits of the hash value.
– Each bucket has a local depth indicating how many hash bits it uses.
– When a bucket overflows, it splits and may cause the directory to double in size
(increasing global depth).
– Efficient for dynamic datasets where the number of keys grows over time.
• Advantages:
– Avoids large-scale rehashing on every insertion.
– Provides good average-case performance with controlled memory use.
• Use case:
– Database indexing and file systems that need scalable hash structures.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Extendible Hashing, Closed Addressing, and
Separate Chaining
• 2. Closed Addressing (Open Addressing)
• What is it?
Closed addressing, commonly called open addressing, stores all entries
directly inside the hash table array. When collisions occur, the algorithm
finds another empty slot via probing.
• How it works:
– Uses probing sequences like linear probing, quadratic probing, or double hashing to
resolve collisions.
– All elements are stored in the table itself (no separate linked lists).
• Pros:
– Space-efficient (no extra pointers).
– Cache-friendly due to contiguous storage.
• Cons:
– Performance suffers if the table is too full (load factor near 1).
– Deletion requires
Data Structures Unit-IVspecial care to avoid breaking probe chains.
Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Extendible Hashing, Closed Addressing, and
Separate Chaining
• 3. Separate Chaining (Open Hashing)
• What is it?
Separate chaining stores elements that hash to the same bucket in a linked
list (or other dynamic data structure) associated with that bucket.
• How it works:
– Each bucket points to a chain of all key-value pairs that hash to it.
– Insertions add new entries to the chain; searches and deletions scan the chain.
• Pros:
– Simple and flexible.
– Can handle loads greater than the number of buckets (chains grow as needed).
– Deletions are straightforward.
• Cons:
– Extra memory overhead for pointers.
– Potentially slower access due to list traversal.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Quick Comparison Table
Closed Addressing Separate Chaining
Feature Extendible Hashing
(Open Addressing) (Open Hashing)
Storage Directory + buckets Array only Array + linked lists
Bucket splitting & Probing to find
Collision Handling Chains per bucket
directory doubling empty slot
Yes, directory Requires rehashing Resize buckets array
Dynamic resizing
doubles (resize array) + chains
Moderate
Higher (pointers in
Memory Overhead (directory + Low (no pointers)
chains)
buckets)
Deletion Tricky (due to probe Simple (remove
Moderate
Complexity sequences) from chain)
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Case study
Dictionary Application using Hash Tables
Description: Implement a dictionary where words and
meanings are stored and retrieved using hashing with
collision resolution
•hash function to convert each word to an index.
•Collisions are handled by chaining — each bucket is a linked list of (key, value) pairs.
•Insert adds new words or updates existing meanings.
•Search finds the meaning of a word.
•Delete removes a word from the dictionary.
•Display prints the entire dictionary content.
Data Structures Unit-IV Hashing
Nutan Maharashtra Vidya Prasarak
NUTAN MAHARASHTRA INSTITUTE
Mandal's
OF ENGINEERING & TECHNOLOGY,
PUNE
Thank You
Data Structures Unit-IV Hashing