KEMBAR78
Hashing in DBMS | PDF | Database Index | Databases
0% found this document useful (0 votes)
30 views5 pages

Hashing in DBMS

Hashing in Database Management Systems (DBMS) is a technique for efficient data retrieval and storage by transforming keys into fixed-size hash codes used for indexing in hash tables. It includes concepts such as hash functions, hash tables, and collision handling methods like chaining and open addressing. Types of hashing include static, dynamic, open addressing, and bucket hashing, each with its advantages and disadvantages regarding efficiency and complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views5 pages

Hashing in DBMS

Hashing in Database Management Systems (DBMS) is a technique for efficient data retrieval and storage by transforming keys into fixed-size hash codes used for indexing in hash tables. It includes concepts such as hash functions, hash tables, and collision handling methods like chaining and open addressing. Types of hashing include static, dynamic, open addressing, and bucket hashing, each with its advantages and disadvantages regarding efficiency and complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Hashing in DBMS (Database Management Systems) is a technique used for efficient data

retrieval and storage. It involves transforming a key (often a piece of data) into a fixed-size
value, called a hash code. This hash code is used to index data in a hash table, which allows for
quick searching, insertion, and deletion operations. Hashing is commonly used for indexing,
especially when dealing with large datasets.
•​ Hash Function: A function that takes an input (key) and produces a fixed-size string of
characters, typically a number, known as the hash value or hash code.
•​ Hash Table: A data structure that stores data in an array format, where the position of
each data item is determined by the hash code generated by the hash function.
•​ Collision: When two different keys produce the same hash value, this is called a
collision. Handling collisions is an important part of the hashing process.
How Hashing Works:
•​ Key: A piece of data, like a student ID, a name, or any other attribute you want to search
for in the database.
•​ Hash Function: The key is passed through a hash function, which computes a hash code
for that key.
•​ Hash Table: The hash code is used as an index to insert the data into a hash table (or a
similar structure like a hash map). This allows quick access to the data.
•​ Search Operation: To search for a key, the system computes the hash code for the key,
directly accessing the corresponding position in the hash table.
Example:
Let's say we are creating a hash table to store student records, and each student has a unique
student ID.
•​ Step 1: Define a Hash Function Suppose the hash function is a simple one:
hash(key)=keymod table size\text{hash}(key) = \text{key} \mod \text{table size}
where key is the student ID and table size is the size of the hash table.
•​ Step 2: Hash Table Creation Suppose we have a hash table of size 10, and the student
IDs are as follows:
•​ Student ID: 123, 456, 789, 234, 567
•​ Step 3: Compute Hash Values For each student ID, we apply the hash function to
compute the hash value:
•​ Hash(123) = 123 % 10 = 3
•​ Hash(456) = 456 % 10 = 6
•​ Hash(789) = 789 % 10 = 9
•​ Hash(234) = 234 % 10 = 4
•​ Hash(567) = 567 % 10 = 7
•​ Step 4: Insert into the Hash Table The data will be stored in the hash table at the
corresponding index:
•​ Index 0: (empty)
•​ Index 1: (empty)
•​ Index 2: (empty)
•​ Index 3: Student ID 123
•​ Index 4: Student ID 234
•​ Index 5: (empty)
•​ Index 6: Student ID 456
•​ Index 7: Student ID 567
•​ Index 8: (empty)
•​ Index 9: Student ID 789
•​ Step 5: Searching To find a student with a given ID, say 456, we calculate the hash
value:
•​ Hash(456) = 456 % 10 = 6
•​ We then look at index 6 in the hash table and find the student record.
Handling Collisions:
If two student IDs were to hash to the same index (a collision), there are various methods to
handle this:
•​ Chaining: Store multiple values at the same index using a linked list.
•​ Open Addressing: Search for the next available slot in the hash table.
Example of Collision:
Suppose the following IDs hash to the same value:
•​ Hash(123) = 3
•​ Hash(113) = 3
With chaining, both records would be stored at index 3:
Index 3: (113 -> 123)
This allows both student IDs to be stored at the same location without overwriting each other.
Advantages of Hashing:
•​ Efficient Search: Hashing provides fast data retrieval (O(1) time complexity on
average).
•​ Efficient Insertion and Deletion: Adding or removing data can also be done quickly.
Disadvantages:
•​ Collision Handling: Managing collisions can become complex and may degrade
performance.
•​ Memory Usage: Hash tables may require a significant amount of memory, especially if
the table size is too large.
1. Static Hashing
In static hashing, a fixed-size hash table is used to store the data, and the size of the table remains
unchanged. The hash function is used to map a key to a particular index within this fixed-size
table.
•​ Example: Consider a hash table of size 10, and the hash function is Hash(Key) = Key %
10. If we have keys (e.g., 21, 32, 43, 54), the data will be stored based on the modulus of
the key:
•​ Hash(21) = 21 % 10 = 1 → Stored at index 1
•​ Hash(32) = 32 % 10 = 2 → Stored at index 2
•​ Hash(43) = 43 % 10 = 3 → Stored at index 3
•​ Hash(54) = 54 % 10 = 4 → Stored at index 4
Since the table size is fixed, if more data needs to be inserted beyond the table’s capacity, it leads
to a problem called overflow.
2. Dynamic Hashing
Dynamic hashing addresses the issue of static hashing where the size of the hash table is fixed
and may lead to overflow. In dynamic hashing, the hash table grows or shrinks dynamically
based on the number of records. This helps in reducing collisions and provides flexibility in
dealing with the overflow situation.
Types of Dynamic Hashing:
•​ Extendible Hashing
•​ Linear Hashing
Extendible Hashing
Extendible hashing uses a directory of pointers to hash buckets and grows the directory size
dynamically as needed. It allows for splitting of buckets and doubling of the directory size to
accommodate additional records.
•​ Example: Let's assume the hash table has a global depth of 1, which means there are
only 2 buckets (each corresponding to hash values 0 and 1). When the table overflows,
we double the directory size and split the existing bucket into two new buckets.
If a new record (say 5) is inserted into the table, it’s hashed as Hash(5) = 5 % 2 = 1,
but bucket 1 already has a record and overflows. The directory size doubles to
accommodate more records.
Linear Hashing
Linear hashing works by gradually increasing the hash table size in a linear manner. When the
table reaches a certain threshold, it is resized by adding new buckets. New records are inserted
into these new buckets, and old records are rehashed to maintain a consistent distribution.
•​ Example: If a table is using bucket size 4 and becomes full, the system will add a new
bucket and rehash the data into these buckets in a linear fashion. This ensures that at any
point, no bucket is overly full.
3. Open Addressing Hashing
In open addressing, all data is stored directly in the hash table itself. When a collision occurs
(i.e., two keys hash to the same index), the system tries to find another open slot within the table
based on a probe sequence. Open addressing is suitable when there is a high number of
collisions.
Types of Open Addressing:
•​ Linear Probing
•​ Quadratic Probing
•​ Double Hashing
Linear Probing
In linear probing, when a collision occurs, the system checks the next available index (i.e., it
checks index + 1, index + 2, etc., until an empty slot is found).
•​ Example: If we have a hash table of size 5 and a hash function Hash(Key) = Key % 5:
•​ Hash(12) = 12 % 5 = 2 → Insert at index 2.
•​ Hash(17) = 17 % 5 = 2, but index 2 is already occupied (by 12). So, the system
checks index 3.
•​ Hash(17) will be inserted at index 3.
Quadratic Probing
Quadratic probing works similarly to linear probing, but instead of checking the next slot, it
checks slots that increase quadratically (e.g., index + 1^2, index + 2^2, index + 3^2, etc.).
•​ Example: Using the same hash table as before with Hash(Key) = Key % 5:
•​ Hash(12) = 12 % 5 = 2 → Insert at index 2.
•​ Hash(17) = 17 % 5 = 2, but index 2 is occupied, so the system checks index 2
+ 1^2 = 3 (if it's occupied, it checks 2 + 2^2 = 6).

Double Hashing
Double hashing uses two hash functions to calculate the index. If a collision occurs, the second
hash function is used to find the next index.
•​ Example: Let’s assume two hash functions:
•​ Hash1(Key) = Key % 5
•​ Hash2(Key) = 1 + (Key % 4)
If Hash1(17) = 2 and index 2 is occupied, double hashing calculates a new index:
•​ Hash2(17) = 1 + (17 % 4) = 1 + 1 = 2 The system will then try index = 2
+ 2 = 4.

4. Bucket Hashing
In bucket hashing, a bucket is used to store multiple records that have the same hash value (i.e.,
when collisions occur). This is similar to chaining but in the context of hash tables.
•​ Example: Suppose we have a hash table with the hash function Hash(Key) = Key % 5.
If keys 12 and 17 both hash to index 2:
•​ At index 2, we store both keys in a bucket.
The bucket allows us to store multiple items at the same index, reducing collisions
significantly.
Summary of Hashing Types:
•​ Static Hashing: A fixed-size hash table; prone to overflow issues.
•​ Dynamic Hashing: The hash table grows/shrinks dynamically; extendible hashing and
linear hashing are common types.
•​ Open Addressing: The hash table stores elements directly in the table; uses linear
probing, quadratic probing, or double hashing to handle collisions.
•​ Bucket Hashing: Stores multiple records in a bucket to handle collisions, reducing the
impact of a high number of collisions.
Advantages and Disadvantages:
•​ Advantages:
•​ Efficient data retrieval and insertion.
•​ Reduces the search space for finding data.
•​ Disadvantages:
•​ Collisions: Can still be problematic depending on the method used.
•​ Complexity: Some methods (like dynamic hashing or double hashing) can be
complex to implement.

You might also like