hashing in data strutures advanced in languae java

Hashing
• Hashing is a technique of converting an element or a value into a fixed size key,
that is used to uniquely identify that element from the group of similar
elements.
• In other words, hashing is a process of mapping an element or a value with a
unique key.
• In hashing, a hash function is used, which takes the value(element) as an input
and generates the fixed size key also known as hash code as an output. The
generated hash code is thus used as the index position at which the value
associated with the key is stored.

Examples of Hashing in Data Structure
The following are real-life examples of hashing in the data
structure –
• In schools, the teacher assigns a unique roll number to each student.
Later, the teacher uses that roll number to retrieve information about
that student.
• A library has an infinite number of books. The librarian assigns a
unique number to each book. This unique number helps in identifying
the position of the books on the bookshelf.

Need for Hash data structure
• Every day, the data on the internet is increasing multifold and it is always a struggle
to store this data efficiently.
• In day-to-day programming, this amount of data might not be that big, but still, it
needs to be stored, accessed, and processed easily and efficiently.
• A very common data structure that is used for such a purpose is the Array data
structure.
• Now the question arises if Array was already there, what was the need for a new data
structure!
• The answer to this is in the word “efficiency“.

• Though storing in Array takes O(1) time, searching in it takes at
least O(log n) time. This time appears to be small, but for a large
data set, it can cause a lot of problems and this, in turn, makes
the Array data structure inefficient.
• So now we are looking for a data structure that can store the
data and search in it in constant time, i.e. in O(1) time.
• This is how Hashing data structure came into play. With the
introduction of the Hash data structure, it is now possible to
easily store data in constant time and retrieve them in constant
time as well.

Components of Hashing
1.Key: A Key can be anything string or integer which is fed as input in
the hash function the technique that determines an index or location
for storage of an item in a data structure.
2.Hash Function: The hash function receives the input key and
returns the index of an element in an array called a hash table. The
index is known as the hash index.
3.Hash Table: Hash table is a data structure that maps keys to values
using a special function called a hash function. Hash stores the data
in an associative manner in an array where each data value has its
own unique index.
•

Hash Table
• Hash table is a data structure which uses hashing to store the data in the form
of key value pairs, such that the basic operations i.e. the insertion, deletion,
and searching can be performed on the data in O(1) time.
• Consider a hash table as an array, and whenever we need to insert, delete or
search an element in it, first the hash code corresponding to that element is
computed and then treating the hash code as the index of the array, the
required operation is performed on the given element.
• A simple hashing approach would be to use the modulo(%) operator as a hash
function and generate the key for a given value(assuming its numerical).
hash= value % hashTableSize

How does Hashing work?
• Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like
to store it in a table.
• Our main objective here is to search or update the values stored in
the table quickly in O(1) time and we are not concerned about the
ordering of strings in the table. So the given set of strings can act as
a key and the string itself will act as the value of the string but how to
store the value corresponding to the key?
Step 1: We know that hash functions (which is some
mathematical formula) are used to calculate the hash value
which acts as the index of the data structure where the value
will be stored.

Step 2: So, let’s assign
• “a” = 1,
• “b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size. We
can compute the location of the string in the array by taking the sum(string) mod 7.
Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.

• The above technique enables us to calculate the location of a
given string by using a simple hash function and rapidly find the
value that is stored in that location. Therefore the idea of
hashing seems like a great way to store (key, value) pairs of the
data in a table.

int hashTable[10];
int hashTableSize = 10;
int hashFunction(int value)
{
return (value);
}
void Insert(int value)
{
//Compute the index using the hash function
int index= hashFunction(value) % hashTablesize;
//insert the value in the hashtable at the computed index
hashTable[index]=value;
}
void search(int value)
{
//Compute the index uing the Hash function
int index=hashFunction(value) % hashTableSize;
//Check the value in the hashTable present the computed index
if (hashTable[index]==value)
printf(“found”);
else printf("not found");
}

What is a Hash function?
• The hash function creates a mapping between key and value, this is done through the use of
mathematical formulas known as hash functions. The result of the hash function is referred to
as a hash value or hash. The hash value is a representation of the original string of characters
but usually smaller than the original.
Types of Hash functions:
1.Division Method.
2.Mid Square Method.
3.Folding Method.
4.Multiplication Method
A good hash function should have the following
properties:
1.Efficiently computable.
2. Should uniformly distribute the keys (Each table position is
equally likely for each.
3.Should minimize collisions.
4.Should have a low load factor(number of items in the table
divided by the size of the table).

Problem with Hashing
• If we consider the above example, the hash function we used is
the sum of the letters, but if we examined the hash function
closely then the problem can be easily visualized that for
different strings same hash value is begin generated by the hash
function.
• For example: {“ab”, “ba”} both have the same hash value, and
string {“cd”,”be”} also generate the same hash value, etc. This is
known as collision and it creates problem in searching,
insertion, deletion, and updating of value.

What is Collision?
• The hashing process generates a small number for a big key, so
there is a possibility that two keys could produce the same
value. The situation where the newly inserted key maps to an
already occupied, and it must be handled using some collision
handling technology.

How to handle Collisions?
• There are mainly two methods to handle collision:
1.Separate Chaining:
2.Open Addressing:

Separate Chaining
• The idea is to make each cell of the hash table point to a linked
list of records that have the same hash function value. Chaining
is simple but requires additional memory outside the table.
• Example: We have given a hash function and we have to insert
some elements in the hash table using a separate chaining
method for collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.

step by step approach to how to solve
the above problem:
Step 1: First draw the empty hash table which will have a possible
range of hash values from 0 to 4 according to the hash function
provided.

Step 2: Now insert all the keys in the hash table one by one. The first key to
be inserted is 12 which is mapped to bucket number 2 which is calculated by
using the hash function 12%5=2.
Step 3: Now the next key is 22. It will map to bucket
number 2 because 22%5=2. But bucket 2 is already
occupied by key 12.

Step 4: The next key is 15. It will map to slot number 0 because
15%5=0.
Step 5: Now the next key is 25. Its bucket number will be
25%5=0. But bucket 0 is already occupied by key 25. So
separate chaining method will again handle the collision by
creating a linked list to bucket 0.

Complexity Analysis
• Insert –O(1)
• Search
• > Best :O(1)
• > Avg: O(N/M)
• >Worst: O(N)
• Delete- depends on
Time complexity of
searching
M= Number of slots in hash table
N= Number of keys to be inserted in hash table
Load factor α = N/M

Open Addressing
• Open Addressing, which is also known as closed hashing is a
technique of collision resolution in hash tables. The main idea of
open addressing is to keep all the data in the same table to
achieve it, we search for alternative slots in the hash table until
it is found.
• The three Major collision resolution strategies
Linear Probing
Quadratic Probing
Double hashing
• When using open addressing, a collision is resolved by probing
(searching) alternative cells in the hash table until our target cell
(empty cell while insertion, and cell with value x while

Linear Probing
• In linear probing, collisions are resolved by searching the hash
table consecutively (with wraparound) until an empty cell is
found.
• The definition of collision function f is quite simple in linear
probing. As suggested by the name it is a linear function of i or
simply f(i)=i
Algorithm of linear probing
• Insert(x) -
• Find the hash value, k of x from the hash function hash(x).
• Iterate consecutively in the table starting from the k, till you find a cell
that is currently not occupied.
• Place x in that cell.

• Search(x) -
• Find the hash value k of x from the hash function hash(x).
• Iterate consecutively in the table starting from the ,k, till you find a cell
that contains x or which is never been occupied.
• If we found x, then the search is successful and unsuccessful in the
other case.
• Delete(x) -
• Repeat the steps of Search(x).
• If element x does not exist in the table then we can't delete it.
• If x exists in the cell (say k), put ∞ in cell k to denote it has been
occupied some time in the past, but now it is empty.

Pseudocode for Linear Probing
class Hashing:
size, table[]
Hash(x):
return x%size
Insert(x):
k=Hash(x)
while(table[k] is not empty):
k=(k+1)%size
table[k]=x
Search(x):
k=Hash(x)
while(table[k] != x):
if(table[k] has never been occupied):
return false
k=(k+1)%size
return table[k]==x
Delete(x):
k=Hash(x)
while(table[k]!=x):
if(table[k] has never been occupied):
return
k=(k+1)%size
if(table[k]==x):
table[k] = -Infinity

Example of linear probing -
• Table Size = 7 Hash Function - hash(key)=key%7
• Insert - 16,40,27,9,75
Step 1 - Make an empty hash table of size 7.
Step 2 - Inserting 16,40, 27
• ℎ(16)=16%7=2
• ℎ(40)=40%7=5
• ℎ(27)=27%7=6

• Step 3 - Inserting 9 and 75.
• hash(9)=9%7=2 But at index 2 already 16 is placed and hence collision occurs
so as per linear probing we will search for consecutive cells till we find an
empty cell.
So we will probe for hash(9)+1 i.e. cell 3, since the next cell i.e. 3 is not
occupied we place 9 in cell 33.
• hash(75)=75%7=5 Again collision happens because 40 is already placed in
cell 5. So will search for the consecutive cells, so we search for cell 6 which is
also occupied then we will search for cell (hash(75)+2)%7 i.e. 0 which is
empty so we will place 75 there.

Search - 75,21
• Step 4 - Search 75 and 21
• hash(75)=75%7=5 But at index 5,75 is not present so we search for
consecutive cells until we found an empty cell or a cell with a value
of 75. So we search in cell 6 but it does not contain 75, so we search
for 75 in cell 0 and we stop our search here as we have found 75.
• h(21)=21%7=0 We will search for 21 in cell 0 but it contains 75 so we
will search in the next cell hash(21)+1, i.e. 1 since it is found empty it is
clear that 21 do not exist in our table.

Problem With Linear Probing
• Even though linear probing is intuitive and easy to implement
but it suffers from a problem known as Primary Clustering. It
occurs because the table is large enough therefore time to get
an empty cell or to search for a key k is quite large.
• worst case time complexity
of searching, insertion and deletion operations to be O(n),
where n is the size of the table.

Quadratic Probing
• The working of quadratic probing involves taking the initial hash
value and probing in the hash table by adding successive
values of an arbitrary quadratic polynomial. As suggested by its
name, quadratic probing uses a quadratic collision function f.
One of the most common and reasonable choices for f(i)=i^2

• For inserting x we search for the
cells hash(x)+0,hash(x)+1^2,hash(x)+2^2,... until we find an
empty cell to insert x.
• For searching x we again search for the
cells hash(x)+0,hash(x)+1^2,hash(x)+2^2,... until we find a cell
with value x. If we find an empty cell that has never been
occupied it means x is not present in the hash table.
• For deletion, we repeat the search process if a cell is found with
value x we replace the value x with a predefined unique value to
denote that this cell has contained some value in past.

Example of Quadratic Probing
• Table Size = 7 Insert = 15, 23, and 85. Search &
Delete = 85 Hash Function -Hash(x)=x%7 Collision
Resolution Strategy - f(i)=i^2
Step 1 - Create a table of size 7.
Step 2 - Insert 15 and 23
•hash(15)=15%7=1 Since the cell at index 1 is not
occupied we can easily insert 15 at cell 1.
•hash(23)=23%7=2 Again cell 2 is not occupied so
place 23 in cell 2. After performing this step our
hash table will look like

• Step 3 - Inserting 85
• hash(85)=85%7=1 In our hash table cell 1 is already occupied
so we will search for cell 1+1^2, i.e. cell 2. Again it is found
occupied so we will search for cell 1+2^2 , i.e. cell 5. It is not
occupied so we will place 85 in cell 5. After performing all
these 3 insertions in our hash table it will look like -

Double Hashing
• Double hashing offers us one of the best techniques for hashing
with open addressing. The permutation formed by double
hashing is like a random permutation therefore it almost
eliminates the chances of cluster forming as it uses a secondary
hash function as an offset to deal with collision condition.
• The hash function that is used in double hashing is of the form -
h(k,i)=hash1(k)+i×hash2(k), i=0,1,2,3,...

Example of double hashing
• Table
size = 7 hash1(k) = k%7 and hash2(k)=1+k%6 Insert = 37,25,1
2,40 and 75 Search & Delete = 75
Step 1 - Create a table of size 77.
Step 2 - Insert 37,25 and 12
•hash1(37)=37%7=2
•hash1(25)=25%7=4
•hash1(12)=12%7=5
There is no collision at any point during
inserting these three values so we will simply
place these elements at their respective
positions. After which, our hash table will look

Step 3 - Insert 40and 74
• hash1(40)=40%7=5, hash2(40)=1+40%6=5 But at the cell
at index 5 is already occupied, so we will check for next
cell i.e. hash1(40)+1×hash2(40) which will evaluate
to (5+5)%7=3. Cell at index 3 is not occupied so we will place
40 in cell 3.
• hash1(74)=74%7=4, hash2(74)=1+74%6=3 But at the cell
at index 4 is already occupied, so we will check for next
cell i.e. hash1(74)+1×hash2(74) which will evaluate
to (4+3)%7=0. Cell at index 0 is not occupied so we will place
74 in cell 0.

Load Factor
• Total number of elements per bucket= N/M
• The load factor of the hash table can be defined as the number
of items the hash table contains divided by the size of the hash
table. Load factor is the decisive parameter that is used when we
want to rehash the previous hash function or want to add more
elements to the existing hash table.
• It helps us in determining the efficiency of the hash function i.e.
it tells whether the hash function which we are using is
distributing the keys uniformly or not in the hash table.

Rehashing?
• Rehashing means hashing again. Basically, when the load factor
increases to more than its predefined value (the default value of
the load factor is 0.75), the complexity increases. So to
overcome this, the size of the array is increased (doubled) and
all the values are hashed again and stored in the new double-
sized array to maintain a low load factor and low complexity.

Hashing Data Structure in JAVA
HashMap
• HashMap class implements the Map interface which allows us to store key and value pair, where keys should be
unique.
• If you try to insert the duplicate key, it will replace the element of the corresponding key
• Java HashMap contains values based on the key.
• Java HashMap contains only unique keys.
• Java HashMap may have one null key and multiple null values.
• Java HashMap is non synchronized.
• Java HashMap maintains no order.
• The initial default capacity of Java HashMap class is 16 with a load factor of 0.75.

Make a HashMap
HashMap<keyType, valueType> myMap =
new HashMap<keyType, valueType>();
Put and get values into a map
myMap.put(key, value);
myMap.get(key) // returns the corresponding value
Some useful other methods
int size = myMap.size();//returns the size of pairs added into map
myMap.containsKey(key); // returns true or false if key is in map
myMap.keySet(); //returns the keys in the Map in collection set
myMap.values();//returns the values in the Map in collection set
myMap.remove(key); // remove the key value pairs
myMAp. isEmpty();//returns true if this map contains no key-value mappings.
Iterate using a foreach loop
for(keyType key : myMap.keySet())
{ // not ordered
myMap.get(key); // do something with the key/value pair
}

Methods inherited from class java.util.AbstractMap
METHOD DESCRIPTION
equals() Compares the specified object with this map for equality.
hashCode() Returns the hash code value for this map.
toString() Returns a string representation of this map.

HashSet in Java
• A few important features of HashSet are mentioned below:
• Implements Set Interface.
• The underlying data structure for HashSet is Hashtable.
• As it implements the Set Interface, duplicate values are not
allowed.
• Objects that you insert in HashSet are not guaranteed to be
inserted in the same order. Objects are inserted based on their
hash code.
• NULL elements are allowed in HashSet.
. HashSet does not store duplicate items

Internal Working of a HashSet
• HashSet uses HashMap for storing its object internally. You
must be wondering that to enter a value in HashMap we need a
key-value pair, but in HashSet, we are passing only one value.
• Storage in HashMap: Actually the value we insert in HashSet
acts as a key to the map Object and for its value, java uses a
constant variable. So in the key-value pair, all the values will be
the same.

• Make a HashSet
HashSet<E> hs = new HashSet<E>();
build an empty HashSet object in which the default initial capacity
is 16 and the default load factor is 0.75.
• Put values into a hashset
hs.add(key);// Used to add the specified element if it is not
present, if it is present then return false.
• Some useful other methods
hs.remove(key);//It is used to remove the specified element
from this set if it is present. Key should be of object type
hs.clear();//It is used to remove all of the elements from the
set.
hs.contains(key);// used to return true if this
set contains the specified element.

• Iterating through set using Iterator () method
Iterator<E> i = Hash_Set.iterator();
while (i.hasNext())
{ // Iterating over elements
// using next() method
System.out.println(i.next());
}

hashing in data strutures advanced in languae java

More Related Content

Similar to hashing in data strutures advanced in languae java

More from ishasharma835109

Recently uploaded

hashing in data strutures advanced in languae java