Consistent
Hashing
with bounded load
01
Introduction to Hashing
Content
These are the main topics that our 02
paper covers Consistent Hashing
03
MARCH 17, 2025
Consistent Hashing with
bounded load
01 Hashing is a technique in which we map one piece
of data to another kind of data using a function
Introduction to Hashing that is often referred to as a hash function.
Generally, hashing is used to map arbitrary size
data to well defined data format
In general, there are two types of hash functions:
Cryptographic hash function
Non-Cryptographic hash function
Cryptographic hash functions are expensive in
comparison to that non-cryptographic hash
functions.
One of the main problems with hashing is
collisions i.e. mapping of two different records to
the same hash value.
One of the solutions to the problem of collision is
chaining.
In the case of distributed computing, one of the most
easiest way of using hash function is modulo N
hashing in which we determine the server we are
looking for by simply performing modulo operation
on hash value obtained with number of servers.
Modulo N The major problem arising due to implementation of
such technique is that of rehashing.
hashing
In case of change of N, all the records shall be
relocated resulting in an highly inefficient system.
Other problem arising due to such technique is that
of unevenness in the system. We cannot guarantee
even distribution of records among the set of
servers.
02
In consistent hashing, we map each hash value to
a node present on a ring-like structure determined Consistent Hashing
by the output range of the hash function used.
Also, the servers are mapped to the ring and for
retrieval of data, we simply traverse in a clockwise
direction in the ring until we find a server node.
The problem of rehashing is solved in consistent
hashing because we eliminate the direct
dependence on the number of servers for the
allocation of data among the servers.
In case of removal of server, the load of that
server is simply taken up the server next in the
clockwise direction.
This can cause overloading of a particular server
and again the problem of unevenness persists.
To solve the problem of unevenness among the set
03 of servers, we should implement a capacity
Consistent Hashing with constraint on each server which should be equal to
bounded load the average load calculated by dividing the number
of data nodes by the number of server nodes.
Also, we will introduce a multiplicative constant to
the capacity defined at the time of configuration.
In this algorithm, we will use the idea of linear
probing in case of overloading of the server
Also, we will maintain the number of nodes that a
server has forwarded to the next server.
Using average load capacity as an upper limit on
the number of nodes belonging to a server and the
idea of forwarding nodes in case of overloading, we
can establish evenness in distributed system
Hash Ring Consider the case where we have 3 servers and 7
records and we want to distribute them evenly
among the set of severs using the insertion idea of
consistent hashing.
The average load capacity of each server with a
multiplicative factor of 1.1 is 3 nodes per server.
Now in case of node 1, the original server it should
belong to is server S3 but S3 has reached its
average load capacity and thus node 1 is forwards
to server S2.
Thus using the idea of linear forwarding we have
achieved even distribution of records among the
set of servers.
The main advantage of using
consistent hashing with bounded load
Pros and Cons
is the imposition of a hard limit on the
number of nodes belonging to a server
Also, the dependence on the
concreteness of the hash function is
also eliminated.
One of the major problems, in the case
of consistent hashing with bounded
load, is the computation overhead
introduced.
Thank You