Lecture 6 disjoint set

EQUIVALENCE RELATIONS
A relation is represented by ψ. Let a and b be the
two elements of set S. We say that a ψ b if a
and b are related (i.e. if they belong to the same
set).
An equivalence relation is one that satisfies the
following three properties.
i. Reflexive: a ψ a, for all a S
ii. Symmetric: a ψ b if and only if b ψ a
iii. Transitive: a ψ b and b ψ c => a ψ c
An example of equivalence relation is electrical
connectivity

THE DISJOINT SET ADT
A disjoint set data structure keeps note of a set of non-
overlapping subsets.
It is a data structure that helps us solve the dynamic
equivalence problem.

DYNAMIC EQUIVALENCE PROBLEM
The dynamic equivalence problem expresses the problem
of deciding, given two elements a and b, if a ψ b.
The problem is often because the relation is not
explicitly, but implicitly defined.
The equivalence class of an element aЄ S is the subset of
S that contains all the elements that are related to a.
To decide if a ψ b, we need only to check whether a and b
are in the same equivalence class.
We are given an input of n sets each containing one
element. Initially all sets are represented in such a way
that they are not related. This means that all the sets are
disjoint and can be represented as Si ∩ Sj =Ø

OPERATIONS ON DISJOINT SETS
i. MAKE-SET: It creates a set with only one member in each
set.
ii. FIND-SET: Returns the name of the set (equivalence class)
containing a given element
iii. UNION: If we want to add the relation a ψ b, then we first
see if a and b are already related. This is done by performing
finds on both a and b and checking whether they are in the
same equivalence class. If they are not, then we apply union.
This operation merges the two equivalence classes containing
a and b into a new equivalence class.
This is known as UNION/FIND algorithm. This algorithm is
dynamic because, as the algorithm proceeds, the sets can
change via the union operation.

REPRESENTATIONS OF DISJOINT SET ADT
ARRAY REPRESENTATION
This representation assigns one position for each
element.
Each position stores the element and an index to the
representative.
To make the Find-Set operation fast we store the name
of each equivalence class in the array. Thus the find
takes constant time, O(1).
Assume element a belongs to set i and element b
belongs to set j.
When we perform Union(a,b) all j’s have to be changed
to i’s. Each union operation unfortunately takes Ɵ(n)
time.
So for n-1 unions the time taken is Ɵ(n2).

LINKED LIST REPRESENTATION
 All the elements of an equivalence class is maintained in a linked
list. The first object in each linked list is its set's representative.
 Each object in the linked list contains a set member, a pointer to
the object containing the next member of the set, and a pointer
back to the representative.
 Each list maintains head pointer, to the representative, and tail
pointer, to the last object in the list. Within each linked list, the
objects may appear in any order.
 Consider a sequence of n Make-set operations followed by n - 1
Union operations, so that m = 2n - 1.
 We spend Ɵ(n) time performing the n Make-set operations.
Because the ith Union operation updates i objects, the total number
of objects updated by all n - 1 UNION operations is

In the worst case, the implementation of the Union
procedure takes an average of Θ(n) time per call because
we may be appending a longer list onto a shorter list, and
the pointer to the representative for each member of the
longer list should be updated.
With this simple weighted-union heuristic, a single Union
operation can still take Ω(n) time if both sets have Ω(n)
members.
A sequence of m Make-set, Union and Find-Set
operations, n of which are Make-set operations take
O(m+nlogn) time.

Linked list representation of equivalent set Si
Linked list representation of equivalent set Sj

Linked list representation after union (Si, Sj)

TREE REPRESENTATION
 A tree data structure can be used to represent a disjoint set ADT.
 Each set is represented by a tree. The elements in the tree have
the same root and hence the root is used to name the set.
 The trees do not have to be binary since we only need a parent
pointer.
Make-set (DISJ_SET S )
int i;
for( i = N; i > o; i-- )
p[i] = 0;
Tree representation of disjoint set ADT after Make-set operation

TREE REPRESENTATION
Initially, after the Make-set operation, each set contains one
element. The Make-set operation takes O(1) time.
set_type Find-set( element_type x, DISJ_SET S )
if( p[x] <= 0 )
return x;
else
return( find( p[x], S ) );
The Find-Set operation takes a time proportional to the depth
of the tree. This is inefficient for an unbalanced tree
void Union( DISJ_SET S, set_type root1, set_type root2 )
p[root2] = root1;
The union operation takes a constant time of O(1).

TREE REPRESENTATION
Tree representation of disjoint set ADT after union (5,6)

DISJOINT SET FORESTS
In a faster implementation of disjoint sets, we represent
sets by rooted trees, with each node containing one
member and each tree representing one set.
In a disjoint-set forest, eachmember points only to its
parent.
The root of each tree contains the representative and is its
own parent.
Algorithms that use this representation are no faster than
the linked-list representation. For this reason, we use two
heuristics to achieve asymptotically fastest disjoint set.
The two heuristics are:
i. Smart Union Algorithm
ii. Path Compression

SMART UNION ALGORITHM
The unions in the basic tree data structure representation
were performed arbitrarily, by making the second tree a
subtree of the first.
A basic improvement is to make the smaller tree a subtree
of the larger. We call this approach union-by-size.
If unions are done by size, the depth of any node is never
more than log n.
Note that a node is initially at depth 0. When its depth
increases as a result of a union, it is placed in a tree that is
at least twice as large as before. Thus, its depth can be
increased at most log n times. This implies that the
running time for a find operation is O(log n), and a
sequence of m operations takes O(m log n).

We need to keep track of the size of each tree. Let us
assign a size variable for each node and let it contain the
size of the tree (Initially a 0 0r 1 according to the
convenience).
When a union is performed, check the sizes and make the
new size as the sum of the old. Thus, union-by-size is not
at all difficult to implement and requires no extra space.
It is also fast, on average.
It has been shown that a sequence of m operations
requires O(m) average time if union-by-size is used. This
is because when random unions are performed, small sets
are merged with large sets throughout the algorithm.

An alternative implementation, which also
guarantees that all the trees will have depth atmost
O(log n), is union-by-rank. We keep track of the
height, instead of the size, of each tree and
perform unions by making the shallow tree a
subtree of the deeper tree.
This is an easy algorithm, since the height of a
tree increases only when two equally deep trees
are joined (and then the height goes up by one).
Thus, union-by-height is a trivial modification of
union-by-size.

Result of arbitrary union
Result of union-by-size/ union-by-rank

Algorithm of union-by-rank : Assume x and y are two
nodes
Union(x, y)
Link(Find-set(x), Find-set(y))
Link(x, y)
if rank[x] > rank[y]
then p[y] = x
else p[x] = y
if rank[x] == rank[y]
then rank[y] = rank[y] + 1

PATH COMPRESSION
Path compression, is also quite simple and very effective.
We use it during Find-set operations to make each node
on the find path point directly to the root.
Path compression does not change any ranks.
Path compression during the operation Find-set. (a) A tree representing a set prior to executing
Find-set(a). (b) The same set after executing Find-set(a). Each node on the find path now points
directly to the root.

PATH COMPRESSION
 The Find-set procedure is a two-pass method: it makes
one pass up the find path to find the root, and it makes a
second pass back down the find path to update each
node so that it points directly to the root.
Find-set function using path compression:
Find-set(x)
if x ≠ p[x]
then p[x] = Find-set(p[x])
return p[x]

DISJOINT SET FORESTS
Union-by-rank or path-compression improves the running time
of the operations on disjoint-set forests, and the improvement
is even better when the two heuristics are used together.
We do not prove it, but, if there are n Make-set operations and
at most n - 1 Union operations and f Find-set operations, the
path-compression heuristic alone gives a worst-case running
time of Θ (n + f · (1 + log2+ f / n n)).
When we use union by rank and path compression together, the
worst-case running time is O(m α (m,n)), where α(m,n) is a
very slowly growing function and the value of it is derived
from the inverse of Ackermann function.
In any application of a disjoint-set data structure, α(m,n) ≤ 4.
Thus, for practical purposes the running time can be viewed as
linear in m (no. of operations) .

ACKERMANN FUNCTION
Assume m and n are non-negative integers. The value of
this function grows rapidly, even for small inputs.
For example A(4,2) is an integer of 19,729 decimal
digits.
Since the function f (n) = A(m, n) grows very rapidly, its
inverse function, f−1, grows very slowly.
This inverse Ackermann function f−1 is usually denoted
by α. α(m,n) is less than 5 for any practical input size n.

Lecture 6 disjoint set

More Related Content

What's hot

Similar to Lecture 6 disjoint set

More from Abirami A

Recently uploaded

In this document

Lecture 6 disjoint set