KEMBAR78
Lecture 6 disjoint set | PPTX
Disjoint Set ADT
EQUIVALENCE RELATIONS
A relation is represented by ψ. Let a and b be the
two elements of set S. We say that a ψ b if a
and b are related (i.e. if they belong to the same
set).
An equivalence relation is one that satisfies the
following three properties.
i. Reflexive: a ψ a, for all a S
ii. Symmetric: a ψ b if and only if b ψ a
iii. Transitive: a ψ b and b ψ c => a ψ c
An example of equivalence relation is electrical
connectivity
THE DISJOINT SET ADT
A disjoint set data structure keeps note of a set of non-
overlapping subsets.
It is a data structure that helps us solve the dynamic
equivalence problem.
DYNAMIC EQUIVALENCE PROBLEM
The dynamic equivalence problem expresses the problem
of deciding, given two elements a and b, if a ψ b.
The problem is often because the relation is not
explicitly, but implicitly defined.
The equivalence class of an element aЄ S is the subset of
S that contains all the elements that are related to a.
To decide if a ψ b, we need only to check whether a and b
are in the same equivalence class.
We are given an input of n sets each containing one
element. Initially all sets are represented in such a way
that they are not related. This means that all the sets are
disjoint and can be represented as Si ∩ Sj =Ø
OPERATIONS ON DISJOINT SETS
i. MAKE-SET: It creates a set with only one member in each
set.
ii. FIND-SET: Returns the name of the set (equivalence class)
containing a given element
iii. UNION: If we want to add the relation a ψ b, then we first
see if a and b are already related. This is done by performing
finds on both a and b and checking whether they are in the
same equivalence class. If they are not, then we apply union.
This operation merges the two equivalence classes containing
a and b into a new equivalence class.
This is known as UNION/FIND algorithm. This algorithm is
dynamic because, as the algorithm proceeds, the sets can
change via the union operation.
REPRESENTATIONS OF DISJOINT SET ADT
ARRAY REPRESENTATION
This representation assigns one position for each
element.
Each position stores the element and an index to the
representative.
To make the Find-Set operation fast we store the name
of each equivalence class in the array. Thus the find
takes constant time, O(1).
Assume element a belongs to set i and element b
belongs to set j.
When we perform Union(a,b) all j’s have to be changed
to i’s. Each union operation unfortunately takes Ɵ(n)
time.
So for n-1 unions the time taken is Ɵ(n2).
REPRESENTATIONS OF DISJOINT SET ADT
LINKED LIST REPRESENTATION
 All the elements of an equivalence class is maintained in a linked
list. The first object in each linked list is its set's representative.
 Each object in the linked list contains a set member, a pointer to
the object containing the next member of the set, and a pointer
back to the representative.
 Each list maintains head pointer, to the representative, and tail
pointer, to the last object in the list. Within each linked list, the
objects may appear in any order.
 Consider a sequence of n Make-set operations followed by n - 1
Union operations, so that m = 2n - 1.
 We spend Ɵ(n) time performing the n Make-set operations.
Because the ith Union operation updates i objects, the total number
of objects updated by all n - 1 UNION operations is
REPRESENTATIONS OF DISJOINT SET ADT
LINKED LIST REPRESENTATION
In the worst case, the implementation of the Union
procedure takes an average of Θ(n) time per call because
we may be appending a longer list onto a shorter list, and
the pointer to the representative for each member of the
longer list should be updated.
With this simple weighted-union heuristic, a single Union
operation can still take Ω(n) time if both sets have Ω(n)
members.
A sequence of m Make-set, Union and Find-Set
operations, n of which are Make-set operations take
O(m+nlogn) time.
REPRESENTATIONS OF DISJOINT SET ADT
LINKED LIST REPRESENTATION
Linked list representation of equivalent set Si
Linked list representation of equivalent set Sj
REPRESENTATIONS OF DISJOINT SET ADT
LINKED LIST REPRESENTATION
Linked list representation after union (Si, Sj)
REPRESENTATIONS OF DISJOINT SET ADT
TREE REPRESENTATION
 A tree data structure can be used to represent a disjoint set ADT.
 Each set is represented by a tree. The elements in the tree have
the same root and hence the root is used to name the set.
 The trees do not have to be binary since we only need a parent
pointer.
Make-set (DISJ_SET S )
int i;
for( i = N; i > o; i-- )
p[i] = 0;
Tree representation of disjoint set ADT after Make-set operation
REPRESENTATIONS OF DISJOINT SET ADT
TREE REPRESENTATION
Initially, after the Make-set operation, each set contains one
element. The Make-set operation takes O(1) time.
set_type Find-set( element_type x, DISJ_SET S )
if( p[x] <= 0 )
return x;
else
return( find( p[x], S ) );
The Find-Set operation takes a time proportional to the depth
of the tree. This is inefficient for an unbalanced tree
void Union( DISJ_SET S, set_type root1, set_type root2 )
p[root2] = root1;
The union operation takes a constant time of O(1).
REPRESENTATIONS OF DISJOINT SET ADT
TREE REPRESENTATION
Tree representation of disjoint set ADT after union (5,6)
Tree representation of disjoint set ADT after union (7,8)
Tree representation of disjoint set ADT after union (5,7)
DISJOINT SET FORESTS
In a faster implementation of disjoint sets, we represent
sets by rooted trees, with each node containing one
member and each tree representing one set.
In a disjoint-set forest, eachmember points only to its
parent.
The root of each tree contains the representative and is its
own parent.
Algorithms that use this representation are no faster than
the linked-list representation. For this reason, we use two
heuristics to achieve asymptotically fastest disjoint set.
The two heuristics are:
i. Smart Union Algorithm
ii. Path Compression
SMART UNION ALGORITHM
The unions in the basic tree data structure representation
were performed arbitrarily, by making the second tree a
subtree of the first.
A basic improvement is to make the smaller tree a subtree
of the larger. We call this approach union-by-size.
If unions are done by size, the depth of any node is never
more than log n.
Note that a node is initially at depth 0. When its depth
increases as a result of a union, it is placed in a tree that is
at least twice as large as before. Thus, its depth can be
increased at most log n times. This implies that the
running time for a find operation is O(log n), and a
sequence of m operations takes O(m log n).
SMART UNION ALGORITHM
We need to keep track of the size of each tree. Let us
assign a size variable for each node and let it contain the
size of the tree (Initially a 0 0r 1 according to the
convenience).
When a union is performed, check the sizes and make the
new size as the sum of the old. Thus, union-by-size is not
at all difficult to implement and requires no extra space.
It is also fast, on average.
It has been shown that a sequence of m operations
requires O(m) average time if union-by-size is used. This
is because when random unions are performed, small sets
are merged with large sets throughout the algorithm.
SMART UNION ALGORITHM
An alternative implementation, which also
guarantees that all the trees will have depth atmost
O(log n), is union-by-rank. We keep track of the
height, instead of the size, of each tree and
perform unions by making the shallow tree a
subtree of the deeper tree.
This is an easy algorithm, since the height of a
tree increases only when two equally deep trees
are joined (and then the height goes up by one).
Thus, union-by-height is a trivial modification of
union-by-size.
SMART UNION ALGORITHM
Result of arbitrary union
Result of union-by-size/ union-by-rank
SMART UNION ALGORITHM
Algorithm of union-by-rank : Assume x and y are two
nodes
Union(x, y)
Link(Find-set(x), Find-set(y))
Link(x, y)
if rank[x] > rank[y]
then p[y] = x
else p[x] = y
if rank[x] == rank[y]
then rank[y] = rank[y] + 1
PATH COMPRESSION
Path compression, is also quite simple and very effective.
We use it during Find-set operations to make each node
on the find path point directly to the root.
Path compression does not change any ranks.
Path compression during the operation Find-set. (a) A tree representing a set prior to executing
Find-set(a). (b) The same set after executing Find-set(a). Each node on the find path now points
directly to the root.
PATH COMPRESSION
 The Find-set procedure is a two-pass method: it makes
one pass up the find path to find the root, and it makes a
second pass back down the find path to update each
node so that it points directly to the root.
Find-set function using path compression:
Find-set(x)
if x ≠ p[x]
then p[x] = Find-set(p[x])
return p[x]
DISJOINT SET FORESTS
Union-by-rank or path-compression improves the running time
of the operations on disjoint-set forests, and the improvement
is even better when the two heuristics are used together.
We do not prove it, but, if there are n Make-set operations and
at most n - 1 Union operations and f Find-set operations, the
path-compression heuristic alone gives a worst-case running
time of Θ (n + f · (1 + log2+ f / n n)).
When we use union by rank and path compression together, the
worst-case running time is O(m α (m,n)), where α(m,n) is a
very slowly growing function and the value of it is derived
from the inverse of Ackermann function.
In any application of a disjoint-set data structure, α(m,n) ≤ 4.
Thus, for practical purposes the running time can be viewed as
linear in m (no. of operations) .
ACKERMANN FUNCTION
Assume m and n are non-negative integers. The value of
this function grows rapidly, even for small inputs.
For example A(4,2) is an integer of 19,729 decimal
digits.
Since the function f (n) = A(m, n) grows very rapidly, its
inverse function, f−1, grows very slowly.
This inverse Ackermann function f−1 is usually denoted
by α. α(m,n) is less than 5 for any practical input size n.

Lecture 6 disjoint set

  • 1.
  • 2.
    EQUIVALENCE RELATIONS A relationis represented by ψ. Let a and b be the two elements of set S. We say that a ψ b if a and b are related (i.e. if they belong to the same set). An equivalence relation is one that satisfies the following three properties. i. Reflexive: a ψ a, for all a S ii. Symmetric: a ψ b if and only if b ψ a iii. Transitive: a ψ b and b ψ c => a ψ c An example of equivalence relation is electrical connectivity
  • 3.
    THE DISJOINT SETADT A disjoint set data structure keeps note of a set of non- overlapping subsets. It is a data structure that helps us solve the dynamic equivalence problem.
  • 4.
    DYNAMIC EQUIVALENCE PROBLEM Thedynamic equivalence problem expresses the problem of deciding, given two elements a and b, if a ψ b. The problem is often because the relation is not explicitly, but implicitly defined. The equivalence class of an element aЄ S is the subset of S that contains all the elements that are related to a. To decide if a ψ b, we need only to check whether a and b are in the same equivalence class. We are given an input of n sets each containing one element. Initially all sets are represented in such a way that they are not related. This means that all the sets are disjoint and can be represented as Si ∩ Sj =Ø
  • 5.
    OPERATIONS ON DISJOINTSETS i. MAKE-SET: It creates a set with only one member in each set. ii. FIND-SET: Returns the name of the set (equivalence class) containing a given element iii. UNION: If we want to add the relation a ψ b, then we first see if a and b are already related. This is done by performing finds on both a and b and checking whether they are in the same equivalence class. If they are not, then we apply union. This operation merges the two equivalence classes containing a and b into a new equivalence class. This is known as UNION/FIND algorithm. This algorithm is dynamic because, as the algorithm proceeds, the sets can change via the union operation.
  • 6.
    REPRESENTATIONS OF DISJOINTSET ADT ARRAY REPRESENTATION This representation assigns one position for each element. Each position stores the element and an index to the representative. To make the Find-Set operation fast we store the name of each equivalence class in the array. Thus the find takes constant time, O(1). Assume element a belongs to set i and element b belongs to set j. When we perform Union(a,b) all j’s have to be changed to i’s. Each union operation unfortunately takes Ɵ(n) time. So for n-1 unions the time taken is Ɵ(n2).
  • 7.
    REPRESENTATIONS OF DISJOINTSET ADT LINKED LIST REPRESENTATION  All the elements of an equivalence class is maintained in a linked list. The first object in each linked list is its set's representative.  Each object in the linked list contains a set member, a pointer to the object containing the next member of the set, and a pointer back to the representative.  Each list maintains head pointer, to the representative, and tail pointer, to the last object in the list. Within each linked list, the objects may appear in any order.  Consider a sequence of n Make-set operations followed by n - 1 Union operations, so that m = 2n - 1.  We spend Ɵ(n) time performing the n Make-set operations. Because the ith Union operation updates i objects, the total number of objects updated by all n - 1 UNION operations is
  • 8.
    REPRESENTATIONS OF DISJOINTSET ADT LINKED LIST REPRESENTATION In the worst case, the implementation of the Union procedure takes an average of Θ(n) time per call because we may be appending a longer list onto a shorter list, and the pointer to the representative for each member of the longer list should be updated. With this simple weighted-union heuristic, a single Union operation can still take Ω(n) time if both sets have Ω(n) members. A sequence of m Make-set, Union and Find-Set operations, n of which are Make-set operations take O(m+nlogn) time.
  • 9.
    REPRESENTATIONS OF DISJOINTSET ADT LINKED LIST REPRESENTATION Linked list representation of equivalent set Si Linked list representation of equivalent set Sj
  • 10.
    REPRESENTATIONS OF DISJOINTSET ADT LINKED LIST REPRESENTATION Linked list representation after union (Si, Sj)
  • 11.
    REPRESENTATIONS OF DISJOINTSET ADT TREE REPRESENTATION  A tree data structure can be used to represent a disjoint set ADT.  Each set is represented by a tree. The elements in the tree have the same root and hence the root is used to name the set.  The trees do not have to be binary since we only need a parent pointer. Make-set (DISJ_SET S ) int i; for( i = N; i > o; i-- ) p[i] = 0; Tree representation of disjoint set ADT after Make-set operation
  • 12.
    REPRESENTATIONS OF DISJOINTSET ADT TREE REPRESENTATION Initially, after the Make-set operation, each set contains one element. The Make-set operation takes O(1) time. set_type Find-set( element_type x, DISJ_SET S ) if( p[x] <= 0 ) return x; else return( find( p[x], S ) ); The Find-Set operation takes a time proportional to the depth of the tree. This is inefficient for an unbalanced tree void Union( DISJ_SET S, set_type root1, set_type root2 ) p[root2] = root1; The union operation takes a constant time of O(1).
  • 13.
    REPRESENTATIONS OF DISJOINTSET ADT TREE REPRESENTATION Tree representation of disjoint set ADT after union (5,6) Tree representation of disjoint set ADT after union (7,8) Tree representation of disjoint set ADT after union (5,7)
  • 14.
    DISJOINT SET FORESTS Ina faster implementation of disjoint sets, we represent sets by rooted trees, with each node containing one member and each tree representing one set. In a disjoint-set forest, eachmember points only to its parent. The root of each tree contains the representative and is its own parent. Algorithms that use this representation are no faster than the linked-list representation. For this reason, we use two heuristics to achieve asymptotically fastest disjoint set. The two heuristics are: i. Smart Union Algorithm ii. Path Compression
  • 15.
    SMART UNION ALGORITHM Theunions in the basic tree data structure representation were performed arbitrarily, by making the second tree a subtree of the first. A basic improvement is to make the smaller tree a subtree of the larger. We call this approach union-by-size. If unions are done by size, the depth of any node is never more than log n. Note that a node is initially at depth 0. When its depth increases as a result of a union, it is placed in a tree that is at least twice as large as before. Thus, its depth can be increased at most log n times. This implies that the running time for a find operation is O(log n), and a sequence of m operations takes O(m log n).
  • 16.
    SMART UNION ALGORITHM Weneed to keep track of the size of each tree. Let us assign a size variable for each node and let it contain the size of the tree (Initially a 0 0r 1 according to the convenience). When a union is performed, check the sizes and make the new size as the sum of the old. Thus, union-by-size is not at all difficult to implement and requires no extra space. It is also fast, on average. It has been shown that a sequence of m operations requires O(m) average time if union-by-size is used. This is because when random unions are performed, small sets are merged with large sets throughout the algorithm.
  • 17.
    SMART UNION ALGORITHM Analternative implementation, which also guarantees that all the trees will have depth atmost O(log n), is union-by-rank. We keep track of the height, instead of the size, of each tree and perform unions by making the shallow tree a subtree of the deeper tree. This is an easy algorithm, since the height of a tree increases only when two equally deep trees are joined (and then the height goes up by one). Thus, union-by-height is a trivial modification of union-by-size.
  • 18.
    SMART UNION ALGORITHM Resultof arbitrary union Result of union-by-size/ union-by-rank
  • 19.
    SMART UNION ALGORITHM Algorithmof union-by-rank : Assume x and y are two nodes Union(x, y) Link(Find-set(x), Find-set(y)) Link(x, y) if rank[x] > rank[y] then p[y] = x else p[x] = y if rank[x] == rank[y] then rank[y] = rank[y] + 1
  • 20.
    PATH COMPRESSION Path compression,is also quite simple and very effective. We use it during Find-set operations to make each node on the find path point directly to the root. Path compression does not change any ranks. Path compression during the operation Find-set. (a) A tree representing a set prior to executing Find-set(a). (b) The same set after executing Find-set(a). Each node on the find path now points directly to the root.
  • 21.
    PATH COMPRESSION  TheFind-set procedure is a two-pass method: it makes one pass up the find path to find the root, and it makes a second pass back down the find path to update each node so that it points directly to the root. Find-set function using path compression: Find-set(x) if x ≠ p[x] then p[x] = Find-set(p[x]) return p[x]
  • 22.
    DISJOINT SET FORESTS Union-by-rankor path-compression improves the running time of the operations on disjoint-set forests, and the improvement is even better when the two heuristics are used together. We do not prove it, but, if there are n Make-set operations and at most n - 1 Union operations and f Find-set operations, the path-compression heuristic alone gives a worst-case running time of Θ (n + f · (1 + log2+ f / n n)). When we use union by rank and path compression together, the worst-case running time is O(m α (m,n)), where α(m,n) is a very slowly growing function and the value of it is derived from the inverse of Ackermann function. In any application of a disjoint-set data structure, α(m,n) ≤ 4. Thus, for practical purposes the running time can be viewed as linear in m (no. of operations) .
  • 23.
    ACKERMANN FUNCTION Assume mand n are non-negative integers. The value of this function grows rapidly, even for small inputs. For example A(4,2) is an integer of 19,729 decimal digits. Since the function f (n) = A(m, n) grows very rapidly, its inverse function, f−1, grows very slowly. This inverse Ackermann function f−1 is usually denoted by α. α(m,n) is less than 5 for any practical input size n.