Definitions and Motivation
The Algorithm
Time Complexity
References
Kruskal’s Algorithm
Melanie Ferreri
February 8, 2021
Melanie Ferreri Kruskal’s Algorithm 1 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Preliminary Definitions
Definition
A graph G = (V, E) consists of a finite set V of vertices, and a set E
of two-element subsets of V , which represent edges between
vertices. We sometimes denote the vertex set of G by V (G) and the
edge set by E(G). A weighted graph is a graph whose edges have
each been assigned a numerical weight. The weight of a graph is the
sum of the weights of its edges.
Example (A weighted graph G)
b 4
3 a
2
c 5
1
1 e
d
Melanie Ferreri Kruskal’s Algorithm 2 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Preliminary Definitions
Definition
A tree is an acyclic connected graph (there is a path between any
pair of vertices, and there are no cycles).
A spanning tree of a graph G is a spanning subgraph H (the vertex
set of H is the same as V (G)) such that H is a tree.
A minimum spanning tree (MST) of G is a spanning tree whose
weight is minimum among all possible spanning trees of G.
Melanie Ferreri Kruskal’s Algorithm 3 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Examples of spanning trees
Our G from earlier was:
b 4
3 a
2
c 5
1
1 e
d
A spanning tree and minimum spanning tree of G are as follows:
b 4 b
3 a 3 a
2
c 5 c
1
1 e 1 e
d d
Melanie Ferreri Kruskal’s Algorithm 4 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Some Facts
MST’s do not have to be unique.
Every connected graph G has a spanning tree:
If G is already a tree, then it is its own spanning tree. If G has
cycles, then we can remove edges from cycles in a way that
keeps G connected, until there are no more cycles, and we end
up with a spanning tree.
Melanie Ferreri Kruskal’s Algorithm 5 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Motivation for Kruskal’s algorithm
Suppose we have some set of villages, and we want to connect them
by roads in the least expensive way possible.
Some roads would be more expensive to construct than others,
based on the land they would go through, the labor required, etc.
We can view villages as vertices, and roads as edges, weighted by
the costs of building them.
The cheapest way to ensure that we can get from any village A to any
village B is by building roads in the shape of a minimum spanning
tree.
Melanie Ferreri Kruskal’s Algorithm 6 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Kruskal’s Algorithm
Start with a connected weighted graph G.
Sort the edges of G by weight in increasing order.
Let T be the graph whose vertex set is V (G) and whose edge
set is intialized to empty.
Let e1 be any edge of minimum weight in E(G), and add it to
E(T ). Then, let e2 be any remaining edge of minimum weight,
and add that to E(T ).
Let e3 be a remaining edge of minimum weight which does not
form a cycle with the already chosen edges. Add e3 to E(T ).
Keep adding minimum-weight edges that don’t form a cycle, until
we obtain a spanning tree T .
Melanie Ferreri Kruskal’s Algorithm 7 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Kruskal’s Algorithm Works
Theorem
Kruskal’s algorithm produces a minimum spanning tree in a
connected weighted graph.
Proof.
Let G be a connected weighted graph with n vertices. Let T be a
spanning tree obtained by Kruskal’s algorithm, where
E(T ) = {e1 , e2 , . . . , en−1 } ordered by selection in constructing the
tree. Let w(ei ) be the weight of ei , so w(e1 ) ≤ w(e2 ) ≤ · · · ≤ w(en−1 ),
n−1
X
and w(T ) = w(ei ).
i=1
We want to show that T is a MST. Suppose to the contrary that T is
not minimum, and let H be a MST which has the most possible edges
in common with T among all MST’s.
Melanie Ferreri Kruskal’s Algorithm 8 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Kruskal’s Algorithm Works
Proof (continued).
Since H 6= T , there is some edge on T that is not on H. Let ei be the
first such edge in E(T ). So for all j < i, ej is in both H and T .
Define G0 = H + ei . Since H is a tree and we are adding an edge,
this produces a cycle, so G0 has a cycle C. T has no cycle, so there
must be some edge e0 on C that is not on T . Then let T0 = G0 − e0 ,
so T0 is another spanning tree of G. Also,
w(T0 ) = w(H) + w(ei ) − w(e0 ). Since H is a MST, w(H) ≤ w(T0 );
thus w(H) ≤ w(H) + w(ei ) − w(e0 ), so w(e0 ) ≤ w(ei ). By Kruskal’s
algorithm, w(e0 ) = w(ei ) if i = 1. If i > 1, then ei is an edge that can
be added to {e1 , . . . , ei−1 } without producing a cycle, but also e0 can
be added without producing a cycle. So by Kruskal’s algorithm,
w(ei ) ≤ w(e0 ) since edges are chosen in order of weight. Thus for
i > 1, w(ei ) = w(e0 ) as well. So w(T0 ) = w(H), so T0 is a MST, and
T0 has more edges in common with T than H does, which is a
contradiction.
Melanie Ferreri Kruskal’s Algorithm 9 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Example
We will use Kruskal’s algorithm to find a minimum spanning tree of
the following graph:
a
4 7
b c
2 1
d
7 3
2
e
6 1
5
f g
Melanie Ferreri Kruskal’s Algorithm 10 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Example (continued)
We start out with the graph T having no edges, then we add the first
two edges of minimum weight.
a a
b c b 1 c
d d
e e 1
f g → f g
Melanie Ferreri Kruskal’s Algorithm 11 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Example (continued)
Then, we add the next-lowest weighted edges, making sure to only
include them if they do not produce a cycle.
a a
b 1 c b 2 1 c
d d
2 2
e 1 e 1
f g → f g
Melanie Ferreri Kruskal’s Algorithm 12 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Example (continued)
The next-lowest weighted edge is between c and g with weight 3, but
if we add it, it would make a cycle, so we go with the edge of weight 4
instead.
4 a 4 a
b 2 1 c b 2 1 c
d d
2 2
e 1 e 1
5
f g → f g
Now T is connected, and we have a minimum spanning tree of weight
15.
Melanie Ferreri Kruskal’s Algorithm 13 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
What contributes to the time complexity of Kruskal’s
algorithm?
The major contributors to the complexity of Kruskal’s algorithm are:
Sorting the edges by weight
Checking whether a potential new edge forms a cycle with the
previously selected edges
Suppose |V (G)| = n and |E(G)| = m.
We can sort the edges with MergeSort, which takes O(m log m) time.
For checking for cycles, we can use a Union Find data structure to
manage the edges of the MST.
Melanie Ferreri Kruskal’s Algorithm 14 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Union Find
A Union Find or "disjoint sets" data structure manages a list of objects
that is partitioned into connected subsets.
In the case of creating a spanning tree, we want each new edge to
connect two separate connected components of the in-progress MST
T . (If this is not the case, then we would be adding an edge to an
already connected component, thereby creating a cycle.)
So we can set up the vertices of the graph in a Union Find structure
so that we know which pieces of the graph are connected to one
another, and avoid adding edges between vertices which are already
connected.
Melanie Ferreri Kruskal’s Algorithm 15 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
How Union Find works
We start out by making an array Arr[] of size n. Arr[i] will keep
track of the parent of vertex i. We will use this array to build trees
which represent the connected components of the graph.
Union Find has two operations: union, and find. When we call
Union(a,b), we set vertex b as the parent of vertex a, so Arr[a] =
b. The Find operation tells us whether two vertices are connected,
by checking if they are in the same tree. This is done by following up
the parents of each vertex until we reach a vertex which is its own
parent - this is the root of the tree. If two vertices are in the same tree,
we will get the same root, and know they are connected.
Melanie Ferreri Kruskal’s Algorithm 16 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Example of Union Find
We will number the vertices by alphabetical order (so a = 0, b = 1,
etc). Our array starts out with A[i] = i for all i. Then when we add an
edge to T , we make one of the vertices on that edge a parent of the
other. At the third step of our last example, we had this:
a
b 1 c
d
2
e 1
f g
So Arr[] looked something like [ 0 1 2 2 3 5 4 ], where c is
the parent of d, d is the parent of e, and e is the parent of g, while a, b,
c and f are their own parents.
Melanie Ferreri Kruskal’s Algorithm 17 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Example of Union Find (continued)
b 1 c
d
2
e 1
f g
Then when we wanted to add the next-smallest edge of weight 3
between c and g, we called Find(2,6) and it returned true since c
and g are both in the tree rooted at c, so they are connected. Thus we
didn’t add that edge and moved ahead in our sorted list.
Melanie Ferreri Kruskal’s Algorithm 18 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Overall Complexity
There is a variation on union find which builds the trees in a balanced
way, by comparing the sizes of two subsets before unioning them to
determine how to join them in the data structure. Without a balanced
tree, the union and find operations can take O(n) time, but with a
balanced tree, this is reduced to O(log n).
At each step of Kruskal’s algorithm, we have to use the union and find
operations in order to check if our potential new edge will create a
cycle. We will have to iterate through at most all m edges of G, which
contributes m ∗ O(log n) = O(m log n) time. We also know sorting the
edges by weight takes O(m log m) time, so in total we have time
complexity O(m log m + m log n). There can be at most n2 edges of
G; thus log(m) is at most 2 log(n), so O(log m) = O(log n) in this case.
Thus, the total time complexity of Kruskal’s algorithm is O(m log m).
Melanie Ferreri Kruskal’s Algorithm 19 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
Applications and Conclusion
In conclusion, Kruskal’s algorithm runs in O(m log m) or O(m log n)
time (they are on the same order of magnitude for simple graphs).
This algorithm has applications in network design and other problems
like cluster analysis. It can also be used to approximate a solution to
the traveling salesperson problem (finding a shortest path which visits
each vertex at least once).
Thank you for listening!
Melanie Ferreri Kruskal’s Algorithm 20 / 21
Definitions and Motivation
The Algorithm
Time Complexity
References
References
Applications of minimum spanning tree problem. https://www.geeksforgeeks.
org/applications-of-minimum-spanning-tree/, a.
Kruskal’s minimum spanning tree algorithm | greedy algo-2.
https://www.geeksforgeeks.org/
kruskals-minimum-spanning-tree-algorithm-greedy-algo-2/, b.
Disjoint set (or union-find) | set 1 (detect cycle in an undirected graph).
https://www.geeksforgeeks.org/union-find/, c.
Omar Khaled Abdelaziz Abdelnabi. Minimum spanning tree.
https://www.hackerearth.com/practice/algorithms/graphs/
minimum-spanning-tree/tutorial/.
G. Chartrand and P. Zhang. A First Course in Graph Theory. Dover Books on
Mathematics. Dover Publications, 2013. ISBN 9780486297309. URL
https://books.google.com/books?id=zA_CAgAAQBAJ.
Prateek Garg. Disjoint set union (union find). https://www.hackerearth.com/
practice/notes/disjoint-set-union-union-find/.
Melanie Ferreri Kruskal’s Algorithm 21 / 21