Module 07
Module 07
Module: 07
TREES
Motivation: Tree is a data structure which will help to students to store the data in the memory and
access it according to need. This can also be used in the operating system concepts i.e. the core system
implementation.
Syllabus:
Huffman algorithm 1 Hr 1 Hr
Expression trees 1 Hr 1 Hr
Definitions:-
Tree:-
A tree represents a hierarchy of information.
Trees
91
Data Structure and Files
Parent node: Immediate node joined by a single line segment when traversing up the tree. B is the
parent of D and E.
Children: All immediate nodes joined by a line segment when traversing one level down the tree. D
and E are children of B.
Sibling: Children of the same parent. C is a sibling of B.
• Ancestors: All nodes along the path to the root. H, C, and A are ancestors of I.
External node: Node without children. Also called a leaf. D, E, F, G, and I are external nodes or leaves.
• Internal node: Node with one or more children. A, B, C and H are internal nodes.
• Depth (level): Number of line segments from the root to a node (or the number of ancestors).
The depth of B is 1. The depth of I is 3.
Leaf Nodes: Nodes that do not have any children are called leaf nodes. They are also referred to as
terminal nodes. (Nodes D,E,F,G,I are leaf nodes)
Height: The height of a node is the length of the longest downward path to a leaf from that node. The
height of the root is the height of the tree.
Trees
92
Data Structure and Files
Subtree: A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T. (This is
different from the formal definition of subtree used in graph theory.) The subtree corresponding to the
root node is the entire tree;
Proper Subtree: The subtree corresponding to any other node is called a proper subtree (in analogy to
the term proper subset).
Directed Edge: A directed edge refers to the link from the parent to the child.
In-Degree: In-degree of a node is the number of edges arriving at that node.
Out Degree: Out-degree of a node is the number of edges leaving that node.
Learning:-
Student will learn how tree is used in real life applications and Operating system file structure.
Theory:
Binary Trees
A binary tree is an ordered tree in which all internal nodes are of degree 2.
an internal node (the root of the tree) and two binary trees, the left subtree and
the right subtree.
Trees
93
Data Structure and Files
arithmetic expression
(# nodes at level i) 2i
The number of nodes n in a perfect binary tree can be found using this formula: n = 2h + 1 − 1 where
h is the height of the tree.
The number of nodes n in a complete binary tree is minimum: n = 2h and maximum: n = 2h + 1 − 1
where h is the height of the tree.
The number of nodes n in a perfect binary tree can also be found using this formula: n = 2L − 1
where L is the number of leaf nodes in the tree.
The number of leaf nodes n in a perfect binary tree can be found using this formula: n = 2h where h is the
height of the tree.
The number of leaf node in a Complete Binary Tree of n-node is UpperBound(n / 2).
Trees
94
Data Structure and Files
For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1.
n = n0 + n1 + n2 + n4 + n3 + n5 + .... + nB-1 + nB
Define:
h = height
n = # nodes
ni = # internal nodes
ne = # external nodes
Using n = ni + ne, it immediately follows that
ni = ne – 1, or ne = ni + 1
n = 2ni + 1
(1)
(2)
Trees
95
Data Structure and Files
Trees
96
Data Structure and Files
void expandExternal(Position p)
before:
Position p points to an external node storing the string “CJ”.
p
“CJ”
after:
Position p points to the same node storing the same object. Now, however, the formerly external
node has two external children.
p “CJ”
The binary tree creation follows a very simple principle -- for the new element to be added, compare it with
the current element in the tree. If its value is less than the current element in the tree then move towards
the left side of that element or else to its right. If there is no sub tree on the left, make your new element as
the left child of that current element or else compare it with the existing left child and follow the same rule.
Exactly same has to done for the case when your new element is greater than the current element in the
tree but this time with the right child. Check out the algo..
Algorithm:-
Step-1: If Value of New element < current element then go to step-2 or else step-3
Step-2: If the Current element does not have a left sub-tree then make your new element the left child of
the current element; else make the existing left child as your current element and go to step-1.
Step-3: If the current element does not have a right sub-tree then make your new element the right child of
the current element; else make the existing right child as your current element and go to step-1.
Trees
97
Data Structure and Files
The number of nodes n in a perfect binary tree can be found using this formula: n = 2h + 1 − 1 where h is
the height of the tree.
The number of nodes n in a complete binary tree is minimum: n = 2h and maximum: n = 2h + 1 − 1 where
h is the height of the tree.
The number of nodes n in a perfect binary tree can also be found using this formula: n = 2L − 1 where L
is the number of leaf nodes in the tree.
The number of leaf nodes n in a perfect binary tree can be found using this formula: n = 2h where h is
the height of the tree.
The number of NULL links in a Complete Binary Tree of n-node is (n+1).
The number of leaf node in a Complete Binary Tree of n-node is UpperBound (n / 2).
For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1.
post-order traversal
In each case, the algorithms for traversal are recursive - they call themselves.
Pre-order traversal
1. Start at the root node
2. Traverse the left subtree
Trees
98
Data Structure and Files
DBACFEG
Algorithm
preorder(node)
print node.value
if node.left ≠ null then preorder(node.left)
if node.right ≠ null then preorder(node.right)
In-order traversal
1. Traverse the left subtree
2. Visit the root node
ABCDEFG
Algorithm
inorder(node)
if node.left ≠ null then inorder(node.left)
print node.value
if node.right ≠ null then inorder(node.right)
Post-order traversal
Trees
99
Data Structure and Files
ACBEGFD
Algorithm
postorder(node)
if node.left ≠ null then postorder(node.left)
if node.right ≠ null then postorder(node.right)
print node.value
Trees
100
Data Structure and Files
The preorder, inorder, and postorder traversals are special cases of the Euler tree traversal.
“Walk around” the tree and visit each node three times:
on the left
from below
on the right
import java.io.*;
import java.util.*;
class NodeTree
{
int data;
NodeTree leftchild;
NodeTree rightchild;
NodeTree root;
public NodeTree()
{
leftchild=null;
rightchild=null;
}
public void insert(int id)
{
NodeTree n= new NodeTree();
n.data=id;
if(root==null)
Trees
101
Data Structure and Files
root=n;
else
{
NodeTree current=root;
NodeTree parent;
while(true)
{
parent = current;
if(id<current.data)
{
current=current.leftchild;
if(current==null)
{
parent.leftchild=n;
return;
}
}
else
{
current=current.rightchild;
if(current==null)
{
parent.rightchild=n;
return;
}
}
}
}
}
public NodeTree find(int key)
{
NodeTree current = root;
NodeTree t = new NodeTree();
while(current.data!= key)
{
if(key<current.data)
current=current.leftchild;
else
current=current.rightchild;
if(current==null)
return null;
}
return current;
}
Trees
102
Data Structure and Files
Trees
103
Data Structure and Files
else
{
NodeTree successor = getsuccessor(current);
if(current==root)
root=successor;
else if(isleftchild)
parent.leftchild=successor;
else
parent.rightchild=successor;
successor.leftchild=current.leftchild;
}
return true;
}
public NodeTree getsuccessor(NodeTree deleNode)
{
NodeTree successorparent = deleNode;
NodeTree successor =deleNode;
NodeTree current=deleNode.rightchild;
while(current!=null)
{
successorparent=successor;
successor=current;
current=current.leftchild;
}
if(successor != deleNode.rightchild)
{
successorparent.leftchild=successor.rightchild;
successor.rightchild=deleNode.rightchild;
}
return successor;
}
public void traverse(int ttype)
{
switch(ttype)
{
case 1: System.out.println("\n Preorder traversal:");
preorder(root);
break;
case 2: System.out.println("\n Inorder traversal:");
inorder(root);
break;
case 3: System.out.println("\n Postorder traversal:");
postorder(root);
break;
Trees
104
Data Structure and Files
}
}
void preorder(NodeTree localroot)
{
if(localroot!=null)
{
System.out.println("\n"+localroot.data);
preorder(localroot.leftchild);
preorder(localroot.rightchild);
}
}
void inorder(NodeTree localroot)
{
if(localroot!=null)
{
inorder(localroot.leftchild);
System.out.println(localroot.data);
inorder(localroot.rightchild);
}
}
void postorder(NodeTree localroot)
{
if(localroot!=null)
{
postorder(localroot.leftchild);
postorder(localroot.rightchild);
System.out.println(localroot.data);
}
}
public static void main(String[] arg)
{
NodeTree t = new NodeTree();
NodeTree root = new NodeTree();
t.insert(50);
t.insert(25);
t.insert(75);
t.insert(10);
t.insert(30);
t.insert(60);
t.insert(80);
int n,key;
System.out.println("Enter a Traverseal method:-\n");
System.out.println("1 Preorder\n 2 Inorder\n 3 Postorder\n");
Scanner sc= new Scanner(System.in);
Trees
105
Data Structure and Files
n= sc.nextInt();
t.traverse(n);
A threaded binary tree makes it possible to traverse the values in the binary tree via a linear traversal that
is more rapid than a recursive in-order traversal.
Trees
106
Data Structure and Files
It is also possible to discover the parent of a node from a threaded binary tree, without explicit use of
parent pointers or a stack, albeit slowly. This can be useful where stack space is limited, or where a stack of
parent pointers is unavailable (for finding the parent pointer via DFS).
This is possible, because if a node (k) has a right child (m) then m's left pointer must be either a child, or a
thread back to k. In the case of a left child, that left child must also have a left child or a thread back to k,
and so we can follow m's left children until we find a thread, pointing back to k. The situation is similar for
when m is the left child of k
A threaded tree, with the special threading links shown by dashed arrows.
The node structure for a threaded binary tree varies a bit and its like this --
struct NODE
{
struct NODE *leftchild;
int node_value;
struct NODE *rightchild;
struct NODE *thread;
}
Advantages:
• The traversal operation is faster than that of its unthreaded version, because with threaded binary
tree non-recursive implementation is possible which can run faster and does not require the
botheration of stock management.
• The second advantage is more subtle with a threaded binary tree; we can efficiently determine the
predecessor and successor nodes starting from any node. A stack is required to provide upward
pointing information in the tree whereas in a threaded binary tree, without having to incur the
overload of using a stack mechanism the same can be carried out with the threads.
• Any node can be accessible from any other node. Threads are usually more to upward whereas links
are downward. Thus in a threaded tree, one can move in either direction and nodes are in fact
circularly linked. This is not possible in unthreaded counter part because there we can move only in
downward direction starting form root.
Trees
107
Data Structure and Files
• Insertion into and deletions from a threaded tree are all although time consuming operations (since
we have to manipulate both links and threads)
Disadvantages:
• Slower tree creation, since threads need to be maintained. This can partly be alleviated by constructing
the tree as an non-threaded tree, then threading it with a special libavl function.
• In theory, threaded trees need two extra bits per node to indicate whether each child pointer points to
an ordinary node or the node's successor/predecessor node. In libavl, however, these bits are stored in
a byte that is used for structure alignment padding in non-threaded binary trees, so no extra storage is
used.
Huffman Algorithm
Huffman's algorithm is a method for building an extended binary tree with a minimum weighted path
length from a set of given weights. Initially construct a forest of singleton trees, one associated with each
weight. If there are at least two trees, choose the two trees with the least weight associated with their
roots and replace them with a new tree, constructed by creating a root node whose weight is the sum of
the weights of the roots of the two trees removed, and setting the two trees just removed as this new
node's children. This process is repeated until the forest consists of one tree.
Pseudocode
Huffman (W, n)
Input: A list W of n weights.
Output: An extended binary tree T with weights taken from W that gives the minimum
weighted path length.
Procedure:
Create list F from singleton trees formed from {elements} of W
WHILE (F has more than one element) DO
Find T1, T2 in F that have minimum values associated with their roots
Construct new tree T by creating a new node and setting T1 and T2 as its children
Let the sum of the values associated with the roots of T1 and T2 be associated with the root of T
Add T to F
OD
Huffman: = tree stored in F
Trees
108
Data Structure and Files
Huffman's scheme uses a table of frequency of occurrence for each symbol (or character) in the input. This
table may be derived from the input itself or from data which is representative of the input. For instance,
the frequency of occurrence of letters in normal English might be derived from processing a large number
of text documents and then used for encoding all text documents. We then need to assign a variable-length
bit string to each character that unambiguously represents that character. This means that the encoding for
each character must have a unique prefix. If the characters to be encoded are arranged in a binary tree:
For example:
String Encoding
TEA 10 00 010
Encoding tree for ETASNO SEA 011 00 010
TEN 10 00 110
Notes:
1. As desired, the highest frequency letters - E and T - have two digit encodings, whereas all the others
have three digit encodings.
2. Encoding would be done with a lookup table.
A divide-and-conquer approach might have us asking which characters should appear in the left and right
subtrees and trying to build the tree from the top down. As with the optimal binary search tree, this will
lead to to an exponential time algorithm.
A greedy approach places our n characters in n sub-trees and starts by combining the two least
weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for
its root node.
2. Combine the two lowest frequencies, F and E, to form a sub-tree of weight 14.
3. Move it into its correct place.
4. Again combine the two lowest frequencies, C and B, to form a sub-tree of weight 25.
5. Move it into its correct place.
6. Now the sub-tree with weight, 14, and D are combined to make a tree of weight, 30.
7. Move it to its correct place.
8. Now the two lowest weights are held by the "25" and "30" sub-trees, so combine them to make
one of weight, 55.
9. Move it after the A.
10. Finally, combine the A and the "55" sub-tree to produce the final tree.
The encoding table is:
A 0
C 100
B 101
F 1100
E 1101
D 111
The time complexity of the Huffman algorithm is O(nlogn). Using a heap to store the weight of
each tree, each iteration requires O(logn) time to determine the cheapest weight and insert the
new weight. There are O(n) iterations, one for each item.
"How do we decode a Huffman-encoded bit string? With these variable length strings, it's not possible to
break up an encoded string of bits into characters!"
Trees
110
Data Structure and Files
Analysis:
Each iteration of Huffman's algorithm reduces the size of the problem by 1, and so there are exactly n
iterations. The i th iteration consists of locating the two minimum values in a list of length n−i+1. This is a
linear operation, and so Huffman's algorithm clearly has a time complexity of (n2).
However, it would be faster to sort the weights initially, and then maintain two lists. The first list consists of
weights that have not yet been combined, and the second list consists of trees that have been formed by
combining weights. This initial ordering is obtained at a cost of (nlogn) . Obtaining the minimum two trees
at each step then consists of two comparisons (compare the heads of the two lists, and then compare the
larger to the item after the smaller). The ordering of the second list can be maintained cheaply by using a
binary search to insert new elements. Since at step i there are i−1 elements in the second list, (log i)
comparisons are needed for insertion. Over the entire duration of the algorithm the cost of keeping this list
sorted is (n log n). Therefore the overall time complexity of Huffman's algorithm is (n log n).
In terms of space complexity, the algorithm constructs a complete binary tree with exactly n leaves.
Therefore the output can only have at most 2n−1 nodes. Thus Huffman's algorithm requires linear space.
Multiway Trees:
B- Tree:
Trees
111
Data Structure and Files
B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and
deletions in logarithmic amortized time. The B-tree is a generalization of a binary search tree in that more
than two paths diverge from a single node. (Comer, p. 123) Unlike self-balancing binary search trees, the B-
tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and
file systems.
Definition:
A B-tree of order m (the maximum number of children for each node) is a tree which satisfies the following
properties:
Each internal node's elements act as separation values which divide its subtrees. For example, if an internal
node has three child nodes (or subtrees) then it must have two separation values or elements a1 and a2. All
values in the leftmost subtree will be less than a1 , all values in the middle subtree will be between a1 and
a2, and all values in the rightmost subtree will be greater than a2.
Internal nodes in a B-tree – nodes which are not leaf nodes – are usually represented as an ordered set of
elements and child pointers. Every internal node contains a maximum of U children and – other than the
root – a minimum of L children. For all internal nodes other than the root, the number of elements is one
less than the number of child pointers; the number of elements is between L−1 and U−1. The number U
must be either 2L or 2L−1; thus each internal node is at least half full. This relationship between U and L
Trees
112
Data Structure and Files
implies that two half-full nodes can be joined to make a legal node, and one full node can be split into two
legal nodes (if there is room to push one element up into the parent). These properties make it possible to
delete and insert new values into a B-tree and adjust the tree to preserve the B-tree properties.
Leaf nodes have the same restriction on the number of elements, but have no children, and no child
pointers.
The root node still has the upper limit on the number of children, but has no lower limit. For example, when
there are fewer than L−1 elements in the entire tree, the root will be the only node in the tree, and it will
have no children at all.
A B-tree of depth n+1 can hold about U times as many items as a B-tree of depth n, but the cost of search,
insert, and delete operations grows with the depth of the tree. As with any balanced tree, the cost grows
much more slowly than the number of elements.
Some balanced trees store values only at the leaf nodes, and so have different kinds of nodes for leaf nodes
and internal nodes. B-trees keep values in every node in the tree, and may use the same structure for all
nodes. However, since leaf nodes never have children, a specialized structure for leaf nodes in B-trees will
improve performance.
B+ TREES:
B+ tree (B plus Tree) is a type of tree which represents sorted data in a way that allows for efficient
insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel
index, with maximum and minimum bounds on the number of keys in each index segment (usually called a
"block" or "node"). In a B+ tree, in contrast to a B-tree, all records are stored at the leaf level of the tree;
only keys are stored in interior nodes.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context in
particular, file systems. This is primarily because unlike binary search trees, B+ trees have very high fanout
(typically on the order of 100 or more), which reduces the number of I/O operations required to find an
element in the tree.
Trees
113
Data Structure and Files
A simple B+ tree example linking the keys 1–7 to data values d 1-d7. Note the linked list (red) allowing rapid
in-order traversal.
The algorithm to perform a search for a record r follows pointers to the correct child of each node until a
leaf is reached. Then, the leaf is scanned until the correct record is found (or until failure).
function search(record r)
u := root
while (u is not a leaf) do
choose the correct pointer in the node
move to the first node following the pointer
u := current node
scan u for r
Insertion
allocate new leaf and move half the bucket's elements to the new bucket
insert the new leaf's smallest key and address into the parent.
if the root splits, create a new root which has one key and two pointers.
Characteristics
Removing a (previously located) record requires O(logbn) operations in the worst case
Performing a range query with k elements occurring within the range requires O(logbn + k)
operations in the worst case.
AVL TREE:
An AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of
any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and
deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the
tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more
tree rotations.
The balance factor of a node is the height of its left subtree minus the height of its right subtree
(sometimes opposite) and a node with balance factor 1, 0, or −1 is considered balanced. A node with any
other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is
either stored directly at each node or computed from the heights of the subtrees.
AVL trees are often compared with red-black trees because they support the same set of operations and
because red-black trees also take O(log n) time for the basic operations. AVL trees perform better than red-
black trees for lookup-intensive applications.
Trees
115
Data Structure and Files
INSERTION:
After inserting a node, it is necessary to check each of the node's ancestors for consistency with the rules of
AVL. For each node checked, if the balance factor remains −1, 0, or +1 then no rotations are necessary.
However, if the balance factor becomes ±2 then the subtree rooted at this node is unbalanced. If insertions
are performed serially, after each insertion, at most two tree rotations are needed to restore the entire
tree to the rules of AVL.
There are four cases which need to be considered, of which two are symmetric to the other two. Let P be
the root of the unbalanced subtree. Let R be the right child of P. Let L be the left child of P.
Right-Right case and Right-Left case: If the balance factor of P is −2, then the right subtree outweighs the
left subtree of the given node, and the balance factor of the right child (R) must be checked. If the balance
factor of R is ≤ 0, a left rotation is needed with P as the root. If the balance factor of R is +1, a double left
rotation (with respect to P) is needed. The first rotation is a right rotation with R as the root. The second is
a left rotation with P as the root.
Left-Left case and Left-Right case: If the balance factor of P is +2, then the left subtree outweighs the right
subtree of the given node, and the balance factor of the left child (L) must be checked. If the balance factor
of L is ≥ 0, a right rotation is needed with P as the root. If the balance factor of L is −1, a double right
rotation (with respect to P) is needed. The first rotation is a left rotation with L as the root. The second is a
right rotation with P as the root.
Trees
116
Data Structure and Files
Pictorial description of how rotations cause rebalancing tree, and then retracing one's steps toward the
root updating the balance factor of the nodes.
Deletion
If the node is a leaf, remove it. If the node is not a leaf, replace it with either the largest in its left subtree
(inorder predecessor) or the smallest in its right subtree (inorder successor), and remove that node. The
node that was found as a replacement has at most one subtree. After deletion, retrace the path back up
the tree (parent of the replacement) to the root, adjusting the balance factors as needed.
As with all binary trees, a node's in-order successor is the left-most child of its right subtree, and a node's
in-order predecessor is the right-most child of its left subtree. In either case, this node will have zero or one
children. Delete it according to one of the two simpler cases above.
Trees
117
Data Structure and Files
In addition to the balancing described above for insertions, if the balance factor for the tree is 2 and that of
the left subtree is 0, a right rotation must be performed on P. The mirror of this case is also necessary.
The retracing can stop if the balance factor becomes −1 or +1 indicating that the height of that subtree has
remained unchanged. If the balance factor becomes 0 then the height of the subtree has decreased by one
and the retracing needs to continue. If the balance factor becomes −2 or +2 then the subtree is unbalanced
and needs to be rotated to fix it. If the rotation leaves the subtree's balance factor at 0 then the retracing
towards the root must continue since the height of this subtree has decreased by one. This is in contrast to
an insertion where a rotation resulting in a balance factor of 0 indicated that the subtree's height has
remained unchanged.
The time required is O(log n) for lookup, plus a maximum of O(log n) rotations on the way back to the root,
so the operation can be completed in O(log n) time.
Objective Questions:-
A. The number of nodes on the longest path from the root to a leaf.
a) Height of tree b) Depth of tree c) length of tree
a) Binary Tree b.) Full Binary Tree c.) Complete Binary Tree
Trees
118
Data Structure and Files
Subjective Questions:-
ADT tree
Binary Tree
References:-
Trees
119