FILES ORGANIZATION
The tree structures
Outlines
• Introduction
• M-ary search trees
• B-Trees
• Conclusion
Introduction
• Index table methods are limited to certain types of files (small files or
static files). Methods based on tree structures are better suited for
large and/or dynamic files. To better utilize block space, m-ary trees
are employed.
M-ary Search Trees
• An m-ary search tree of order n is a tree where each node can have at
most n children (Children[1..n]) and n-1 values (Val[1..n-1]). The
values within a node are ordered in ascending order.
M-ary Search Trees
• The children are organized based on the node values, following the rules:
• i) Child[1] points to a subtree containing values < Val[1]
• ii) Child[i] points to a subtree containing values > Val[i-1] and < Val[i], for
i=2..n-1
• iii) Child[n] points to a subtree containing values > Val[n-1]
• The degree of a node (the number of children) is the number of stored
values plus one.
• The figure depicts an m-ary search tree of order 5 with the root node a.
• Node b is the first child of a, and f is the fifth child of a. An internal node
may have some children set to nil while others are not. For instance, all
children of b except the third one are set to nil. Similarly, the first, third,
and fourth children of node d are set to nil, while its second child (pointing
to h) and fifth child (pointing to i) are not nil.
Note
• The general structure of a block is as follows:
• Type BlockType = Structure
• Val: array[1..N-1] of SomeType; // records or (keys-address)
• Children: array[1..N] of integer; // block numbers
• Degree: integer; // number of elements in Children
• End
• Among the characteristics of a file viewed as a tree, there is the root.
It is the block number containing the root of the tree. For example, if
the tree in the previous figure were a file, the content of block d
would be as follows:
Val 42 50 55 60 5 Val 10 2
Child -1 h -1 -1 i Degree Child -1 -1 Degree
1 2 3 4 5 1 2 3 4 5
Search
• The search for a value C begins in the root node P and continues
along a branch:
• 1- If C exists in P, then stop successfully.
• 2- If C does not exist in P, let k be the position in P where C should be
inserted (to keep the values ordered).
• P := Children[k]
• If P is not Nil, then go to 1; otherwise, stop with failure.
• For example, the search for 48 in the tree from the previous figure proceeds as
follows:
• Start the search in node a. The value 48 does not exist and is between 40 and 67.
So, the next node to visit is Children[3] (node d).
• Continue the search in node d, and the next node to explore is Children[2]
because 48 is between 42 and 50.
• Continue the search in node h. The value 48 exists (Val[2]), so stop successfully.
• If we were searching for the value 15, we would have first visited node a, then
node b (Children[1] of a, because 15 < 24). There, we would stop with failure
because 15 is between 12 and 20, and Children[4] of b is nil.
Insert
• To insert a new value V into an m-ary search tree, we first search for the value. The
search also returns the index k where V should be inserted if there is space in node P.
• If P is not full,
➢insert V into P, shifting elements to keep the array of values ordered. Else (P is
full):
➢allocate a new node Q containing a single value (Val[1] = V) and two children set
to nil (Children[1] = nil and Children[2] = nil).
➢make the Children[k] of P point to the new node Q (which was, before the
insertion, necessarily nil). The index k is the one returned by the search.
• For example, if we insert the value 64 into the tree from the previous
figure, we proceed as follows:
1. Search for 64 → failure (the last visited node is i, and the position
where 64 should be inserted is k=2).
2. Since the node i is not full, we can insert 64 at position 2 by shifting
to the right the values > 64.
• If we insert the value 62, we will allocate a new node (j) that contains
62, and it will be pointed to by the Children[1] of node i (where i is the
last visited node in the search for 62, and the position where 62 should
be inserted in i, if there were space, is k = 1).
B-Trees
• These are m-ary search trees that always remain balanced and are
therefore very useful for managing large and dynamic files. For
simplicity, an odd order is chosen (N = 2d+1). In a B-Tree of order N:
• All nodes except the root are filled at least 50% (i.e., d values).
• The root can contain a minimum of 1 value.
• All branches have the same length (completely balanced tree).
• Search in a B-Tree is similar to searching in an m-ary search tree. The
difference lies in the insertion and deletion algorithms.
Insertion Algorithm
• To insert a value V into a B-Tree with root R, proceed as follows:
1. Insert V and its right child (initially set to -1), i.e., if V is inserted at position j in a node, then the
right child RC should also be inserted at position j+1 in the array of children.
2. Search for V to verify that it does not exist and find the last visited leaf node (P).
3. If P is not full, insert (V, RC) into P by internal shifting and stop.
4. If P is already full, it will "split" into two nodes (P and Q, a new allocated node):
a) Form the "ordered sequence" of values in P along with the value V.
b) Assign the first d values with the first d+1 children to node P.
c) Assign the last d values with the last d+1 children to the new node Q.
d) Insert the middle value (the one at position d+1 in the ordered sequence) into the parent
node of P. The right child of this value will be node Q.
• Go to 2.
• For example, the insertion of 76 into the B-Tree from the previous
figure proceeds as follows:
1. Search for 76 → found = FALSE, the last visited node = e (it's a
leaf), and the position where 76 should be inserted is 2.
2. Since the node e is not full, insert 76 with its right child -1 at
positions 2 in the Val array and 3 in the children array of node e.
• If we now insert the value 57, the last visited node in the search will be d. The position of 57 in d should then
be 4 (and its RC at position 5).
• Since node d is already full, there will be a "split" (meaning, allocation of a new node, let's say f):
a) Form the "ordered sequence":
• Val: 42, 50, 55, 57, 60
• Fils: -1, -1, -1, -1, -1, -1
b) Assign the first d values (42, 50) and the first d+1 children (-1, -1, -1) to node d.
• Assign the last d values (57, 60) with the last d+1 children (-1, -1, -1) to the new node f.
• Insert the middle value (55) with its right child (f) into the parent node (a).
• Since node a is not full, the insertion of (55, f) can be done by internal shifting.
• If we continue the insertion of 7, node b will "split" (resulting in the allocation of a new node,
let's call it g). The ordered sequence of values in b plus the value 7 is formed:
• Val: 2, 5, 7, 12, 20
• Fils: -1, -1, -1, -1, -1, -1
• The values (2 and 7) with the first 3 children remain in b. The values (12 and 20) with the last 3
children will be assigned to the new node g. The middle value (7) with its right child node g is
inserted into the parent node (a).
• As node a is also full, it will "split" in turn. A new node h is then allocated, and the ordered
sequence is formed:
• Val: 7, 24, 40, 55, 70
• Fils: b, g, c, d, f, e
• The first d values (7 and 24) with the first 3 children (b, g, and c) remain in a. The last d values
(55 and 70) with the last 3 children (d, f, and e) are assigned to the new node h. The middle
value (40) with its right child node h is inserted into the parent of node a.
• As the node a was the root of the tree (thus having no parent), a new root (i)
is then allocated containing only the value 40. The left child (Children[1]) is
the old root (a), and the right child (Children[2]) is the node h. The
following figure shows the resulting tree:
• When the root of a B-Tree splits (due to an insertion), the depth of the tree
increases by one level.
Conclusion
In conclusion, tree methods
• Have advantages such as: • Disadvantages include:
• The size problem is no longer an • Intended for static files.
issue. • Imbalance of the B-Tree, with no
• Ordered file. solution except periodic
reorganization.