OPTIMAL BINARY SEARCH TREES
An Optimal Binary Search Tree (OBST), also known as a Weighted
Binary Search Tree, is a binary search tree that minimizes the
expected search cost. In a binary search tree, the search cost is the
number of comparisons required to search for a given key.
In an OBST, each node is assigned a weight that represents the
probability of the key being searched for. The sum of all the weights
in the tree is 1.0. The expected search cost of a node is the sum of the
product of its depth and weight, and the expected search cost of its
children.
To construct an OBST, we start with a sorted list of keys and their
probabilities. We then build a table that contains the expected search
cost for all possible sub-trees of the original list. We can use dynamic
programming to fill in this table efficiently. Finally, we use this table
to construct the OBST.
The time complexity of constructing an OBST is O(n^3), where n is
the number of keys. However, with some optimizations, we can
reduce the time complexity to O(n^2). Once the OBST is constructed,
the time complexity of searching for a key is O(log n), the same as for
a regular binary search tree.
The OBST is a useful data structure in applications where the keys
have different probabilities of being searched for. It can be used to
improve the efficiency of searching and retrieval operations in
databases, compilers, and other computer programs.
Given a sorted array key [0.. n-1] of search keys and an array freq[0..
n-1] of frequency counts, where freq[i] is the number of searches
for keys[i]. Construct a binary search tree of all keys such that the
total cost of all the searches is as small as possible.
Let us first define the cost of a BST. The cost of a BST node is the
level of that node multiplied by its frequency. The level of the root is
1.
Examples:
Input: keys[] = {10, 12}, freq[] = {34, 50}
There can be following two possible BSTs
10 12
\ /
12 10
I II
Frequency of searches of 10 and 12 are 34 and 50 respectively.
The cost of tree I is 34*1 + 50*2 = 134
The cost of tree II is 50*1 + 34*2 = 118
1) Optimal Substructure:
The optimal cost for freq[i..j] can be recursively calculated
using the following formula.
We need to calculate optCost(0, n-1) to find the result.
The idea of above formula is simple, we one by one try all nodes
as root (r varies from i to j in second term). When we
make rth node as root, we recursively calculate optimal cost
from i to r-1 and r+1 to j.
We add sum of frequencies from i to j (see first term in the
above formula)
2)