Unit-3 Searching
Unit-3 Searching
Searching
Symbol Table
Symbol table is used to store the information about the occurrence of various entities such as
objects, classes, variable name, interface, function name etc. it is used by both the analysis
and synthesis phases.
      It is used to store the name of all entities in a structured form at one place.
      It is used to verify if a variable has been declared.
      It is used to determine the scope of a name.
      It is used to implement type checking by verifying assignments and expressions in the
       source code are semantically correct.
A symbol table can either be linear or a hash table. Using the following format, it maintains
the entry for each name.
For example, suppose a variable store the information about the following variable
declaration:
Scope Management
A compiler maintains two types of symbol tables: a global symbol table which can be
accessed by all the procedures and scope symbol tables that are created for each scope in the
program.
To determine the scope of a name, symbol tables are arranged in hierarchical structure as
shown in the example below:
The above program can be represented in a hierarchical structure of symbol tables:
The global symbol table contains names for one global variable (int value) and two procedure
names, which should be available to all the child nodes shown above. The names mentioned
in the pro_one symbol table (and all its child tables) are not available for pro_two symbols
and its child tables.
This symbol table data structure hierarchy is stored in the semantic analyzer and whenever a
name needs to be searched in a symbol table, it is searched using the following algorithm:
      first a symbol will be searched in the current scope, i.e. current symbol table.
      if a name is found, then search is completed, else it will be searched in the parent
       symbol table until,
      either the name is found or global symbol table has been searched for the name.
   1. The efficiency of a program can be increased by using symbol tables, which give
      quick and simple access to crucial data such as variable and function names, data
      kinds, and memory locations.
   2. better coding structure Symbol tables can be used to organize and simplify code,
      making it simpler to comprehend, discover, and correct problems.
   3. Symbol tables can be used to facilitate easy access to and examination of a program’s
      state during execution, enhancing debugging by making it simpler to identify and
      correct mistakes.
   4. Improved code reuse: By offering a standardized method of storing and accessing
      information, symbol tables can be utilized to increase the reuse of code across
      multiple projects.
   5. Faster code execution: By offering quick access to information like memory
      addresses, symbol tables can be utilized to optimize code execution by lowering the
      number of memory accesses required during execution.
   6. Symbol tables can be used to increase the portability of code by offering a
      standardized method of storing and retrieving data, which can make it simpler to
      migrate code between other systems or programming languages.
   1. Increased memory consumption: Systems with low memory resources may suffer
      from symbol tables’ high memory requirements.
   2. Increased processing time: The creation and processing of symbol tables can take a
      long time, which can be problematic in systems with constrained processing power.
   3. Complexity: Developers who are not familiar with compiler design may find symbol
      tables difficult to construct and maintain.
   4. Limited scalability: Symbol tables may not be appropriate for large-scale projects or
      applications that require o the management of enormous amounts of data due to their
      limited scalability.
   5. Upkeep: Maintaining and updating symbol tables on a regular basis can be time- and
      resource-consuming.
   6. Limited functionality: It’s possible that symbol tables don’t offer all the features a
      developer needs, and therefore more tools or libraries will be needed to round out
      their capabilities.
   1. Resolution of variable and function names: Symbol tables are used to identify the data
      types and memory locations of variables and functions as well as to resolve their
      names.
   2. Resolution of scope issues: To resolve naming conflicts and ascertain the range of
      variables and functions, symbol tables are utilized.
   3. Symbol tables, which offer quick access to information such as memory locations, are
      used to optimize code execution.
   4. Code generation: By giving details like memory locations and data kinds, symbol
      tables are utilized to create machine code from source code.
   5. Error checking and code debugging: By supplying details about the status of a
      program during execution, symbol tables are used to check for faults and debug code.
   6. Code organization and documentation: By supplying details about a program’s
      structure, symbol tables can be used to organize code and make it simpler to
      understand.
What is a tree?
A tree is a kind of data structure that is used to represent the data in hierarchical form. It can
be defined as a collection of objects or entities called as nodes that are linked together to
simulate a hierarchy. Tree is a non-linear data structure as the data in a tree is not stored
linearly or sequentially.
A binary search tree follows some order to arrange the elements. In a Binary search tree, the
value of left node must be smaller than the parent node, and the value of right node must be
greater than the parent node. This rule is applied recursively to the left and right subtrees of
the root.
Similarly, we can see the left child of root node is greater than its left child and smaller than
its right child. So, it also satisfies the property of binary search tree. Therefore, we can say
that the tree in the above image is a binary search tree.
       Searching an element in the Binary search tree is easy as we always have a hint that
        which subtree has the desired element.
       As compared to array and linked lists, insertion and deletion operations are faster in
        BST.
Insertion Operation
Whenever an element is to be inserted, first locate its proper location. Start searching from
the root node, then if the data is less than the key value, search for the empty location in the
left subtree and insert the data. Otherwise, search for the empty location in the right subtree
and insert the data.
Algorithm
1. START
2. If the tree is empty, insert the first element as the root node of the
3. If an element is less than the root value, it is added into the left
4. If an element is greater than the root value, it is added into the right
5. The final leaf nodes of the tree point to NULL values as their
child nodes.
6. END
Now, let's see the creation of binary search tree using an example.
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
       First, we have to insert 45 into the tree as the root of the tree.
       Then, read the next element; if it is smaller than the root node, insert it as the root of
        the left subtree, and move to the next element.
       Otherwise, if the element is larger than the root node, then insert it as the root of the
        right subtree.
Now, let's see the process of creating the Binary search tree using the given data element. The
process of creating the BST is shown below –
As 15 is smaller than 45, so insert it as the root node of the left subtree.
As 79 is greater than 45, so insert it as the root node of the right subtree.
Step 4 - Insert 90.
90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
55 is larger than 45 and smaller than 79, so it will be inserted as the left subtree of 79.
Step 7 - Insert 12.
12 is smaller than 45 and 15 but greater than 10, so it will be inserted as the right subtree of
10.
20 is smaller than 45 but greater than 15, so it will be inserted as the right subtree of 15.
Step 9 - Insert 50.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a left subtree of 55.
Searching means to find or locate a specific element or node in a data structure. In Binary
search tree, searching a node is easy because elements in BST are stored in a specific order.
steps of searching a node in Binary Search tree are listed as follows –
        First, compare the element to be searched with the root element of the tree.
        If root is matched with the target element, then return the node's location.
        If it is not matched, then check whether the item is less than the root element, if it is
         smaller than the root element, then move to the left subtree.
        If it is larger than the root element, then move to the right subtree.
        Repeat the above procedure recursively until the match is found.
        If the element is not found or not present in the tree, then return NULL.
Now, let's understand the searching in binary tree using an example. We are taking the binary
search tree formed above. Suppose we have to find node 20 from the below tree.
Step1:
Step2:
Step3:
Algorithm to search for a item (key) in a given Binary Search Tree:
1. START
5. If the key does not match with the value in the root,
9. END
In a binary search tree, we must delete a node from the tree by keeping in mind that the
property of BST is not violated. To delete a node from BST, there are three possible
situations occur -
Deleting a node with both children is not so simple. Here we have to delete the node is such a
way, that the resulting tree follows the properties of a BST.
The trick is to find the inorder successor of the node. Copy contents of the inorder successor
to the node, and delete the inorder successor.
   https://www.guru99.com/binary-search-tree-data-structure.html
Algorithm of Deletion operation in BST
1. If the root is null, the tree is empty, and there is nothing to delete.
2. Start from the root and traverse the tree to find the node with the given key to be deleted.
- Set the left child of the current node to the result of the recursive deletion.
- Set the right child of the current node to the result of the recursive deletion.
c. If the key is equal to the value of the current node (node to be deleted is found):
         Time Complexity
Where 'n' is the number of nodes in the given tree.
 Space Complexity
      The node's left subtree contains only nodes with data values lower than the parent
       node's data.
      The node's right subtree contains only nodes with data higher than the parent node's
       data.
      In a BST, the left and right subtree must also be a binary search tree.
      Each node in the binary search tree can have at most two children.
      BSTs are widely used in compilers and interpreters to implement symbol tables,
       where symbols (e.g., variables, functions) are efficiently stored for quick retrieval.
      In databases, BSTs are employed for indexing data, facilitating rapid search and
       retrieval operations.
      File systems often use BSTs to organize directory structures, allowing for efficient
       navigation and retrieval of file paths.
      BSTs are used in applications like text editors and search engines to provide quick
       auto-completion and search suggestion features.
      BSTs can be adapted to implement priority queues, allowing for efficient retrieval of
       elements with the highest or lowest priority.
      BSTs can be adapted to implement priority queues, allowing for efficient retrieval of
       elements with the highest or lowest priority.
      BSTs are employed in Huffman coding, a compression algorithm, to efficiently
       represent variable-length codes for characters.
      BST is fast in insertion and deletion when balanced. It is fast with a time complexity
       of O(log n).
      BST is also for fast searching, with a time complexity of O(log n) for most operations.
      BST is efficient. It is efficient because they only store the elements and do not require
       additional memory for pointers or other data structures.
      We can also do range queries – find keys between N and M (N <= M).
      BST code is simple as compared to other data structures.
      BST can automatically sort elements as they are inserted, so the elements are always
       stored in a sorted order.
      BST can be easily modified to store additional data or to support other operations.
       This makes it flexible.
      The main disadvantage is that we should always implement a balanced binary search
       tree. Otherwise the cost of operations may not be logarithmic and degenerate into a
       linear search on an array.
      They are not well-suited for data structures that need to be accessed randomly, since
       the time complexity for search, insert, and delete operations is O(log n), which is
       good for large data sets, but not as fast as some other data structures such as arrays or
       hash tables.
      A BST can be imbalanced or degenerated which can increase the complexity.
      Do not support some operations that are possible with ordered data structures.
      They are not guaranteed to be balanced, which means that in the worst case, the
       height of the tree could be O(n) and the time complexity for operations could degrade
       to O(n).
      The height of the left and right tree for any node does not differ by more than 1.
      The left subtree of that node is also balanced.
      The right subtree of that node is also balanced.
The height of a tree is the number of edges on the longest path between the root node and the
leaf node.
The above tree is a binary search tree. A binary search tree is a tree in which each node
on the left side has a lower value than its parent node, and the node on the right side has
a higher value than its parent node.
In the above tree, n1 is a root node, and n4, n6, n7 are the leaf nodes. The n7 node is the
farthest node from the root node. The n4 and n6 contain 2 edges and there exist three edges
between the root node and n7 node. Since n7 is the farthest from the root node;
therefore, the height of the above tree is 3.
Now we will see whether the above tree is balanced or not. The left subtree contains the
nodes n2, n4, n5, and n7, while the right subtree contains the nodes n3 and n6. The left
subtree has two leaf nodes, i.e., n4 and n7. There is only one edge between the node n2 and
n4 and two edges between the nodes n7 and n2; therefore, node n7 is the farthest from the
root node. The height of the left subtree is 2. The right subtree contains only one leaf
node, i.e., n6, and has only one edge; therefore, the height of the right subtree is 1. The
difference between the heights of the left subtree and right subtree is 1. Since we got the
value 1 so we can say that the above tree is a height-balanced tree. This process of calculating
the difference between the heights should be performed for each node like n2, n3, n4, n5, n6
and n7. When we process each node, then we will find that the value of k is not more than 1,
so we can say that the above tree is a balanced binary tree.
In the above tree, n6, n4, and n3 are the leaf nodes, where n6 is the farthest node from
the root node. Three edges exist between the root node and the leaf node; therefore, the
height of the above tree is 3.
When we consider n1 as the root node, then the left subtree contains the nodes n2, n4, n5, and
n6, while subtree contains the node n3. In the left subtree, n2 is a root node, and n4 and n6
are leaf nodes. Among n4 and n6 nodes, n6 is the farthest node from its root node, and n6 has
two edges; therefore, the height of the left subtree is 2. The right subtree does have any child
on its left and right; therefore, the height of the right subtree is 0. Since the height of the left
subtree is 2 and the right subtree is 0, so the difference between the height of the left subtree
and right subtree is 2.
According to the definition, the difference between the height of left sub tree and the
right subtree must not be greater than 1. In this case, the difference comes to be 2,
which is greater than 1; therefore, the above binary tree is an unbalanced binary search
tree.
Let's understand the need for a balanced binary tree through an example.
                                        Tree Diagram 1
The above tree is a binary search tree because all the left subtree nodes are smaller than its
parent node and all the right subtree nodes are greater than its parent node.
Suppose we want to want to find the value 79 in the above tree. First, we compare the value
of node n1 with 79; since the value of 79 is not equal to 35 and it is greater than 35 so we
move to the node n3, i.e., 48. Since the value 79 is not equal to 48 and 79 is greater than 48,
so we move to the right child of 48. The value of the right child of node 48 is 79 which is
equal to the value to be searched. The number of hops required to search an element 79 is 2
and the maximum number of hops required to search any element is 2. The average case to
search an element is O(logn).
                                        Tree Diagram 2
The above tree is also a binary search tree because all the left subtree nodes are smaller
than its parent node and all the right subtree nodes are greater than its parent node.
Suppose we want to find the find the value 79 in the above tree. First, we compare the value
79 with a node n4, i.e., 13. Since the value 79 is greater than 13 so we move to the right child
of node 13, i.e., n2 (21). The value of the node n2 is 21 which is smaller than 79, so we again
move to the right of node 21. The value of right child of node 21 is 29. Since the value 79 is
greater than 29 so we move to the right child of node 29. The value of right child of node 29
is 35 which is smaller than 79 so we move to the right child of node 35, i.e., 48. The value 79
is greater than 48, so we move to the right child of node 48. The value of right child node of
48 is 79 which is equal to the value to be searched. In this case, the number of hops required
to search an element is 5. In this case, the worst case is O(n).
If the number of nodes increases, the formula used in the tree diagram1 is more efficient than
the formula used in the tree diagram2. Suppose the number of nodes available in both above
trees is 100,000. To search any element in a tree diagram2, the time taken is 100,000µs
whereas the time taken to search an element in tree diagram is log(100,000) which is equal
16.6 µs. We can observe the enormous difference in time between above two trees.
Therefore, we conclude that the balance binary tree provides searching more faster than linear
tree data structure.
AVL trees are binary search trees in which the difference between the height of the left and
right subtree is either -1, 0, or +1.
The difference between the heights of the left subtree and the right subtree for any node is
known as the balance factor of the node.
Tree is said to be balanced if balance factor of each node is in between -1 to 1, otherwise, the
tree will be unbalanced and need to be balanced.
AVL trees are also called a self-balancing binary search tree. These trees help to maintain the
logarithmic search time. It is named after its inventors (AVL) Adelson, Velsky, and Landis.
      If balance factor of any node is 1, it means that the left sub-tree is one level higher
       than the right sub-tree.
      If balance factor of any node is 0, it means that the left sub-tree and right sub-tree
       contain equal height.
      If balance factor of any node is -1, it means that the left sub-tree is one level lower
       than the right sub-tree.
An AVL tree is given in the following figure. We can see that, balance factor associated with
each node is in between -1 and +1. therefore, it is an example of AVL tree.
Complexity
There are usually four cases of rotation in the balancing algorithm of AVL trees: LL, RR, LR,
RL.
LL Rotation
When BST becomes unbalanced, due to a node is inserted into the left subtree of the left
subtree of C, then we perform LL rotation, LL rotation is clockwise rotation, which is applied
on the edge below a node having balance factor 2.
In above example, node C has balance factor 2 because a node A is inserted in the left subtree
of C left subtree. We perform the LL rotation on the edge below A.
Example:
RR Rotation
When BST becomes unbalanced, due to a node is inserted into the right subtree of the right
subtree of A, then we perform RR rotation, RR rotation is an anticlockwise rotation, which is
applied on the edge below a node having balance factor -2
In above example, node A has balance factor -2 because a node C is inserted in the right
subtree of A right subtree. We perform the RR rotation on the edge below A.
Example
LR Rotation
Double rotations are bit tougher than single rotation which has already explained above. LR
rotation = RR rotation + LL rotation, i.e., first RR rotation is performed on subtree and then
LL rotation is performed on full tree, by full tree we mean the first node from the path of
inserted node whose balance factor is other than -1, 0, or 1.
      A node B has been inserted into the right subtree of A the left subtree of C, because of
       which C has become an unbalanced node having balance factor 2. This case is L R
       rotation where: Inserted node is in the right subtree of left subtree of C.
   Now we perform LL clockwise rotation on full tree, i.e. on node C. node C has now
    become the right subtree of node B, A is left subtree of B.
 Balance factor of each node is now either -1, 0, or 1, i.e. BST is balanced now.
    Example:
RL Rotation
      A node B has been inserted into the left subtree of C the right subtree of A, because of
       which A has become an unbalanced node having balance factor - 2. This case is RL
       rotation where: Inserted node is in the left subtree of right subtree of A.
      After performing LL rotation, node A is still unbalanced, i.e. having balance factor -2,
       which is because of the right-subtree of the right-subtree node A.
Example:
    Algorithm
    The following steps are involved in performing the insertion operation of an AVL
    Tree –
    Step 1 − Create a node
    Step 2 − Check if the tree is empty
    Step 3 − If the tree is empty, the new node created will become the
         root node of the AVL Tree.
    Step 4 − If the tree is not empty, we perform the Binary Search Tree
         insertion operation and check the balancing factor of the node
         in the tree.
Step 5 − Suppose the balancing factor exceeds ±1, we apply suitable
     rotations on the said node and resume the insertion from Step 4.
let’s consider an example where we wish to create an AVL Tree by inserting the
elements: 10, 20, 30, 40, and 50.
The following demonstrates how the given elements are inserted one by one in the
AVL Tree:
Construct AVL Tree for the following sequence of numbers-
50 , 20 , 60 , 10 , 8 , 15 , 32 , 46 , 11 , 48
Insert 50
Insert 20
Insert 60
       Insert 10
Insert 8
      Find the first imbalanced node on the path from the newly inserted node (node 8) to
       the root node.
      The first imbalanced node is node 20.
      Now, count three nodes from node 20 in the direction of leaf node.
      Then, use AVL tree rotation to balance the tree.
      Find the first imbalanced node on the path from the newly inserted node (node 15) to
       the root node.
      The first imbalanced node is node 50.
      Now, count three nodes from node 50 in the direction of leaf node.
      Then, use AVL tree rotation to balance the tree.
Insert 46
Insert 11
Insert 48
This is the final balanced AVL tree after inserting all the given elements.
Applications of AVL Tree:
      It is used to index huge records in a database and also to efficiently search in that.
      For all types of in-memory collections, including sets and dictionaries, AVL Trees are
       used.
      Database applications, where insertions and deletions are less common but frequent
       data lookups are necessary
      Software that needs optimized search.
      It is applied in corporate areas and storyline games.
      It is difficult to implement.
      It has high constant factors for some of the operations.
      Less used compared to Red-Black trees.
      Due to its rather strict balance, AVL trees provide complicated insertion and removal
       operations as more rotations are performed.
      Take more processing for balancing.
Hash Tables
Hash Table is a data structure which stores data in an associative manner. In a hash table,
data is stored in an array format, where each data value has its own unique index value.
Access of data becomes very fast if we know the index of the desired data.
Thus, it becomes a data structure in which insertion and search operations are very fast
irrespective of the size of the data. Hash Table uses an array as a storage medium and uses
hash technique to generate an index where an element is to be inserted or is to be located
from.
Hashing
Hashing is one of the searching techniques that uses a constant time. The time complexity in
hashing is O(1).
In Hashing technique, the hash table and hash function are used. Using the hash function, we
can calculate the address at which the value can be stored.
The main idea behind the hashing is to create the (key/value) pairs. If the key is given, then
the algorithm computes the index at which the value would be stored. It can be written as:
Index = hash(key)
When we pass the key in the hash function, then it gives the index.
Hash(john) = 3;
Hash Function
A Hash Function is a function that converts a given numeric or alphanumeric key to a small
practical integer value. The mapped integer value is used as an index in the hash table. In
simple terms, a hash function maps a significant number or string to a small integer that can
be used as the index in the hash table.
The pair is of the form (key, value), where for a given key, one can find a value using some
kind of a “function” that maps keys to values. The key for a given object can be calculated
using a function called a hash function.
In this illustration:
       The hash table has five slots or buckets, labeled with index numbers from 0 to 4.
       Initially, all buckets are empty.
This represents a basic hash table with three key-value pairs. Each key is hashed to determine
the index in the array, and the corresponding value is stored in the respective bucket.
      Hash tables are frequently used for indexing and searching massive volumes of data.
       A search engine might use a hash table to store the web pages that it has indexed.
      Data is usually cached in memory via hash tables, enabling rapid access to frequently
       used information.
      Hash functions are frequently used in cryptography to create digital signatures,
       validate data, and guarantee data integrity.
      Hash tables can be used for implementing database indexes, enabling fast access to
       data based on key values.
      Fast Data Retrieval: Hash tables provide constant-time average complexity for
       insertion, deletion, and lookup operations. This results in fast data retrieval, especially
       when the size of the data set is large.
      Scalability: The performance of hash tables remains relatively constant even as the
       size of the data set increases. This scalability makes hash tables suitable for handling
       large amounts of data.
      Versatility: Hash tables are versatile and can be used in various applications,
       including implementing dictionaries, maps, sets, caches, and more.
      Constant Average Time Complexity: On average, the time complexity of hash table
       operations (insertion, deletion, and lookup) is O(1), making them highly efficient.
      Support for Dynamic Data Sets: Hash tables can handle dynamic data sets well,
       adapting to changes in size without a significant degradation in performance.
      Widely Supported in Programming Languages: Many programming languages
       provide built-in support for hash tables or similar structures, making them easy to use
       and implement.
      Associative Storage: Hash tables use an associative storage model, allowing data to
       be stored in key-value pairs. This enables logical and efficient organization of data.
      Memory Usage: hash tables may use more memory than other data structures, as they
       allocate space for all possible slots in the array, even if many of them remain empty.
      Not Cache-Friendly: The memory access patterns in hash tables may not be cache-
       friendly, leading to potentially slower performance, especially for large datasets.
      Collisions: One of the primary challenges with hash tables is the possibility of
       collisions, where two different keys produce the same hash code. Collisions can
       degrade the performance of hash table operations.
      Key Uniqueness Requirement: Hash tables require that keys be unique within the
       table. If there's a need to store multiple entries with the same key, a different data
       structure might be more appropriate.
For lookup, insertion, and deletion operations, hash tables have an average-case time
complexity of O(1). Yet, these operations may, in the worst case, require O(n) time, where n
is the number of elements in the table.
***