Tries
Introduction
Suppose we want to implement a word dictionary using a C++ program and
perform the following functions:
● Addition of the word(s)
● Searching a word
● Removal of the word(s)
To do the same, hashmaps can be thought of as an efficient data structure as the
average case time complexity of insertion, deletion, and retrieval is O(1) for
integer, character, float, and decimal values.
Let us discuss the time complexity of the same in the case of strings.
Suppose we want to insert string abc in our hashmap. To do so, first, we would
need to calculate the hashcode for it, which would require the traversal of the
whole string abc. Hence, the time taken will be the length of the entire string
irrespective of the time taken to perform any of the above operations. Thus, we can
conclude that the insertion of a string will be O(string_length).
To search a word in the hashmap, we again have to calculate the hashcode of the
string to be searched, and for that also, it would require O(string_length) time.
Similarly, for removing a word from the hashmap, we would have to calculate the
hashcode for that string. It would again require O(string_length) time.
For the same purpose, we can use another data structure known as tries.
1
Below is the visual representation of the trie:
Here, the node at the top named as the start is the root node of our n-ary tree.
Suppose we want to insert the word ARE in the trie. We will begin from the root,
search for the first word A and then insert R as its child and similarly insert the
letter E. You can see it as the left-most path in the above trie.
Now, we want to insert the word AS in the trie. We will follow the same procedure
as above. First-of-all, we find the letter A as the child of the root node, and then we
will search for S. If S was already present as the child of A, then we will do nothing
as the given word is already present otherwise, we will insert S. You can see this in
the above trie. Similarly, we added other words in the trie.
This approach also takes O(word_length) time for insertion.
2
Moving on to searching a word in the trie, for that also, we would have to traverse
the complete length of the string, which would take O(word_length) time.
Let us take an example. Suppose we want to search for the word DOT in the above
trie, we will first-of-all search for the letter D as the direct child of the start node and
then search for O and T and, then we will return true as the word is present in the
trie.
Consider another example AR, which we want to search for in the given trie.
Following the above approach, we would return true as the given word is present in
it. However, ideally, we should return false as the actual word was ARE and not AR.
To overcome this, it can be observed that in the above trie, some of the letters are
marked with a bolder circle, and others are not. This boldness represents the
termination of a word starting from the root node. Hence, while searching for a
word, we will always be careful about the last letter that should be bolded to
represent the termination.
Note: While insertion in a trie, the last letter of the word should be bolded as a
mark of termination of a word.
In the above trie, the following are all possibilities that can occur:
● ARE, AS
● DO, DOT
● NEW, NEWS, NOT
● ZEN
Here, to remove the word, we will simply unbold the terminating letter as it will
make that word invalid. For example: If you want to remove the string DO from the
above trie by preserving the occurrence of string DOT, then we will reach O and
then unbold it. This way the word DO is removed but at the same time, another
3
word DOT which was on the same path as that of word DO was still preserved in the
trie structure.
For removal of a word from trie, the time complexity is still O(word_length) as we
are traversing the complete length of the word to reach the last letter to unbold it.
When to use tries over hashmaps?
It can be observed that by using tries, we can improve the space complexity.
For example, We have 1000 words starting from character A that we want to store.
Now, if you try to hold these words using hashmap, then for each word, we have to
store character A differently. But in case of tries, we only need to store the
character A once. In this way, we can save a lot of space, and hence space
optimization leads us to prefer tries over hashmaps in such scenarios.
In the beginning, we thought of implementing a dictionary. Let's recall a feature of a
dictionary, namely Auto-Search. While browsing the dictionary, we start by typing a
character. All the words beginning from that character appear in the search list. But
this functionality can't be achieved using hashmaps as in the hashmap, the data
stored is independent of each other, whereas, in the case of tries, the data is stored
in the form of a tree-like structure. Hence, here also, tries to prove to be efficient
over hashmaps.
4
TrieNode class implementation
Follow the below-mentioned code (with comments)...
typedef struct Node {
Node *next[26]; // Array of pointer or reference to next node
int cnt; // It will contain how many times a node gets visited
bool isEnding; // It will be TRUE if the word terminates at
this character
Node(){ // Constructor for values initialization
cnt = 0;
isEnding = false;
for(int i=0; i<26; i++) next[i] = NULL;
}
}Node;
5
Insert function
To insert a word in a trie, we will use recursion. Suppose we have the following trie:
Now we want to insert the word BET in our trie.
Recursion says that we need to work on a smaller problem, and the rest will handle
itself. So, we will do it on the root node.
Note: Practically, the functionality of bolding the terminal character is achieved
using the boolean isEnding variable for that particular node. If this value is true
means that the node is the terminal value of the string, otherwise not.
Approach: We will first search for letter B and check if it is present as the children
of the root node or not and then call recursion on it. If B is present as a child of the
root node as in our case it is, then we will simply recurse over it by shifting the
6
length of the string by 1. In case, character B was not the direct child of the root
node, then we have to create one and then call recursion on it. After the recursive
call, we will see that be is now a root node, and the character E is now the word we
are searching for. We will follow the same procedure as done in searching for
character B against the root node and then move forward to the next character of
the string, i.e., T, which happens to be the last character of our string. Now we will
check character T as the child of character E. In our case, it is not a child of
character E, so we’ll create it. As T is the last character of the string, so we will mark
its isEnding value to True.
Following will be the three steps of recursion:
● Small Calculation: We will check if the root node has the first character of
the string as one of its children or not. If not, then we will create one.
● Recursive call: We will tell the recursion to insert the remaining string in the
subtree of our trie.
● Base Case: If we traveled the whole string then we need to mark the
isEnding for the last character as True.
Follow the code below, along with the comments…
typedef struct Node {
Node *next[26];
int cnt;
bool isEnding;
Node(){
cnt = 0;
isEnding = false;
for(int i=0; i<26; i++) next[i] = NULL;
}
}Node;
7
void insert(Node *curr, string s, int index) {
curr->cnt += 1;
// Base case
if(index == s.length()) {
curr->isEnding = true;
return;
}
int nextIndex = (int)(s[index] - 'a');// As for 'a' refers to index
0, 'b' refers to index 2 and so on, so to reach the correct index we will
do so
if(curr->next[nextIndex] == NULL) {// If the current character of
string is not present as the child node of the previous node then create
the new node
curr->next[nextIndex] = new Node();
}
// Recursive call
insert(curr->next[nextIndex], s, index+1);
}
// For user
Node *root = new Node();
void insertWord(string word) {
insert(root, "word", 0);
}
Search in tries
Objective: Create a search function that will get a string as its argument and
returns the total count of string that will contain the given string as its prefix.
Approach:
We will be using the same process as that used during insertion. We will call
recursion over the root node after searching for the first character as its child. If the
first character is not present as one of the children of the root node, then we will
8
simply return 0; otherwise, we will send the remaining string to the recursion. When
we reach the end of the string, then check for the last character’s cnt value; and we
will return the value from there.
typedef struct Node {
Node *next[26];
int cnt;
bool isEnding;
Node(){
cnt = 0;
isEnding = false;
for(int i=0; i<26; i++) next[i] = NULL;
}
}Node;
int search(Node *curr, string s, int index) {
if(curr == NULL) {
return 0;
}
if(index == s.length()) {
return curr->cnt;
}
int nextIndex = (int)(s[index] - 'a');
return search(curr->next[nextIndex], s, index+1);
}
// For user
Node *root = new Node();
insert(root, "abc", 0);
insert(root, "aba", 0);
insert(root, "ab", 0);
insert(root, "aca", 0);
cout<<search(root, "ab", 0)<<"\n";
9
Deletion in Tries
Objective: To delete the given word from our trie.
Approach: First-of-all we need to search for the word in the trie, and if found, then
we simply need to mark the isEnding value of the last character of the word to
false as that will simply denote that the word does not exist in our trie.
Check out the code below: (Nearly same as that of insertion, just minor changes
which are explained alongside)
typedef struct Node {
Node *next[26];
int cnt;
bool isEnding;
Node(){
cnt = 0;
isEnding = false;
for(int i=0; i<26; i++) next[i] = NULL;
}
}Node;
Node* remove(Node *curr, string s, int index) {
curr->cnt -= 1; // As we are deleting one of the strings
//if there is no longer any string that will basically have
this prefix
if(curr->cnt==0)return NULL;
if(index == s.length()) {
curr->isEnding = false;
return curr;
}
int nextIndex = (int)(s[index] - 'a');
10
curr->next[nextIndex] = remove(curr->next[nextIndex], s,
index+1);
return curr;
}
// For user
Node *root = new Node();
insert(root, "abc", 0);
insert(root, "aba", 0);
insert(root, "ab", 0);
insert(root, "aca", 0);
cout<<search(root, "ab", 0)<<"\n";
root = remove(root, "abc", 0);
cout<<search(root, "ab", 0) <<"\n"; // returns the count
Time Complexity of Various Functions
For a given string of length L, the time complexities of various operations in a Trie are as follows:
Operations Time Complexity
insert(word) O(L)
search(word) O(L)
delete(word) O(L)
Applications of Tries
● Tries are used as data structures to implement functionalities like dictionaries, lookup
tables, matching algorithms, etc.
● They are also used for many practical applications such as auto-complete in editors and
mobile applications.
● They are used in phone book search applications where efficient searching of a
large number of records is required.
11
Search Engine
Problem Statement: We are given N strings along with their weights. We get Q
queries and in each query, we get a prefix and we have to find the string with the
best weight that contains that prefix.
Explanation: We need the node in the trie to store the optimal weight among the
prefixes that are formed by its children. Suppose we are given strings “abc”, “bcd”,
“abe” with weights 5, 6, 7 respectively. Let’s take a look at the image given below to
see how the trie is formed for this example:
Now we get the query along with the prefix “ab” so “ab” occurs in “abc” and “abe”.
So as shown in the diagram at nodes a and b in the LHS, we store the optimal
weight as the answer which in this case is 7 from the string “abe”.
12
Algorithm:
● Create a trie data structure to store the given input word.
● Store an extra parameter ‘maxSubtree’ which represents the maximum
weight string with that prefix.
● Run a loop from ‘i’ = 0 to the length of ‘arr1’ - 1, and insert the word at ‘arr1[i]’
into the trie alongside updating the ‘maxSubtree’ value.
● Create a function say ‘query’ to find the maximum best weight that contains
that prefix.
● Run a loop from ‘i’ = 0 to the i<’q’ and store the maximum weight of the given
prefix i.e, ans=query(root, txt, 0).
● Return ‘’ans”.
Code:
#include<bits/stdc++.h>
using namespace std;
typedef struct Node {
Node *next[26];
int maxSubtree;
Node() {
maxSubtree = 0;
for(int i=0; i<26; i++) next[i] = NULL;
}
}Node;
void insert(Node *curr, pair<string, int> &databaseEntry, int index) {
if(index == databaseEntry.first.length()) {
curr->maxSubtree = databaseEntry.second;
return;
}
// Updating the value for current node
curr->maxSubtree = max(curr->maxSubtree, databaseEntry.second);
int nextIndex = (int)(databaseEntry.first[index] - 'a');
13
if(curr->next[nextIndex] == NULL) {
curr->next[nextIndex] = new Node();
}
insert(curr->next[nextIndex], databaseEntry, index+1);
}
int query(Node *curr, string &txt, int index) {
if(curr == NULL) return -1;
if(index == txt.length()) {
return curr->maxSubtree;
}
int nextIndex = (int)(txt[index] - 'a');
return query(curr->next[nextIndex], txt, index+1);
}
int main() {
int n, q;
cin>>n>>q;
pair<string, int> database[n];
for(int i=0; i<n; i++) {
cin>>database[i].first>>database[i].second;
}
Node *root = new Node();
for(int i=0; i<n; i++){
insert(root, database[i], 0);
}
while(q--) {
string txt;
cin>>txt;
cout<<query(root, txt, 0)<<"\n";
}
return 0;
}
14
Maximum XOR
Problem Statement: You are given two arrays of non-negative integers say ‘arr1’
and ‘arr2’. Your task is to find the maximum value of ( ‘A’ xor ‘B’ ) where ‘A’ and ‘B’
are any elements from ‘arr1’ and ‘arr2’ respectively and ‘xor’ represents the bitwise
xor operation.
Approach: We can maximize the bitwise ‘xor’ value for any two integers by taking
their bits at ‘i- th’ position as 1 and 0 or 0 and 1. For an integer ‘X’ in binary
representation, we start from its most significant bit and try to find a corresponding
number ‘Y’ from available numbers such that bits at the current position of ‘X’ and
‘Y’ are different, in this way we could maximize the bitwise xor value for ‘X’. So for a
32-bit integer, we just need to iterate through all its bits and check for a
corresponding integer from another array such that their xor value is maximum.
To efficiently check for the corresponding integer value in the first array we can
maintain a binary trie data structure for each element in the first array. For each
element in the first array, we convert it into binary representation and insert the bit
string into the trie. The most significant bit will be the root.
So the idea is to iterate through each element in the second array and convert it
into its binary representation and iterate through its all bits starting from the most
significant bit. If the current bit is ‘1’ then we move to ‘0’ child in trie if available and
vice versa. At last, we will find a corresponding integer from the second array such
that its xor value with the current element of the first array is maximum. At last, we
will return the maximum of all such values.
15
Algorithm:
● Let the given array be ‘arr1’ and ‘arr2’.
● Declare a variable say “ans’’. And initialize it with 0.
● Create a trie data structure to store the binary representation of 32-bit
integers.
● Run a loop from ‘i’ = 0 to the length of ‘arr1’ - 1 to traverse the elements of
‘arr1’ and insert the binary representation of ‘arr1[i]’ into the trie.
● Create a function say ‘Search’ to find the maximum xor value for a given
integer from trie.
● Run a loop from ‘i’ = 0 to the length of ‘arr2’ - 1 to traverse the elements in
‘arr2’.
● If ‘ans’ < ‘search(root, arr2[i], M)’, then update ‘ans’ , i.e. do ‘ans’ = ‘search(root,
arr2[i], M)’.
● Return ‘ans’.
Code:
#include<bits/stdc++.h>
using namespace std;
typedef struct Node {
Node *next[2];
Node() {
for(int i=0; i<2; i++) next[i] = NULL;
}
}Node;
void insert(Node *curr, int x, int i) {
if(i == -1) return;
if((x & (1<<i)) > 0) {
if(curr->next[1] == NULL) {
16
curr->next[1] = new Node();
}
insert(curr->next[1], x, i-1);
} else {
if(curr->next[0] == NULL) {
curr->next[0] = new Node();
}
insert(curr->next[0], x, i-1);
}
int search(Node *curr, int x, int i) {
if(i == -1) {
return 0;
}
if((x & (1<<i)) > 0) {
if(curr->next[0] != NULL) {
return ((1<<i) + search(curr->next[0], x, i-1));
} else{
return search(curr->next[1], x, i-1);
}
} else {
if(curr->next[1] != NULL) {
return ((1<<i) + search(curr->next[1], x, i-1));
} else {
return search(curr->next[0], x, i-1);
}
}
}
int main() {
int n, m;
cin>>n;
17
int arr1[n];
for(int i=0; i<n; i++) cin>>arr1[i];
cin>>m;
int arr2[m];
for(int i=0; i<m; i++) cin>>arr2[i];
Node *root = new Node();
int M = 30;
for(int i=0; i<n; i++) {
insert(root, arr1[i], M);
}
int ans = 0;
for(int i=0; i<m; i++) {
ans = max(ans, search(root, arr2[i], M));
}
cout<<ans<<"\n";
return 0;
}
Time Complexity
O( 32 * (N + M)), where ‘N’ and ‘M’ are the sizes of the given array.
For every element in the first array, we are inserting it into the trie that takes O(K)
time, where ‘K’ is the size of the string. And for each element in the second array,
we are doing search operations that also take O(K) time as the depth of trie will be
at max ‘K’ i.e. the length of the string. Here ‘K’ will be 32 as every integer in binary
representation will have 32 bits. Hence the overtime complexity is O( 32 * (N + M) )
Space Complexity
O( 32 * N ), where ‘N’ is the size of the given first array.
As we are implementing trie data structure to store keep track of binary
representation of each number in an array that takes O ( N * K) space where ‘N’ is
the number of strings and ‘K’ is the length of string here ‘K’ will be 32 as every
integer in binary representation will have 32 bits. Hence the overall space
complexity is O( 32 * N ).
18