ADS Unit 3
ADS Unit 3
23CS6PEADS – Unit 3
Prof. SNEHA S BAGALKOT
Assistant Professor, Dept. of CSE, BMSCE
UNIT – 3: Advanced Trees
▪Trie–Insert, Search, Delete,
▪Segment Tree–Construction, Range Minimum Query,
▪Suffix Array and Suffix Tree–Construction, Substring Check,
▪Binary Indexed Tree or Fenwick Tree - Construction,
▪K Dimensional Tree–Construction.
Tries
Applications
• Auto Complete
• Spell Checkers
• Longest Prefix Matching
• Browser History
Trying on tries
Trying on tries
31 31
9 22 9 22
4 5 14 8 4 5 14 8
3 1 4 1 5 9 2 6 3 1 4 1 5 9 2 6
Add/Modify
• Add a certain value to bottom node.
• Propagate up the tree.
• Original segment tree-> 31
• Add 5 to value 5. 9 22
• New value is 10. 4 5 14 8
• Add 5 to every node 3 1 4 1 5 9 2 6
along the path from
the bottom node to 36
the root. 9 27
• Changed nodes are 4 5 19 8
highlighted. 3 1 4 1 10 9 2 6
Add/Modify Code
• Note that SIZE is the size of the segment tree (We
usually assume it to be a power of 2. You can always
pad it with 0s to make it a power of 2).
• Also, the tree is stored with the root at node 1, its
children at node 2 and 3, and so on…
void add(int pos, int val) { //pos is 0-indexed
int node = SIZE+pos;
while(node > 0) {
segTree[node] += val;
node /= 2;
}
}
Range Sum
• Sums up the values in a range.
• Select O(lgn) nodes to cover the range.
• Original segment tree.
31
9 22
4 5 14 8
3 1 4 1 5 9 2 6
• Range sum(1,4)
• The 3 nodes selected 31
cover the range. 9 22
• Sum of range is sum 4 5 14 8
of highlighted nodes.
3 1 4 1 5 9 2 6
Range Sum Code
• Call recursive helper function.
int sum(int start, int end) {
return sum(1, start, end, 0, SIZE-1);
}
int sum(int node, int start, int end, int left, int right) {
// Case 1: Intervals disjoint
if(end<left || right<start) return 0;
//Case 2: Current interval inside query interval
if(start<=left && right<=end) return segTree[node];
//Case 3: Overlap, recurse on children
return sum(node*2, start, end, left, (left+right)/2) +
sum(node*2+1, start, end, (left+right)/2+1, right);
}
Other operations
• For example, find minimum in range.
• void add(int pos, int val);
• int range_min(int start, int end);
• Add/modify code similar.
• Basically, a node’s value needs to be computed in
O(1) time from its children.
1
1 2
1 1 5 2
3 1 4 1 5 9 2 6
Lazy Propagation
• What if you need to modify a range?
• You need to BE LAZY!
• Store lazy value for
each node.
• For updates, find the
O(lgn) nodes that cover
a range, and set their lazy
value.
• Propagate lazy value to
children when recursing
on child nodes.
Lazy Propagation
• Let lazy[node] = value to be added to this interval.
• True value of node = segTree[node] + lazy[node] at
all times (need to be able to determine true value of
node from segTree and lazy in O(1) time).
• Example below: Add 5 to range [0,5].
• Highlighted nodes are in range and modified.
• Can modify lazy and segTree or only lazy.
31 0
9 22 5 0
4 5 14 8 0 0 5 0
3 1 4 1 5 9 2 6 0 0 0 0 0 0 0 0
Potw
• Farmer John has a large field of grass.
• Each of the N plots has a certain amount of grass.
• Bessie and the other cows choose an interval in which they
want to graze.
• If Bessie and the cows graze
on an interval, the amounts
of grass in all plots in that
interval are reduced by a
certain amount.
• The cows can’t graze on an
interval if it would cause the
amount of grass to turn
negative, so Farmer John
must stop them.
• How many groups of cows
does Farmer John need to stop?
Potw
• Input format: • Sample input:
o Line 1: N C (N is the number of plots of o 43
grass and C is the number of groups of o 7
cows)
o 3
o Line 2…N+1: G_i: The amount of grass in the
i-th plot o 2
o Line N+2…N+C+1: A B C (If the cows graze, o 4
they decrease the amount of grass of each o 142
plot in [A, B] by C) o 231
• Output format: o 121
o Line 1: A, the amount of groups of cows • Sample output:
FJ has to turn away
o 1
• Constraints:
o 1<=N,C<=1,000: 10 points
o 1<=N,C<=100,000: 25 points
Segment Trees
• Idea: create a tree on top of a base array that keeps information
about ranges of the array
• Example: where is the minimum in a range of the array?
• Example: where is the maximum?
• Example: what is the sum over some range?
• Segment trees are a way of computing this
• Work for dynamically changing data
• Static data has a dynamic programming solution that is easier (we will see
later)
2 3
0-7 8-15
4 5 6 7
0-3 4-7 8-11 12-15
8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Building and updating the tree
• At base level, each “leaf” will have the result for a single element of
the original array.
• When the node covers [k,k] it is the result for a[k] in the original array, a.
• Can build the tree bottom-up
• Compute children, then merge these for parent
• If an array element gets updated
• Update all nodes on the path to that node – can traverse tree appropriately,
and update bottom-up.
Answering range queries with the tree
• For query [a,b], checking node i: [L, R]
• Assumption: [a,b] is within [L,R], i.e. a >= L, b <= R
• If a==L and b==R, just return value
• If [a,b] overlaps one child, recursively call on that child
• if b <= (L+R)/2, return query [a,b] on left child node 2*i
• if a >= (L+R)/2+1, return query [a,b] on right child node 2*i+1
• If [a,b] overlaps both children, process result from call on both:
• query [a,(L+R)/2] on left child node 2*i
• query [(L+R)/2+1,b] on right child node 2*i+1
• return the result of both (e.g. sum, or min, or max, or lcm, etc.)
Query [6,12] 1
0-15
[6,12] overlaps both sides
2 3
0-7 8-15
4 5 6 7
0-3 4-7 8-11 12-15
8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides
4 5 6 7
0-3 4-7 8-11 12-15
8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides
8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides
8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
[6,7] is exactly [12,12]
this, so stop on left
only
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides
8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
[6,7] is exactly [12,12]
this, so stop on left
only
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
[12,12]
is this,
so stop
Range [6,12]
[6,12] overlaps.
Check elements
one by one.
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Assume n=16
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Suffix trees and suffix arrays
Trie
• A tree representing a set of strings.
c
{ a
aeef b
ad e
bbfe d b
bbfg
e
c } f
f c
e g
Trie (Cont)
• Assume no string is a prefix of another
f c
e g
Compressed Trie
• Compress unary nodes, label edges by strings
c c
a a
b
e
d b bbf
d
e eef
f
f c c
e g e g
Suffix tree
Given a string s a suffix tree of s is a
compressed trie of all suffixes of s
{ $
$ a b
b
b$ $
ab$ a
a $ b
bab$ b $
abab$ $
}
Trivial algorithm to build a Suffix tree
a
Put the largest suffix in b
a
b
$
a b
b a
Put the suffix bab$ in a b
b $
$
a b
b a
a b
b $
$
We will also label each leaf with the starting point of the corres.
suffix.
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
Analysis
Naively, takes O(n2) time to build.
If we did not get stuck traversing the pattern then the pattern
occurs in the text.
Each leaf in the subtree below the node we reach corresponds
to an occurrence.
Let s = cbaaba
The maximal palindrome with center between i-1 and i is the LCP
of the suffix at position i of s and the suffix at position m-i+1 of sr
Maximal palindromes algorithm
Prepare a generalized suffix tree for
s = cbaaba$ and sr = abaabc#
a #
b c $
7 7
a $
b
b a c
6 b
c c # a 6
a a # $ a
# $
4
b a
a 5 5 b
3 3 $
b a c a
c 4 $ # $
# 2 1
2
1
Analysis
O(n) time to identify all palindromes
Drawbacks
• Suffix trees consume a lot of space
Let s = abab
Sort the suffixes lexicographically:
ab, abab, b, bab
The suffix array gives the indices of the
suffixes in sorted order
3 1 4 2
How do we build it ?
• Build a suffix tree
• Traverse the tree in DFS, lexicographically
picking edges outgoing from each node
and fill the suffix array.
• O(n) time
How do we search for a pattern ?
• If P occurs in T then all its occurrences are
consecutive in the suffix array.
Maintain l = LCP(P,L) l
L
Maintain r = LCP(P,R)
R r
How do we accelerate the search ?
l
L
If l > r then
• x
Cycle through the dimensions as you walk down the tree.
x 30,40
(35,45)
(30,40)
x 10,12 50,30
(5,25)
(50,30) y 35,45
(10,12)
Insert Code
insert(Point x, KDNode t, int cd) {
if t == null
t = new KDNode(x)
else if (x == t.data)
// error! duplicate
else if (x[cd] < t.data[cd])
t.left = insert(x, t.left, (cd+1) % DIM)
else
t.right = insert(x, t.right, (cd+1) % DIM)
return t
}
FindMin in kd-trees
• FindMin(d): find the point with the smallest value in the dth
dimension.
• Recursively traverse the tree
• If cutdim(current_node) = d, then the minimum can’t be in the
right subtree, so recurse on just the left subtree
- if no left subtree, then current node is the min for tree
x
(35,90
(60,80)
51,75
)
(51,75)
y 25,40 70,70
(70,7
0)
(50, x 10,30 55,1
50)
35,90 60,80
(25,40)
y 1,10 50,50
(10,30)
( 10)
1
,
(55,1)
FindMin
FindMin(y-dimension):
x
(35,90
(60,80)
51,75
)
(51,75)
y 25,40 70,70
(70,7
0)
(50, x 10,30 35,90 5 5 , 60,80
50) 5 5
(25,40) 1
,1
y 50,50
(10,30) 1 ,1
1 ,
( 10) 0
1 1 0
,
(55,1)
FindMin
FindMin(y-dimension): space searched
x
(35,90
(60,80)
51,75
)
(51,75)
y 25,40 70,70
(70,7
0)
(50, x 10,30 55,1
50)
35,90 60,80
(25,40)
y 1,10 50,50
(10,30)
( 10)
1
,
(55,1)
FindMin Code
Point findmin(Node T, int dim, int cd):
// empty tree
if T == NULL: return NULL
In BST, we’d
findmin(A.right). cd A
Here, we have to
findmin(A.right, cd)
Everything in Q has Q P
cd-coord < B, and cd B
everything in P has cd-
coord ≥ B
Delete in kd-trees --- No Right Subtree
x (x,y)
return t
Nearest Neighbor Searching in kd-trees
• Nearest Neighbor Queries are very common: given a point Q find the point P in the data
set that is closest to Q.
• Doesn’t work: find cell that would contain Q and return the point it contains.
Reason: the nearest point to P in space may be far from P in the tree:
E.g. NN(52,52):
(35,9
(60,80) 51,75
0)
(51,75)
(10,30)
(,10)
1
(55,1)
kd-Trees Nearest Neighbor
• Idea: traverse the whole tree, BUT make two
modifications to prune to search space:
// if this
better than
point isthe best: dist =
distance(Q, T.data)
if dist < best_dist
: = dist
best_dist
// best
visit=subtrees
T.data is most promising order:
if Q[cd] < T.data[cd]:
NN(Q, T.left, next_cd, BB.trimLeft(cd, t.data))
NN(Q, T.right, next_cd, BB.trimRight(cd,
else:
NN(Q, t.data)) next_cd, BB.trimRight(cd, t.data))
T.right,
NN(Q, T.left, next_cd, BB.trimLeft(cd, t.data))