KEMBAR78
ADS Unit 3 | PDF | String (Computer Science) | Algorithms And Data Structures
0% found this document useful (0 votes)
11 views99 pages

ADS Unit 3

Uploaded by

c6y6fmsnxp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views99 pages

ADS Unit 3

Uploaded by

c6y6fmsnxp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Advanced Data Structures

23CS6PEADS – Unit 3
Prof. SNEHA S BAGALKOT
Assistant Professor, Dept. of CSE, BMSCE
UNIT – 3: Advanced Trees
▪Trie–Insert, Search, Delete,
▪Segment Tree–Construction, Range Minimum Query,
▪Suffix Array and Suffix Tree–Construction, Substring Check,
▪Binary Indexed Tree or Fenwick Tree - Construction,
▪K Dimensional Tree–Construction.
Tries
Applications
• Auto Complete
• Spell Checkers
• Longest Prefix Matching
• Browser History
Trying on tries
Trying on tries

The size of a trie is directly


correlated to the size of all
the possible values that the
trie could represent
Trying on tries
• Single node in a trie contains just two
things:
• A value, which might be null
• An array of references to child
nodes, all of which also might be
null
• Each node in a trie, including the root
node itself, has only these two aspects
to it.
• When a trie representing the English
language is created, it consists of a
single root node, whose value is usually
set to an empty string: "".
Trying on tries
• That root node will also have an array that
contains 26 references, all of which will point to
null at first.

• As the trie grows, those pointers start to get filled


up with references to other nodes nodes, which
we’ll see an example of pretty soon.
• For example, our root node will hold an array of
indexes 0 through 25, since there are 26 possible
slots for the 26 letters of the alphabet.

• Since the alphabet is in order, we know that the


reference to the node that will contain the letter A
will live at index 0.
Example
CODE
CODE : Function that returns a new Trie node
[for reference]
Giving trie traversal a try

• Looking at our trie, we can see that we have


an empty root node, as is typical for a trie
structure.

• We also have six different words that we’re


representing in this trie: Peter, piper, picked,
peck, pickled, and peppers.

• Notice how there are six different “branches”


to this trie, one for each word that’s being
represented.
Giving trie traversal a try
• We can also see that some words are
sharing parent nodes.

• For example, all of the branches for


the words Peter, peck, and peppers
share the nodes for p and for e.
Similarly, the path to the word picked
and pickled share the nodes p, i, c,
and k.
Giving trie traversal a try
• So, what if we wanted to add the word
pecked to this list of words represented by
this trie? We’d need to do two things in
order to make this happen:

• First, we’d need to check that the word


pecked doesn’t already exist in this trie.

• Next, if we’ve traversed down the branch


where this word ought to live and the
words doesn’t exist yet, we’d insert a value
into the node’s reference where the word
should go. In this case, we’d insert e and d
at the correct references.
SEARCH [for reference]
Giving trie traversal a try
Trie
INSERT
[for
reference]
Deletion : Returns 1 if a given Trie node
has any children [for reference]
Deletion : Recursive function to delete a string from a Trie
[for reference]
Segment Trees
Extremely versatile
Basic Segment Tree
• void add(int pos, int val);
• int range_sum(int start, int end);
• Three ways below of visualizing it.
• Commonly stored in an array to save memory.
• Node i: Parent = i/2, left child = i *2, right child=
31i*2+1
9 22 4 5 14 8 3 1 4 1 5 9 2 6

31 31
9 22 9 22
4 5 14 8 4 5 14 8
3 1 4 1 5 9 2 6 3 1 4 1 5 9 2 6
Add/Modify
• Add a certain value to bottom node.
• Propagate up the tree.
• Original segment tree-> 31
• Add 5 to value 5. 9 22
• New value is 10. 4 5 14 8
• Add 5 to every node 3 1 4 1 5 9 2 6
along the path from
the bottom node to 36
the root. 9 27
• Changed nodes are 4 5 19 8
highlighted. 3 1 4 1 10 9 2 6
Add/Modify Code
• Note that SIZE is the size of the segment tree (We
usually assume it to be a power of 2. You can always
pad it with 0s to make it a power of 2).
• Also, the tree is stored with the root at node 1, its
children at node 2 and 3, and so on…
void add(int pos, int val) { //pos is 0-indexed
int node = SIZE+pos;
while(node > 0) {
segTree[node] += val;
node /= 2;
}
}
Range Sum
• Sums up the values in a range.
• Select O(lgn) nodes to cover the range.
• Original segment tree.
31
9 22

4 5 14 8

3 1 4 1 5 9 2 6
• Range sum(1,4)
• The 3 nodes selected 31
cover the range. 9 22
• Sum of range is sum 4 5 14 8
of highlighted nodes.
3 1 4 1 5 9 2 6
Range Sum Code
• Call recursive helper function.
int sum(int start, int end) {
return sum(1, start, end, 0, SIZE-1);
}

int sum(int node, int start, int end, int left, int right) {
// Case 1: Intervals disjoint
if(end<left || right<start) return 0;
//Case 2: Current interval inside query interval
if(start<=left && right<=end) return segTree[node];
//Case 3: Overlap, recurse on children
return sum(node*2, start, end, left, (left+right)/2) +
sum(node*2+1, start, end, (left+right)/2+1, right);
}
Other operations
• For example, find minimum in range.
• void add(int pos, int val);
• int range_min(int start, int end);
• Add/modify code similar.
• Basically, a node’s value needs to be computed in
O(1) time from its children.

1
1 2
1 1 5 2
3 1 4 1 5 9 2 6
Lazy Propagation
• What if you need to modify a range?
• You need to BE LAZY!
• Store lazy value for
each node.
• For updates, find the
O(lgn) nodes that cover
a range, and set their lazy
value.
• Propagate lazy value to
children when recursing
on child nodes.
Lazy Propagation
• Let lazy[node] = value to be added to this interval.
• True value of node = segTree[node] + lazy[node] at
all times (need to be able to determine true value of
node from segTree and lazy in O(1) time).
• Example below: Add 5 to range [0,5].
• Highlighted nodes are in range and modified.
• Can modify lazy and segTree or only lazy.
31 0

9 22 5 0
4 5 14 8 0 0 5 0

3 1 4 1 5 9 2 6 0 0 0 0 0 0 0 0
Potw
• Farmer John has a large field of grass.
• Each of the N plots has a certain amount of grass.
• Bessie and the other cows choose an interval in which they
want to graze.
• If Bessie and the cows graze
on an interval, the amounts
of grass in all plots in that
interval are reduced by a
certain amount.
• The cows can’t graze on an
interval if it would cause the
amount of grass to turn
negative, so Farmer John
must stop them.
• How many groups of cows
does Farmer John need to stop?
Potw
• Input format: • Sample input:
o Line 1: N C (N is the number of plots of o 43
grass and C is the number of groups of o 7
cows)
o 3
o Line 2…N+1: G_i: The amount of grass in the
i-th plot o 2
o Line N+2…N+C+1: A B C (If the cows graze, o 4
they decrease the amount of grass of each o 142
plot in [A, B] by C) o 231
• Output format: o 121
o Line 1: A, the amount of groups of cows • Sample output:
FJ has to turn away
o 1
• Constraints:
o 1<=N,C<=1,000: 10 points
o 1<=N,C<=100,000: 25 points
Segment Trees
• Idea: create a tree on top of a base array that keeps information
about ranges of the array
• Example: where is the minimum in a range of the array?
• Example: where is the maximum?
• Example: what is the sum over some range?
• Segment trees are a way of computing this
• Work for dynamically changing data
• Static data has a dynamic programming solution that is easier (we will see
later)

• See section 2.4.3 of book


Storing the Segment Tree
• Similar to a heap
• Use an array of 4 times the base array size (could be smaller, but easy
upper bound)
• Root is index 1
• Covers [0,n-1] in original array
• Example: the index of the minimum value from the array, or max, or sum, etc.
• Left child of node i (covering [L,R]) is 2*i, right node is 2*i+1
• Left child covers range [L, (L+R)/2]
• Right child covers range [(L+R)/2+1, R]
• Parent node is always node i/2
1

Assume n=16 0-15

2 3
0-7 8-15

4 5 6 7
0-3 4-7 8-11 12-15

8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Building and updating the tree
• At base level, each “leaf” will have the result for a single element of
the original array.
• When the node covers [k,k] it is the result for a[k] in the original array, a.
• Can build the tree bottom-up
• Compute children, then merge these for parent
• If an array element gets updated
• Update all nodes on the path to that node – can traverse tree appropriately,
and update bottom-up.
Answering range queries with the tree
• For query [a,b], checking node i: [L, R]
• Assumption: [a,b] is within [L,R], i.e. a >= L, b <= R
• If a==L and b==R, just return value
• If [a,b] overlaps one child, recursively call on that child
• if b <= (L+R)/2, return query [a,b] on left child node 2*i
• if a >= (L+R)/2+1, return query [a,b] on right child node 2*i+1
• If [a,b] overlaps both children, process result from call on both:
• query [a,(L+R)/2] on left child node 2*i
• query [(L+R)/2+1,b] on right child node 2*i+1
• return the result of both (e.g. sum, or min, or max, or lcm, etc.)
Query [6,12] 1
0-15
[6,12] overlaps both sides

2 3
0-7 8-15

4 5 6 7
0-3 4-7 8-11 12-15

8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides

2 [6,7] overlaps right side only 3 [8,12] overlaps both sides


0-7 8-15

4 5 6 7
0-3 4-7 8-11 12-15

8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides

2 [6,7] overlaps right side only 3 [8,12] overlaps both sides


0-7 8-15
[12,12]
4 [6,7] overlaps right 6 [8,11] is exactly this, 7 on left
5
0-3 side only 8-11 so stop 12-15 only
4-7

8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides

2 [6,7] overlaps right side only 3 [8,12] overlaps both sides


0-7 8-15
[12,12]
4 6 [8,11] is exactly this, 7 on left
5
0-3 8-11 so stop 12-15 only
4-7

8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
[6,7] is exactly [12,12]
this, so stop on left
only
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Query [6,12] 1
0-15
[6,12] overlaps both sides

2 [6,7] overlaps right side only 3 [8,12] overlaps both sides


0-7 8-15
[12,12]
4 6 [8,11] is exactly this, 7 on left
5
0-3 8-11 so stop 12-15 only
4-7

8 9 10 11 12 13 14 15
0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
[6,7] is exactly [12,12]
this, so stop on left
only
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)

[12,12]
is this,
so stop
Range [6,12]

[6,12] overlaps.
Check elements
one by one.

0-3 4-7 8-11 12-15

(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Assume n=16

0-3 4-7 8-11 12-15

(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Suffix trees and suffix arrays
Trie
• A tree representing a set of strings.

c
{ a
aeef b
ad e
bbfe d b
bbfg
e
c } f

f c
e g
Trie (Cont)
• Assume no string is a prefix of another

Each edge is labeled by a letter, c


a
no two edges outgoing from the same b
node are labeled the same.
e
Each string corresponds to a leaf. d b
e
f

f c
e g
Compressed Trie
• Compress unary nodes, label edges by strings

c c
a a
b
e
d b bbf
d
e eef
f
f c c
e g e g
Suffix tree
Given a string s a suffix tree of s is a
compressed trie of all suffixes of s

To make these suffixes prefix-free we add a


special character, say $, at the end of s
Suffix tree (Example)
Let s=abab, a suffix tree of s is a compressed
trie of all suffixes of s=abab$

{ $
$ a b
b
b$ $
ab$ a
a $ b
bab$ b $
abab$ $
}
Trivial algorithm to build a Suffix tree

a
Put the largest suffix in b
a
b
$

a b
b a
Put the suffix bab$ in a b
b $
$
a b
b a
a b
b $
$

Put the suffix ab$ in


a b
b a
b
$
a $
b
$
a b
b a
b
$
a $
b
$

Put the suffix b$ in


a b
b
$
a
a $ b
b $
$
a b
b
$
a
a $ b
b $
$

Put the suffix $ in $


a b
b
$
a
a $ b
b $
$
$
a b
b
$
a
a $ b
b $
$

We will also label each leaf with the starting point of the corres.
suffix.
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
Analysis
Naively, takes O(n2) time to build.

More sophisticated algorithms in O(n) time


What can we do with it ?
Exact string matching:
Given a Text T, |T| = n, preprocess it such
that when a pattern P, |P|=m, arrives
you can quickly decide when it occurs in
T.

W e may also want to find all occurrences


of P in T
Exact string matching
In preprocessing we just build a suffix tree in O(n) time
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1

Given a pattern P = ab we traverse the tree according to the


pattern.
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1

If we did not get stuck traversing the pattern then the pattern
occurs in the text.
Each leaf in the subtree below the node we reach corresponds
to an occurrence.

By traversing this subtree we get all k occurrences in O(n+k)


time
Generalized suffix tree
Given a set of strings S a generalized suffix
tree of S is a compressed trie of all suffixes of
s∈S
To make these suffixes prefix-free we add a
special char, say $, at the end of s

To associate each suffix with a unique string


in S add a different special char to each s
Generalized suffix tree (Example)
Let s1=abab and s2=aab
here is a generalized suffix tree for s1 and s2
#
{ $
a b
$ # 5 4
b$ b# #
ab$ ab# b a a $ 3
bab$ aab# b b 4
# $
abab$
a $ # 1 2
} b
$ 3 2
1
So what can we do with it ?
Matching a pattern against a database of
strings
Longest common substring (of two strings)
Every node with a leaf
descendant from string s1 and a #
leaf descendant from string s2 $
a b 5 4
represents a maximal common
substring and vice versa. #
b a a $ 3
b b 4
Find such node with # $
largest “string depth” a $ # 1 2
b
$ 3 2
1
Lowest common ancestors
A lot more can be gained from the suffix tree
if we preprocess it so that we can answer
LCA queries on it
Why?
The LCA of two leaves represents the
longest common prefix (LCP) of these 2
suffixes
#
a $
b 5 4
#
b a a $ 3
b b 4
# $
a $ # 1 2
b
$ 3 2
1
Finding maximal palindromes
• A palindrome: caabaac, cbaabc
• Want to find all maximal palindromes in a
string s

Let s = cbaaba

The maximal palindrome with center between i-1 and i is the LCP
of the suffix at position i of s and the suffix at position m-i+1 of sr
Maximal palindromes algorithm
Prepare a generalized suffix tree for
s = cbaaba$ and sr = abaabc#

For every i find the LCA of suffix i of s and


suffix m-i+1 of sr
Let s = cbaaba$ then sr = abaabc#

a #
b c $
7 7
a $
b

b a c
6 b
c c # a 6
a a # $ a
# $
4
b a
a 5 5 b
3 3 $
b a c a
c 4 $ # $
# 2 1
2
1
Analysis
O(n) time to identify all palindromes
Drawbacks
• Suffix trees consume a lot of space

• It is O(n) but the constant is quite big

• Notice that if we indeed want to traverse


an edge in O(1) time then we need an
array of ptrs. of size |Σ| in each node
Suffix array
• We lose some of the functionality but we
save space.

Let s = abab
Sort the suffixes lexicographically:
ab, abab, b, bab
The suffix array gives the indices of the
suffixes in sorted order
3 1 4 2
How do we build it ?
• Build a suffix tree
• Traverse the tree in DFS, lexicographically
picking edges outgoing from each node
and fill the suffix array.

• O(n) time
How do we search for a pattern ?
• If P occurs in T then all its occurrences are
consecutive in the suffix array.

• Do a binary search on the suffix array

• Takes O(mlogn) time


Example
Let S = mississippi
L 11 i
8 ippi
Let P = issa 5 issippi
2 ississippi
1 mississippi
M 10 pi
9 ppi
7 sippi
4 sisippi
6 ssippi
R 3 ssissippi
How do we accelerate the search ?

Maintain l = LCP(P,L) l
L
Maintain r = LCP(P,R)

If l = r then start comparing M to


P at l + 1

R r
How do we accelerate the search ?

l
L

If l > r then

Suppose we know LCP(L,M)


M
If LCP(L,M) < l we go left
If LCP(L,M) > l we go right
If LCP(L,M) = l we start
comparing at l + 1
R r
Analysis of the acceleration
If we do more than a single comparison in an
iteration then max(l, r ) grows by 1 for each
comparison O(logn + m) time
kd-Trees
kd-Trees
• Invented in 1970s by Jon Bentley
• Name originally meant “3d-trees, 4d-trees, etc”
where k was the # of dimensions

• Now, people say “kd-tree of dimension d”

• Idea: Each level of the tree compares against 1


dimension.
• Let’s us have only two children at each node
(instead of 2d)
kd-trees
• Each level has a “cutting dimension”

• x
Cycle through the dimensions as you walk down the tree.

• Each node contains a point P = (x,y) y

• To find (x’,y’) you only y


compare coordinate from the
cutting dimension x
e.g. if cutting dimension is x,
- then you ask: is x’ < x?
kd-tree example
insert: (30,40), (5,25), (10,12), (70,70), (50,30), (35,45)

x 30,40

(70,70) y 5,25 70,70

(35,45)

(30,40)
x 10,12 50,30

(5,25)
(50,30) y 35,45
(10,12)
Insert Code
insert(Point x, KDNode t, int cd) {
if t == null
t = new KDNode(x)
else if (x == t.data)
// error! duplicate
else if (x[cd] < t.data[cd])
t.left = insert(x, t.left, (cd+1) % DIM)
else
t.right = insert(x, t.right, (cd+1) % DIM)
return t
}
FindMin in kd-trees
• FindMin(d): find the point with the smallest value in the dth
dimension.
• Recursively traverse the tree
• If cutdim(current_node) = d, then the minimum can’t be in the
right subtree, so recurse on just the left subtree
- if no left subtree, then current node is the min for tree

rooted at this node.

• If cutdim(current_node) ≠ d, then minimum could be in either


subtree, so recurse on both subtrees.
- (unlike in 1-d structures, often have to explore several
FindMin
FindMin(x-dimension):

x
(35,90
(60,80)
51,75
)
(51,75)
y 25,40 70,70
(70,7
0)
(50, x 10,30 55,1
50)
35,90 60,80
(25,40)

y 1,10 50,50
(10,30)
( 10)
1
,
(55,1)
FindMin
FindMin(y-dimension):

x
(35,90
(60,80)
51,75
)
(51,75)
y 25,40 70,70
(70,7
0)
(50, x 10,30 35,90 5 5 , 60,80
50) 5 5
(25,40) 1
,1
y 50,50
(10,30) 1 ,1
1 ,
( 10) 0
1 1 0
,
(55,1)
FindMin
FindMin(y-dimension): space searched

x
(35,90
(60,80)
51,75
)
(51,75)
y 25,40 70,70
(70,7
0)
(50, x 10,30 55,1
50)
35,90 60,80
(25,40)

y 1,10 50,50
(10,30)
( 10)
1
,
(55,1)
FindMin Code
Point findmin(Node T, int dim, int cd):
// empty tree
if T == NULL: return NULL

// T splits on the dimension we’re searching


// => only visit left subtree
if cd == dim:
if t.left NULL: returnt.data
== else findmin(T.left, dim, (cd+1)%DIM)
return
// T splits on a different dimension
// => have to search both subtrees
else:
return minimum(
findmin(T.left, dim, (cd+1)%DIM),
findmin(T.right, dim, (cd+1)%DIM)
T.data
)
Delete in kd-trees
Want to delete node A.
Assume cutting
dimension of A is cd

In BST, we’d
findmin(A.right). cd A

Here, we have to
findmin(A.right, cd)

Everything in Q has Q P
cd-coord < B, and cd B
everything in P has cd-
coord ≥ B
Delete in kd-trees --- No Right Subtree

• What is right subtree is


empty?
• Possible idea: Find the max
in the left subtree?

Why might this not


- work? x (x,y)

Suppose I findmax(T.left)
and get point (a,b):

It’s possible that T.left


contains another point Q
with x = a. (a,b)
(a,c) cd
Now, our equal
coordinate invariant is
violated!
No right subtree --- Solution
• Swap the subtrees of node to be
deleted
• B = findmin(T.left)
• Replace deleted node by B

x (x,y)

Now, if there is another


point with x=a, it appears
in the right subtree, where Q
it should cd
(a,b)
(a,c)
Point delete(Point x, Node T, int cd):
if T == NULL: error point not found!
next_cd = (cd+1)%DIM

// This is the point to delete:


if x = T.data:
// use min(cd) from right subtree:
if != NULL:
t.right
t.data = findmin(T.right, cd, next_cd)
t.right = delete(t.data, t.right, next_cd)
// swap subtrees and use min(cd) from new right:
else ifT.left != NULL:
t.data = findmin(T.left, cd, next_cd)
t.right = delete(t.data, t.left, next_cd)
else
t = null // we’re a leaf: just remove

// this is not the point, so search for it:


else ifx[cd] < t.data[cd]:
t.left = delete(x, t.left, next_cd)
else
t.right = delete(x, t.right, next_cd)

return t
Nearest Neighbor Searching in kd-trees
• Nearest Neighbor Queries are very common: given a point Q find the point P in the data
set that is closest to Q.

• Doesn’t work: find cell that would contain Q and return the point it contains.
Reason: the nearest point to P in space may be far from P in the tree:
E.g. NN(52,52):

(35,9
(60,80) 51,75
0)
(51,75)

(70,7 25,40 70,70


0)
(50
10,30 55,1
,50 35,90 60,80
)
(25,40
) 1,10 50,50

(10,30)
(,10)
1
(55,1)
kd-Trees Nearest Neighbor
• Idea: traverse the whole tree, BUT make two
modifications to prune to search space:

1. Keep variable of closest point C found so far.


Prune subtrees once their bounding boxes say that
they can’t contain any point closer than C

2. Search the subtrees in order that maximizes the


chance for pruning
Nearest Neighbor: Ideas,
continued
Query
Point Q d
If d > dist(C, Q), then no
point in BB(T) can be
Bounding box closer to Q than C.
of subtree T Hence, no reason to search
rooted at T subtree rooted at T.

Update the best point so far, if T is better: if


dist(C, Q) > dist(T.data, Q), C := T.data

Recurse, but start with the subtree “closer” to Q:


First search the subtree that would contain Q if we were
inserting Q below T.
Nearest Neighbor, best, best_dist are global var (can

Code also pass into function calls)

def NN(Point Q, kdTree T, int cd, Rect BB):

// if this bounding box is too far, do nothing


if T == NULL ordistance(Q, BB) > best_dist: return

// if this
better than
point isthe best: dist =
distance(Q, T.data)
if dist < best_dist
: = dist
best_dist
// best
visit=subtrees
T.data is most promising order:
if Q[cd] < T.data[cd]:
NN(Q, T.left, next_cd, BB.trimLeft(cd, t.data))
NN(Q, T.right, next_cd, BB.trimRight(cd,
else:
NN(Q, t.data)) next_cd, BB.trimRight(cd, t.data))
T.right,
NN(Q, T.left, next_cd, BB.trimLeft(cd, t.data))

Following Dave Mount’s Notes (page 77)


Nearest Neighbor Facts

• Might have to search close to the whole tree in the


worst case. [O(n)]

• In practice, runtime is closer to:


- O(2d + log n)
log n to find cells “near” the query point
- 2d to search around cells in that neighborhood
-
• Three important concepts that reoccur in range /
nearest neighbor searching:
storing partial results: keep best so far, and update
- pruning: reduce search space by eliminating irrelevant trees.
- traversal order: visit the most promising subtree first.
THANK
YOU

You might also like