KEMBAR78
R-Trees and Geospatial Data Structures | PPTX
CS 6213 – Advanced Data Structures – Lecture 7
 Instructor
Prof. Amrinder Arora
amrinder@gwu.edu
Please copy TA on emails
Please feel free to call as well
 TA
Iswarya Parupudi
iswarya2291@gwmail.gwu.edu
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 2
CS 6213
Basics
Record / Struct
/ Arrays / LLs
Stacks /
Queues
Graphs / Trees
/ BSTs
Heaps and
PQs
Advanced
Trie, B-Tree
Splay Trees
R-Tree
Union Find
Applications
Databases
Spatial
String
In Memory
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 3
 Antonin Guttman, U. C. Berkeley
 K. A. Mohamed
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 4
5
 Spatial Data
 R-Tree Structure
 Operations
 Searching
 Insertion
 Deletion
 Variants
 Applications
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
Given a city
map, „index‟
all university
buildings in
an efficient
structure for
quick
topological
search.
6L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
7
“Index”
buildings in
an efficient
structure for
quick search
Spatial object:
Contour (outline) of the area
around the building(s).
Minimum bounding region
(MBR) of the object.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
8
MBR of the city
neighbourhoods.
MBR of the city
defining the
overall search
region.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
Mostly involves 2D regions.
 Need to support 2D range queries.
 Multiple return values desired: Answering a query region by reporting
all spatial objects that are fully-contained-in or overlapping the query
region (Spatial-Access Method – SAM).
In general:
 Spatial data objects often cover areas in multidimensional spaces.
 Spatial data objects are not well-represented by point-location.
 An „index‟ based on an object‟s spatial location is desirable.
9L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
Problem Summary: To retrieve data items quickly and efficiently
according to their spatial locations.
 A B-Tree is an ordered, dynamic, multi-way structure of order m (i.e. each
node has at most m children).
 The keys and the subtrees are arranged in the fashion of a search tree.
 Each node may contain a large number of keys, and the number of subtrees
in each node, then, may also be large.
 The B-Tree is designed (among other objectives):
 to branch out this large number of directions, and
 to contain a lot of keys in each node so that the height of the tree is relatively short.
10
M
P T X
B D F G K L N O Q S V W Y ZI
E H
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 A height-balanced tree, similar to a B-Tree.
 Index records in the leaf nodes contain pointers to the actual
spatial-objects (entries) they represent.
 Each entry has a unique identifier that points to one spatial object,
and its MBR; i.e., entry = (MBR, pointer).
 Spatial searching requires visiting only a small number of nodes.
 The index is completely dynamic: inserts and deletes can be
intermixed with searches. (No periodic reorganization is
required.)
11L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 Let M be the maximum number of entries that will fit in one node.
 Let m ≤ M/2 be a parameter specifying the minimum number of entries in one
node.
Then an R-Tree must satisfy the following properties:
1. Every leaf node contains between m and M index records, unless it is the
root.
2. For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR
that spatially contains the n-dimensional data object represented by the
tuple-identifier.
3. Every non-leaf node has between m and M children, unless it is the root.
4. For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that
spatially contains the regions in the child node.
5. The root has two children unless it is a leaf.
6. All leaves appear on the same level.
12L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 An entry E in a leaf node is defined as:
E = (I, tuple-identifier)
 Where I refers to the smallest binding n-dimensional region
(MBR) that encompasses the spatial data pointed to by its tuple-
identifier.
 I is a series of closed-intervals that make up each dimension of
the binding region.
 Example. In 2D, I = (Ix, Iy),
where Ix = [xa, xb], and Iy = [ya, yb].
13L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
[Not limited to 2D – higher dimensions are certainly possible.]
 In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb].
 If either ka or kb (or both) are equal to , this means that the
spatial object extends outward indefinitely along that dimension.
14L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 An entry E in a non-leaf node is defined as: E = (I, child-pointer)
 Where the child-pointer points to the child of this node, and I is
the MBR that encompasses all the regions in the child-node‟s
pointer‟s entries.
15
I(A) I(B) … I(M)
I(a) I(b) I(c) I(d)
B
a
b
c
d
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 16
a b c d e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 17
a
b
c
d
m
a b cd e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 18
a
b
c
d
m
e f
n
a b cd e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 19
a
b
c
d
m
e f
n
h
g
i
o p
a b cd e f g h i j k l
m n o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 20
21
Typical query:
Find and report
all university
building sites that
are within 5km of
the city centre.
Approach:
i.Build the R-Tree
using rectangular
regions a, b, … i.
ii.Formulate the
query range Q.
iii.Query the R-
Tree and report
all regions
overlapping Q.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
Let Q be the query region.
Let T be the root of the R-Tree.
Search all entry-records whose regions overlaps Q.
Search sub-trees:
 If T is not leaf, then apply Search on ever child-node entry E
whose I overlaps Q.
Search leaf nodes:
 If T is leaf, then check each entry E in the leaf and return E if E.I
overlaps Q.
22L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
23
r2
e
r5 r8
r3 r4r1 r7r0
ic gf hba d
@ r6
@ r2 @ r5 @ r8
@ r0 @ r1 @ r7 @ r3 @ r4
R-Tree settings:
M =
m =
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
24
 The search algorithm descends the tree from the root in a manner
similar to a B-Tree.
 More than one subtree under a node visited may need to be
searched.
 Cannot guarantee good worst-case performance.
 Countered by the algorithms during insertion, deletion, and update
that maintain the tree in a form that allows the search algorithm to
eliminate irrelevant regions of the indexed space.
 So that only data near the search area need to be examined.
 Emphasis is on the optimal placement of spatial objects with respect
to the spatial location of other objects in the structure.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 A Node-Overflow happens when a new Entry is added to a fully
packed node, causing the resulting number of entries in the node
to exceed the upper-bound M.
 The „overflow‟ node must be split, and all its current entries, as
well as the new one, consolidated for local optimum arrangement.
 A Node-Underflow happens when one or more Entries are
removed from a node, causing the remaining number of entries in
that node to fall below the lower-bound m.
 The underflow node must be condensed, and its entries
dispersed for global optimum arrangement.
25L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
26
 New index entry-records are added to the leaves.
 Nodes that overflow are split, and splits propagate up the tree.
 A split-propagation may cause the tree to grow in height.
The main Insert routine
 Let E = (I, tuple-identifier) be the new entry to be inserted.
 Let T be the root of the R-Tree.
 [Ins_1] Locate a leaf L starting from T to insert E.
 [Ins_2] Add E to L. If L is already full (overflow), split L into L and L‟.
 [Ins_3] Propagate MBR changes (enlarged or reduced) upwards.
 [Ins_4] Grow tree taller if node split propagation causes T to split.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 Similar to insertion into B+-tree but may insert into any leaf; leaf
splits in case capacity exceeded.
 Which leaf to insert into? (Choose Leaf)
 How to split a node? (Node Split)
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 27
m
n
o p
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 28
29
[Ins_1] Locate a leaf L starting from T to insert E = (I, tuple-identifier).
 Notion (i): Select the path that would require the least enlargement to include E.I.
 Notion (ii): Resolve ties by choosing the child-node with the smallest MBR.
 Invoke: L = ChooseLeaf (E, T).
A B C
@rN
A
C
B
E.I
rN
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
30
Algorithm: ChooseLeaf (E, N)
Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.
Output: The leaf L where E should be inserted.
 If N is leaf Then Return N
 Let FS be the set of current entries in the node N
 Let F = (I, child-pointer) FS, so that F.I satisfies the Insertion-
Notions
 Return ChooseLeaf (E, F.child-pointer)
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
31
[Ins_2] Add E to L.
 Notion (i): If L has room for another entry, install E.
 Notion (ii): Otherwise split L to obtain L and L‟, which between
them, will contain all previous entries in L and the new E
(consolidated for local optima).
[Ins_3] Propagate MBR changes upwards by invoking
AdjustTree (L, L‟).
 Notion (i): Ascend from leaf L to the root T while adjusting the
covering rectangles MBR.
 Notion (ii): If L‟ exists, propagate node splits as necessary; i.e.
attempt to install a new entry in the parent of L to point to L‟.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
32
Example. Found L = @Y to insert new E =
e. R-Tree settings: M = 3, m = 1.
K
@G
a b c
@Y
X Y Z
@K
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
33
Algorithm: AdjustTree (N, N’)
Inputs: (i) A node N that has had its contents modified, (ii) The
resultant split node N‟, if not NULL, that accompanies N.
Outputs: (i) N as above, (ii) N‟ as above.
 If N is the root Then Return {(i) N, (ii) N‟}
 Let PN be the parent node of N.
 Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points
to N.
 Adjust I_N so that it tightly encloses all entry regions in N.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
34
 If N‟ is Not NULL Then
 If number of entries in PN < M-1 Then
 Create a new Entry EN‟ = (I_N’, child-pointer_N’)
 Install EN‟ in PN
 Return AdjustTree (PN, NULL)
 Else
 Set {PN, PN‟} = SplitNode (PN, EN‟)
 Return AdjustTree (PN, PN‟)
 End If
 Else
 Return AdjustTree (PN, NULL)
 End If
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
[Ins_4] Grow Tree taller.
 Notion: If the recursive node split propagation causes the root to
split, then create a new root whose children are the two resulting
nodes.
35
A B C
@T (root)
E F
@C
G H
@C’
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
36
 The height of the R-Tree containing n entry-records is at most
logm n – 1, because the branching factor of each node is at
least m.
 The maximum number of nodes is:
 Worst case space utilisation for all nodes except the root is:
 Nodes will tend to have more than m entries, and this will:
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
37
 Current index entry-records are removed from the leaves.
 Nodes that underflow are condensed, and its contents redistributed
appropriately throughout the tree.
 A condense propagation may cause the tree to shorten in height.
The main Delete routine
 Let E = (I, tuple-identifier) be a current entry to be removed.
 Let T be the root of the R-Tree.
 [Del_1] Find the leaf L starting from T that contains E.
 [Ins_2] Remove E from L, and condense „underflow‟ nodes.
 [Ins_3] Propagate MBR changes upwards.
 [Ins_4] Shorten tree if T contains only 1 entry after condense propagation.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 [Del_1] Find the leaf L starting from T that contains E.
 Algorithm: FindLeaf (E, N)
 Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.
 Output: The leaf L containing E.
 If N is leaf Then
 If N contains E Then Return N
 Else Return NULL
 Else
 Let FS be the set of current entries in N.
 For each F = (I, child-pointer) FS where F.I overlaps E.I Do
 Set L = FindLeaf (E, F.child-pointer)
 If L is not NULL Then Return L
 Next F
 Return NULL
 End If
38L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
[Del_2] Remove E from L, and condense „underflow‟ nodes.
[Del_3] Propagate MBR changes upwards.
 Notion (i): Ascend from leaf L to root T while adjusting covering
rectangles MBR.
 Notion (ii): If after removing the entry E in L and the number of
entries in L becomes fewer than m, then the node L has to be
eliminated and its remaining contents relocated.
39L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 Propagate these notions upwards by invoking CondenseTree (N,
QS), where N is an R-Tree node whose entries have been modified,
and QS is the set of eliminated nodes.
 Start the propagation by setting N = L, and QS = .
 Re-insert the entries from the eliminated nodes in QS back into the
tree.
 Entries from eliminated leaf nodes are re-inserted as new entries
using the Insert routine discussed earlier.
 Entries from higher-level nodes must be placed higher in the tree so
that leaves of their dependent subtrees will be on the same level as
the leaves on the main tree.
40L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 Example: Delete the index entry-record b. R-Tree settings: M = 4,
m = 2.
 Spatial constraint: a.I will form smallest MBR with r4.
41
r2 r6
@ r7
a b
@ r0
r0 r1
@ r2
r3 r4 r5
@ r6
c d e
@ r1
f g h
@ r3
i j
@ r4
k l m
@ r5
n
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
42
Algorithm: CondenseTree (N, QS)
Inputs: (i) A node N whose entries have been modified, (ii) A set of
eliminated nodes QS.
 If N is NOT the root Then
 Let PN be the parent node of N.
 Let EN = (I_N, child-pointer_N) in PN.
 If N.entries < m Then
 Delete EN from PN
 Add N to QS
 Else
 Adjust I_N so that it tightly encloses all entry regions in N.
 End If
 CondenseTree (PN, QS)
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
43
 Else If N is root AND Q is NOT Then
 For each Q QS Do
 For each E Q Do
 If Q is leaf Then Insert (E)
 Else Insert (E) as a node entry at the same node level as
Q
 End If
 Next E
 Next Q
 End If
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
Why ‘re-insert’ orphaned entries?
 Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981),
an „underflow‟ node can be merged with whichever adjacent sibling that will
have its area increased the least, or its entries re-distributed among sibling
nodes.
 Both methods can cause the nodes to split.
 Eventually all changes need to be propagated upwards, anyway.
Re-insertion accomplishes the same thing, and:
 It is simpler to implement (and at comparable efficiency).
 It incrementally refines the spatial structure of the tree.
 It prevents gradual deterioration if each entry was located permanently under
the same parent node.
44L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
45
 A high value of m, nearer to M, is useful when the underlying
database represented by the R-Tree is mostly used for search
inquiries with very few updates.
 The height of the tree will be kept to a minimum.
 High search performance is maintained.
 However, the risk of overflow and underflow is also high.
 A small value of m is good when frequent updates and
modifications of the underlying database is required.
 The nodes are less dense.
 Maintenance is less costly.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
 Avoids multiple paths during searching.
 Objects may be stored in multiple nodes
 MBRs of nodes at same tree level do not overlap
 On insertion/deletion the tree may change downward or upward in
order to maintain the structure
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 46
R-TreeVariants
http://perso.enst.fr/~saglio/bdas/EPFL0525/sld041.htm
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 47
R-TreeVariants
 Similar to other R-Trees except that the Hilbert
value of its rectangle centroid is calculated.
 That key is used to guide the insertion
 On an overflow, evenly divide between two nodes
 Experiments has shown that this scheme
significantly improves performance and decreases
insertion complexity.
 Hilbert R-tree achieves up to 28% saving in the
number of pages touched compared to R*-tree.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 48
R-TreeVariants
 The Hilbert value of an object is found by interleaving the bits of
its x and y coordinates, and then chopping the binary string into 2-
bit strings.
 Then, for every 2-bit string, if the value is 0, we replace every 1 in
the original string with a 3, and vice-versa.
 If the value of the 2-bit string is 3, we replace all 2‟s and 0‟s in a
similar fashion.
 After this is done, you put all the 2-bit strings back together and
compute the decimal value of the binary string;
 This is the Hilbert value of the object.
 http://www-users.cs.umn.edu/research/shashi-
group/CS8715/exercise_ans.doc
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 49
R-TreeVariants
 Proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf
Schneider, and Bernhard Seeger in 1990
 Same algorithm as the regular R-tree for query and delete
operations.
 When inserting, the R*-tree uses a combined strategy.
 For leaf nodes, overlap is minimized
 For inner nodes, enlargement and area are minimized.
 When splitting, the R*-tree uses a topological split that chooses a
split axis based on perimeter, then minimizes overlap.
 In addition to an improved split strategy, the R*-tree also tries to
avoid splits by reinserting objects and subtrees into the tree,
inspired by the concept of balancing a B-tree.
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 50
R-TreeVariants
 MBR: Minimum Bounding Rectangle
 R-Trees are an extremely compelling data structure for spatial
data.
 Largely based on B-Tree (Can be considered a generalization of
B-Tree)
 Can support more than two dimensions
 Support same basic operations (deletion, searching, insertion,
update, etc.)
 Many variants of R-Trees are available
L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 51

R-Trees and Geospatial Data Structures

  • 1.
    CS 6213 –Advanced Data Structures – Lecture 7
  • 2.
     Instructor Prof. AmrinderArora amrinder@gwu.edu Please copy TA on emails Please feel free to call as well  TA Iswarya Parupudi iswarya2291@gwmail.gwu.edu L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 2
  • 3.
    CS 6213 Basics Record /Struct / Arrays / LLs Stacks / Queues Graphs / Trees / BSTs Heaps and PQs Advanced Trie, B-Tree Splay Trees R-Tree Union Find Applications Databases Spatial String In Memory L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 3
  • 4.
     Antonin Guttman,U. C. Berkeley  K. A. Mohamed L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 4
  • 5.
    5  Spatial Data R-Tree Structure  Operations  Searching  Insertion  Deletion  Variants  Applications L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 6.
    Given a city map,„index‟ all university buildings in an efficient structure for quick topological search. 6L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 7.
    7 “Index” buildings in an efficient structurefor quick search Spatial object: Contour (outline) of the area around the building(s). Minimum bounding region (MBR) of the object. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 8.
    8 MBR of thecity neighbourhoods. MBR of the city defining the overall search region. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 9.
    Mostly involves 2Dregions.  Need to support 2D range queries.  Multiple return values desired: Answering a query region by reporting all spatial objects that are fully-contained-in or overlapping the query region (Spatial-Access Method – SAM). In general:  Spatial data objects often cover areas in multidimensional spaces.  Spatial data objects are not well-represented by point-location.  An „index‟ based on an object‟s spatial location is desirable. 9L7 - R-TreesCS 6213 - Advanced Data Structures - Arora Problem Summary: To retrieve data items quickly and efficiently according to their spatial locations.
  • 10.
     A B-Treeis an ordered, dynamic, multi-way structure of order m (i.e. each node has at most m children).  The keys and the subtrees are arranged in the fashion of a search tree.  Each node may contain a large number of keys, and the number of subtrees in each node, then, may also be large.  The B-Tree is designed (among other objectives):  to branch out this large number of directions, and  to contain a lot of keys in each node so that the height of the tree is relatively short. 10 M P T X B D F G K L N O Q S V W Y ZI E H L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 11.
     A height-balancedtree, similar to a B-Tree.  Index records in the leaf nodes contain pointers to the actual spatial-objects (entries) they represent.  Each entry has a unique identifier that points to one spatial object, and its MBR; i.e., entry = (MBR, pointer).  Spatial searching requires visiting only a small number of nodes.  The index is completely dynamic: inserts and deletes can be intermixed with searches. (No periodic reorganization is required.) 11L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 12.
     Let Mbe the maximum number of entries that will fit in one node.  Let m ≤ M/2 be a parameter specifying the minimum number of entries in one node. Then an R-Tree must satisfy the following properties: 1. Every leaf node contains between m and M index records, unless it is the root. 2. For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR that spatially contains the n-dimensional data object represented by the tuple-identifier. 3. Every non-leaf node has between m and M children, unless it is the root. 4. For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the regions in the child node. 5. The root has two children unless it is a leaf. 6. All leaves appear on the same level. 12L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 13.
     An entryE in a leaf node is defined as: E = (I, tuple-identifier)  Where I refers to the smallest binding n-dimensional region (MBR) that encompasses the spatial data pointed to by its tuple- identifier.  I is a series of closed-intervals that make up each dimension of the binding region.  Example. In 2D, I = (Ix, Iy), where Ix = [xa, xb], and Iy = [ya, yb]. 13L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 14.
    [Not limited to2D – higher dimensions are certainly possible.]  In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb].  If either ka or kb (or both) are equal to , this means that the spatial object extends outward indefinitely along that dimension. 14L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 15.
     An entryE in a non-leaf node is defined as: E = (I, child-pointer)  Where the child-pointer points to the child of this node, and I is the MBR that encompasses all the regions in the child-node‟s pointer‟s entries. 15 I(A) I(B) … I(M) I(a) I(b) I(c) I(d) B a b c d L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 16.
    L7 - R-TreesCS6213 - Advanced Data Structures - Arora 16
  • 17.
    a b cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 17
  • 18.
    a b c d m a b cde f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 18
  • 19.
    a b c d m e f n a bcd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 19
  • 20.
    a b c d m e f n h g i o p ab cd e f g h i j k l m n o p L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 20
  • 21.
    21 Typical query: Find andreport all university building sites that are within 5km of the city centre. Approach: i.Build the R-Tree using rectangular regions a, b, … i. ii.Formulate the query range Q. iii.Query the R- Tree and report all regions overlapping Q. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 22.
    Let Q bethe query region. Let T be the root of the R-Tree. Search all entry-records whose regions overlaps Q. Search sub-trees:  If T is not leaf, then apply Search on ever child-node entry E whose I overlaps Q. Search leaf nodes:  If T is leaf, then check each entry E in the leaf and return E if E.I overlaps Q. 22L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 23.
    23 r2 e r5 r8 r3 r4r1r7r0 ic gf hba d @ r6 @ r2 @ r5 @ r8 @ r0 @ r1 @ r7 @ r3 @ r4 R-Tree settings: M = m = L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 24.
    24  The searchalgorithm descends the tree from the root in a manner similar to a B-Tree.  More than one subtree under a node visited may need to be searched.  Cannot guarantee good worst-case performance.  Countered by the algorithms during insertion, deletion, and update that maintain the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space.  So that only data near the search area need to be examined.  Emphasis is on the optimal placement of spatial objects with respect to the spatial location of other objects in the structure. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 25.
     A Node-Overflowhappens when a new Entry is added to a fully packed node, causing the resulting number of entries in the node to exceed the upper-bound M.  The „overflow‟ node must be split, and all its current entries, as well as the new one, consolidated for local optimum arrangement.  A Node-Underflow happens when one or more Entries are removed from a node, causing the remaining number of entries in that node to fall below the lower-bound m.  The underflow node must be condensed, and its entries dispersed for global optimum arrangement. 25L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 26.
    26  New indexentry-records are added to the leaves.  Nodes that overflow are split, and splits propagate up the tree.  A split-propagation may cause the tree to grow in height. The main Insert routine  Let E = (I, tuple-identifier) be the new entry to be inserted.  Let T be the root of the R-Tree.  [Ins_1] Locate a leaf L starting from T to insert E.  [Ins_2] Add E to L. If L is already full (overflow), split L into L and L‟.  [Ins_3] Propagate MBR changes (enlarged or reduced) upwards.  [Ins_4] Grow tree taller if node split propagation causes T to split. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 27.
     Similar toinsertion into B+-tree but may insert into any leaf; leaf splits in case capacity exceeded.  Which leaf to insert into? (Choose Leaf)  How to split a node? (Node Split) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 27
  • 28.
    m n o p L7 -R-TreesCS 6213 - Advanced Data Structures - Arora 28
  • 29.
    29 [Ins_1] Locate aleaf L starting from T to insert E = (I, tuple-identifier).  Notion (i): Select the path that would require the least enlargement to include E.I.  Notion (ii): Resolve ties by choosing the child-node with the smallest MBR.  Invoke: L = ChooseLeaf (E, T). A B C @rN A C B E.I rN L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 30.
    30 Algorithm: ChooseLeaf (E,N) Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N. Output: The leaf L where E should be inserted.  If N is leaf Then Return N  Let FS be the set of current entries in the node N  Let F = (I, child-pointer) FS, so that F.I satisfies the Insertion- Notions  Return ChooseLeaf (E, F.child-pointer) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 31.
    31 [Ins_2] Add Eto L.  Notion (i): If L has room for another entry, install E.  Notion (ii): Otherwise split L to obtain L and L‟, which between them, will contain all previous entries in L and the new E (consolidated for local optima). [Ins_3] Propagate MBR changes upwards by invoking AdjustTree (L, L‟).  Notion (i): Ascend from leaf L to the root T while adjusting the covering rectangles MBR.  Notion (ii): If L‟ exists, propagate node splits as necessary; i.e. attempt to install a new entry in the parent of L to point to L‟. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 32.
    32 Example. Found L= @Y to insert new E = e. R-Tree settings: M = 3, m = 1. K @G a b c @Y X Y Z @K L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 33.
    33 Algorithm: AdjustTree (N,N’) Inputs: (i) A node N that has had its contents modified, (ii) The resultant split node N‟, if not NULL, that accompanies N. Outputs: (i) N as above, (ii) N‟ as above.  If N is the root Then Return {(i) N, (ii) N‟}  Let PN be the parent node of N.  Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points to N.  Adjust I_N so that it tightly encloses all entry regions in N. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 34.
    34  If N‟is Not NULL Then  If number of entries in PN < M-1 Then  Create a new Entry EN‟ = (I_N’, child-pointer_N’)  Install EN‟ in PN  Return AdjustTree (PN, NULL)  Else  Set {PN, PN‟} = SplitNode (PN, EN‟)  Return AdjustTree (PN, PN‟)  End If  Else  Return AdjustTree (PN, NULL)  End If L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 35.
    [Ins_4] Grow Treetaller.  Notion: If the recursive node split propagation causes the root to split, then create a new root whose children are the two resulting nodes. 35 A B C @T (root) E F @C G H @C’ L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 36.
    36  The heightof the R-Tree containing n entry-records is at most logm n – 1, because the branching factor of each node is at least m.  The maximum number of nodes is:  Worst case space utilisation for all nodes except the root is:  Nodes will tend to have more than m entries, and this will: L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 37.
    37  Current indexentry-records are removed from the leaves.  Nodes that underflow are condensed, and its contents redistributed appropriately throughout the tree.  A condense propagation may cause the tree to shorten in height. The main Delete routine  Let E = (I, tuple-identifier) be a current entry to be removed.  Let T be the root of the R-Tree.  [Del_1] Find the leaf L starting from T that contains E.  [Ins_2] Remove E from L, and condense „underflow‟ nodes.  [Ins_3] Propagate MBR changes upwards.  [Ins_4] Shorten tree if T contains only 1 entry after condense propagation. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 38.
     [Del_1] Findthe leaf L starting from T that contains E.  Algorithm: FindLeaf (E, N)  Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.  Output: The leaf L containing E.  If N is leaf Then  If N contains E Then Return N  Else Return NULL  Else  Let FS be the set of current entries in N.  For each F = (I, child-pointer) FS where F.I overlaps E.I Do  Set L = FindLeaf (E, F.child-pointer)  If L is not NULL Then Return L  Next F  Return NULL  End If 38L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 39.
    [Del_2] Remove Efrom L, and condense „underflow‟ nodes. [Del_3] Propagate MBR changes upwards.  Notion (i): Ascend from leaf L to root T while adjusting covering rectangles MBR.  Notion (ii): If after removing the entry E in L and the number of entries in L becomes fewer than m, then the node L has to be eliminated and its remaining contents relocated. 39L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 40.
     Propagate thesenotions upwards by invoking CondenseTree (N, QS), where N is an R-Tree node whose entries have been modified, and QS is the set of eliminated nodes.  Start the propagation by setting N = L, and QS = .  Re-insert the entries from the eliminated nodes in QS back into the tree.  Entries from eliminated leaf nodes are re-inserted as new entries using the Insert routine discussed earlier.  Entries from higher-level nodes must be placed higher in the tree so that leaves of their dependent subtrees will be on the same level as the leaves on the main tree. 40L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 41.
     Example: Deletethe index entry-record b. R-Tree settings: M = 4, m = 2.  Spatial constraint: a.I will form smallest MBR with r4. 41 r2 r6 @ r7 a b @ r0 r0 r1 @ r2 r3 r4 r5 @ r6 c d e @ r1 f g h @ r3 i j @ r4 k l m @ r5 n L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 42.
    42 Algorithm: CondenseTree (N,QS) Inputs: (i) A node N whose entries have been modified, (ii) A set of eliminated nodes QS.  If N is NOT the root Then  Let PN be the parent node of N.  Let EN = (I_N, child-pointer_N) in PN.  If N.entries < m Then  Delete EN from PN  Add N to QS  Else  Adjust I_N so that it tightly encloses all entry regions in N.  End If  CondenseTree (PN, QS) L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 43.
    43  Else IfN is root AND Q is NOT Then  For each Q QS Do  For each E Q Do  If Q is leaf Then Insert (E)  Else Insert (E) as a node entry at the same node level as Q  End If  Next E  Next Q  End If L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 44.
    Why ‘re-insert’ orphanedentries?  Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981), an „underflow‟ node can be merged with whichever adjacent sibling that will have its area increased the least, or its entries re-distributed among sibling nodes.  Both methods can cause the nodes to split.  Eventually all changes need to be propagated upwards, anyway. Re-insertion accomplishes the same thing, and:  It is simpler to implement (and at comparable efficiency).  It incrementally refines the spatial structure of the tree.  It prevents gradual deterioration if each entry was located permanently under the same parent node. 44L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 45.
    45  A highvalue of m, nearer to M, is useful when the underlying database represented by the R-Tree is mostly used for search inquiries with very few updates.  The height of the tree will be kept to a minimum.  High search performance is maintained.  However, the risk of overflow and underflow is also high.  A small value of m is good when frequent updates and modifications of the underlying database is required.  The nodes are less dense.  Maintenance is less costly. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora
  • 46.
     Avoids multiplepaths during searching.  Objects may be stored in multiple nodes  MBRs of nodes at same tree level do not overlap  On insertion/deletion the tree may change downward or upward in order to maintain the structure L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 46 R-TreeVariants
  • 47.
    http://perso.enst.fr/~saglio/bdas/EPFL0525/sld041.htm L7 - R-TreesCS6213 - Advanced Data Structures - Arora 47 R-TreeVariants
  • 48.
     Similar toother R-Trees except that the Hilbert value of its rectangle centroid is calculated.  That key is used to guide the insertion  On an overflow, evenly divide between two nodes  Experiments has shown that this scheme significantly improves performance and decreases insertion complexity.  Hilbert R-tree achieves up to 28% saving in the number of pages touched compared to R*-tree. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 48 R-TreeVariants
  • 49.
     The Hilbertvalue of an object is found by interleaving the bits of its x and y coordinates, and then chopping the binary string into 2- bit strings.  Then, for every 2-bit string, if the value is 0, we replace every 1 in the original string with a 3, and vice-versa.  If the value of the 2-bit string is 3, we replace all 2‟s and 0‟s in a similar fashion.  After this is done, you put all the 2-bit strings back together and compute the decimal value of the binary string;  This is the Hilbert value of the object.  http://www-users.cs.umn.edu/research/shashi- group/CS8715/exercise_ans.doc L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 49 R-TreeVariants
  • 50.
     Proposed byNorbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990  Same algorithm as the regular R-tree for query and delete operations.  When inserting, the R*-tree uses a combined strategy.  For leaf nodes, overlap is minimized  For inner nodes, enlargement and area are minimized.  When splitting, the R*-tree uses a topological split that chooses a split axis based on perimeter, then minimizes overlap.  In addition to an improved split strategy, the R*-tree also tries to avoid splits by reinserting objects and subtrees into the tree, inspired by the concept of balancing a B-tree. L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 50 R-TreeVariants
  • 51.
     MBR: MinimumBounding Rectangle  R-Trees are an extremely compelling data structure for spatial data.  Largely based on B-Tree (Can be considered a generalization of B-Tree)  Can support more than two dimensions  Support same basic operations (deletion, searching, insertion, update, etc.)  Many variants of R-Trees are available L7 - R-TreesCS 6213 - Advanced Data Structures - Arora 51