KEMBAR78
G3 - R-Tree, R+-Tree | PDF | Database Index | Computer Data
0% found this document useful (0 votes)
171 views47 pages

G3 - R-Tree, R+-Tree

The document discusses spatial data indexing and describes the R-tree, a multidimensional index that groups spatial objects into minimum bounding rectangles and organizes them hierarchically in a tree structure to support efficient spatial queries and updates. It explains the structure of R-trees, as well as algorithms for search, insertion, and deletion operations on R-trees that involve adjusting bounding rectangles and propagating changes up the tree.

Uploaded by

Trung Hiếu Vũ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views47 pages

G3 - R-Tree, R+-Tree

The document discusses spatial data indexing and describes the R-tree, a multidimensional index that groups spatial objects into minimum bounding rectangles and organizes them hierarchically in a tree structure to support efficient spatial queries and updates. It explains the structure of R-trees, as well as algorithms for search, insertion, and deletion operations on R-trees that involve adjusting bounding rectangles and propagating changes up the tree.

Uploaded by

Trung Hiếu Vũ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Spatial Data Indexing

R-Tree
Group 3
Dương | Hiếu | Nam
Content
A. Introduction
Index | B-Tree | B+-Tree

B. Spatial Data Indexing


Introduction to Spatial Databases
R tree (Structure and Search/Insert/Delete action)
R+-Tree and R*-Tree
A. Introduction
Indexes
It is a data structure technique
which is used to quickly locate and
access the data in a database.

● Single-Level Ordered Indexes


● Multilevel-Indexes
Single-Level Indexes
● Primary Index is defined on an ordered data file. The data file is ordered
on a key field. The key field is generally the primary key of the relation.

● Clustering Index is defined on an ordered data file. The data file is ordered
on a non-key field.

● Secondary Index may be generated from a field which is a candidate key


and has a unique value in every record, or a non-key with duplicate values.
Multi-Level Indexes
A single-level index might become
too large a size to store with multiple
disk accesses.

The multilevel indexing segregates


the main block into various smaller
blocks so that the same can stored in
a single block.
Search Tree
A special type of tree that is used
to guide the search for a record,
given the value of one of the
record’s fields.

Use a search tree as a mechanism


to search for records stored in a
disk file.
B-Tree
B-Tree is a self-balancing search tree and a fat tree

The goals for balancing a search tree are as follows:


● To guarantee that nodes are evenly distributed, so that the depth of the
tree is minimized
● To make the search speed uniform, so that the average time to find any
random key is roughly the same
B-Tree

A node in a B-tree
B-Tree

B-tree of order p = 3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6.


+
B -Tree
Most implementations of a dynamic multilevel index use a variation of the B
tree data structure called a B+-tree.

● In a B-tree, every value of the search field appears once at some level in
the tree, along with a data pointer.

● In a B+-tree, data pointers are stored only at the leaf nodes of the tree,
so the structure of leaf nodes differs from the structure of internal nodes
+
B -Tree

The structure of the internal nodes of a B+ tree


+
B -Tree

Using the Pnext pointer it links all the leaf nodes, just like a linked list, thereby
achieving ordered access to the records stored in the disk.
+
B -Tree

An example of B+ Tree
+
B-Tree vs B -Tree
B-Tree B+-Tree

Search keys can not be repeatedly stored Search keys can be repeatedly stored

Data is stored in leaf and internal nodes Data is only stored on leaf nodes

Searching slower Searching faster

Internal node deletion is complicated Deletion is not a complex process

Leaf nodes can not be linked together Leaf nodes are linked together
B. Spatial Data
Indexing
Introduction to Spatial Databases (1/4)
Spatial databases are optimized for storing and querying spatial data that
represents objects defined in a geometric space (k-dimensional).

Spatial data type:


Point
● Point: has no space, specified by coordinate
● Line: sequence of points
● Region: specified by location and boundary Line Region
Introduction to Spatial Databases (2a/4)
Spatial queries:
● Range queries: Find all objects within a given spatial area
E.g. Find all bus station within a 5 mile radius of supermarket

R
E.g. Find all points in R
Introduction to Spatial Databases (2b/4)
Spatial queries:
● Nearest neighbor queries: Find k-closest objects to a given location
E.g. Find the police car that is closest to the location of a crime

E.g. Find 4-nearest points to P


P
Introduction to Spatial Databases (2c/4)
Spatial queries:
● Spatial join queries: object intersecting, overlapping, containing, etc.

A1 A1
B1
B2
B1 A2
A2 B3 B4
B2

Spatial join for intersection Spatial join for containment


Result: { (A1,B1), (A1,B2), (A2,B1) } Result: { (A2,B3), (A2,B4) }
Introduction to Spatial Databases (3/4)
Indexing problems: B+ trees are single-dimensional indexes, and spatial data
is multidimensional.
Example:
salary
With query “age < 27 AND salary < 600”,
800 single dimensional indexes such as B+
700 trees will encounter troubles.
600
⇒ Spatial index
500
25 26 27 age
Introduction to Spatial Databases (4/4)
Spatial index: A multidimensional or spatial index, in contrast to a B+ tree,
organizes data entries with each key seen as a point or region.

● Indexing structures for point data include Grid files, hB trees, KD trees,
Point Quad trees, and SR trees.
● Indexing structures handle regions as well as point data include Region
Quad trees, SKD trees, and R-Trees.
R-Tree
The R tree is a height-balanced tree, which is an extension of the B+ tree for
k-dimensions, where k > 1.

For 2D, Spatial objects are approximated by their minimum bounding


rectangle (MBR), which is the smallest rectangle, with sides parallel to the
coordinate system axis, containing the object.

MBR = { (P1.x, P1.y) (P2.x, P2.y) }


R-Tree \ Structure (1/3)
● Cluster of data can be grouped into MBRs
● MBRs can be grouped recursively into larger MBRs
R-Tree \ Structure (2/3)
● Nested MBRs are organized as a tree
R-Tree \ Structure (3/3)
● Leaf entry = < k-dimensional box, pointer to underlying object >
○ The k-dimensional box is the tightest bounding box for data object

● Non-leaf entry = < k-dimensional box, pointer to child node >


○ The box cover all boxes in child node (in fact, subtree)
○ Denote: An entry e = < mbr, ptr >

● All leaves are at the same distance from root


● Every nodes can contain between m and M entries (except root).
○ M is the maximum entries of a node in a tree
○ Typically, m is 50% of M
R-Tree \ Search (1/3)
Algorithm Search (Node N, Region Q)

if N is non-leaf
foreach entry e in N // (e=<bmr, ptr>)
if e.mbr overlaps Q, search subtree identified by e.ptr

else // N is leaf
foreach entry e in N
if e.mbr overlaps Q, add e.ptr to the answer list
R-Tree \ Search (2/3)
1 G5 G6 9
G6
G5
F G
A C G1 M
B G3 2 G1 G2 6 G3 G4
H
G4 N
E A B C D E F G H M N
G2 D
3 4 5 7 8
R-Tree \ Search (3/3)
Main points:

● Every parent node completely covers its children

● Nodes in the same level may overlap!

● A child MBR may be covered by more than one parent, but it is stored
under only one of them.

● A point query may follow multiple branches.


R-Tree \ Insertion (1/3)
Algorithm Insert: Insert a new entry E into an R-Tree (similar to insertion in a B-Tree)

I1. [Find position for new record]


invoke ChooseLeaf to select a leaf node L in which to place E.

I2. [Add record to leaf node] If L has room for another entry, install E.
Otherwise invoke SplitNode to obtain L and LL containing E and all the old entries
of L.

I3. [Propagate changes upward] Invoke AdjustTree on L, also passing LL if a split was
performed.

I4. [Grow tree taller] If node split propagation caused the root to split, create a new
root whose children are the two resulting nodes.
R-Tree \ Insertion (2/3)
Algorithm ChooseLeaf: Select a leaf node in which to place a new index entry E.

CL1 [Initialize] Set N to be the root node.

CL2 [Leaf check] If N is a leaf, return N.

CL3 [Choose subtree] If N is not a leaf, let F be the entry in N whose rectangle F I needs
least enlargement to include E I. Resolve ties by choosing the entry with the
rectangle for smallest area.

CL4 [Descend until a leaf is reached] Set N to be the child node pointed to by Fp and
repeat from CL2.
R-Tree \ Insertion (3a/3)
Algorithm AdjustTree: Ascend from a leaf node L to the root, adjusting covering
rectangles and propagating node splits as necessary.

AT1 [Initialize] Set N=L. If L was split previously, set NN to be the resulting second node.

AT2 [Check if done] If N is the root, stop.

AT3 [Adjust covering rectangle in parent entry] Let P be the parent node of N,
and Let EN be the N’s entry in P.
Adjust EN I so that it tightly encloses all entry rectangles in N.
R-Tree \ Insertion (3b/3)
Algorithm AdjustTree: Ascend from a leaf node L to the root, adjusting covering
rectangles and propagating node splits as necessary.

AT4 [Propagate node split upward] If N has a partner NN resulting from an earlier split,
create a new entry ENN with ENNp pointing to NN and ENNI enclosing all rectangles
in NN Add ENN to P if there is room. Otherwise, invoke SplitNode to produce P and
PP containing ENN and all P’s old entries.

AT5 [Move up to next level] Set N=P and set NN=PP if a split occurred.
Repeat from AT2.
R-Tree \ Deletion (1/3)
Algorithm Delete: Remove index record E from an R-tree

D1 [Find node containing record] Invoke FindLeaf to location the leaf L containing E.
Stop if the record was not found.

D2 [Delete record] Remove E from L.

D3 [Propagate changes] Invoke CondenseTree, passing L.

D4 [Shorten tree] If the root node has only one child after the tree has been adjusted,
make the child the new root.
R-Tree \ Deletion (2/3)
Algorithm FindLeaf: Given an R-tree whose root node is T, find the leaf node containing
the index entry E.

FL1. [Search subtrees] If T is not a leaf, check each entry F in T to determine if F I overlaps
E l.
For each such entry invoke FindLeaf on the tree whose root is pointed to by F p
until E is found or all entries have been checked.

FL2. [Search leaf node for record] If T is a leaf, check each entry to see if it matches E.
If E is found return T.
R-Tree \ Deletion (3a/3)
Algorithm CondenseTree: Given a leaf node L from which an entry has been deleted,
eliminate the node if it has too few entries and relocate its entries Propagate node
elimination upward as necessary. Adjust all covermg rectangles on the path to the root,
making them smaller if possible.

CT1 [Initialize] Set N=L. Set Q, the set of eliminated nodes, to be empty.

CT2 [Find parent entry] If N is the root, go to CT6.


Otherwise let P be the parent of N, and let EN be N’s entry in P.

CT3 [Eliminate under-full node] If N has fewer than m entries,


delete EN from P and add N to set Q.
R-Tree \ Deletion (3b/3)
Algorithm CondenseTree: (tbc.)

CT4 [Adjust covering rectangle] If N has not been eliminated , adjust EN I to tightly contain
all entries in N.

CT5 [Move up one level in tree] Set N=P and repeat from CT2.

CT6 [Re-insert orphaned entries] Re-insert all entnes of nodes in set Q.


Entries from eliminated leaf nodes are re-inserted in tree leaves as described in
Algorithm Insert, but entries from higher-level nodes must be placed higher in the
tree, so that leaves of their dependent subtrees will be on the same level as leaves
of the main tree.
R-Tree \ Node Splitting
Problem: In order to add a new entry to
a full node containing M entries, it is
necessary to divide the collection of
M+1 entries between two nodes.

Solutions:

- Exhaustive Algorithm
- A Quadratic-cost Algorithm
- A Linear-Cost Algorithm

Figure 3 1 illustrates the area of the covering


rectangles in the “bad split” case is much larger
than in the “good split” case.
Variations of R-Tree
Timos K. Sellis, etc. 1987. The R+-Tree: A Dynamic Index for Multi-Dimensional
Objects. In Proceedings of the 13th International Conference on Very Large Data Bases
(VLDB '87), San Francisco, CA, USA, 507-518.

Norbert Beckmann, etc. 1990. The R*-tree: an efficient and robust access method
for points and rectangles. In Proceedings of the 1990 ACM SIGMOD international
conference on Management of data (SIGMOD '90). ACM, New York, NY, USA, 322-331.
R+-Tree \ Introduction (1/2)
Considering the performance of R-tree searching, the concepts of coverage and overlap
are important.

- Coverage of a level of an R-tree is defined as the total area of all the rectangles
associated with the nodes of that level.

- Overlap of a level of an R-tree is defined as the total area contained within two or
more nodes.
R+-Tree \ Introduction (2/2)
Obviously, efficient R-tree searching demands
that both overlap and coverage be minimized.

- Minimal coverage reduces the amount of


dead space (i.e. empty space) covered by
the nodes.

- Minimal overlap is even more critical than


minimal coverage.
R-Tree vs R+-Tree
R*-Tree \ Introduction
Tbc.
R-Tree vs R*-Tree
Tbc.
References
[1] R. Elmasri, S.B. Navathe: “Fundamentals of Database Systems”, 7th Edition,
Pearson Addison-Wesley, 2016

[2] R. Ramakrishnan: “Database management systems”, 3rd Edition, McGraw-Hill,


2003

[3] Y. Manolopoulos, A. Nanopoulos: “R-Trees: Theory and Application”, Springer, 2005


References
[4] Antonin Guttman. 1984. R-trees: a dynamic index structure for spatial searching.
In Proceedings of the 1984 ACM SIGMOD international conference on Management of
data (SIGMOD '84). ACM, New York, NY, USA, 47-57.

[5] Timos K. Sellis, etc. 1987. The R+-Tree: A Dynamic Index for Multi-Dimensional
Objects. In Proceedings of the 13th International Conference on Very Large Data Bases
(VLDB '87), San Francisco, CA, USA, 507-518.

[6] Norbert Beckmann, etc. 1990. The R*-tree: an efficient and robust access method
for points and rectangles. In Proceedings of the 1990 ACM SIGMOD international
conference on Management of data (SIGMOD '90). ACM, New York, NY, USA, 322-331.
Q&A

You might also like