KEMBAR78
Rtree | PDF | Areas Of Computer Science | Applied Mathematics
0% found this document useful (0 votes)
87 views33 pages

Rtree

The document discusses the R-tree, which is a multi-dimensional index structure that extends the B-tree. The R-tree supports efficient orthogonal range queries by storing minimum bounding rectangles (MBRs) that encompass groups of data points. It discusses how the R-tree is structured, how queries are performed by recursively checking if node MBRs intersect the search region, and how data is inserted by choosing subtrees to minimize the perimeter of bounding rectangles.

Uploaded by

milagros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views33 pages

Rtree

The document discusses the R-tree, which is a multi-dimensional index structure that extends the B-tree. The R-tree supports efficient orthogonal range queries by storing minimum bounding rectangles (MBRs) that encompass groups of data points. It discusses how the R-tree is structured, how queries are performed by recursively checking if node MBRs intersect the search region, and how data is inserted by choosing subtrees to minimize the perimeter of bounding rectangles.

Uploaded by

milagros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

The R-Tree

Yufei Tao

ITEE
University of Queensland

INFS4205/7205, Uni of Queensland The R-Tree


We will study a new structure called the R-tree, which can be thought of
as a multi-dimensional extension of the B-tree. The R-tree supports
efficiently a variety of queries (as we will find out later in the course), and
is implemented in numerous database systems. Our discussion in this
lecture will focus on orthogonal range reporting.

INFS4205/7205, Uni of Queensland The R-Tree


2D Orthogonal Range Reporting (Window Query)

Let S be a set of points in R2 . Given an axis-parallel rectangle q, a range


query returns all the points of S that are covered by q, namely, S ∩ q.
The definition can be extended to any dimensionality in a straightforward
manner.
Example
a
b
c

d
e
f
g The result is {d, e, g } for the
i h shaded rectangle q.
j
k
l

INFS4205/7205, Uni of Queensland The R-Tree


Applications

Find all restaurants in the Manhattan area.


Find all professors whose ages are in [20, 40] and their annual
salaries are in [200k, 300k].
...

INFS4205/7205, Uni of Queensland The R-Tree


R-Tree

Each leaf node has between 0.4B and B data points, where B ≥ 3
is a parameter. The only exception applies when the leaf is the root,
in which case it is allowed to have between 1 and B points. All the
leaf nodes are at the same level.
Each internal node has between 0.4B and B child nodes, except
when the node is the root, in which case it needs to have at least 2
child nodes.

In practice, for a disk-resident R-tree, the value of B depends on the


block size of the disk so that each node is stored in a block.

INFS4205/7205, Uni of Queensland The R-Tree


R-Tree

For any node u, denote by Su the set of points in the subtree of u.


Consider now u to be an internal node with child nodes v1 , ..., vf
(f ≤ B). For each vi (i ≤ f ), u stores the minimum bounding rectangle
(MBR) of Svi , denoted as MBR(vi ).

The above is an MBR on 7 points.

INFS4205/7205, Uni of Queensland The R-Tree


Example

Assume B = 3.
e2
a
b c

d e6
e
e7 u1
e5 g f e2 e3
h
e4 u2 u3
i j e4 e5 e6 e7 e8
e8
k
l e3
i l a d g b c e h f k j
u4 u5 u6 u7 u8

INFS4205/7205, Uni of Queensland The R-Tree


Answering a Range Query

Let q be the search region of a range query. Below we give the


pseudo-code of the query algorithm, which is invoked as
range-query(root, q), where root is the root of the tree.

Algorithm range-query(u, r )
1. if u is a leaf then
2. report all points stored at u that are covered by r
3. else
4. for each child v of u do
5. if MBR(v ) intersects r then
6. range-query(v , r )

INFS4205/7205, Uni of Queensland The R-Tree


Example

Nodes u1 , u2 , u3 , u5 , u6 are accessed to answer the query with the shaded


search region.
e2
a
b c

d e6
e
e7 u1
e5 g f e2 e3
h
e4 u2 u3
i j e4 e5 e6 e7 e8
e8
k
l e3
i l a d g b c e h f k j
u4 u5 u6 u7 u8

INFS4205/7205, Uni of Queensland The R-Tree


R-Tree Construction Can Be “Arbitrary”

Have you wondered why the leaf nodes are created in this way? For
example, is it absolutely necessary to group i and l into a leaf node?

a
b c

d
e
g f
h

i j
k
l

The R-tree definition has no formal constraint whatsoever on the


grouping of data into nodes (unlike B-trees), but some R-trees have
poorer performance than others; see the next slide.

INFS4205/7205, Uni of Queensland The R-Tree


R-Tree Construction Can Be “Arbitrary”

Is this a good R-tree?


e5
a e6
b c

d e8
e
e7 u1
g f e2 e3
h
u2 u3
i j e4 e5 e6 e7 e8
k
l
e4 l e a i g b c k h f d j
u4 u5 u6 u7 u8

Implication?

INFS4205/7205, Uni of Queensland The R-Tree


R-Tree Construction: A Common Principle

In general, the construction algorithm of the R-tree aims at minimizing


the perimeter sum of all the MBRs.
For example, the left tree has a smaller perimeter sum than the right one.

a e5
a e6
b c c
b
d d e8
e e
g e7
f g f
h h
i j j
i
k k
l l
e4

INFS4205/7205, Uni of Queensland The R-Tree


R-Tree Construction: A Common Principle

Why not minimize the area?


A rectangle with a smaller perimeter usually has a smaller area, but not
the vice versa. Later in the course, we will see an analysis that formally
validates this intuition.

The above two rectangles have the same area.

INFS4205/7205, Uni of Queensland The R-Tree


Insertion

Let p be the point being inserted. The pseudo-code below should is


invoked as insert(root, p), where root is the root of the tree.

Algorithm insert(u, p)
1. if u is a leaf node then
2. add p to u
3. if u overflows then
/* namely, u has B + 1 points */
4. handle-overflow(u)
5. else
6. v ← choose-subtree(u, p)
/* which subtree under u should we insert p into? */
7. insert(v , p)

INFS4205/7205, Uni of Queensland The R-Tree


Choose-Subtree

Which MBR would you insert p into?

Algorithm choose-subtree(u, p)
1. return the child whose MBR requires the
minimum increase in perimeter to cover p.
break ties by favoring the smallest MBR.

INFS4205/7205, Uni of Queensland The R-Tree


Overflow Handling

Algorithm handle-overflow(u)
1. split(u) into u and u 0
2. if u is the root then
3. create a new root with u and u 0 as its child nodes
4. else
5. w ← the parent of u
6. update MBR(u) in w
7. add u 0 as a child of w
8. if w overflows then
9. handle-overflow(w )

INFS4205/7205, Uni of Queensland The R-Tree


Splitting a Leaf

Essentially we are dealing with the following problem:

Let S be a set of B + 1 points. Divide S into two disjoint sets S1


and S2 to minimize the perimeter sum of MBR(S1 ) and MBR(S2 ),
subject to the condition that |S1 | ≥ 0.4B and |S2 | ≥ 0.4B.

Example

The left split is better:


h h
a k a k

d f d f
b b
g g
j j
e e
c i c i

S1 = {a, b, c, d, e} S1 = {a, d, e, g , j}
S2 = {f , g , h, i, j, k} S2 = {b, c, f , h, i, k}

INFS4205/7205, Uni of Queensland The R-Tree


Splitting a Leaf Node

Let m = |S|. In 2D space, the leaf-split problem can be solved in O(m5 )


time, noticing that each MBR is determined by 4 points.
This, however, is too expensive. In practice, heuristics are used to
accelerate the process, but there is no guarantee that we can find the
best split — typical “trading quality for efficiency”.
The next slide explains how.

INFS4205/7205, Uni of Queensland The R-Tree


Splitting a Leaf Node

Algorithm split(u)
1. m = the number of points in u
2. sort the points of u on x-dimension
3. for i = d0.4Be to m − d0.4Be
4. S1 ← the set of the first i points in the list
5. S2 ← the set of the other i points in the list
6. calculate the perimeter sum of MBR(S1 ) and MBR(S2 ); record it
if this is the best split so far
7. Repeat Lines 2-6 with respect to y-dimension
8. return the best split found

INFS4205/7205, Uni of Queensland The R-Tree


Example

h h h
a a a

d f d f d f
b b b
g g g
j j j
e e e
c i c i c i

There are 3 possible splits along the x-dimension. Remember that each
node must have at least 0.4B = 4 points (here B = 10).

INFS4205/7205, Uni of Queensland The R-Tree


Think:
How to implement the algorithm in O(n log n) time?
Find a counter-example where the algorithm does not give
an optimal split.
We have discussed only the 2D case. How to extend the
algorithm to dimensionality d ≥ 3?

INFS4205/7205, Uni of Queensland The R-Tree


Splitting an Internal Node

Let S be a set of B +1 rectangles. Divide S into two disjoint sets S1


and S2 to minimize the perimeter sum of MBR(S1 ) and MBR(S2 ),
subject to the condition that |S1 | ≥ 0.4B and |S2 | ≥ 0.4B.

Once again, we will settle for an algorithm that is fast but does not
always return an optimal split.

INFS4205/7205, Uni of Queensland The R-Tree


Splitting an Internal Node

Algorithm split(u)
/* u is an internal node */
1. m = the number of points in u
2. sort the rectangles in u by their left boundaries on the x-dimension
3. for i = d0.4Be to m − d0.4Be
4. S1 ← the set of the first i rectangles in the list
5. S2 ← the set of the other i rectangles in the list
6. calculate the perimeter sum of MBR(S1 ) and MBR(S2 ); record it
if this is the best split so far
7. Repeat Lines 2-6 with respect to the right boundaries on the x-dimension
8. Repeat Lines 2-7 w.r.t. the y-dimension
9. return the best split found

INFS4205/7205, Uni of Queensland The R-Tree


Example

h h h
a d f a d f a d f

j j j
b e b e b e
g g g
c i c i c i

There are 3 possible splits w.r.t. the left boundaries on the x-dimension.
Remember that each node must have at least 0.4B = 4 points (here
B = 10).

INFS4205/7205, Uni of Queensland The R-Tree


Insertion Example

Assume that we want to insert the white point m. By applying


choose-subtree twice, we reach the leaf node u6 that should
accommodate m. The node overflows after incorporating m (recall
B = 3).

e2
a
b c
m u1
d e6
e e2 e3
e5 g e7
f u2
h u3
e4 e4 e5 e6 e7 e8
i j
e8
k
l e3 g
i l a d b c e m h f k j
u4 u5 u6 u7 u8

INFS4205/7205, Uni of Queensland The R-Tree


Insertion Example

Node u6 splits, generating u9 . Adding u9 as a child of u3 causes u3 to


overflow.

e2
a
b c
e6 m u1
e9
d
e e2 e3
e5 g e7
f u2
h u3
e4 e4 e5 e6 e9 e7 e8
i j
e8
k
l e3 a g c e m h f j
i l d b k
u4 u5 u6 u9 u7 u8

INFS4205/7205, Uni of Queensland The R-Tree


Insertion Example

Node u3 splits, generating u10 . The insertion finishes after adding u10 as
a child of the root.

e2
a e10
b c
e6 m
e9
d u1
e
e5 e7 e2 e10 e3
g f
h u2 u3
e4
e4 e5 u10 e e e7 e8
i j 6 9
e8
k
l e3 g
i l a d b c e m h f k j
u4 u5 u6 u9 u7 u8

INFS4205/7205, Uni of Queensland The R-Tree


Although not required in this course, there are fast algorithms to
perform deletions from an R-tree. The instructor will discuss this
in the class if time permits.

INFS4205/7205, Uni of Queensland The R-Tree


2D Orthogonal Range Reporting (Window Query) on Rectangles

Let S be a set of axis-parallel rectangles in R2 . Given an axis-parallel


rectangle r , a range query returns all the rectangles of S that are covered
by r , namely, S ∩ r .
The definition can be extended to any dimensionality in a straightforward
manner.
Example
a
b
c

d e
h
f
g The result is {a, d, g , e, h} for the
i shaded rectangle q.
j
k
l

INFS4205/7205, Uni of Queensland The R-Tree


An R-Tree on Rectangles

Same as an R-tree on points with two changes:

Leaf nodes store data rectangles (as opposed to points).


Splitting a leaf node is performed using the internal-split algorithm
mentioned earlier.

INFS4205/7205, Uni of Queensland The R-Tree


An R-Tree on Line Segments—Motivation

Consider a geographic information system (GIS) that manages road


segments. The above figure shows an example where the blue segments
represent a road, whereas the green ones represent another. Suppose that
we want to retrieve all the road segments in the shaded window (think of
the window as the user’s browser, for instance). How to adapt the R-tree
to support this type of retrieval?

INFS4205/7205, Uni of Queensland The R-Tree


Filter Refinement

A common solution implemented in commercial systems is based on the


filter refinement framework. In the filter step, we quickly retrieve a
candidate set of segments that is guaranteed to contain the final result.
The refinement step then examines every segment in the candidate set to
remove the “false hits”.
A predominant approach do realize the filter step is to create an MBR on
every segment, and use an R-tree to find the MBRs that intersect the
query. The segments corresponding to those MBRs constitute the
candidate set.

INFS4205/7205, Uni of Queensland The R-Tree


Filter Refinement

The dashed rectangles are the MBRs on the segments. What are the
segments retrieved by the filter step? Are there any false hits?

INFS4205/7205, Uni of Queensland The R-Tree

You might also like