11 Iterators Relalg
11 Iterators Relalg
Alvin Cheung
Fall 2024
• Design question: how to package these algos such that they can be
reused by others?
• Two challenges
• Different relational operations want to use the same algo
• Tuples are not all read in at the same time
• sort(data) doesn’t work!
• init(predicate):
child.init()
pred = predicate; // local variable storing state
current = NULL; // local cursor
• next():
while (current != EOF && !pred(current))
current = child.next(); // give us another tuple
} // exit if pred is satisfied or EOF
return current; // return current tuple or EOF
• close():
child.close()
Example: Heap Scan
• Leaf of the query plan
• init(relation):
heap = open heap file for this relation; // file handle
cur_page = heap.first_page(); // first page
cur_slot = cur_page.first_slot(); // first slot on that page
• next():
if (cur_page == NULL) return EOF;
current = [cur_page, cur_slot]; // we will return this recordId
cur_slot = cur_slot.advance(); // advance the slot for subseq. calls
if (cur_slot == NULL) { // advance to next page, first slot
cur_page = cur_page.advance();
if (cur_page != NULL)
cur_slot = cur_page.first_slot();
}
return current;
• close():
heap.close() // close file
Example: Group By on Sorted input
agg_type state init merge(x) final
COUNT count 0 count ++ count
GroupBy
SUM sum 0 sum += x sum
AVG ? ? ? ?
Sorted on
group col
Sort
MIN ? ? ? ?
• close():
child.close()
Neat: only maintains one tuple of partial results in memory at any time!
A Full (Single Table) Query Plan
• A Query Plan is Single-threaded!
GroupBy
• Exercise: trace the calls that lead to flow of tuples:
• Call init() on the root GroupBy
• How does init() recurse down the chain and return? Sort
• Call next() on root
• How does next() recurse down the chain and return a tuple?
Select
• Note how the blocking operator (sort) interacts with the other,
streaming operators
Heap
• Select and GroupBy are essentially streaming operators Scan
• We don’t store each operator output on disk; tuples stream through
the plan’s call stack
• Some operators like Sort use disk internally – but not exposed outside
the operator
Summary
• We just finished our tour of the upper layer of the DBMS
• We have now seen how SQL queries can be represented using relational
algebra trees
Alvin Cheung
Fall 2024
Database
An Overview of Query Execution
SQL Query Relational Algebra
SELECT S.name
FROM Reserves R, Sailors S Query Parser !S.name("bid=100⋀rating>5(
WHERE R.sid = S.sid
AND R.bid = 100
& Optimizer
Reserves ⋈R.sid=S.sid Sailors))
AND S.rating > 5
…
a le n t to
E q u iv On-the-fly
Optimized (Physical) Query Plan:
(Logical) Query Plan:
Project Iterator
!S.name !S.name
On-the-fly
But actually will "S.rating>5 Select Iterator
"R.bid=100 ⋀ S.rating > 5
⋈R.sid=S.s
produce…
Indexed Nested
Loop Join Iterator
⋈R.sid=S.s Operator Code
id
Heap Scan
id B+-Tree
"R.bid=100 Sailors Iterator
Reserves Sailors Indexed Scan
Iterator Reserves
SQL vs Relational Algebra
SQL Query Relational Algebra
SELECT S.name
FROM Reserves R, Sailors S Query Parser !S.name("bid=100⋀rating>5(
WHERE R.sid = S.sid
AND R.bid = 100
& Optimizer
Reserves ⋈R.sid=S.sid Sailors))
AND S.rating > 5
…
a le n t to
E q u iv On-the-fly
Optimized (Physical) Query Plan:
(Logical) Query Plan:
Project Iterator
!S.name
SQL
!S.name Relational Algebra On-the-fly
But actually will "S.rating>5
Operational description of Select Iterator
A"declarative expression
R.bid=100 ⋀ S.rating >
⋈
produce…
5 of the query result a computation.
R.sid=S.s Indexed Nested
Loop Join Iterator
⋈R.sid=S.s Operator Code
id
Heap Scan
id "R.bid=100 Sailors
B+-Tree Systems execute relational algebra
Iterator
Reserves Sailors Indexed Scan query plan.
Iterator Reserves
Relational Algebra Preliminaries
• Algebra of operators on relation instances
• Just like other algebras: linear algebra or elementary algebra
• Operating on matrices or variables
• !S.name("R.bid=100 ⋀ S.rating>5(R ⋈R.sid=S.sid S))
• Closed: result is also a relation instance
• Enables rich composition!
• Just like a linear algebraic expression on matrices returns a matrix
πsname,age(S2)
Relational Instance S2
List of Attributes
sid sname rating age
sname age
28 yuppy 9 35.0
yuppy 35.0
31 lubber 8 55.5
lubber 55.5
44 guppy 5 35.0
guppy 35.0
58 rusty 10 35.0
rusty 35.0
Projection (π), cont.
• Set semantics results in fewer rows
• Real systems don’t automatically remove duplicates
• Why? (Semantics and Performance reasons)
πage(S2)
Relational Instance S2 Multiset
age
sid sname rating age Set
35.0 age
28 yuppy 9 35.0
55.5 35.0
31 lubber 8 55.5
35.0 55.5
44 guppy 5 35.0
35.0
58 rusty 10 35.0
Selection (!)
• Selects a subset of rows (horizontal slice)
• Corresponds to the WHERE clause in SQL
• Output schema same as input
• Duplicate Elimination? Not needed if input is a set.
!rating>8(S2)
Relational Instance S2
Selection Condition (Boolean Expression)
sid sname rating age
28 yuppy 9 35.0 sid sname rating age
31 lubber 8 55.5 28 yuppy 9 35.0
44 guppy 5 35.0 58 rusty 10 35.0
58 rusty 10 35.0
Composing Select and Project
S1 ∪ S2
S1 S2 S1 ∪ S2
Union (∪) VS Union ALL
•In union under set semantics, duplicate elimination is needed
•SQL Expression: UNION (get rid of duplicates) vs. UNION ALL (keep dup.)
S1 ∪ S2
Relational Instance S1 Relational Instance S2
sid sname rating age
sid sname rating age sid sname rating age
22 dustin 7 45
22 dustin 7 45.0 28 yuppy 9 35.0
28 yuppy 9 35.0
31 lubber 8 55.5 31 lubber 8 55.5
31 lubber 8 55.5
58 rusty 10 35.0 44 guppy 5 35.0
44 guppy 5 35.0
58 rusty 10 35.0
58 rusty 10 35.0
Set Difference ( − )
• Same as with union, both input relations must be compatible.
S1 − S2
S1 S2 S1 − S2
Set Difference ( − ), cont.
• Q: Do we need to eliminate duplicates like in UNION?
• Not required if inputs are sets
• SQL Expression: EXCEPT vs EXCEPT ALL
• Same as UNION/UNION ALL
• In EXCEPT duplicates are eliminated if they exist S1 − S2
sid sname rating age
Relational Instance S1 Relational Instance S2
22 dustin 7 45
sid sname rating age sid sname rating age
22 dustin 7 45.0 28 yuppy 9 35.0
31 lubber 8 55.5 31 lubber 8 55.5
S2 − S1
58 rusty 10 35.0 44 guppy 5 35.0 sid sname rating age
R:
!R(sid->sid2, bid->bid2) (R1) sid2 bid2 day
22 101 10/10/96
58 103 11/12/96
Renaming ( ! = “rho” ) contd.
• Yet another shorthand for renaming
!Temp1(R1.sid -> sid1, S1.sid -> sid2) (R1 × S1)
Output List of cols Input
Name to rename Relation
• Again, can omit output name if we don’t want to rename the output
• For this case, we can equivalently rename each relation and then do cross-product
R1 × S1 Temp1
sid bid day sid sname rating age sid1 bid day sid2 sname rating age
S1 ∩ S2
S1 S2 S1 ∩ S2
Intersection (∩)
• Same story as — with respect to duplicates
• Duplicates don’t need to be eliminated if inputs are sets
S1 ∩ S2
Relational Instance S1 Relational Instance S2
sid sname rating age sid sname rating age
22 dustin 7 45.0 28 yuppy 9 35.0 sid sname rating age
31 lubber 8 55.5 31 lubber 8 55.5 31 lubber 8 55.5
58 rusty 10 35.0 44 guppy 5 35.0 58 rusty 10 35.0
58 rusty 10 35.0
Intersection (∩), Pt 2
S1 ∩ S2
• We saw that ∩ is a compound
operator.
• S1 ∩ S2 = ?
Intersection (∩), Pt 3
S1 ∩ S2
• S1 ∩ S2 = S1 – ?
• Q: What is “?”
= S1 – ?
Intersection (∩), Pt 4
S1 ∩ S2
• S1 ∩ S2 = S1 – (S1 – S2)
= S1 – ( S1
? – S2 )
Compound Operator: Join
• Joins are compound operators (like intersection):
• Generally, !"( R × S)
• With possibly a rename in there (for natural join)
• R ⋈" S = #"(R × S)
• Apply a cross-product, then filter out tuples that
don’t match.
• If " only contains equality conditions (with an AND
between them), this is called an equi-join
Theta Join (⋈") Example
• We want to find boats that people have reserved
• R1 ⋈sid=sid S1
• Confusing… hard to interpret!
S1:
R1:
sid sname rating age sid bid day sid sname rating age
sid bid day
22 101 10/10/96 ⋈sid=sid 22 dustin 7 45.0 = 22 101 10/10/96 22 dustin 7 45.0
31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0
58 103 11/12/96
58 rusty 10 35.0
• Q: How do we fix?
Theta Join (⋈") Example
• #(sid->sid1)(R1) ⋈sid1=sid S1
S1:
R1:
sid1 bid day sid sname rating age sid1 bid day sid sname rating age
22 101 10/10/96 ⋈sid1=sid 22 dustin 7 45.0 = 22 101 10/10/96 22 dustin 7 45.0
58 103 11/12/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0
58 rusty 10 35.0
Natural Join (⋈)
• Special case of equi-join in which equalities are specified for all
matching fields and duplicate fields are projected away
• Compute R × S
• Select rows where fields appearing in both relations have equal values
• Project onto the set of all unique fields.
Other Join Variants
• We have convenient symbols for Outer joins:
• Left Outer join
• R⟕S
• Right Outer join
• R⟖S
• Full Outer join
• R⟗S
Extended Relational Algebra
• Group By / Aggregation Operator (!):
• !age, AVG(rating)(Sailors)
• With selection (HAVING clause):
• !age, AVG(rating), COUNT(*)>2(Sailors)