DATABASE SYSTEMS
Relational Model Terminology
2
A relation is a table with columns and rows.
Only applies to logical structure of the database,
not the physical structure
Attribute is a named column of a relation.
Domain is the set of allowable values for one or
more attributes.
Relational Model Terminology
3
Tuple is a row of a relation.
Degree is the number of attributes in a relation.
Cardinality is the number of tuples in a relation.
Relational Database is a collection of normalized
relations with distinct relation names.
Instances of Branch and Staff
Relations
4
Examples of Attribute Domains
5
Alternative Terminology for Relational Model
6
Properties of Relation
Relation name is distinct from all other relation names in
relational schema.
Each cell of relation contains exactly one atomic (single) value.
Each attribute has a distinct name.
Values of an attribute are all from the same domain.
Properties of Relation
Each tuple is distinct; there are no duplicate tuples.
Order of attributes has no significance.
Order of tuples has no significance, theoretically
Relational Models Concepts
Relational data models
Introduced by Ted Codd at IBM in 1970
Provides underlying mathematical foundations
simplicity
Relation?
a table of values where
◼ Tuple = row
◼ Attribute = Column
Domains
A domain D refers to a set of atomic values
to specify Domain:
◼ Data-type
◼ Format
◼ Name
data type describing the types of values in each column (attribute)
◼ Examples:
SSN: the set of valid 9-digits
Academic_dept: The set of all character strings representing valid
academic department names (e.g., CS, EE)
Employee_ages: [15..80]
USA_Phone_Number: (ddd)ddd-dddd
Where d DIGITS
Relation schema R
◼ represented by R (A1,A2,…,An)
dom(A1) denotes domain of A1
◼ i.e., all possible values related to attribute A1
degree of relation?
◼ Number of attributes in R
◼ Student (Name, SSN, Home_phone, address, Age, GPA)
◼ the degree of student = 6
◼ Dom(Name)=Names
◼ Dom(SSN) =Social_security_Number
◼ Dom(Home_phone) = Local_Phone_Number
Formal Definition of Relation
Formal definition of r(R)
◼ r(R)(dom(A1)dom(A2)...dom(An))
◼ i.e. r is a subset of all possible n-tuple
◼ E.g.,
◼ Suppose A = {1,2} with cardinality |A|=2 and B = {3,4},
|B|=2
◼ A × B = {1,2} × {3,4} = {(1,3), (1,4), (2,3), (2,4)}
◼ B × A = {3,4} × {1,2} = {(3,1), (3,2), (4,1), (4,2)}
◼ A = B = {1,2} then A × B = B × A = {1,2} × {1,2} = {(1,1),
(1,2), (2,1), (2,2)}
Relational Algebra
16
Relational Algebra Introduction
Selection
Projection
Cartesian Product
Union and
Set Difference
Relational Algebra
17
Relational algebra (RA) is a query language for the
relational model with a solid theoretical foundation.
Relational algebra is not visible at the user interface level
(not in any commercial RDBMS, at least).
However, almost every RDBMS uses RA to represent queries
internally (for query optimization and execution).
Knowledge of relational algebra will help in understanding
SQL and relational database systems in general.
Relational Algebra Introduction
18
In mathematics, an algebra is a set (the carrier), and
operations that are closed with respect to the set.
Example: (N, {∗, +}) forms an algebra.
In case of RA, the carrier is the set of all finite relations.
We will get to know the operations of RA in the sequel
(one such operation is, for example, Selection).
Relational Algebra
19
Relational algebra operations work on one or more
relations to define another relation without changing
the original relations.
Both operands and results are relations, so output from
one operation can become input to another operation.
Allows expressions to be nested, just as in arithmetic.
This property is called closure.
Selection
20
Another operation of relational
algebra is selection.
In contrast to operations like +
in N, the selection σ is
parameterized by a simple
predicate.
For example, the operation
σSID=101 selects all tuples in the
input relation which have the
value 101 in column SID
21
Selection Operator
22
A simple selection predicate ∂ has the form
<Term> ComparisonOperator <Term>.
<Term> is an expression that can be evaluated to a
data value for a given tuple:
an attribute name,
a constant value,
an expression built from attributes, constants, and data
type operations like +,−, , /.
Selection Operator
23
<ComparisonOperator> is = (equals), != (not
equals), < (less than), > (greater than), other data
type-dependent predicates (e.g., LIKE).
Examples for simple selection predicates:
LAST= ’Smith’
POINTS > 8
POINTS = MAXPT.
24
Selection Operator
25
(R) corresponds to the following SQL query:
SELECT*
FROM R
WHERE ∂
A different relational algebra operation called
projection corresponds to the SELECT clause.
Complex Selection
26
More complex selection predicates may be
performed using the Boolean connectives:
∂1 ^ ∂ 2 (“and”), ∂ 1 v ∂ 2 (“or”), ¬ ∂ (“not”).
Note: ∂ 1(R)^ ∂ 2(R) = ∂ 1(∂2(R)).
27
Projection
28
Projection
29
Projection
30
Projection
31
Mapping or aliasing A to B
32
Relational Algebra in a DBMS
Optimized Query Executable
Relational execution code
SQL Relational
algebra plan
algebra
query expression
expression
Code
parser
generator
Query optimizer
DBMS
Relational Algebra Notation
Operation My HTML Symbol
Projection PROJECT ∏
Selection SELECT ∂
Renaming RENAME p
Union UNION
Intersection INTERSECTION
Assignment <-
Cartesian product X X
Join JOIN
Left outer join LEFT OUTER JOIN
Right outer join RIGHT OUTER JOIN
Full outer join FULL OUTER JOIN
Semijoin SEMIJOIN
Relational Algebra
35
Relational algebra (RA) is a query language for the
relational model with a solid theoretical foundation.
Relational algebra is not visible at the user interface level
(not in any commercial RDBMS, at least).
However, almost any RDBMS uses RA to represent queries
internally (for query optimization and execution).
Knowledge of relational algebra will help in understanding
SQL and relational database systems in general.
Objectives
Introduction to Relational Algebra
Selection Operator
Projection Operator
Notations
Cartesian Product
Joins
Set Operations
Cartesian product
37
In general, queries need to combine information
from several tables.
In RA, such queries are formulated using ×, the
Cartesian product
Cartesian Product
38
Since attribute names must be unique within a tuple,
the Cartesian product may only be applied if R, S
do not share any attribute names. (This is no real
restriction because we have projection)
Cartesian Product
The cartesian product of two tables combines each row in one table with
each row in the other table.
Example: The table E (for EMPLOYEE) and D (for DEPARTMENT)
What PK,FK here? Can you tell relation between these two tables.
Cartesian Product
Seldom useful in practice.
Usually an error.
Can give a huge result.
Joins.
A very common and useful operation.
Equivalent to a cartesian product followed by a
select.
Inside a relational DBMS, it is usually much more
efficient to calculate a join directly, instead of
calculating a cartesian product and then throwing
away most of the lines.
Joins
The cartesian product example above combined
each employee with each department.
If we only keep those lines where the dept attribute
for the employee is equal to the dnr (the
department number) of the department,
we get a nice list of the employees, and the
department that each employee works for:
Theta Join Φ
The intermediate result generated by a Cartesian product may be
quite large in general (|R| = n, |S| = m ) |R×S| = n m).
Since the combination of Cartesian product and selection in queries
is common, a special operator join has been introduced.
Example
Theta Join - Example
Students Courses
stud# name course course# name
100 Fred PH PH Pharmacy
200 Dave CM CM Computing
300 Bob CM
Students ⋈ stud# = 200 Courses
stud# Students.name course course# Courses.name
200 Dave CM PH Pharmacy
200 Dave CM CM Computing
Inner Join (Equijoin)
A Theta join where the <condition> is the match
(=) of the primary and foreign keys.
R⋈ <R.primary_key = S.foreign_key> S
Inner Join - Example
Students Courses
stud# name course course# name
100 Fred PH PH Pharmacy
200 Dave CM CM Computing
300 Bob CM
Students ⋈ course = course# Courses
stud# Students.name course course# Courses.name
100 Fred PH PH Pharmacy
200 Dave CM CM Computing
300 Bob CM CM Computing
Natural Join
Inner join produces redundant data (in the previous example:
course and course#). To get rid of this duplication:
< stud#, Students.name, course, Courses.name >
(Students ⋈ <course = course#> Courses)
Or
R1= Students ⋈ <course = course#> Courses
R2= < stud#, Students.name, course, Courses.name > R1
The result is called the natural join of Students and Courses
Rename Attributes
We want to join these tables, but:
Several columns in the result will have the same name (nr and name).
How do we express the join condition, when there are two columns called nr? Solutions:
Rename the attributes, using the rename operator.
Keep the names, and prefix them with the table name, as is done in SQL. (This is somewhat
unorthodox.)
Aggregate Functions
Outer Joins
Inner join + rows of one table which do not satisfy the
<condition>.
Left Outer Join: R <R.primary_key = S.foreign_key> S
All rows from R are retained and unmatched rows of S are
padded with NULL
Right Outer Join: R <R.primary_key = S.foreign_key> S
All rows from S are retained and unmatched rows of R are
padded with NULL
Left Outer Join - Example
Students Courses
stud# name course course# name
100 Fred PH PH Pharmacy
200 Dave CM CM Computing
400 Peter EN CH Chemistry
Students <course = course#> Courses
stud# Students.name course course# Courses.name
100 Fred PH PH Pharmacy
200 Dave CM CM Computing
400 Peter EN NULL NULL
Right Outer Join - Example
Students Courses
stud# name course course# name
100 Fred PH PH Pharmacy
200 Dave CM CM Computing
400 Peter EN CH Chemistry
Students <course = course#> Courses
stud# Students.name course course# Courses.name
100 Fred PH PH Pharmacy
200 Dave CM CM Computing
NULL NULL NULL CH Chemistry
Combination of Unary and Join Operations
Students Courses
stud# name address course course# name
100 Fred Aberdeen PH PH Pharmacy
200 Dave Dundee CM CM Computing
300 Bob Aberdeen CM
Show the names of students (from Aberdeen) and the names of their courses
Students.name Courses.name
R1= Students ⋈ <course=course#> Courses Fred Pharmacy
R2= <address=“Aberdeen”> R1 Bob Computing
R3= <Students.name, Course.name> R2
Set Operations
Union
Intersection
Difference
Union
Assuming R & S are union compatible…
Union: R ⋃ S is the set of tuples in either R or S or
both.
Union
Takes the set of rows in each table and combines them,
eliminating duplicates
Participating relations must be Union compatible, i.e. have the
same number of columns, and the same column names, domains
and data types
R S RS
A B A B A B
a1 b1 a2 b2 a1 b1
a2 b2 a3 b3 a2 b2
a3 b3
Union
Since it is a set, there are no duplicate tuples
Union is Commutative
RUS=SUR
Intersection ()
Assuming that R & S are union compatible
Intersection: R ⋂ S is the set of tuples in both
R and S
Intersection is Commutative
R⋂S=S⋂R
Intersection ()
Takes the set of rows that are common to each relation
Participating relations must be compatible
R S RS
A B A B A B
a1 b1 a2 b2 a2 b2
a2 b2 a3 b3
Difference (-)
Takes the set of rows in the first relation but not the second
Participating relations must be compatible
R S R-S
A B A B A B
a1 b1 a2 b2 a1 b1
a2 b2 a3 b3