RELATIONAL ALGEBRA
Unary and Binary relational operations
SELECT and PROJECT
The SELECT Operation
The SELECT operation is used to choose a subset of the tuples from a relation that satisfies a selection
condition.
It restricts the tuples in a relation to only those tuples that satisfy the condition.
It can also be visualized as a horizontal partition of the relation into two sets of tuples—those tuples that
satisfy the condition and are selected, and those tuples that do not satisfy the condition and are discarded.
For example, to select the EMPLOYEE tuples whose department is 4, or those whose salary is greater
than $30,000
σDno=4(EMPLOYEE)
σSalary>30000(EMPLOYEE)
In general, the SELECT operation is denoted by
σ<selection condition>(R)
where the symbol σ (sigma) is used to denote the SELECT operator and the selection condition is a
Boolean expression (condition) specified on the attributes of relation R.
The Boolean expression specified in is made up of a number of clauses of the form :
<attribute name><comparison op><constant value>
Or
<attribute name><comparison op><attribute name>
Clauses can be connected by the standard Boolean operators and, or, and not to form a general selection
condition.
For example, to select the tuples for all employees who either work in department 4 and make over
$25,000 per year, or work in department 5 and make over $30,000:
σ(Dno=4 AND Salary>25000) OR (Dno=5
AND Salary>30000)(EMPLOYEE)
1
The Boolean conditions AND, OR, and NOT have their normal interpretation, as follows:
■ (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; otherwise,it is FALSE.
■ (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE; otherwise, it is FALSE.
■ (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
The SELECT operator is unary; that is, it is applied to a single relation. Hence, selection conditions
cannot involve more than one tuple.
The degree of the relation resulting from a SELECT operation—its number of attributes—is the same
as the degree of R.
The SELECT operation is commutative; that is,
σ (cond1)(σ(cond2)(R)) = σ(cond2)(σ(cond1)(R))
The PROJECT Operation
The PROJECT operation, selects certain columns from the table and discards the other columns.
The result of the PROJECT operation can be visualized as a vertical partition of the relation into two
relations: one has the needed columns (attributes) and contains the result of the operation, and the other
contains the discarded columns.
For example, to list each employee’s first and last name and salary, we can use the PROJECT operation as
follows:
§
πLname, Fname, Salary(EMPLOYEE)
The general form of the PROJECT operation is
π<attribute list>(R)
where (pi) is the symbol used to represent the PROJECT operation, and is the desired sublist of attributes
from the attributes of relation R.
The result of the PROJECT operation has only the attributes specified in in the same order as they appear
in the list. Hence, its degree is equal to the number of attributes in <attribute list>.
The PROJECT operation removes any duplicate tuples, so the result of the PROJECT operation is a set
of distinct tuples, and hence a valid relation. This is known as duplicate elimination.
Sequences of Operations and the RENAME Operation
The relations shown above depict operation results do not have any names.
Either we can write the operations as a single relational algebra expression by nesting the operations, or
we can apply one operation at a time and create intermediate result relations.
In the latter case, we must give names to the relations that hold the intermediate results.
For example, to retrieve the first name, last name, and salary of all employees who work in department
number 5, apply a SELECT and a PROJECT operation.
πFname, Lname, Salary(σDno=5(EMPLOYEE))
Alternatively, we can explicitly show the sequence of operations, giving a name to each intermediate
relation, and using the assignment operation, denoted by ← (left arrow), as follows:
DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT ← πFname, Lname, Salary(DEP5_EMPS)
2
It is sometimes simpler to break down a complex sequence of operations by specifying intermediate
result relations than to write a single relational algebra expression.
We can also use this technique to rename the attributes in the intermediate and result relations.
To rename the attributes in a relation, we simply list the new attribute names in parentheses, as in the
following example:
TEMP ← σDno=5(EMPLOYEE)
R(First_name, Last_name, Salary) ← πFname, Lname, Salary(TEMP)
The formal RENAME operation—which can rename either the relation name or the attribute names, or
both—as a unary operator.
The general RENAME operation when applied to a relation R of degree n is denoted by any of the
following three forms:
ρS(B1, B2, ... , Bn)(R) or ρS(R) or ρ(B1, B2, ... , Bn)(R)
where the symbol ρ (rho) is used to denote the RENAME operator, S is the new relation name, and B1,
B2, … , Bn are the new attribute names.
The first expression renames both the relation and its attributes, the second renames the relation only, and
the third renames the attributes only.
Relational Algebra Operations from Set Theory
The UNION, INTERSECTION, and MINUS Operations
■ UNION: The result of this operation, denoted by R 𝖴 S, is a relation that includes all tuples that are either
in R or in S or in both R and S. Duplicate tuples are eliminated.
■ INTERSECTION: The result of this operation, denoted by R ∩ S, is a relation that includes all tuples
that are in both R and S.
■ SET DIFFERENCE (or MINUS): The result of this operation, denoted by R – S, is a relation that
includes all tuples that are in R but not in S.
These are binary operations; that is, each is applied to two sets (of tuples).
When these operations are adapted to relational databases, the two relations on which any of these three
operations are applied must have the same type of tuples; this condition has been called union
compatibility or type compatibility.
Two relations R(A1, A2, … , An) and S(B1, B2, … , Bn) are said to be union compatible (or type
compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 ≤ i ≤ n. This means that the
two relations have the same number of attributes and each corresponding pair of attributes has the
same domain.
For example, to retrieve the Social Security numbers of all employees who either work in department 5
or directly supervise an employee who works in department 5,
DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT1 ← πSsn(DEP5_EMPS)
RESULT2(Ssn) ← πSuper_ssn(DEP5_EMPS)
RESULT ← RESULT1 𝖴 RESULT2
Both UNION and INTERSECTION are commutative operations; that is,
R 𝖴 S = S 𝖴 R and R ∩ S = S ∩ R
3
Both UNION and INTERSECTION can be treated as n-ary operations applicable to any number of
relations because both are also associative operations; that is,
R 𝖴 (S 𝖴 T ) = (R 𝖴 S) 𝖴 T and (R ∩ S) ∩ T = R ∩ (S ∩ T)
The MINUS operation is not commutative; that is, in general, R − S ≠ S − R
The INTERSECTION can be expressed in terms of union and set difference as follows:
R ∩ S = ((R 𝖴 S) − (R − S)) − (S − R)
The CARTESIAN PRODUCT (CROSS PRODUCT) Operation
CARTESIAN PRODUCT operation—also known as CROSS PRODUCT or CROSS JOIN— which is
denoted by ×.
This is also a binary set operation, but the relations on which it is applied do not have to be union
compatible.
This set operation produces a new element by combining every member (tuple) from one relation (set)
with every member (tuple) from the other relation (set).
In general, the result of R(A1, A2, ..., An) × S(B1, B2, ..., Bm) is a relation Q with degree n + m
attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
The resulting relation Q has one tuple for each combination of tuples—one from R and one from S.
Hence, if R has m tuples and S has n tuples, then R × S will have m*n tuples.
Example, suppose that we want to retrieve a list of names of each female employee’s dependents. We
can do this as follows:
FEMALE_EMPS ← σSex=‘F’(EMPLOYEE)
EMPNAMES ← πFname, Lname, Ssn(FEMALE_EMPS)
EMP_DEPENDENTS ← EMPNAMES × DEPENDENT
ACTUAL_DEPENDENTS ← σSsn=Essn(EMP_DEPENDENTS)
RESULT ← πFname, Lname, Dependent_name(ACTUAL_DEPENDENTS)
Binary Relational Operations: JOIN and DIVISION
The JOIN Operation
The JOIN operation, denoted by , is used to combine related tuples from two relations into single
“longer” tuples.
This operation is very important for any relational database with more than a single relation because it
allows us to process relationships among relations.
To illustrate JOIN, suppose that we want to retrieve the name of the manager of each department, as
follows:
DEPT_MGR ← DEPARTMENT Mgr_ssn=Ssn
EMPLOYEE
RESULT ← πDname, Lname, Fname(DEPT_MGR)
The general form of a JOIN operation on two relations R(A1, A2, … , An) and S(B1, B2, … , Bm) is
R <join condition>S
The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, … , An, B1, B2, … , Bm) in that
order; Q has one tuple for each combination of tuples—one from R and one from S—whenever the
combination satisfies the join condition.
4
The main difference between CARTESIAN PRODUCT and JOIN are ,In JOIN, only combinations of
tuples satisfying the join condition appear in the result, whereas in the CARTESIAN PRODUCT all
combinations of tuples are included in the result.
The join condition is specified on attributes from the two relations R and S and is evaluated for each
combination of tuples. Each tuple combination for which the join condition evaluates to TRUE is
included in the resulting relation Q as a single combined tuple.
A general join condition is of the form
<condition>AND<condition> AND … AND<condition>
where each <condition> is of the form Ai θ Bj , Ai is an attribute of R, Bj is an attribute of S, Ai
and Bj have the same domain, and θ (theta) is one of the comparison operators {=, <,>,< , ≥, ≠}.
A JOIN operation with such a general join condition is called a THETA JOIN.
Variations of JOIN:
The EQUIJOIN and NATURAL JOIN
The most common use of JOIN involves join conditions with equality comparisons only. Such a JOIN,
where the only comparison operator used is =, is called an EQUIJOIN.
Notice that in the result of an EQUIJOIN we always have one or more pairs of attributes that haveidentical
values in every tuple.
For example, the values of the attributes Mgr_ssn and Ssn are identical in every tuple of DEPT_MGR
(the EQUIJOIN result) because the equality join condition specified on these two attributes requires the
values to be identical in every tuple in the result.
Because one of each pair of attributes with identical values is superfluous, a new operation called
NATURAL JOIN—denoted by * was created to get rid of the second (superfluous) attribute in an
EQUIJOIN condition.
The standard definition of NATURAL JOIN requires that the two join attributes (or each pair of join
attributes) have the same name in both relations. If this is not the case, a renaming operation is applied
first.
Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple that controls the
project. In the following example, first we rename the Dnumber attribute of DEPARTMENT to Dnum—
so that it has the same name as the Dnum attribute in PROJECT—and then we apply NATURAL JOIN:
PROJ_DEPT ← PROJECT * ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)
The same query can be done in two steps by creating an intermediate table DEPT as follows:
DEPT ← ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)
PROJ_DEPT ← PROJECT * DEPT
The attribute Dnum is called the join attribute for the NATURAL JOIN operation, because it is the only
attribute with the same name in both relations.
In general, the join condition for NATURAL JOIN is constructed by equating each pair of join attributes
that have the same name in the two relations and combining these conditions with AND.
A single JOIN operation is used to combine data from two relations so that related information can be
presented in a single table. These operations are also known as inner joins.
A more general, but nonstandard definition for NATURAL JOIN is Q ← R *(list1),(list2)S
In this case,<list1> specifies a list of i attributes from R, and <list2>specifies a list of i attributes from S.
The NATURAL JOIN or EQUIJOIN operation can also be specified among multiple tables, leading to
an n-way join. For example, consider the following three-way join:
((PROJECT Dnum=DnumberDEPARTMENT) Mgr_ssn=SsnEMPLOYEE)
5
A Complete Set of Relational Algebra Operations
It has been shown that the set of relational algebra operations {σ, π, 𝖴, ρ, –, ×} is a complete set; that is,
any of the other original relational algebra operations can be expressed as a sequence of operations from
this set.
For example, the INTERSECTION operation can be expressed by using UNION and MINUS as follows:
R ∩ S ≡ (R 𝖴 S) – ((R – S) 𝖴(S – R))
JOIN operation can be specified as a CARTESIAN PRODUCT followed by a SELECT operation:
R <condition>S ≡ σ <condition>(R × S)
A NATURAL JOIN can be specified as a CARTESIAN PRODUCT preceded by RENAME and
followed by SELECT and PROJECT operations.
The DIVISION Operation
The DIVISION operation, denoted by ÷, is useful for a special kind of query that sometimesoccurs in
database applications.
example is Retrieve the names of employees who work on all the projects that ‘John Smith’ works on.
To express this query using the DIVISION operation, proceed as follows. First, retrieve the list of project
numbers that ‘John Smith’ works on in the intermediate relation SMITH_PNOS:
SMITH ← σFname=‘John’ AND
Lname=‘Smith’(EMPLOYEE) SMITH_PNOS ←
πPno(WORKS_ON Essn=SsnSMITH)
Next, create a relation that includes a tuple whenever the employee whose Ssn is Essn works on the
project whose number is Pno in the intermediate relation SSN_PNOS:
SSN_PNOS ← πEssn, Pno(WORKS_ON)
Finally, apply the DIVISION operation to the two relations, which gives the desired employees’ Social
Security numbers:
SSNS(Ssn) ← SSN_PNOS ÷ SMITH_PNOS
RESULT ← πFname, Lname(SSNS * EMPLOYEE)
In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), where the attributes of R are a
subset of the attributes of S; that is, X ⊆ Z. Let Y be the set of attributes of R that are not attributes of
S.
The DIVISION operation is defined for convenience for dealing with queries that involve universal
quantification or the all condition.
6
Additional Relational Operations (aggregate, grouping, etc.)
Generalized Projection
The generalized projection operation extends the projection operation by allowing functions of attributes
to be included in the projection list.
The generalized form can be expressed
as: πF1, F2, ..., Fn (R)
where F1, F2, … , Fn are functions over the attributes in relation R and may involve arithmetic
operations and constant values.
This operation is helpful when developing reports where computed values have to be produced in the
columns of a query result.
Example: EMPLOYEE (Ssn, Salary, Deduction,
Years_service) A report may be required to show
Net Salary = Salary – Deduction, Bonus = 2000 * Years_service, and Tax = 0.25 * Salary.
Then a generalized projection combined with renaming may be used as follows:
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax)(πSsn, Salary – Deduction, 2000 * Years_service, 0.25 *
7
Salary(EMPLOYEE))
Aggregate Functions and Grouping
Mathematical aggregate functions on collections of values from the database. Examples of such
functions include retrieving the average or total salary of all employees or the total number of employee
tuples.
Common functions applied to collections of numeric values include SUM, AVERAGE, MAXIMUM,
and MINIMUM.
The COUNT function is used for counting tuples or values.
AGGREGATE FUNCTION operation defined, using the symbol ℑ , to specify these types of requests as
follows:
<grouping attribute>ℑ<function list> (R)
Where<grouping attribute> is a list of attributes of the relation specified in R, and <function list>is a list
of (<function><attribute> ) pairs.
In each such pair, is one of the allowed functions—such as SUM, AVERAGE, MAXIMUM,
MINIMUM, COUNT—and is an attribute of the relation specified by R.
The resulting relation has the grouping attributes plus one attribute for each element in the function list.
For example, to retrieve each department number, the number of employees in the department, and their
average salary, while renaming the resulting attributes as indicated below, we write:
ρR(Dno, No_of_employees, Average_sal) (Dno ℑ COUNT Ssn, AVERAGE Salary (EMPLOYEE))
If we do not want to rename the attributes then the above query we can write it
as, Dno ℑ COUNT Ssn, AVERAGE Salary(EMPLOYEE)
Note: If no grouping attributes are specified, the functions are applied to all the tuples in the relation, so
the resulting relation has a single tuple only.
Recursive Closure Operations
Recursive closure operation in relational algebra is applied to a recursive relationship between tuples of
the same type, such as the relationship between an employee and a supervisor.
This relationship is described by the foreign key Super_ssn of the EMPLOYEE relation and it relates
each employee tuple (in the role of supervisee) to another employee tuple (in the role of supervisor).
An example of a recursive operation is to retrieve all supervisees of an employee e at all levels—that is,
all employees e’ directly supervised by e, all employees e’’ directly supervised by each employee e’, all
employees e’’’ directly supervised by each employee e’’, and so on.
For example, to specify the Ssns of all employees e_ directly supervised—at level one—by the employee
e whose name is ‘James Borg’ we can apply the following operation:
BORG_SSN ← πSsn(σFname=‘James’ AND
Lname=‘Borg’(EMPLOYEE)) SUPERVISION(Ssn1, Ssn2) ←
πSsn,Super_ssn(EMPLOYEE) RESULT1(Ssn) ← πSsn1(SUPERVISION
Ssn2=SsnBORG_SSN)
To retrieve all employees supervised by Borg at level 2—that is, all employees e’’ supervised by some
employee e’ who is directly supervised by Borg—we can apply another JOIN to the result of the first
query, as follows:
RESULT2(Ssn) ← πSsn1(SUPERVISION Ssn2=SsnRESULT1)
To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we can apply the UNION
operation to the two results, as follows:
8
RESULT ← RESULT2 U RESULT1
OUTER JOIN Operations
For a NATURAL JOIN operation R * S, only tuples from R that have matching tuples in S—andvice
versa—appear in the result. Hence, tuples without a matching (or related) tuple are eliminated from the
JOIN result.
Tuples with NULL values in the join attributes are also eliminated. This type of join, where tuples with
no match are eliminated, is known as an inner join.
This amounts to the loss of information if the user wants the result of the JOIN to include all thetuples in
one or more of the component relations.
A set of operations, called outer joins, were developed for the case where the user wants to keep all the
tuples in R, or all those in S, or all those in both relations in the result of the JOIN, regardless of whether
or not they have matching tuples in the other relation.
This satisfies the need of queries in which tuples from two tables are to be combined by matching
corresponding rows, but without losing any tuples for lack of matching values.
For example, suppose that we want a list of all employee names as well as the name of the departments
they manage if they happen to manage a department; if they do not manage one, we can indicate it with
a
NULL value. We can apply an operation LEFT OUTER JOIN, denoted by , to retrieve the result
as follows:
TEMP ← (EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)
RESULT ← πFname, Minit, Lname, Dname(TEMP)
The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R in R S; if no
matching tuple is found in S, then the attributes of S in the join result are filled with NULL values.
A similar operation, RIGHT OUTER JOIN, denoted by , keeps every tuple in the second, or right,
relation S in the result of R S.
A third operation, FULL OUTER JOIN, denoted by , keeps all tuples in both the left and the right
relations when no matching tuples are found, filling them with NULL values as needed.
The OUTER UNION Operation
The OUTER UNION operation was developed to take the union of tuples from two relations that have
some common attributes, but are not union (type) compatible.
This operation will take the UNION of tuples in two relations R(X, Y) and S(X, Z) that are partially
compatible, meaning that only some of their attributes, say X, are union compatible.
The attributes that are union compatible are represented only once in the result, and those attributes that
are not union compatible from either relation are also kept in the result relation T(X,Y, Z). It is therefore
the same as a FULL OUTER JOIN on the common attributes.
Two tuples t1 in R and t2 in S are said to match if t1[X]=t2[X]. These will be combined (unioned) into a
single tuple in t. Tuples in either relation that have no matching tuple in the other relation are padded
with NULL values.
For example, an OUTER UNION can be applied to two relations whose schemas are STUDENT(Name,
Ssn, Department, Advisor) and INSTRUCTOR(Name, Ssn, Department, Rank).
Tuples from the two relations are matched based on having the same combination of values of the shared
9
attributes—Name, Ssn, Department.
The resulting relation, STUDENT_OR_INSTRUCTOR, will have the following attributes:
STUDENT_OR_INSTRUCTOR(Name, Ssn, Department, Advisor, Rank)
All the tuples from both relations are included in the result, but tuples with the same (Name, Ssn,
Department) combination will appear only once in the result.
Tuples appearing only in STUDENT will have a NULL for the Rank attribute, whereas tuples appearing
only in INSTRUCTOR will have a NULL for the Advisor attribute. A tuple that exists in both relations,
which represent a student who is also an instructor, will have values for all its attributes.
Examples of Queries in Relational Algebra
10
11
Mapping Conceptual Design into a Logical Design
Relational Database Design using ER-to-Relational mapping.
Step 1: For each regular (strong) entity type E in the ER schema, create a relation R that includes all the
simple attributes of E.
Step 2: For each weak entity type W in the ER schema with owner entity type E, create a relation R, and include
all simple attributes (or simple components of composite attributes) of W as attributes. In addition, include as
foreign key attributes of R the primary key attribute(s) of the relation(s) that correspond to the owner entity
type(s).
12
Step 3: For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that
correspond to the entity types participating in R. Choose one of the relations, say S, and include the primary key
of T as a foreign key in S. Include all the simple attributes of R as attributes of S.
Step 4: For each regular binary 1:N relationship type R identify the relation (N) relation S. the primary key of
T as a foreign key of S. Simple attributes of R map to attributes of S.
13
Step 5: For each binary M:N relationship type R, create a relation S. Include the primary keys of participant
relations as foreign keys in S. Their combination will be the primary key for S. Simple attributes of R become
attributes of S.
14
Step 6: For each multi-valued attribute A, create a new relation R. This relation will include an attribute
corresponding to A, plus the primary key K of the parent relation (entity type or relationship type) as a foreign
key in R. The primary key of R is the combination of A and K.
Step 7: For each n-ary relationship type R, where n>2, create a new relation S to represent R. Include the
primary keys of the relations participating in R as foreign keys in S. Simple attributes of R map to attributes of S.
The primary key of S is a combination of all the foreign keys that reference the participants that have cardinality
constraint > 1. For a recursive relationship, we will need a new relation.
15