Chapter – 6
Query Languages
Relational algebra, Relational Calculus and SQL:
1. Relational algebra:
The query language ‘Relational Algebra’ defines a set of operations on
relations.
Relational Algebra
RELATIONAL ALGEBRA is a widely used procedural query language. It collects
instances of relations as input and gives occurrences of relations as output. It uses
various operations to perform this action. SQL Relational algebra query operations
are performed recursively on a relation. The output of these operations is a new
relation, which might be formed from one or more input relations.
Unary Relational Operations
SELECT (symbol: σ)
PROJECT (symbol: π)
RENAME (symbol: ρ)
Relational Algebra Operations From Set Theory
UNION (υ)
INTERSECTION ( ),
DIFFERENCE (-)
CARTESIAN PRODUCT ( x )
Binary Relational Operations
JOIN
DIVISION
Let’s study them in detail with solutions:
SELECT (σ)
The SELECT operation is used for selecting a subset of the tuples according to a
given selection condition. Sigma(σ)Symbol denotes it. It is used as an expression to
choose tuples which meet the selection condition. Select operator selects tuples
that satisfy a given predicate.
σp(r)
σ is the predicate
r stands for relation which is the name of the table
p is prepositional logic
Example 1
σ topic = "Database" (Tutorials)
Output – Selects tuples from Tutorials where topic = ‘Database’.
Example 2
σ topic = "Database" and author = "guru99"( Tutorials)
Output – Selects tuples from Tutorials where the topic is ‘Database’ and ‘author’ is
guru99.
Example 3
σ sales > 50000 (Customers)
Output – Selects tuples from Customers where sales is greater than 50000
Projection(π)
The projection eliminates all attributes of the input relation but those mentioned in
the projection list. The projection method defines a relation that contains a vertical
subset of Relation.
This helps to extract the values of specified attributes to eliminates duplicate
values. (pi) symbol is used to choose attributes from a relation. This operator helps
you to keep specific columns from a relation and discards the other columns.
Example of Projection:
Consider the following table
CustomerID CustomerName Status
1 Google Active
2 Amazon Active
3 Apple Inactive
4 Alibaba Active
Here, the projection of CustomerName and status will give
Π CustomerName, Status (Customers)
CustomerName Status
Google Active
Amazon Active
Apple Inactive
Alibaba Active
Rename (ρ)
Rename is a unary operation used for renaming attributes of a relation.
ρ (a/b)R will rename the attribute ‘b’ of relation by ‘a’.
Union operation (υ)
UNION is symbolized by ∪ symbol. It includes all tuples that are in tables A or in B.
It also eliminates duplicate tuples. So, set A UNION set B would be expressed as:
The result <- A ∪ B
For a union operation to be valid, the following conditions must hold –
R and S must be the same number of attributes.
Attribute domains need to be compatible.
Duplicate tuples should be automatically removed.
Example
Consider the following tables.
Table A Table B
column 1 column 2 column 1 column 2
1 1 1 1
1 2 1 3
A ∪ B gives
Table A ∪ B
column 1 column 2
1 1
1 2
1 3
Set Difference (-)
– Symbol denotes it. The result of A – B, is a relation which includes all tuples that
are in A but not in B.
The attribute name of A has to match with the attribute name in B.
The two-operand relations A and B should be either compatible or Union
compatible.
It should be defined relation consisting of the tuples that are in relation A,
but not in B.
Example
A-B
Table A – B
column 1 column 2
1 2
Intersection
An intersection is defined by the symbol ∩
A∩B
Defines a relation consisting of a set of all tuple that are in both A and B. However,
A and B must be union-compatible.
Visual
Definition of Intersection
Example:
A∩B
Table A ∩ B
column 1 column 2
1 1
Join Operations
Join operation is essentially a cartesian product followed by a selection criterion.
Join operation denoted by ⋈.
JOIN operation also allows joining variously related tuples from different relations.
Types of JOIN:
Various forms of join operation are:
Inner Joins:
Outer join:
Left Outer Join
Right Outer Join
Full Outer Join
Inner Join:
In an inner join, only those tuples that satisfy the matching criteria are included,
while the rest are excluded. Let’s study various types of Inner Joins:
INNER JOIN (⋈)
Natural join can only be performed if there is a common attribute (column) between
the relations. The name and type of the attribute must be same.
Example
Consider the following two tables
C
Num Square
2 4
3 9
D
Num Cube
2 8
3 27
C⋈D
C⋈D
Num Square Cube
2 4 8
3 9 27
OUTER JOIN
In an outer join, along with tuples that satisfy the matching criteria, we also
include some or all tuples that do not match the criteria.
Left Outer Join(A B)
In the left outer join, operation allows keeping all tuple in the left relation. However,
if there is no matching tuple is found in right relation, then the attributes of right
relation in the join result are filled with null values.
Consider the following 2 Tables
A
Num Square
2 4
3 9
4 16
B
Num Cube
2 8
3 18
5 75
A B
A⋈B
Num Square Cube
2 4 8
3 9 18
4 16 –
Right Outer Join: ( A B)
In the right outer join, operation allows keeping all tuple in the right relation.
However, if there is no matching tuple is found in the left relation, then the
attributes of the left relation in the join result are filled with null values.
A B
A⋈B
Num Cube Square
2 8 4
3 18 9
5 75 –
Full Outer Join: ( A B)
In a full outer join, all tuples from both relations are included in the result,
irrespective of the matching condition.
A B
A⋈B
Num Cube Square
2 4 8
3 9 18
4 16 –
5 – 75
2. Relational Calculus:
It is a non-procedural query language.
It tells what to do but never explains how to do.
It is based on predicate calculus.
It was introduced by computer scientist and mathematician
Dr.Edger Codd.
It is a declarative language that uses predicate logic or first
order logic to determine the results from the database.
Types of Relational Calculus:
It is of two types
a. Tuple Relational calculus.
b. Domain relational calculus.
Tuple relational calculus:
It uses a tuple variable (t) that goes to each row of the table and
checks if the predicate is true/false for the given row.
Depending on the given predicate condition, it returns the row/ part
of the row.
Syntax: { t|p(t)}
t- tuple variable
p(t) – predicate logic/condition.
Eg: Customer
Custid Custname Pincode
1 ABC 121
2 DEF 152
3 GHI 121
4 XYZ 134
Fetch all data of customer whose pincode is 121.
Ordinary query: select * from customer where pincode=121;
Workflow of tuple relational calculus:
Tuple variable will flow to each and every row in the table and check
the pincode is 121 or not. It will display customers where pincode is 121.
{ t|customer(t) ∧ t.pincode=121}.
Suppose if i need only custname then it is represented as
{ t.custname|customer(t) ∧ t.pincode=121}.
Domain relational calculus:
It uses the domain variable to get the column values required from the
database based on the predicate expression/condition.
Syntax:
{<a1,a2,a3….an> | p(<a1,a2,a3…..,an>)}
Where <a1,a2,a3….an> are attribute and
P(<a1,a2,a3,…an>) is predicate on attribute.
If I want to display custid and custname from the above example. It is
represented as
{<custid,custname>|customer ∧ pincode=121}
3. SQL
SQL stands for Structured / Standard Query Language.
It is used for storing, manipulating and retrieving data
stored in a relational database.
It allows the user to access data in the relational database
management system.
It allows the user to define data in a database and
manipulate that data.
It allows the user to create and drop database and tables.
It allows user to create views, tables and stored procedure
etc.
It allows users to set permissions on tables, procedures,
views etc.
History of SQL:
1970 – Dr.Edger Codd of IBM who is father of relational databases
developed the concept of relational database.
1974 – Structured Query Language SQL appeared.
1978 – IBM worked to develop codd ideas and released a product
named system/R.
1986 – IBM developed the first prototype of relational database and
standard by ANSI. The first relational database was released by
relational software which later came to be known as oracle.
1987 – SQL became part of International Standard Organization (ISO).
How Query works / processing:
SQL Server takes four steps to process a query: parsing, translating,
optimizing, and execution. They are shown in Figure 1. The first four steps
are all performed by the Relational Engine. The output of the third step is
the optimized plan that is scheduled, and during which calls are made to
the Storage Engine to retrieve the data that becomes the results of the query
you are executing.
Parsing
During the parsing stage SQL Server performs basic checks on the source
code (your T-SQL batch). This parsing looks for invalid SQL syntax, such as
incorrect use of reserved words, column and table names, and so on.
If parsing completes without errors, it generates a parse tree, which is
passed onto the next stage of query processing, binding. The parse tree is an
internal representation of the query. If parsing detects any errors, the
process stops and the errors are returned.
Syntax for Parse tree:
Query
Select (Select list) from [from list] where [condition]
Eg:
Select studid, studname from student1 where age>20;
Query
Select [select list] from [from list] where [condition]
Attribute1 Attribute2 Relation_name attribute > value
Studid studname student1 age 20
Translation:
It is used to translate the given SQL query to low level language query
using relational algebra operations.
Basic operation:
σ- Selection which is used to select a tuple.
Π- Projection which is used to select an attributes.
Eg:
Select empname from employee where salary>10000;
Π empname (σ salary>10000(employee))
Optimization:
It is used to optimize the query find low cost query. It maintains
statistical data as size of the table, query size, record size, running time etc
with help of data dictionary.
SQL is a very high level language:
o The users specify what to search for- not how the search is
actually done
o The algorithms are chosen automatically by the DBMS.
For a given SQL query there may be many possible execution plans.
Amongst all equivalent plans choose the one with lowest cost.
Cost is estimated using statistical information from the database
catalog.
Execution Plan:
By this execution plan multiple low level language query is generated.
Evaluation:
It is used to determine low cost query.