KEMBAR78
GATE CS/IT Database Prep Guide | PDF | Acid | Databases
0% found this document useful (0 votes)
420 views574 pages

GATE CS/IT Database Prep Guide

Uploaded by

shubhraj3263
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
420 views574 pages

GATE CS/IT Database Prep Guide

Uploaded by

shubhraj3263
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 574

http://www.knowledgegate.

in/gate
Database
• Core subjects for CS/IT Students

• In GATE 7-8 Marks out of 100 Marks, and 5-6 questions on an average

• Needs less time, very scoring

• Mostly numerical questions

• Indirect Application in Industry, good future

http://www.knowledgegate.in/gate
Syllabus

• Transactions and concurrency control


• ER-model
• Relational model
• Integrity constraints
• Normal forms
• File organization, indexing (e.g., B and B+ trees)
http://www.knowledgegate.in/gate
• Relational algebra, tuple calculus, SQL
Video chapters
• Chapter-1 (Bscics Transactions and concurrency control)
• Chapter-2 (ER-model)
• Chapter-3 (Relational model, Functional Dependencies, Keys)
• Chapter-4 (Normalization)
• Chapter-5 (File organization, indexing (e.g., B and B+ trees)
• Chapter-6 (Relational algebra, tuple calculus, SQL)
http://www.knowledgegate.in/gate
Chapter-1
(Basics Transactions and concurrency control)

http://www.knowledgegate.in/gate
Data
• Data are characteristics, usually numerical, that are collected through observation. In a more
technical sense, data are a set of values of qualitative or quantitative variables about one or
more persons or objects, while a datum (singular of data) is a single value of a single variable.
• Any facts and figures about an entity is called as Data.

http://www.knowledgegate.in/gate
Information
• Although the terms "data" and "information" are often used interchangeably, these terms
have distinct meanings. In some popular publications, data are sometimes said to be
transformed into information when they are viewed in context or in post-analysis.
• Processed Data is called information.

http://www.knowledgegate.in/gate
Data Base
• A database is an organized collection of data, generally stored and
accessed electronically from a computer system.

http://www.knowledgegate.in/gate
Data Base Management System
• The database management system (DBMS) is the software that interacts with end
users, applications, and the database itself to capture and analyze the data.

• The DBMS software additionally encompasses the core facilities provided to


administer the database. The sum total of the database, the DBMS and the associated
applications can be referred to as a "database system".

http://www.knowledgegate.in/gate
Problem With File System
1. Data redundancy and inconsistency: Imagine two spreadsheets that list employee
contacts, but one is updated with a new phone number while the other is not, leading to
confusion about the correct number to use.
2. Difficulty in accessing Data: Consider needing a specific report from a database, but you
have to request it from an IT specialist instead of getting it yourself directly, causing
delays.
3. Data Isolation: If customer profiles are stored in one system and their order histories in
another, merging this information for a comprehensive view can be cumbersome.
4. Integrity Problem: After a customer updates their address on one page of a website, they
find the old address on another page because the change did not carry through all
systems.
5. Atomicity Problem: A banking transaction that deducts money from one account but fails
to deposit it in another leaves the transaction incomplete and accounts unbalanced.
6. Concurrent access Anomalies: Two people booking the last ticket on a flight at the same

http://www.knowledgegate.in/gate
time might both get a confirmation, but only one can actually get the seat.
Instance and Schemas
• The collection of information stored in the database at a particular
moment is called an instance of the database.
• The overall design of the database is called the database schema.

http://www.knowledgegate.in/gate
TRANSACTION
• Why we study transaction?
• According to general computation principle (operating system) we may have partially
executed program, as the level of atomicity is instruction i.e. either an instruction is
executed completely or not
• But in DBMS view, user perform a logical work(operation) which is always atomic in nature
i.e. either operation is execute or not executed, there is no concept like partial execution.
For example, Transaction T1 which transfer 100 units from account A to B

T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100

http://www.knowledgegate.in/gate
Write(B)
• In this transaction if a failure occurs after Read(B) then the final statue of the
system will be inconsistent as 100 units are debited from account A but not
credited in account B, this will generate inconsistency.

• Here for ‘consistency’ before (A + B) == after (A + B)”

T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
Write(B)

http://www.knowledgegate.in/gate
What is transaction
• To remove this partial execution problem, we increase the level of
atomicity and bundle all the instruction of a logical operation into a
unit called transaction.
• So formally ‘A transaction is a Set of logically related instructions to
perform a logical unit of work’.

T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
http://www.knowledgegate.in/gate
Write(B)
• As here we are only concerned with DBMS so we well only two basic
operation on database

• READ (X) - Accessing the database item x from disk (where database
stored data) to memory variable also name as X.

• WRITE (X) - Writing the data item from memory variable X to disk.

http://www.knowledgegate.in/gate
Desirable properties of transaction
• Now as the smallest unit which have atomicity in DBMS view is transaction, so if
want that our data should be consistent then instead of concentrating on data
base, we must concentrate on the transaction for our data to be consistent.

T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
Write(B)

http://www.knowledgegate.in/gate
• Transactions should possess several properties, often called the ACID properties;
to provide integrity and consistency of the data in the database. The following
are the ACID properties:

http://www.knowledgegate.in/gate
• Atomicity - A transaction is an atomic unit of processing; it should
either be performed in its entirety or not performed at all.
• It is the responsibility of recovery control manager / transaction
control manager of DBMS to ensure atomicity

http://www.knowledgegate.in/gate
• Consistency - A transaction should be consistency preserving, meaning that if it
is completely executed from beginning to end without interference from other
transactions, it should take the database from one consistent state to another.
• The definition of consistency may change from one system to another. The
preservation of consistency of database is the responsibility of
programmers(users) or the DBMS modules that enforces integrity constraints.

http://www.knowledgegate.in/gate
• Isolation - A transaction should appear as though it is being executed
in isolation from other transactions, even though many transactions
are executing concurrently.
• That is, the execution of a transaction should not be interfered with by
any other transactions executing concurrently.
• The isolation property of database is the responsibility of concurrency
control manager of database.

http://www.knowledgegate.in/gate
• Durability - The changes applied to the database by a committed
transaction must persist in the database.
• These changes must not be lost because of any failure. It is the
responsibility of recovery control manager of DBMS.

http://www.knowledgegate.in/gate
Q NOT a part of the ACID properties of database transactions?

(a) Atomicity

(b) Consistency

(c) Isolation

http://www.knowledgegate.in/gate
(d) Deadlock-freedom
Transaction states

http://www.knowledgegate.in/gate
Transaction states
• ACTIVE - It is the initial state. Transaction remains in this state while it is executing operations.
• PARTIALLY COMMITTED - After the final statement of a transaction has been executed, the
state of transaction is partially committed as it is still possible that it may have to be aborted
(due to any failure) since the actual output may still be temporarily residing in main memory
and not to disk.
• FAILED - After the discovery that the transaction can no longer proceed (because of hardware
/logical errors). Such a transaction must be rolled back.
• ABORTED - A transaction is said to be in aborted state when the when the transaction has
been rolled back and the database has been restored to its state prior to the start of
execution.
• COMMITTED - A transaction enters committed state after successful completion of a
transaction and final updation in the database.

http://www.knowledgegate.in/gate
Q if the transaction is in which of the state that we can
guarantee that data base is in consistent state

a) aborted b) committed

c) both aborted & committed d) none

http://www.knowledgegate.in/gate
Why we need concurrent execution
• Concurrent execution is necessary because-
• It leads to good database performance , less weighting time.
• Overlapping I/O activity with CPU increases throughput and response time.

http://www.knowledgegate.in/gate
PROBLEMS DUE TO CONCURRENT EXECUTION OF TRANSACTION

• But interleaving of instructions between transactions may also lead to many


problems that can lead to inconsistent database.
• Sometimes it is possible that even though individual transaction are satisfying
the acid properties even though the final statues of the system will be
inconsistent.

http://www.knowledgegate.in/gate
Lost update problem / Write - Write problem-
• If there is any two write operation of different transaction on same
data value, and between them there is no read operations, then the
second write over writes the first write.

T1 T2
Read(A)
Write(A)
Write(A)
Commit
Commit

http://www.knowledgegate.in/gate
Dirty read problem/ Read -Write problem
• In this problem, the transaction reads a data item updated by another
uncommitted transaction, this transaction may in future be aborted
or failed.
• The reading transactions end with incorrect results.

T1 T2
Read(A)
Write(A)
Read(A)
Commit
Abort
http://www.knowledgegate.in/gate
Unrepeatable read problem
• When a transaction tries to read a value of a data item twice, and another
transaction updates the data item in between, then the result of the two read
operation of the first transaction will differ, this problem is called, Non-
repeatable read problem.

T1 T2
Read(A)
Read(A)
Write(A)
Read(A)
http://www.knowledgegate.in/gate
Phantom read problem

T1 T2
Read(A)
Delete(A)
Read(A)

http://www.knowledgegate.in/gate
Q The following schedule is suffering from

a) Lost update problem

b) Unrepeatable read problem

c) Both A and B

d) Neither A and B

http://www.knowledgegate.in/gate
Q Which of the following scenarios may lead to an irrecoverable error in a database
system?

(A) A transaction writes a data item after it is read by an uncommitted transaction

(B) A transaction reads a data item after it is read by an uncommitted transaction

(C) A transaction reads a data item after it is written by a committed transaction

(D) A transaction reads a data item after it is written by an uncommitted transaction

http://www.knowledgegate.in/gate
Solution is Schedule
• When two or more transaction executed together or one after another then they can be
bundled up into a higher unit of execution called schedule.

• A schedule of n transactions T1, T2, ..., Tn is an ordering of the operations of the transactions.
Operations from different transactions can be interleaved in the schedule S.

• However, schedule for a set of transaction must contain all the instruction of those
transaction, and for each transaction Ti that participates in the schedule S, the operations of Ti
in S must appear in the same order in which they occur in Ti.

http://www.knowledgegate.in/gate
• Serial schedule - A serial schedule consists of sequence of instruction belonging to
different transactions, where instructions belonging to one single transaction appear
together. Before complete execution of one transaction another transaction cannot be
started.

• For a set of n transactions, there exist n! different valid serial schedules. Every serial schedule
http://www.knowledgegate.in/gate
lead database into consistent state. Throughput of system is less.
• Non-serial schedule - A schedule in which sequence of instructions of a
transaction appear in the same order as they appear in individual transaction
but the instructions may be interleaved with the instructions of different
transactions i.e. concurrent execution of transactions takes place.

http://www.knowledgegate.in/gate
• So the number of schedules for n different transaction T1,T2,T3,---------TN where
each transaction conations n1, n2, n3, -----------,n4 respectively will be.

• {(n1 + n2+ n3 +-----------+nn)!}/ (n1! n2! n3! -----------nn!)}

• {(n1 + n2+ n3 +-----------+nn)!}/ (n1! n2! n3! -----------nn!)} - n!

http://www.knowledgegate.in/gate
• Conclusion of schedules
• We do not have any method to proof that a schedule is consistent, but from
the above discussion we understand that a serial schedule will always be
consistent, so if somehow we proof that a non-serial schedule will also have
same effects as of a serial schedule that we get a proof that, this particular
non-serial schedule will also be consistent “find those schedules that are
logically equal to serial schedules”.
• For a concurrent schedule to result in consistent state, it should be equivalent
to a serial schedule. i.e. it must be serializable.

http://www.knowledgegate.in/gate
On the basis of On the basis of
SERIALIZABILITY RECOVERABILITY

Conflict
Recoverable
serializable

View
Cascadeless
serializable

Result
Strict
Equivalent

http://www.knowledgegate.in/gate
T1 T2 T1 T2 T1 T2 T1 T2
R(A) R(B) R(A) W(A)
R(B) R(A) W(A) R(A)

T1 T2 T1 T2 T1 T2 T1 T2
R(A) W(B) R(A) W(A)
W(B) R(A) W(A) R(A)

T1 T2 T1 T2 T1 T2 T1 T2
R(A) R(A) W(A) W(A)
R(A) R(A) W(A) W(A)

http://www.knowledgegate.in/gate
SERIALIZABILITY
• Conflicting instructions - Let I and J be two consecutive instructions belonging to two different
transactions Ti and Tj in a schedule S, the possible I and J instruction can be as-
• I= READ(Q), J=READ(Q) ->Non-conflicting
• I= READ(Q), J=WRITE(Q) ->Conflicting
• I= WRITE(Q), J=READ(Q) ->Conflicting
• I= WRITE(Q), J=WRITE(Q) ->Conflicting

• So, the instructions I and J are said to be conflicting, if they are operations by different
transactions on the same data item, and at least one of these instructions is a write operation.

http://www.knowledgegate.in/gate
Conflict equivalent – if one schedule can be converted to another schedule by
swapping of non- conflicting instruction then they are called conflict equivalent
schedule.

T1 T2 T1 T2
R(A) R(B)
A=A-50 B=B+50
R(B) R(A)
B=B+50 A=A-50
R(B) R(B)
B=B+50 B=B+50
R(A) R(A)
A=A+10 A=A+10
http://www.knowledgegate.in/gate
Q Consider a schedule of transactions T1 and T2: Here, RX stands for “Read(X)” and WX
stands for “Write(X)”. Which one of the following schedules is conflict equivalent to the
above schedule?
S
T1 T2
RA

a) b) c) d) RB
WB
S S S S RC
T1 T2 T1 T2 T1 T2 T1 T2 RD
RB RA RA RB WD
WB RC RC WB WC
RD WD WD RD WB
RA WB RB WC Commit
RC RB WB RA Commi
WD WB RD RC t
WB RD WB WD
WC WC WC WB
Commit Commit Commit Commit
Commit Commit Commit Commit
http://www.knowledgegate.in/gate
CONFLICT SERIALIZABLE
• The schedules which are conflict equivalent to a serial schedule are called
conflict serializable schedule.
• If a schedule S can be transformed into a schedule S’ by a series of swaps of non-
conflicting instructions, we say that S and S’ are conflict equivalent.
• A schedule S is conflict serializable, if it is conflict equivalent to a serial schedule.

http://www.knowledgegate.in/gate
Procedure for determining conflict serializability of a schedule
• It can be determined using PRECEDENCE GRAPH method:
• A precedence graph consists of a pair G (V, E)
• V= set of vertices consisting of all the transactions participating in the schedule.
• E= set of edges consists of all edges Ti à Tj, for which one of the following
conditions holds:
• Ti executes write(Q) before Tj executes read(Q)
• Ti executes read(Q) before Tj executes write(Q)
• Ti executes write(Q) before Tj executes write(Q)

http://www.knowledgegate.in/gate
• If an edge Ti à Tj exists in the precedence graph, then in any serial schedule S’ equivalent
to S, Ti must appear before Tj.

• If the precedence graph for S has no cycle, then schedule S is conflict serializable, else it
is not. This cycle detection can be done by cycle detection algorithms, one of them based
on depth first search takes O(n2) time.

• The serialializibility order of transactions of equivalent serial schedule can be determined


using topological order in a precedence graph.

http://www.knowledgegate.in/gate
Q Let Ri(z) and Wi(z) denote read and write operations on a data element z by a
transaction Ti, respectively. Consider the schedule S with four transactions.
S: R4 (x)R2 (x) R3(x) R1(y) W1(y) W2(x) W3(y) R4(y) Which one of the following serial
schedules is conflict equivalent to S?

(a) 𝑇1 → 𝑇3 → 𝑇4 → 𝑇2

(b) 𝑇1 → 𝑇4 → 𝑇3 → 𝑇2
(c) T4 → 𝑇1→ 𝑇3→ 𝑇2
(d) 𝑇3 → 𝑇1→ 𝑇4→ 𝑇2

http://www.knowledgegate.in/gate
Q Let ri(z) and wi(z) denote read and write operations respectively on a data item z by a
transaction Ti. Consider the following two schedules.
• S1:r1(x)r1(y)r2(x)r2(y)w2(y)w1(x) S1 S2
• S2:r1(x)r2(x)r2(y)w2(y)r1(y)w1(x) T1 T2 T1 T2
Which one of the following options is correct? R(X) R(X)
R(Y) R(X)
a) S1 is conflict serializable, and S2 is not conflict serializable R(X) R(Y)
b) S1 is not conflict serializable, and S2 is conflict serializable R(Y) W(Y)
W(Y) R(Y)
c) Both S1 and S2 are conflict serializable
W(X) W(X)
d) Neither S1 nor S2 is conflict serializable

http://www.knowledgegate.in/gate
Q Consider the following schedule for transactions T1, T2 and T3: Which one of the schedules
below is the correct serialization of the above?

(A) T1->>T3->>T2

(B) T2->>T1->>T3

(C) T2->>T3->>T1

(D) T3->>T1->>T2
http://www.knowledgegate.in/gate
VIEW SERIALIZABLE

• If a schedule is not conflict serializable, still it can be consistent, so let us study a weaker form
of serializability called View serializability, and even if a schedule is view serializable still it can
be consistent.
• If a schedule is conflict serializable then it will also be view serializable, so we must check view
serializability only if a schedule is not conflict serializable.

http://www.knowledgegate.in/gate
• if a schedule is not conflict serializable and it does not contain any blind write then it can
never be view serializable, but if not conflict serializable and have blind write then may or may
not be view serializable.

• If a schedule is not conflict serializable and if there exist a blind write.


• First tabulate all serial schedules possible. Then check one by one whether given schedule
is view equivalent to any of the serial schedule.
• If yes then schedule is view serializable otherwise not.

http://www.knowledgegate.in/gate
Check Conflict
Serializability

Not Conflict Yes Conflict


Serializable Serializable

Blind Write No Blind Yes View


Present Write Serializable

Check For
Not View
View
http://www.knowledgegate.in/gate
Serializability
Serializable
• Two schedules S and S’ are view equivalent, if they satisfy following conditions –
• For each data item Q, if the transaction Ti reads the initial value of Q in schedule S , then
then the transaction Ti must, in schedule S’ ,also read the initial value of Q.
• If a transaction Ti in schedule S reads any data item Q, which is updated by transaction Tj,
then a transaction Ti must in schedule S’ also read data item Q updated by transaction Tj in
schedule S’.
• For each data item Q, the transaction (if any) that performs the final write(Q) operation in
schedule S, then the same transaction must also perform final write(Q) in schedule S’.

http://www.knowledgegate.in/gate
• Complexity wise finding the schedule is view serializable or not is a NP- complete problem.
View Serializable
• A schedule S is view serializable, if it is view equivalent to a serial schedule.

http://www.knowledgegate.in/gate
View Serializable

A schedule S is view serializable, if it is view equivalent to a serial schedule.

http://www.knowledgegate.in/gate
• BLIND WRITES- In the above example, transaction T4 and T6 perform write
operation on data item Q without accessing (reading the data item), such
updation without knowing/accessing previous value of data item, are called
Blind updation or BLIND WRITE.

http://www.knowledgegate.in/gate
Q A Schedule that is not conflict serializable and contains at least one blind write
then the schedule is
a) Always view serializable

b) Always non-serializable

c) May be View serializable

d) None of the above

http://www.knowledgegate.in/gate
Q Which of the following statements (s) is/are TRUE?
S1: All view serializable schedules are also conflict serializable.
S2: All conflict serializable schedules are also view serializable.
S3: If a schedule is not conflict serializable then it is not view serializable
S4: If a schedule is not view serializable then it is not conflict serializable.

a) S1 and S2 only

b) S2 and S3 only

c) S2 and S4 only

d) S1 and S3 only

http://www.knowledgegate.in/gate
Q Consider the following schedule ‘S’ with three transactions.
S: R1(B); R3(C); R1 (A); W2 (A); W1(A), W2 (B); W3 (A); W1 (B); W3 (B), W3 (C)
Which of the following is TRUE with respect to the above schedule?

a) It is conflict serializable with sequence [T1, T2 T3]

b) It is conflict serializable with sequence [T2, T1 T3]

c) It is view serializable but not conflict serializable

d) It is neither conflict serializable nor view serializable

http://www.knowledgegate.in/gate
NON- RECOVERABLE SCHEDULE
• A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a
data item previously written by Ti, then the commit or abort operation of Ti
appears before Tj. Such a schedule is called Non- Recoverable schedule.

S
T1 T2
R(X)
W(X)
R(X)
C
C
http://www.knowledgegate.in/gate
RECOVERABLE SCHEDULE
• A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a
data item previously written by Ti, then the commit or abort of Ti must appear
before Tj. Such a schedule is called Recoverable schedule.

S
T1 T2
R(X)
W(X)
R(X)
C
C

http://www.knowledgegate.in/gate
Q Consider the following schedule S of transactions T1, T2, T3, T4
Which one of the following statements is CORRECT?
(A) S is conflict-serializable but not recoverable
(B) S is not conflict-serializable but is recoverable
(C) S is both conflict-serializable and recoverable
(D) S is neither conflict-serializable nor is it recoverable

http://www.knowledgegate.in/gate
Q Let S be the following schedule of operations of three transactions T1, T2 and T3 in a relational
database system: R2(Y),R1(X),R3(Z),R1(Y)W1(X),R2(Z),W2(Y),R3(X),W3(Z), Consider the
statements P and Q below:
P: S is conflict-serializable.
Q: If T3 commits before T1 finishes, then S is recoverable.
Which one of the following choices is correct?
S

(a) Both P and Q are true T1 T2 T3


R(Y)
(b) P is true and Q is false R(X)
R(Z)
(c) P is false and Q is true W(X) W(Z)
R(z)
(d) Both P and Q are false
W(Y)
R(X)
W(z)
http://www.knowledgegate.in/gate
CASCADING ROLLBACK
• It is a phenomenon, in which a single transaction failure leads to a series of transaction
rollbacks, is called cascading rollback. Even if the schedule is recoverable the, the commit of
transaction may lead lot of transaction to rollback.
• Cascading rollback is undesirable, since it leads to undoing of a significant amount of work.
Uncommitted reads are not allowed in cascade less schedule.
S
T1 T2 T3
R(X)
W(X)
R(X)
W(X)
R(X)
C
C
http://www.knowledgegate.in/gate
C
CASCADELESS SCHEDULE
• To avoid cascading rollback, cascade less schedule are used.
• A schedule in which for each pair of transactions Ti and Tj, such that if Tj reads a data item
previously written by Ti then the commit or abort of Ti must appear before read operation of
Tj. Such a schedule is called cascade less schedule.

S
T1 T2 T3
R(X)
W(X)
C
R(X)
W(X)
C
R(X)
http://www.knowledgegate.in/gate C
Check Dirty Read

Dirty Read Present Dirty Read not Present

Not Cascadleless
Both Recoverable and
Check order of Cascadless
commit

If order of commit is
same in which dirty

http://www.knowledgegate.in/gate
read is done,
recoverable otherwise
not
Q Consider the following partial Schedule S involving two transactions T1 and T2. Only the read and the write operations
have been shown. The read operation on data item P is denoted by read(P) and the write operation on data item P is
denoted by write(P).
Suppose that the transaction T1 fails immediately after time instance 9. Which one of the following statements is correct?
(A) T2 must be aborted and then both T1 and T2 must be re-started to ensure transaction atomicity
(B) Schedule S is non-recoverable and cannot ensure transaction atomicity
(C) Only T2 must be aborted and then re-started to ensure transaction atomicity
(D) Schedule S is recoverable and can ensure atomicity and nothing else needs to be done

http://www.knowledgegate.in/gate
Strict Schedule
• A schedule in which for each pair of transactions Ti and Tj, such that if Tj reads a data item previously written by Ti
then the commit or abort of Ti must appear before read and write operation of Tj.

S1 S2 S3
T1 T2 T1 T2 T1 T2
R(a) R(a) R(a)
W(a) W(a) R(b)
W(a) C W(a)
C W(a) W(b)
R(a) R(a) C
C C R(a)
C

http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
Q Consider the following database schedule with two transactions, T1 and T2. S = r2(X); r1(X);
r2(Y); w1(X); r1(Y); w2(X); a1; a2. where ri(Z) denotes a read operation by transaction Ti on a
variable Z, wi(Z) denotes a write operation by Ti on a variable Z and ai denotes an abort by
transaction Ti. Which one of the following statements about the above schedule is TRUE?
(a) S is non-recoverable (b) S is recoverable, but has a cascading abort
(c) S does not have a cascading abort (d) S is strict

S
T1 T2
R(x)
R(X)
R(Y)
W(X)
R(Y)
W(X)
a1
http://www.knowledgegate.in/gate
a 2
CONCURRENCY CONTROL

• Now we understood that if there is a schedule how to check whether it will work
correctly or not i.e. weather it will maintain the consistency of the data base or not.
(conflict serializability, view serializability, recoverability and cascade less)
• Now we will understand those protocol which guarantee how to design those
schedules which ensure conflict serializability or other properties. There are different
approach or idea to ensure conflict serializability which is the most important property.
• So first we must understand what is the possibility of conflict between two instruction
and if somehow, we manage than the generated schedule will always be conflict
serializable

http://www.knowledgegate.in/gate
Goals of a Protocol: - We desire the following properties from schedule generating
protocols
• Concurrency should be as high as possible, as this is our ultimate goal
because of which we are making all the effort.
• The time taken by a transaction should also be less.
• Desirable Properties satisfied by the protocol
• Easy to understand and implement

http://www.knowledgegate.in/gate
TIME STAMP ORDERING PROTOCOL
• Basic idea of time stamping is to decide the order between the transaction
before they enter in the system using a stamp (time stamp), in case of any
conflict during the execution order can be decided using the time stamp.
• Let’s understand how this protocol works, here we have two idea of
timestamping, one for the transaction, and other for the data item.

http://www.knowledgegate.in/gate
• Time stamp for transaction,
• With each transaction ti, in the system, we associate a unique fixed timestamp, denoted by
TS(ti).
• This timestamp is assigned by database system to a transaction at time transaction enters
into the system.
• If a transaction has been assigned a timestamp TS(ti) and a new transaction tj , enters into
the system with a timestamp TS(tj), then always TS(ti) <TS(tj).

http://www.knowledgegate.in/gate
• Two things are to be noted
1. First time stamp of a transaction remain fixed throughout the execution
2. Second it is unique means no two transaction can have the same timestamp.

• The reason why we called time stamp not stamp, because for stamping we use
the value of the system clock as stamp, advantage is,
• Tt will always be unique as time never repeats
• There is no requirement of refreshing and starting with fresh value.

• The time stamp of the transaction also determines the serializability order.
• Thus if TS(ti) <TS(tj), then the system must ensure that the produced schedule is
equivalent to a serial schedule in which transaction ti appears before transaction
tj. http://www.knowledgegate.in/gate
• Time stamp with data item, in order to assure such scheme, the protocol maintains for each
data item Q two timestamp values:

1. W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q)


successfully.
2. R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q)
successfully.
• These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed.

http://www.knowledgegate.in/gate
• Suppose a transaction Ti request a read(Q)
1. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was
already overwritten. Hence, the read operation is rejected, and Ti is rolled
back.
2. If TS(Ti)≥ W-timestamp(Q), then the read operation is executed, and R-
timestamp(Q) is set to the maximum of R-timestamp(Q) and TS(Ti).

http://www.knowledgegate.in/gate
• Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and
the system assumed that that value would never be produced. Hence, the write operation is
rejected, and Ti is rolled back.

2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this
write operation is rejected, and Ti is rolled back.

3. If TS(Ti) ≥ R-timestamp(Q), then the write operation is executed, and W-timestamp(Q) is set to
max(W-timestamp(Q), TS(Ti)).

4. If TS(Ti) ≥ W-timestamp(Q), then the write operation is executed, and W-timestamp(Q) is set to
max(W-timestamp(Q), TS(Ti)).

If a transaction Ti is rolled back by the concurrency control scheme as a result of either a read or
http://www.knowledgegate.in/gate
write operation, the system assigns it’s a new timestamp and restarts it.
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom

Time Stamp Ordering

Thomas Write Rule


Basic 2PL
Conservative 2PL
Rigorous 2PL
Strict 2PL

http://www.knowledgegate.in/gate
THOMAS WRITE RULE
• Thomas write is an improvement in time stamping protocol, which makes some modification
and may generate those protocols that are even view serializable, because it allows greater
potential concurrency.

• It is a Modified version of the timestamp-ordering protocol in which Blind write operations


may be ignored under certain circumstances.

• The protocol rules for read operations remain unchanged. while for write operation, there is
slightly change in Thomas write rule than timestamp ordering protocol.

http://www.knowledgegate.in/gate
When Ti attempts to write data item Q,

• if TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value of


{Q}. Rather than rolling back Ti as the timestamp ordering protocol would have
done, this {write} operation can be ignored.

• This modification is valid as the any transaction with TS(Ti ) < W-timestamp(Q),
the value written by this transaction will never be read by any other transaction
performing Read(Q) ignoring such obsolete write operation is considerable.
• Thomas' Write Rule allows greater potential concurrency. Allows some view-
serializable schedules that are not conflict serializable.

http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom

Time Stamp Ordering YES YES NO NO YES


Thomas Write Rule
Basic 2PL
Conservative 2PL
Rigorous 2PL
Strict 2PL

http://www.knowledgegate.in/gate
Lock Based Protocols
• To ensure isolation is to require that data items be accessed in a mutually
exclusive manner i.e. while one transaction is accessing a data item, no other
transaction can modify that data item. Locking is the most fundamental
approach to ensure this.
• Lock based protocols ensure this requirement. Idea is first obtain a lock on the
desired data item then if lock is granted then perform the operation and then
unlock it.

http://www.knowledgegate.in/gate
• In general, we support two modes of lock because, to provide better concurrency.

• Shared mode
• If transaction Ti has obtained a shared-mode lock (denoted by S) on any data item Q, then
Ti can read, but cannot write Q, any other transaction can also acquire a shared mode lock
on the same data item(this is the reason we called this shared mode).

• Exclusive mode
• If transaction Ti has obtained an exclusive-mode lock (denoted by X) on any data item Q,
then Ti can both read and write Q, any other transaction cannot acquire either a shared or
exclusive mode lock on the same data item. (this is the reason we called this exclusive
mode)

http://www.knowledgegate.in/gate
Lock –Compatibility Matrix

• Conclusion shared is compatible only with shared while exclusive is not compatible either with
shared or exclusive.
• To access a data item, transaction Ti must first lock that item, if the data item is already locked
by another transaction in an incompatible mode, or some other transaction is already waiting
in non-compatible mode, then concurrency control manager will not grant the lock until all
incompatible locks held by other transactions have been released. The lock is then granted.

http://www.knowledgegate.in/gate
• Lock based protocol do not ensure serializability as granting and releasing of lock do not follow
any order and any transaction any time may go for lock and unlock. Here in the example
below we can see, that even this transaction in using locking but neither it is conflict
serializable nor independent from deadlock.
T1 T2
LOCK-X(A)
READ(A)
WRITE(A)
UNLOCK(A)
LOCK-S(B)
READ(B)
UNLOCK(B)
LOCK-X(B)
READ(B)
WRITE(B)
UNLOCK(B)
LOCK-S(A)
READ(A)
UNLOCK(A)
http://www.knowledgegate.in/gate
Two phase locking protocol(2PL)
• The protocol ensures that each transaction issue lock and unlock requests in two phases, note
that each transaction will be 2 phased not schedule.
• Growing phase- A transaction may obtain locks, but not release any locks.
• Shrinking phase- A transaction may release locks, but may not obtain any new locks.

• Initially a transaction is in growing phase and acquires lock as needed and in between can
perform operation reach to lock point and once a transaction releases a lock, it can issue no
more lock requests i.e. it enters the shrinking phase.

http://www.knowledgegate.in/gate
T1 T2
LOCK-X(A)
READ(A)
WRITE(A)
LOCK-S(B)
READ(B)
LOCK-X(B)
READ(B)
WRITE(B)
LOCK-S(A)
READ(A)
UNLOCK(B)
UNLOCK(A)
UNLOCK(B)
UNLOCK(A)

http://www.knowledgegate.in/gate
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom

Time Stamp Ordering YES YES NO NO YES


Thomas Write Rule NO YES NO NO YES
Basic 2PL
Conservative 2PL
Rigorous 2PL
Strict 2PL

http://www.knowledgegate.in/gate
Q Which of the following concurrency protocol ensures both conflict serializability
and freedom from deadlock?
(1) 2 - phase Locking (2) Time stamp - ordering

(a) Both (1) and (2) (b) (1) only

(c) (2) only (d) Neither (1) nor (2)

http://www.knowledgegate.in/gate
Conservative 2PL
• The idea is there is no growing phase transaction start directly from lock point,
i.e. transaction must first acquire all the required locks then only it can start
execution. If all the locks are not available then transaction must release the
acquired locks and must wait.
• Shrinking phase will work as usual, and transaction can unlock any data item
anytime.
• we must have a knowledge in future to understand what is data required so
that we can use it

http://www.knowledgegate.in/gate
Q In conservative two phase locking protocol, a transaction
a) Should release exclusive locks only after the commit operation
b) Should release all the locks only at beginning of the transaction
c) should acquire all the locks at beginning of the transaction
d) Should acquire all the exclusive locks at beginning transaction

http://www.knowledgegate.in/gate
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom

Time Stamp Ordering YES YES NO NO YES


Thomas Write Rule NO YES NO NO YES
Basic 2PL YES YES NO NO NO
Conservative 2PL
Rigorous 2PL
Strict 2PL

http://www.knowledgegate.in/gate
RIGOROUS 2PL
• Requires that all locks be held until the transaction commits.
• This protocol requires that locking be two phase and also all the locks taken be
held by transaction until that transaction commit.
• Hence there is no shrinking phase in the system.

http://www.knowledgegate.in/gate
Q In a Rigorous 2 phase protocol
a) All shared locks held by the transaction are related after the
transaction is committed
b) All exclusive locks held by the transaction are released after the
transaction is committed
c) All locks held by the transaction are released after the transaction is
committed
d) All locks held by the transaction are released before the transaction is
committed

http://www.knowledgegate.in/gate
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom

Time Stamp Ordering YES YES NO NO YES


Thomas Write Rule NO YES NO NO YES
Basic 2PL YES YES NO NO NO
Conservative 2PL YES YES NO NO YES
Rigorous 2PL
Strict 2PL

http://www.knowledgegate.in/gate
STRICT 2PL
• that all exclusive-mode locks taken by a transaction be held until that transaction commits.
This requirement ensures that any data written by an uncommitted transaction are locked in
exclusive mode until the transaction commits, preventing any other transaction from reading
the data.

• This protocol requires that locking be two phase and also that exclusive –mode locks taken by
transaction be held until that transaction commits.

• So it is simplified form of rigorous 2pl

http://www.knowledgegate.in/gate
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom

Time Stamp Ordering YES YES NO NO YES


Thomas Write Rule NO YES NO NO YES
Basic 2PL YES YES NO NO NO
Conservative 2PL YES YES NO NO YES
Rigorous 2PL YES YES YES YES NO
Strict 2PL

http://www.knowledgegate.in/gate
Q Consider the following two statements about database transaction schedules:
I. Strict two-phase locking protocol generates conflict serializable schedules that are also
recoverable.
II. Timestamp-ordering concurrency control protocol with Thomas Write Rule can
generate view serializable schedules that are not conflict serializable.
Which of the above statements is/are TRUE?

(a) Both I and II

(b) Neither I nor II

(c) II only

(d) I only

http://www.knowledgegate.in/gate
Q Which of the following statement is/are correct
a) Every conflict serializable schedule allowed under 2PL protocol is
allowed by basic time stamping protocol.
b) Every schedule allowed under basic time stamping protocol is allowed
by Thomas-write rule
c) Every schedule allowed under Thomas-write rule is allowed by basic
time stamping protocol
d) none

http://www.knowledgegate.in/gate
Video chapters
• Chapter-1 (Transactions and concurrency control)
• Chapter-2 (ER-model)
• Chapter-3 (Relational model, Functional Dependencies, Keys)
• Chapter-4 (Normalization)
• Chapter-5 (File organization, indexing (e.g., B and B+ trees)
• Chapter-6 (Relational algebra, tuple calculus, SQL)
http://www.knowledgegate.in/gate
E-R DIAGRAM/MODEL
• Introduced in 1976 by Dr Peter Chen, a non-technical design
method works on conceptual level based on the perception
of the real world.

• E-R data model was developed to facilitate database design


by allowing specification of an enterprise schema that
represents the overall logical structure of a database.

http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
Conclusion
• It consists of collections of basic objects, called entities and of relationships among these
entities and attributes which defines their properties.

• E-R model is very useful in mapping the meanings and interactions of real-world enterprises
onto a conceptual schema.

• It is free from ambiguities and provides a standard and logical way of visualizing the data.

• As basically it is a diagrammatical representation easy to understand even by a non-technical


user.

http://www.knowledgegate.in/gate
ENTITY

http://www.knowledgegate.in/gate
ENTITY
• An entity is a thing or an object in the real world that is distinguishable from other object
based on the values of the attributes it possesses.
• An entity may be concrete, such as a person or a book, or it may be abstract, such as a course,
a course offering, or a flight reservation.

http://www.knowledgegate.in/gate
• Types of Entity
• Tangible - Entities which physically exist in real world. E.g. - Car, Pen, locker
• Intangible - Entities which exist logically. E.g. – Account, video.

http://www.knowledgegate.in/gate
• ENTIY SET- Collection of same type of entities that share the same properties or
attributes. In an ER diagram an entity set is represented by a rectangle. In a
relational model it is represented by a separate table.

• In ER diagram we cannot represent an entity, as entity is an instant not schema,


and ER diagram is designed to understand schema.
• In a relational model entity is represented by a row or a tuple or a record in a
table.
http://www.knowledgegate.in/gate
ATTRIBUTES
• Attributes are the units defines and describe properties and characteristics of
entities. Attributes are the descriptive properties possessed by each member of
an entity set. for each attribute there is a set of permitted values called domain.

http://www.knowledgegate.in/gate
• In an ER diagram attributes are represented by ellipse or oval connected to rectangle.

• While in a relational model they are represented by independent column. e.g. Instructor (ID,
name, salary, dept_name)

http://www.knowledgegate.in/gate
Types of Attributes
• Single valued- Attributes having single value at any instance of time for an entity. E.g. –
Aadhar no, dob.

• Multivalued - Attributes which can have more than one value for an entity at same time. E.g.
- Phone no, email, address.
• A multivalued attribute is represented by a double ellipse in an ER diagram and by an
independent table in a relational model.
• Separate table for each multivalued attribute, by taking mva and pk of main table as fk in
new table

http://www.knowledgegate.in/gate
Customer
Customer ID First Name Surname Telephone Number
123 Pooja Singh 555-861-2025, 192-122-1111
(555) 403-1659 Ext. 53; 182-929-
456 San Zhang
2929
789 John Doe 555-808-9633

http://www.knowledgegate.in/gate
Customer
Customer ID First Name Surname Telephone Number
123 Pooja Singh 555-861-2025, 192-122-1111
(555) 403-1659 Ext. 53; 182-929-
456 San Zhang
2929
789 John Doe 555-808-9633

जग
ु ाड़ technology

Customer
Telephone Telephone
Customer ID First Name Surname
Number1 Number2
123 Pooja Singh 555-861-2025 192-122-1111
(555) 403-1659 Ext.
456 San Zhang 182-929-2929
53
789 John Doe 555-808-9633

http://www.knowledgegate.in/gate
Customer
Customer ID First Name Surname Telephone Number
123 Pooja Singh 555-861-2025, 192-122-1111
(555) 403-1659 Ext. 53; 182-929-
456 San Zhang 2929
789 John Doe 555-808-9633

Customer Name Customer Phone Number


Customer ID Telephone Number
Customer ID First Name Surname
123 555-861-2025
123 Pooja Singh 123 192-122-1111
456 San Zhang 456 (555) 403-1659 Ext. 53
789 John Doe 456 182-929-2929
789 555-808-9633

http://www.knowledgegate.in/gate
• Simple - Attributes which cannot be divided further into sub parts. E.g. Age

• Composite - Attributes which can be further divided into sub parts, as simple
attributes. A composite attribute is represented by an ellipse connected to an
ellipse and in a relational model by a separate column.

http://www.knowledgegate.in/gate
• Stored - Main attributes whose value is permanently stored in database. E.g.
date_of_birth

• Derived -The value of these types of attributes can be derived from values of other
Attributes. E.g. - Age attribute can be derived from date_of_birth and Date attribute.

http://www.knowledgegate.in/gate
Descriptive attribute - Attribute of relationship is called descriptive attribute.

• An attribute takes a null value when an entity does not have a value for it. The null
value may indicate “not applicable”— that is, that the value does not exist for the
entity.

http://www.knowledgegate.in/gate
• Null can also designate that an attribute value is unknown. An unknown value
may be either missing (the value does exist, but we do not have that
information) or not known (we do not know whether or not the value actually
exists).

http://www.knowledgegate.in/gate
Relationship / Association

http://www.knowledgegate.in/gate
Relationship / Association

• Is an association between two or more entities of same or different entity set.


• In ER diagram we cannot represent individual relationship as it is an instance or
data.

http://www.knowledgegate.in/gate
• In an ER diagram it is represented by a diamond, while in relational model
sometimes through foreign key and other time by a separate table.

• Note: - normally people use word relationship for relationship type so don’t get confused.

http://www.knowledgegate.in/gate
• Every relationship type has three components.

• Name- Every relation must have a unique name.

• Degree-

• Structural constraints (cardinalities ratios, participation)

http://www.knowledgegate.in/gate
Degree of a relationship/Relationship Set
• Means number of entities set(relations/tables) associated(participate) in the relationship set.

• Most of the relationship sets in a data base system are binary.

• Occasionally however relationship sets involve more than two entity sets.

• Logically, we can associate any number of entity set in a relationship called N-ary Relationship.

http://www.knowledgegate.in/gate
• Unary Relationship - One single entity set participate in a Relationship, means two entities of
the same entity set are related to each other.

• These are also called as self -referential Relationship set.

• E.g.- A member in a team maybe supervisor of another member in team.

http://www.knowledgegate.in/gate
• Binary Relationship - Two entity
sets participate in a Relationship. It
is most common Relationship.

• Ternary Relationship - When


three entities participate in a
Relationship. E.g. The University
might need to record which
teachers taught which subjects in
which courses.

• Quaternary Relationship - When


four entities participate in a
Relationship.

http://www.knowledgegate.in/gate
• N-ary relationship – where n number of entity set are associated

• But the most common relationships in ER models are Binary.

http://www.knowledgegate.in/gate
Structural constraints (Cardinalities Ratios, Participation)
• An E-R enterprise schema may define certain constraints to which the contents
of a database must conform.
• Express the number of entities to which another entity can be associated via a
relationship set. Four possible categories are-
• One to One (1:1) Relationship.
• One to Many (1: M) Relationship.
• Many to One (M: 1) Relationship.
• Many to Many (M: N) Relationship.

http://www.knowledgegate.in/gate
One to One (1:1) Relationship - An entity in A is associated with at most one entity in B, and an
entity in B is associated with at most one entity in A.

E.g.- The directed line from relationship set advisor to both entities set indicates that ‘an instructor
may advise at most one student, and a student may have at most one advisor’.

http://www.knowledgegate.in/gate
One to Many (1: M) Relationship - An entity in A is associated with any number (zero or more) of
entities in B. An entity in B, however, can be associated with at most one entity in A.

E.g.- This indicates that an instructor may advise many students, but a student may have at most
one advisor.

http://www.knowledgegate.in/gate
Many to One (M: 1) Relationship - An entity in A is associated with at most one entity in
B. An entity in B, however, can be associated with any number (zero or more) of entities
in A.

E.g.- This indicates that student may have many instructors but an instructor can advise at
most one student.

http://www.knowledgegate.in/gate
Many to Many(M:N) Relationship - An entity in A is associated with any number (zero or
more) of entities in B, and an entity in B is associated with any number (zero or more) of
entities in A.

E.g.- This indicates a student may have many advisors and an instructor may advise many
students.

http://www.knowledgegate.in/gate
Participation Constraints
• Participation constraint specifies whether the existence of an entity depends on its
being related to another entity via the relationship type.
• These constraints specify the minimum and maximum number of relationship
instances that each entity must/can participate in.
• Max cardinality – it defines the maximum no of times an entity occurrence
participating in a relationship.
• Min cardinality - it defines the minimum no of times an entity occurrence
participating in a relationship.

http://www.knowledgegate.in/gate
• PARTICIPATION CONSTRAINTS- it defines participations of entities of an entity
type in a relationship.
• PARTIAL PARTICIPATION (min cardinality zero) - In Partial participation only
some entities of entity set participate in Relationship set, that is there exists at
least one entity which do not participate in a relation.
• TOTAL PARTICIPATION (min cardinality at least one) - In total participation every
entity of an entity set participates in at least one relationship in Relationship set.

http://www.knowledgegate.in/gate
• Conversion of 1-1 relationship(binary)
• No separate table is required, take pk of one side as pk on other side,
priority must be given to the side having total participation.

• Conversion of 1-n or n-1 relationship (binary)


• No separate table is required, modify n side by taking pk of 1 side a foreign
key on n side.

• Conversion of n-n relationship (binary)


• Separate table is required take pk of both table and declare their
combination as a pk of new table.

http://www.knowledgegate.in/gate
Q In an Entity-Relationship (ER) model, suppose R is a many-to-one relationship from
entity set E1 to entity set E2. Assume that E1 and E2 participate totally in R and that the
cardinality of E1 is greater than the cardinality of E2. Which one of the following is true
about R?
(a) Every entity in E1 is associated with exactly one entity in E2
(b) Some entity in E1 is associated with more than one entity in E2

(c) Every entity in E2 is associated with exactly one entity in E1


(d) Every entity in E2 is associated with at most one entity in E1

http://www.knowledgegate.in/gate
Q Given the basic ER and relational models, which of the following is INCORRECT?
(A) An attribute of an entity can have more than one value

(B) An attribute of an entity can be composite


(C) In a row of a relational table, an attribute can have more than one value

(D) In a row of a relational table, an attribute can have exactly one value or a NULL value

http://www.knowledgegate.in/gate
Q Consider the entities ‘hotel room’, and ‘person’ with a many to many relationship ‘lodging’ as
shown below:

If we wish to store information about the rent payment to be made by person (s) occupying
different hotel rooms, then this information should appear as an attribute of

(A) Person

(B) Hotel Room

(C) Lodging

(D) None of these

http://www.knowledgegate.in/gate
STRONG AND WEAK ENTITY SET

• An entity set is called strong entity set, if it has a primary key, all the tuples in the set are
distinguishable by that key.
• An entity set that does not process sufficient attributes to form a primary key is called a weak
entity set.
• It contains discriminator attributes (partial key) which contain partial information about the
entity set, but it is not sufficient enough to identify each tuple uniquely. Represented by
double rectangle.

http://www.knowledgegate.in/gate
• For a weak entity set to be meaningful and converted into strong entity set, it must be
associated with another strong entity set called the identifying or owner entity set i.e. weak
entity set is said to be existence dependent on the identity set.
• The identifying entity set is said to own weak entity set that it identifies.
• A weak entity set may participate as owner in an identifying relationship with another weak
entity set.

http://www.knowledgegate.in/gate
• The relationship associating the weak entity set with the identifying entity set is called the
identifying relationship (double diamonds).

• The identifying relationship is many to one from the weak entity set to identifying entity set,
and the participation of the weak entity set in relationship is always total.
• The primary key of weak entity set will be the union of primary key and discriminator
attributes.

http://www.knowledgegate.in/gate
REASONS TO HAVE WEAK ENTITY SET
• Weak entities reflect the logical structure of an entity being dependent on
another.
• Weak entity can be deleted automatically when their strong entity is deleted.
• Without weak entity set it will lead to duplication and consequent possible
inconsistencies.

http://www.knowledgegate.in/gate
Conversion From ER Diagram To Relational Model
• Entity Set
• Convert every strong entity set into a separate table.
• Convert every weak entity set into a separate table, by making it
dependent into one strong entity set (identifying or owner entity set).

http://www.knowledgegate.in/gate
Conversion From ER Diagram Tom Relational Model

• Relationship(Unary)
• No separate table is required, add a new column as fk which refer
the pk of the same table.

http://www.knowledgegate.in/gate
Conversion From ER Diagram Tom Relational Model

• Relationship(Binary)
• 1:1 No separate table is required, take pk of one side and put it as fk on other
side, priority must be given to the side having total participation.

http://www.knowledgegate.in/gate
Conversion From ER Diagram Tom Relational Model

• Relationship(Binary)
• 1:n or n:1 No separate table is required, modify n side by taking pk of 1 side a
foreign key on n side

http://www.knowledgegate.in/gate
Roll_no name Age Edu_id name Subject
1 A 19 1 A OS
2 B 18 2 B DBMS
3 C 20 3 C TOC
4 D 20 4 D CN

http://www.knowledgegate.in/gate
Conversion From ER Diagram Tom Relational Model

• Relationship(Binary)
• (n-n) Separate table is required take pk of both table and declare their
combination as a pk of new table

http://www.knowledgegate.in/gate
Roll no name Age Edu_id name Subject
1 A 19 1 A OS
2 B 18 2 B DBMS
3 C 20 3 C TOC
4 D 20 4 D CN

Roll no name Age Edu_id Edu_id name Subject Roll no Roll no Edu_id
1 A 19 1 1 A OS 1 1 1
1 A 19 3 2 B DBMS 2 1 3
1 A 19 4 2 B DBMS 3 1 4
2 B 18 2 3 C TOC 1 2 2
2 B 18 3 3 C TOC 2 2 3
3 C 20 2 3 C TOC 4 3 2
4 D 20 3 4 D CN 1 4 3
4 D 20 4 4 D CN 4 4 4

http://www.knowledgegate.in/gate
Roll no name Age Edu_id name Subject
1 A 19 1 A OS
2 B 18 2 B DBMS
3 C 20 3 C TOC
4 D 20 4 D CN

Roll no Edu_id
1 1
1 3
1 4
2 2
2 3
3 2
4 3
4 4

http://www.knowledgegate.in/gate
Conversion From ER Diagram Tom Relational Model

• Relationship(3 or More)
• Take the pk of all participating entity sets as fk and declare their
combinations as pk in the new table.

http://www.knowledgegate.in/gate
Conversion From ER Diagram Tom Relational Model

• Multivalued Attributes
• A separate table must be taken for all multivalued attributes, where we take
pk of the main table as fk and declare combination of fk and multivalued
attribute are pk in the new table.

http://www.knowledgegate.in/gate
Conversion From ER Diagram Tom Relational Model

• Composite Attributes
• A separate column must be taken for all simple attributes of the composite
attribute.

http://www.knowledgegate.in/gate
Q The minimum number of tables required to convert the following ER diagram to
relation model is _____________

http://www.knowledgegate.in/gate
Q The minimum number of tables required to convert the following ER diagram to
Relational model is ____________

http://www.knowledgegate.in/gate
Q An ER model of a database consists of entity types A and B. These are connected by a
relationship R which does not have its own attribute. Under which one of the following
conditions, can the relational table for R be merged with that of A?
(a) Relationship R is one-to-many and the participation of A in R is total

(b) Relationship R is one-to-many and the participation of A in R is partial

(c) Relationship R is many-to-one and the participation of A in R is total

(d) Relationship R is many-to-one and the participation of A in R is partial

http://www.knowledgegate.in/gate
Q Consider an Entity-Relationship (ER) model in which entity sets E1 and E2 are connected
by an m: n relationship R12, E1 and E3 are connected by a 1: n (1 on the side of E1 and n on
the side of E3) relationship R13. E1 has two single-valued attributes a11 and a12 of which a11
is the key attribute. E2 has two single-valued attributes a21 and a22 of which a21 is the key
attribute. E3 has two single valued attributes a31 and a32 of which a31 is the key attribute.
The relationships do not have any attributes. If a relational model is derived from the
above ER model, then the minimum number of relations that would be generated is
___________.
a) 2

b) 3

c) 4

d) 5

http://www.knowledgegate.in/gate
Q Let E1 and E2 be two entities in an E/R diagram with simple single-valued
attributes. R1 and R2 are two relationships between E1 and E2, where R1 is one-to-
many and R2 is many-to-many. R1 and R2 do not have any attributes of their own.
What is the minimum number of tables required to represent this situation in the
relational model?
a) 2

b) 3

c) 4

d) 5
http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
Q Consider the following entity relationship diagram (ERD), where two entities E1 and E2 have a
relation R of cardinality 1 : m. The attributes of E1 are A11, A12 and A13 where A11 is the key
attribute. The attributes of E2 are A21, A22 and A23 where A21 is the key attribute and A23 is a multi-
valued attribute. Relation R does not have any attribute. A relational database containing
minimum number of tables from the above ERD. The number of tables in the database is
(A) 2

(B) 3

(C) 5

(D) 4

http://www.knowledgegate.in/gate
• ADVANTAGES OF E-R DIGRAM
• Constructs used in the ER diagram can easily be transformed into relational
tables.
• It is simple and easy to understand with minimum training.

• DISADVANTAGE OF E-R DIGRAM


• Loss of information content.
• Limited constraints representation.
• It is overly complex for small projects.

http://www.knowledgegate.in/gate
Video chapters
• Chapter-1 (Transactions and concurrency control)
• Chapter-2 (ER-model)
• Chapter-3 (Relational model, Functional Dependencies, Keys)
• Chapter-4 (Normalization)
• Chapter-5 (File organization, indexing (e.g., B and B+ trees)
• Chapter-6 (Relational algebra, tuple calculus, SQL)
http://www.knowledgegate.in/gate
RELATIONAL DATABASE MANAGEMENT SYSTEM

• A Relational Database Management System (RDBMS) is a


software system that uses a relational model to create,
manage, and query data in databases through tables linked by
defined relationships. Most modern commercial and open-
source database applications are relational in nature.
• Based on the relational model specified by Edgar F. Codd. The
father of modern relational database design in 1970.

http://www.knowledgegate.in/gate
BASICS OF RDBMS
• Domain (set of permissible value in particular column) is a set of atomic values. By atomic we
mean that each value in the domain is indivisible as far as the formal relational model is
concerned. A common method of specifying a domain is to specify a data type from which the
data values forming the domain are drawn.

• E.g. Names: The set of character strings that represent names of persons.

Domain/ NAME ID CITY COUNTRY HOBBY


Field/ NISHA 1 AGRA INDIA PLAYING

Column/ NIKITA 2 DELHI INDIA DANCING

Arity/ AJAY 3 AGRA INDIA CHESS

Degree ARPIT 4 PATNA INDIA READING

http://www.knowledgegate.in/gate
• Table (Relation) - A Relation is a set of tuples/rows/entities/records.

• Tuple - Each row of a Relation/Table is called Tuple.

• Arity/Degree - No. of columns/attributes of a Relation. E.g. - Arity is 5 in Table Student.

• Cardinality - No of rows/tuples/record of a Relational instance. E.g. - Cardinality is 4 in table


Student. NAME ID CITY COUNTRY HOBBY
NISHA 1 AGRA INDIA PLAYING
Rows/Tuples/Record/ NIKITA 2 DELHI INDIA DANCING
Cardinality AJAY 3 AGRA INDIA CHESS
ARPIT 4 PATNA INDIA READING

http://www.knowledgegate.in/gate
Properties of Relational tables
1. Cells contains atomic values

2. Values in a column are of the same kind

3. Each row is unique

4. No two tables can have the same name in a relational schema.

5. Each column has a unique name

6. The sequence of rows is insignificant

7. The sequence of columns is insignificant.

http://www.knowledgegate.in/gate
Problems in relational database

• Update Anomalies- Anomalies that cause redundant work to be done during


insertion into and Modification of a relation and that may cause accidental loss
of information during a deletion from a relation
• Insertion Anomalies

• Modification Anomalies

• Deletion Anomalies

http://www.knowledgegate.in/gate
• Insertion anomalies: An independent piece of information cannot be recorded
into a relation unless an irrelevant information must be inserted together at the
same time.

Roll no name Age Br_code Br_name Br_hod_name


1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr

http://www.knowledgegate.in/gate
• Modification anomalies: The update of a piece of information must occur at
multiple locations.
Roll no name Age Br_code Br_name Br_hod_name
1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr

http://www.knowledgegate.in/gate
• Deletion Anomalies: The deletion of a piece of information unintentionally
removes other information.

Roll no name Age Br_code Br_name Br_hod_name


1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr

http://www.knowledgegate.in/gate
Roll no name Age Br_code Br_name Br_hod_name
1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr
PK FK PK

Roll no name Age Br_code Br_code Br_name Br_hod_name


1 A 19 101 101 Cs Abc
2 B 18 101 102 ec Pqr
3 C 20 101
4 D 20 102

http://www.knowledgegate.in/gate
Conclusion
• Like every paragraph must have a single idea similarly every table must have a single idea and
if a table contains more than one idea then that table must be decomposed until each table
contains only one idea.
• We need some tools to approach this decomposition or normalization on large database
which contains a number of table, and that tool is functional dependencies.

http://www.knowledgegate.in/gate
Purpose of Normalization
• With out normalization data base system may be inaccurate, slow and inefficient and they
might not produce the data we expect. Normalization may be simply defined as refinement
process.
• Which includes creating tables and establishing relationships between those tables according
to rules designed both to protect data and make the database more flexible by eliminating two
factors;
• Redundancy
• Inconsistent Dependency

http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
Br_code Br_hod_name

Br_code Br_hod_name

http://www.knowledgegate.in/gate
Functional Dependency कोई बताता नह)ं, इसक- feel आ जाती है

http://www.knowledgegate.in/gate
FUNCTIONAL DEPENDENCY
• A formal tool for analysis of relational schemas.
X Y Z
• In a Relation R, if ‘α’ ⊑ R AND ‘β’ ⊑ R, then attribute or a 1 4 2
Set of attribute ‘α’ Functionally derives an attribute or 1 4 3
set of attributes ‘β’, iff each ‘α’ value is associated with
precisely one ‘β’ value.
2 6 3
3 2 2
• For all pairs of tuples t1 and t2 in R such that
• If T1[α] = T2[α]
• Then, T1[β] = T2[β]

http://www.knowledgegate.in/gate
Q Which of the following functional dependencies are satisfied by the
instance?
(A) XY à Z and Z à Y
X Y Z
(B) YZ à X and Y à Z 1 4 2
1 5 3
(C) YZ à X and X à Z
1 6 3
3 2 2
(D) XZ à Y and Y à X

http://www.knowledgegate.in/gate
• Shortcut Steps to find weather a FD from α à β can be concluded on a given
instance or not

1. Find Weather all values of α are different or not, if yes then FD valid

2. Find Weather all values of β are same or not, if yes then FD valid

3. जब कुछ भी काम ना करे , Try to find two same values of α on which we get
different values of β

http://www.knowledgegate.in/gate
Q Consider the following relation instance, which of the following
dependency doesn’t hold

A) A à b A B C

1 2 3
B) BC à A
4 2 3

C) B à C 5 3 3

D) AC à B
http://www.knowledgegate.in/gate
Q Which of the following dependency doesn’t hold good?

A) A à BC A B C D E
a 2 3 4 5
B) DE à C 2 a 3 4 5
a 2 3 6 5

C) C à DE a 2 3 6 6

D) BC à http://www.knowledgegate.in/gate
A
• Trivial Functional dependency - If β is a subset of α, then the functional
dependency α à β will always hold.
X Y Z
िजसका होना न होना बराबर हो 1 4 2
1 4 3
2 6 3
3 2 2

http://www.knowledgegate.in/gate
1. α - Determinant (Determines β value).

2. β - Dependent (Dependent on α).

3. If k à R, the K is a super key of R

4. Note: A functional dependency is a property of the relation schema R, not of a particular


legal relation state/instance r of R.

X Y Z
1 4 2
1 4 3
2 6 3
3 2 2

http://www.knowledgegate.in/gate
Q Consider the relation X(P, Q, R, S, T, U) with the following set of functional dependencies
F={
{P, R} → {S,T},
{P, S, U} → {Q, R}
}
Which of the following is the trivial functional dependency in F+ is closure of F?

(a) {P, R} → {S, T}

(b) {P, R} → {R, T}

(c) {P, S} → {S}

(d) {P, S, U} → {Q}

http://www.knowledgegate.in/gate
Q Which of the following dependencies are satisfied by the relation
instance?
A B C
AàB 1 1 4
1 2 4
BàC
2 1 3
BàA 2 2 3

CàB 2 4 3

CàA

AàC http://www.knowledgegate.in/gate
ATTRIBUTES CLOSURE/CLOSURE ON ATTRIBUTE SET/ CLOSURE SET OF ATTRIBUTES

• Attribute closure of an attribute set can be defined as set of attributes which can
be functionally determined from F.
• DENOTED BY F+

• Set of all attributes Functionally determined by X either directly from FD’S or


logically derived.

http://www.knowledgegate.in/gate
DIRECT METHOD

E.g. In a Relation R (A, B, C, D), with set of functional dependencies as-


{
A→B
B→C
AB → D
}
A+ =

http://www.knowledgegate.in/gate
Q R(ABCDEFG)

AàB

BC à DE

AEG à G

(AC)+ =?

http://www.knowledgegate.in/gate
Q R(ABCDE)
A à BC

CD à E

BàD

EàA

(B)+ =
http://www.knowledgegate.in/gate
Q R(ABCDEF)

AB à C

BC à AD

DàE

CF à B

(AB)+ =
http://www.knowledgegate.in/gate
Q R(ABCDEFGH)
A à BC

CD à E

EàC

D à AEH

ABH à BD

DH à BC

(BCD)+ =
http://www.knowledgegate.in/gate
Q Suppose the following functional dependencies hold on a relation U with attributes P,Q,R,S,
and T:

P → QR
RS → T

Which of the following functional dependencies can be inferred from the above functional
dependencies?

(A) PS → T

(B) R → T

(C) P → R

(D) PS → Q

http://www.knowledgegate.in/gate
ARMSTRONG’S AXIOMS

1. An axiom or postulate is a statement that is taken to be true, to serve as


a premise or starting point for further reasoning and arguments.

2. Armstrong's axioms are a set of axioms (or, more precisely, inference


rules) used to infer all the functional dependencies on a relational
database.

3. They were developed by William W. Armstrong in his 1974 paper.

4. The axioms are sound in generating only functional dependencies in


the closure of a set of functional dependencies (denoted as F+) when
applied to that set (denoted as F).

http://www.knowledgegate.in/gate
Armstrong Axioms

• Reflexivity: If Y is a subset of X, then X → Y

• Augmentation: If X → Y, then XZ → YZ

• Transitivity: If X → Y and Y → Z, then X → Z

http://www.knowledgegate.in/gate
From these rules, we can derive these secondary rules-

• Union: If X → Y and X → Z, then X → YZ

• Decomposition: If X → YZ, then X → Y and X → Z

• Pseudo transitivity: If X → Y and WY → Z, then WX → Z

• Composition: If X → Y and Z → W, then XZ → YW

http://www.knowledgegate.in/gate
Why Armstrong axioms refers to the Sound and Complete

• By sound, we mean that given a set of functional dependencies F specified on a relation


schema R, any dependency that we can infer from F by using the primary rules of Armstrong
axioms holds in every relation state r of R that satisfies the dependencies in F.

• By complete, we mean that using primary rules of Armstrong axioms repeatedly to infer
dependencies until no more dependencies can be inferred results in the complete set of all
possible dependencies that can be inferred from F.

http://www.knowledgegate.in/gate
APPLICATION OF ATTRIBUTE CLOSURE

Equivalence of Two FD sets-

Two FD sets F1 and F2 are equivalent if –

F1+ = F2+

Or

F1 ⊑ F2+ and F2 ⊑ F1+

Race एक जगह से श,
ु होने ज,र0 है
http://www.knowledgegate.in/gate
Q Consider the following set of fd R(ACDEH)

F G
AàC A à CD

AC à D
E à AH
E à AD

EàH

http://www.knowledgegate.in/gate
Q Consider the following set of fd R(ABCDE)

F G
B à CD B à CDE

AD à E A à BC

BàA AD à E

http://www.knowledgegate.in/gate
Q consider the following relation R(PQRS)

F G
PàQ P à QR

QàR
RàS
RàS

http://www.knowledgegate.in/gate
To find the MINIMAL COVER /CANONICAL COVER/IRREDUCIBLE SET

• Minimal cover- It means to eliminate any kind of redundancy from a FD set.

• A canonical cover of a set of functional dependencies, F is a simplified set of functional dependencies that has
the same closure as the original set F.

• There may be any following type of redundancy in the set of functional dependencies: -
• Complete production may be Redundant.

• One or more than one attributes may be redundant on right hand side of a production.

• One or more than one attributes may be redundant on Left hand side of a production.

http://www.knowledgegate.in/gate
Gate aspirant के हाथ& से minimize होने के बाद

http://www.knowledgegate.in/gate
Procedure to find MINIMAL COVER
• Use decomposition rule wherever applicable so that RHS of a production/FD contains only
single attribute.

• Remove extraneous attribute on LHS of a production by finding the closure for every possible
subset, if in any case the closure is same it means remaining attributes are redundant.

• For every production find the closure value of LHS of production keeping the production in a
set, and next time ignoring the production to be in a set. If both closure set matches, it means
the production is redundant.

http://www.knowledgegate.in/gate
Q R(ABCD)

AàB

CàB

D à ABC

AC à D

http://www.knowledgegate.in/gate
Q The following functional dependencies hold true for the relational schema {V, W, X, Y, Z}:
V→W
VW → X
Y → VX
Y→Z
Which of the following is irreducible equivalent for this set of functional dependencies?

(a) (b) (c) (d)


V→W V→W V→W V→W
V→X W→X V→X W→X
Y→V Y→V Y→V Y→V
Y→Z Y→Z Y→X Y→X
Y→Z Y→Z

http://www.knowledgegate.in/gate
Q R(VWXYZ)
P Q
VàW VàW

VW à X VàX

Y à VX YàV

YàZ YàZ

http://www.knowledgegate.in/gate
Key

http://www.knowledgegate.in/gate
Super key
• Set of attributes using which we can identify each tuple uniquely is called Super key, i.e. the
set of attributes used to differentiate each tuple of a relation.

• Let X be a set of attributes in a Relation R , if X+(Closure of X) determines all attributes of R


then X is said to be Super key of R .

• There should be at least one Super key with Not Null constraint.

http://www.knowledgegate.in/gate
Q Consider a relation R(A, B, C, D, E) with the following three functional
dependencies.
𝐴𝐵 → 𝐶; 𝐵𝐶 → 𝐷; 𝐶 → 𝐸;
The number of super keys in the relation R is___________.

A B C D E

http://www.knowledgegate.in/gate
Q The maximum number of super keys for the relation schema
R(E, F, G, H) with E as the key is
a) 5
E F G H
b) 6

c) 7

d) 8

http://www.knowledgegate.in/gate
Candidate key
1. Minimum set of attributes that differentiates the tuple of a Relation. No proper
subset as super key Also called as MINIMAL SUPER KEY.
2. There should be at least one candidate key with Not Null constraint.
3. Prime attribute - Attributes that are member of candidate Keys are called Prime
attributes.

http://www.knowledgegate.in/gate
Primary key
1. One of the candidate keys is selected by database administrator as a Primary
Key.

2. Primary Key attribute are not allowed to have Null values.

3. At most one Primary Key per table is allowed in RDMS.

4. Candidate key which are not chosen as primary key is alternate key.

http://www.knowledgegate.in/gate
Q Which of the following is NOT a superkey in a relational schema with
attributes V, W, X, Y, Z and primary key VY?
(a) VXYZ

(b) VWXZ

(c) VWXY

(d) VWXYZ

http://www.knowledgegate.in/gate
Roll no name Age Br_code Br_name Br_hod_name
1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr
PK FK PK

Roll no name Age Br_code Br_code Br_name Br_hod_name


1 A 19 101 101 Cs Abc
2 B 18 101 102 ec Pqr
3 C 20 101
4 D 20 102

http://www.knowledgegate.in/gate
PK FK
Roll no CR
1 1
2 2
3 1
4 2
5 1
6 2
7 1
8 2
9 1
10 2
11 1
12 2
13 1
14 2
15 null
http://www.knowledgegate.in/gate
Foreign Keys
• A foreign key is a column or group of columns in a relational database table that refers the
primary key of the same table or some other table to represent relationship.
• The concept of referential integrity is derived from foreign key theory.

http://www.knowledgegate.in/gate
Q Consider the following statements S1 and S2 about the relational data model:
● S1: A relation scheme can have at most one foreign key.
● S2: A foreign key in a relation scheme R cannot be used to refer to tuples of
R.
Which one of the following choices is correct?

(a) Both S1 and S2 are true

(b) S1 is true and S2 is false

(c) S1 is false and S2 is true

(d) Both S1 and S2 are false

http://www.knowledgegate.in/gate
Q The following table has two attributes A and C where A is the primary key and C is the
foreign key referencing A with on-delete cascade.
The set of all tuples that must be additionally deleted to preserve referential integrity
when the tuple (2,4) is deleted is:
a) (3,4) and (6,4)

b) (5,2) and (7,2)

c) (5,2),(7,2) and (9,5)

d) (3,4),(4,3) and (6,4)

http://www.knowledgegate.in/gate
Q Consider the following tables T1 and T2. In table T1, P is the primary
key and Q is the foreign key referencing R in table T2 with on-delete
cascade and on-update cascade. In table T2, R is the primary key and
S is the foreign key referencing P in table T1 with on-delete set NULL
and on-update cascade. In order to delete record 〈3,8〉 from table T1,
the number of additional records that need to be deleted from table
T1 is _________.

http://www.knowledgegate.in/gate
• Composite key – Composite key is a key composed of more than one
column sometimes it is also known as concatenated key.

• Secondary key – Secondary key is a key used to speed up the search


and retrieval contrary to primary key, a secondary key does not
necessary contain unique values.

http://www.knowledgegate.in/gate
Q R(ABCD)

AàB A B C D
BàC

CàA

http://www.knowledgegate.in/gate
Q R(ABCD)

AB à CD A B C D

DàA

http://www.knowledgegate.in/gate
Q R(ABC)

AB à C A B C
CàA

http://www.knowledgegate.in/gate
Q R(ABCDEFGHIJ)

AB à C A B C D E F G H I J
A à DE

BàF

F à GH

D à IJ

http://www.knowledgegate.in/gate
Q R(ABCDEFGHIJ)

AB à C
A B C D E F G H I J
AD à GH

BD à EF

AàI

HàJ

http://www.knowledgegate.in/gate
Q R(ABCDEFGH)

A à BC A B C D E F G H

ABE à CDGH

C à GD

DàG

EàF
http://www.knowledgegate.in/gate
Q R(ABCDE)

AàB A B C D E

BC à E

DE à A

http://www.knowledgegate.in/gate
Q R(ABCDE)

AàB A B C D E

BC à E

DE à A

http://www.knowledgegate.in/gate
Q R(ABCDE)

AB à CD
A B C D E

DàA

BC à DE

http://www.knowledgegate.in/gate
Q R(ABCDEF)

AB à C A B C D E F

DC à AE

EàF

http://www.knowledgegate.in/gate
Q R(ABCDEF)

AB à C A B C D E F

CàD

D à BE

EàF

FàA
http://www.knowledgegate.in/gate
Q R(WXYZ)

ZàW W X Y Z
Y à XZ

XW à Y

http://www.knowledgegate.in/gate
Q R(VWXYZ)

ZàY V W X Y Z

YàZ

X à YV

VW à X

http://www.knowledgegate.in/gate
Q R(ABCDE)

A à BC A B C D E

CD à E

BàD

EàA

http://www.knowledgegate.in/gate
Q Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the set of functional
dependencies {{E, F} à {G}, {F} à {I, J}, {E, H} à {K, L}, K à {M}, L à {N} on R.
What is the key for R?

A) {E, F} E F G H I J K L M N
B) {E, F, H}

C) {E, F, H, K, L}

D) {E}

http://www.knowledgegate.in/gate
Q Given the STUDENTS relation as shown below.

For (StudentName, StudentAge) to be the key for this instance, the value X should
not be equal to______________

http://www.knowledgegate.in/gate
Q Relation R has eight attributes ABCDEFGH. Fields of R contain only atomic values. F = {CH à G,
A à BC, B à CFH, E à A, F à EG} is a set of functional dependencies (FDs) so that F+ is exactly
the set of FDs that hold for R. How many candidate keys does the relation R have?
(A) 3

(B) 4
A B C D E F G H

(C) 5

(D) 6

http://www.knowledgegate.in/gate
Q Consider a relation scheme R = (A, B, C, D, E, H) on which the following
functional dependencies hold: {A à B, BC à D, E à C, D à A}. What are the
candidate keys of R?

(A) AE, BE

A B C D E H
(B) AE, BE, DE

(C) AEH, BEH, BCH

(D) AEH, BEH, DEH

http://www.knowledgegate.in/gate
Video chapters
• Chapter-1 (Transactions and concurrency control)
• Chapter-2 (ER-model)
• Chapter-3 (Relational model, Functional Dependencies, Keys)
• Chapter-4 (Normalization)
• Chapter-5 (File organization, indexing (e.g., B and B+ trees)
• Chapter-6 (Relational algebra, tuple calculus, SQL)
http://www.knowledgegate.in/gate
• Normalization of data (Decomposition of Relation) can be considered a process of analyzing the given
relation schema to achieve the desirable properties of minimizing redundancy using Decomposition.
• The tool we use for normalization is functional dependencies and candidate keys.
• Functional dependency can be used only to normalize up to BCNF.
• A series of normal form tests that can be carried out on individual relation schemas so that the
relational database can be normalized to any desired degree.
• 1NF>>>2NF>>3NF>>BCNF

http://www.knowledgegate.in/gate
FIRST NORMAL FORM
• A Relation table is said to be in first normal form iff each attribute in each cell have single
value(atomic). Means a Relation should not contain any multivalued or composite attributes.

• Other implications of first normal form


• Every row should be unique, that is no two rows should have the same values of all the
attributes.
• There must be a primary key.
• Every column should have a unique name
• Order of row and column is irrelevant

http://www.knowledgegate.in/gate
Prime attribute: - A attribute is said to be prime if it is part of any of the candidate key

Non-Prime attribute: - A attribute is said to be non-prime if it is not part of any of the candidate
key

Eg R(ABCD)
ABàCD
Here candidate key is AB so, A and B are prime attribute, C and D are non-prime attributes.

http://www.knowledgegate.in/gate
PARTIAL DEPENDENCY- When a non – prime attribute is dependent only on a part (Proper
subset) of candidate key then it is called partial dependency. (PRIME > NON-PRIME)

TOTAL DEPENDENCY- When a non – prime attribute is dependent on the entire candidate key
then it is called total dependency.

e.g. R(ABCD) ABàD, AàC

http://www.knowledgegate.in/gate
SECOND NORMAL FORM
• Relation R is in 2NF if,
• R should be in 1 NF.
• R should not contain any Partial dependency. (that is every non-prime
attribute should be fully dependent upon candidate key)

http://www.knowledgegate.in/gate
Q R(A, B, C) BàC
A B C A B B C
a 1 X A 1 1 X
b 2 Y B 2 2 Y
a 3 Z A 3 3 z
C 3 Z C 3
D 3 Z D 3
E 3 Z E 3

http://www.knowledgegate.in/gate
TRANSITIVE DEPENDENCY – A functional dependency from non-Prime attribute to non-Prime
attribute is called transitive

E.g.- R(A, B, C, D) with A as a candidate key


AàB
BàC [ transitive dependency]
CàD [transitive dependency]

http://www.knowledgegate.in/gate
THIRD NORMAL FORM

• Let R be the relational schema, it is said to be in 3 NF


• R should be in 2NF
• It must not contain any transitive dependency

http://www.knowledgegate.in/gate
THIRD NORMAL FORM DIRECT DEFINATION

• A relational schema R is said to be 3 NF if every functional dependency


in R from α-->β, either α is super key or β is the prime attribute

http://www.knowledgegate.in/gate
A B C A B B C
A 1 P A 1 1 P
B 2 Q B 2 2 Q
C 2 Q C 2 3 R
D 2 Q D 2 4 S
E 3 R E 3
F 3 R F 3
G 4 S G 4

http://www.knowledgegate.in/gate
R(A, B, C) A B C A B C B
AB à C A C B A B B C
B B C B B C B
CàB B A D B A D A
A A E A A E A
C C B C C
D C B D C
E C B e C
F C B f c

http://www.knowledgegate.in/gate
BCNF (BOYCE CODD NORMAL FORM)
A relational schema R is said to be BCNF if every functional dependency in R from
α-->β

α must be a super key

E.g.- R (A, B, C, D)
{
AB->C [No violation of 2NF,3NF, BCNF]
C->D [violation of BCNF, C not a candidate/super key]
D->A [violation of BCNF, D not a candidate/super key]
} Candidate key= {AB}, {DB}, {CB}

http://www.knowledgegate.in/gate
Some important note points on Normalization:
• If a relation R does not contain any non- trivial dependency, then R Is in BCNF.
• A Relation with two attributes is always in BCNF.
• A relation schema R consist of only simple candidate key then, R is always in 2NF but may or
may not be in 3NF or BCNF.
• A Relation schema R consist of only prime attributes then R is always in 3NF, but may or may
not be in BCNF.
• A relation schema R in 3NF and with only simple candidate keys, then R surely in BCNF.

http://www.knowledgegate.in/gate
Q R(ABC) (AB, BC)

AB à C A B C

CàA

http://www.knowledgegate.in/gate
Q R(ABCD)(AB)

AB à C A B C D

BàD

http://www.knowledgegate.in/gate
Q R(ABCDE)(CE)

CE à D A B C D E

DàB

CàA

http://www.knowledgegate.in/gate
Q R(ABCDE)(ACD, BCD, CDE)

AàB A B C D E

BC à E

DE à A

http://www.knowledgegate.in/gate
Q R(ABCD)(AB, AD, BC, CD)

AB à CD A B C D

CàA

DàB

http://www.knowledgegate.in/gate
Q R(ABCDE)(AB)

AB à C A B C D E

BàD

DàE

http://www.knowledgegate.in/gate
Q R(ABCDEFGH)(AE)

A à BC A B C D E F G H

ABE à CDGH

C à GD

DàG

EàF

http://www.knowledgegate.in/gate
Q R(WXYZ)(Y, XW, XZ)

ZàW W X Y Z

Y à XZ

XW à Y

http://www.knowledgegate.in/gate
Q R(ABCDEF)(ABC, ACD)

ABC à D A B C D E F

ABD à E

CD à F

CDF à B

http://www.knowledgegate.in/gate
Q R(ABCDE)(AB)
A B C D E
AB à C

BàD

DàE

http://www.knowledgegate.in/gate
Q R(ABCDE)(A, E, BC, CD)

A à BC A B C D E

CD à E

BàD

EàA

http://www.knowledgegate.in/gate
Q (ABCDE)(ACD, BCD, CDE)

AàB A B C D E

BC à E

DE à A

http://www.knowledgegate.in/gate
Q R(ABCDE)(ABD)
A B C D E

BD à E

AàC

http://www.knowledgegate.in/gate
Q R(ABCDEF) (A, BC, DEF)

A à BCDEF
A B C D E F
BC à ADEF

DEF à ABC

http://www.knowledgegate.in/gate
Q R(ABCDE)(ac)
A B C D E
AàB

BàE

CàD

http://www.knowledgegate.in/gate
Q R(ABCDE)(AB, BC, BD)

AB à CD A B C D E

DàA

BC à DE

http://www.knowledgegate.in/gate
Q R(ABCD)(AB, BD)

AB à CD A B C D

DàA

http://www.knowledgegate.in/gate
Q R(ABCDEF)(BF)

AB à C A B C D E F

CàD

B à AE

http://www.knowledgegate.in/gate
Q R(ABCDEFGHIJ)(AB)

AB à C A B C D E F G H I J

A à DE

BàF

F à GH

D à IJ
http://www.knowledgegate.in/gate
Q R(ABCDE)(BC, CD)

BC à ADE A B C D E

DàB

http://www.knowledgegate.in/gate
Q R(ABCD)(AD, BD, CD)

AàB A B C D

BàC

CàA

http://www.knowledgegate.in/gate
Q R(ABCDEF)(ABD, BCD)

AB à C A B C D E F

DC à AE

EàF

http://www.knowledgegate.in/gate
Q R(VWXYZ)(VW, XW)

ZàY V W X Y Z

YàZ

X à YV

VW à X
http://www.knowledgegate.in/gate
Q R(ABCDE)(AC)

AàB A B C D E

BàE

CàD

http://www.knowledgegate.in/gate
Q R(ABCDEF)(C, D, AB, BE, BF)

AB à C
A B C D E F
CàD

D>BE

E>F

F>A

http://www.knowledgegate.in/gate
Q Consider the following four relational schemas. For each schema, all non-trivial functional dependencies are listed. The underlined attributes
are the respective primary keys.

Schema I: Registration (rollno, courses)


Field ‘courses’ is a set-valued attribute containing the set of
courses a student has registered for.
Non-trivial functional dependency
rollno → courses

Schema II: Registration (rollno, coursid, email)


Non-trivial functional dependencies:
rollno, courseid → email
email → rollno

Schema III: Registration (rollno, courseid, marks, grade)


Non-trivial functional dependencies:
rollno, courseid, → marks, grade
marks → grade

Schema IV: Registration (rollno, courseid, credit)


Non-trivial functional dependencies:
rollno, courseid → credit
courseid → credit
Which one of the relational schemas above is in 3NF but not in BCNF?
(a) Schema 1 (b) Schema 2

http://www.knowledgegate.in/gate
(c) Schema 3 (d) Schema 4
Q Which one of the following statements if FALSE?
a) Any relation with two attributes is in BCNF

b) A relation in which every key has only one attribute is in 2NF

c) A prime attribute can be transitively dependent on a key in a 3NF relation

d) A prime attribute can be transitively dependent on a key in a BCNF relation

http://www.knowledgegate.in/gate
Multivalued Dependency
• Multivalued dependencies Denoted by, A àà B, Means, for every value of A, there may exist
more than one value of B.

• A trivial multivalued dependency XààY is one where either Y is a subset of X,


or X and Y together form the whole set of attributes of the relation.

• If there is functional dependency from A à B, then there will also a multivalued functional
dependency from A->->B.

http://www.knowledgegate.in/gate
E.g. let the constraint specified by MVD in relation Student as
S_name à à Club_name
S_name à à P_no

S_Name Club_name S_Name P_no


Kamesh Dance Kamesh 123
Kamesh Guitar Kamesh 789

S_Name Club_name P_no


Kamesh Dance 123
Kamesh Guitar 123
Kamesh Dance 789
Kamesh Guitar 789

NOTE: The above Student schema is in BCNF as no functional dependency holds on EMP, but still redundancy due to MVD.

http://www.knowledgegate.in/gate
• Each row indicates that a given restaurant can deliver a Restaurant Delivery Permutations
Restaurant Variety Delivery Area
given variety. The table has no non-key attributes Chatora Sweets Samosa Hatibagan Market
because its only key is {Restaurant, Variety, Delivery Chatora Sweets Samosa Chandni Chowk
Area}. Therefore, it meets all normal forms up to BCNF. Chatora Sweets Samosa Koramangala
Chatora Sweets Dosa Hatibagan Market
• If we assume, however, that Variety offered by a Chatora Sweets Dosa Chandni Chowk
restaurant are not affected by delivery area (i.e. a Chatora Sweets Dosa Koramangala
Moolchand Ladoo Koramangala
restaurant offers all Variety it makes to all areas it Moolchand Dosa Koramangala
supplies), then it does not meet 4NF. The problem is that Thaggu Samosa Hatibagan Market
the table features two non-trivial multivalued Thaggu Samosa Chandni Chowk
Thaggu Ladoo Hatibagan Market
dependencies on the {Restaurant} attribute (which is not
Thaggu Ladoo Chandni Chowk
a super key). The dependencies are:
o {Restaurant} àà {Variety}

o {Restaurant} àà {Delivery Area}

http://www.knowledgegate.in/gate
• These non-trivial multivalued dependencies on a non- Restaurant Delivery Permutations
Restaurant Variety Delivery Area
superkey reflect the fact that the varieties a restaurant
Chatora Sweets Samosa Hatibagan Market
offers are independent from the areas to which the Chatora Sweets Samosa Chandni Chowk
restaurant delivers. This state of affairs leads Chatora Sweets Samosa Koramangala
to redundancy in the table: Chatora Sweets Dosa Hatibagan Market
Chatora Sweets Dosa Chandni Chowk
• for example, we are told three times that Chatora Chatora Sweets Dosa Koramangala
Sweets offers Dosa, and if Chatora Sweets starts Moolchand Ladoo Koramangala
producing Momo then we will need to add multiple Moolchand Dosa Koramangala
rows, one for each of Chatora Sweets's delivery areas. Thaggu Samosa Hatibagan Market
Thaggu Samosa Chandni Chowk
Thaggu Ladoo Hatibagan Market
Thaggu Ladoo Chandni Chowk

http://www.knowledgegate.in/gate
• If we have two or more multivalued independent attributes in the same relation schema, we
get into a problem of having to repeat every value of one of the attributes with every value of
the other attribute to keep the relation state consistent and to maintain the independence
among the attributes involved. This constraint is specified by a multivalued dependency.
Delivery Areas By Restaurant Varieties By Restaurant
Restaurant Delivery Area Restaurant Pizza Variety
Chatora Sweets Hatibagan Market Chatora Sweets Samosa
Chatora Sweets Chandni Chowk Chatora Sweets Dosa
Chatora Sweets Koramangala Moolchand Ladoo
Moolchand Koramangala Moolchand Dosa
Thaggu Hatibagan Market Thaggu Samosa
Thaggu Chandni Chowk Thaggu Ladoo

http://www.knowledgegate.in/gate
• A relation is in 4NF iff
• It is in BCNF
• There must not exist any non-trivial multivalued dependency.

• Each MVD is decomposed in separate table, where it becomes trivial MVD.

http://www.knowledgegate.in/gate
• A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically
stops short of 4NF, perhaps because of a belief that tables violating 4NF (but meeting all lower
normal forms) are rarely encountered in business applications.

• This belief may not be accurate, however. Wu reports that in a study of forty organizational
databases, over 20% contained one or more tables that violated 4NF while meeting all lower
normal forms.

http://www.knowledgegate.in/gate
Lossy/Lossless-Dependency Preserving Decomposition

• Because of a normalization a table is Decomposed into two or more tables, but


during this decomposition we must ensure satisfaction of some properties out of
which the most important is lossless join property / decomposition.

http://www.knowledgegate.in/gate
• if we decompose a table r into two tables r1 and r2 because of normalization then at some later
stage if we want to join(combine) (natural join) these tables r1 and r2, then we must get back
the original table r, without any extra or less tuple. But some information may be lost during
retrieval of original relation or table. For e.g.
r
A B C
1 a p
2 b q
3 a r

r1 r2
A B B C

http://www.knowledgegate.in/gate
r
A B C
1 a p
2 b q
3 a r
r1 r2
A B B C
1 a a p
2 b b q
3 a A B C a r

http://www.knowledgegate.in/gate
• Decomposition is lossy if R1 ⋈ R2 ⊃ R

• Decomposition is lossy if R ⊃ R1 ⋈ R2

http://www.knowledgegate.in/gate
• Decomposition is lossless if R1 ⋈ R2 = R "The decomposition of relation R into R1 and R2
is lossless when the join of R1 and R2 yield the same relation as in R." which guarantees
that the spurious (extra or less) tuple generation problem does not occur with respect
to the relation schemas created after decomposition.

• This property is extremely critical and must be achieved at any cost.

• lossless Decomposition / NonAdditive Join Decomposition

http://www.knowledgegate.in/gate
A B C D E
A 122 1 W A
E 236 4 X B
A 199 1 Y C
B 213 2 Z D

http://www.knowledgegate.in/gate
How to check for lossless join decomposition using FD set, following conditions must hold:

• Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be


either in R1 or in R2. Att(R1) U Att(R2) = Att(R)

• Intersection of Attributes of R1 and R2 must not be NULL. Att(R1) ∩ Att(R2) ≠ Φ

• Common attribute must be a key for at least one relation (R1 or R2)
• Att(R1) ∩ Att(R2) à (R1) or Att(R1) ∩ Att(R2) à (R2)

http://www.knowledgegate.in/gate
Dependency Preserving Decomposition

• Let relation R be decomposed into Relations R1, R2, R3…………. RN with their respective
functional Dependencies set as F1, F2, F3…………. FN, then the Decomposition is Dependency
Preserving iff

• {F1 ∪ F2 ∪ F3 ∪ F4………. ∪ FN }+ = F+

http://www.knowledgegate.in/gate
• Dependency preservation property, although desirable, is sometimes sacrificed.
Q R (A, B, C)

AàB, BàC, CàA

R1(A, B) AND R2(B, C)

http://www.knowledgegate.in/gate
Q R(ABCDEG)(NF) R1(ABC) R2(ABDE) R3(EG)

AB à C

AC à B

AD à E

BàD

BC à A

EàG

http://www.knowledgegate.in/gate
Q Consider the relation R (ABCD) with the FD set F = {A à BC, B à CD, Cà AD}
which is decomposed into set of tables D = {(AB), (BC), CD}. Which of the following
is true about the decomposition D?
a) It is lossless and dependency preserving

b) It is lossy but dependency preserving

c) It is lossless but dependencies are nor preserved

d) It is neither lossless nor dependency preserving

http://www.knowledgegate.in/gate
Q Let the set of functional dependencies F = {QR → S, R → P, S → Q} hold on a relation schema X
= (PQRS). X is not in BCNF. Suppose X is decomposed into two schemas Y and Z, where Y = (PR)
and Z = (QRS).
Consider the two statements given below.

I. Both Y and Z are in BCNF

II. Decomposition of X into Y and Z is dependency preserving and lossless

Which of the above statements is/are correct? (GATE- 2019) (1 Marks)


(a) I only

(b) Neither I nor II

(c) II only

(d) Both I and II


http://www.knowledgegate.in/gate
Q Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D
Then the decomposition of R into R1(AB) and R2(CD) is (GATE-2001) (2 Marks)

(A) dependency preserving and lossless join

(B) lossless join but not dependency preserving

(C) dependency preserving but not lossless join

(D) not dependency preserving and not lossless join

http://www.knowledgegate.in/gate
Q Consider the relation R(P,Q,S,T,X,Y,Z,W) with the following functional dependencies.

Consider the decomposition of the relation R into the constituent relations according to the
following two decomposition schemes. Which one of the following options is correct? (GATE
2021) (2 MARKS)
a) D1 is a lossless decomposition, but D2 is a lossy decomposition
b) D1 is a lossy decomposition, but D2 is a lossless decomposition
c) Both D1 and D2 are lossless decompositions
d) Both D1 and D2 are lossy decompositions

http://www.knowledgegate.in/gate
5 NF/Project-Join Normal Form

• A Relational table R is said to be in 5th normal form if


• it is in 4 NF
• it cannot be further non-loss decomposed

http://www.knowledgegate.in/gate
Video chapters
• Chapter-1 (Transactions and concurrency control)
• Chapter-2 (ER-model)
• Chapter-3 (Relational model, Functional Dependencies, Keys)
• Chapter-4 (Normalization)
• Chapter-5 (File organization, indexing (e.g., B and B+ trees)
• Chapter-6 (Relational algebra, tuple calculus, SQL)
http://www.knowledgegate.in/gate
Indexing
• Theoretically relational database is derived from set theory, and in a set the order of elements
in a set is irrelevant, so does in relations(tables). But in practice implementation we have to
specify the order.

• A number of properties, like search, insertion and deletion will depend on the order in which
elements are stored in the tables.

• There are only two ways in which elements can be stored in a table ordered (Sorted) or
unordered (Unsorted).

http://www.knowledgegate.in/gate
File organization/ organization of records in a file
Ordered file organization

• All the records in the file are ordered on some search key field.
• Here binary search is possible. (give example of book page searching)
• Maintenance (insertion & deletion) is costly, as it requires reorganization of entire file.
• Notes that we will get binary search only if we are using that key for searching on which
indexing is done, otherwise it will behave as unsorted file

http://www.knowledgegate.in/gate
Unordered file organization
• All the records are inserted usually in the end of the file so not ordered according to any field,
Because of this only linear search is possible, searching is slow.

• Maintenance (insertion & deletion) is easy, as it does not require re organization of entire file.

http://www.knowledgegate.in/gate
• Reason for indexing - For a large file when it contains a large number of records which will
eventually acquire large number of blocks, then its access will become slow.
• Additional auxiliary access structure is called indexes, a data technique to efficiently retrieve
records from the database files based on some attributes on which the indexing has been
done. Indexing in database systems is similar to what we see in books.

http://www.knowledgegate.in/gate
• Index typically provides secondary access path, which provide alternative way to access the
records without affecting the physical placement of records in the main file.

• The size of index file is way smaller than that of the main file, as index file record contain only
two columns key (attribute in which searching is done) and block pointer (base address of the
block of main file which contains the record holding the key), while main file contains all the
columns.

http://www.knowledgegate.in/gate
• Index can be created on any field of relation (primary key, non-key)

• One index file is designed according to an attribute, means more than one index file can be
designed for a main file.

http://www.knowledgegate.in/gate
• Index file is always ordered, irrespective of weather main file is ordered or unordered. So that
we can take the advantage of binary search.

• Indexing gives the advantage of faster time, but space taken by index file will be an overhead.

http://www.knowledgegate.in/gate
• Number of access required to search the correct block of main file is log2(number of blocks in
index file) + 1

http://www.knowledgegate.in/gate
Indexing can be classified on number of criteria’s one of them could be
• Dense Index
• Sparse Index

http://www.knowledgegate.in/gate
DENSE INDEX

• In dense index, there is an entry in the index file for every search key value in the main file.
This makes searching faster but requires more space to store index records itself.

• Note that it is not for every record, it is for every search key value. Sometime number of
records in the main file > number of search keys in the main file, for example if search key is
repeated.

http://www.knowledgegate.in/gate
SPARSE INDEX

• If an index entry is created only for some records of the main file, then it is called sparse index.
• No. of index entries in the index file < No. of records in the main file.
• Note: - dense and sparse are not complementary to each other, sometimes it is possible that a
record is both dense and sparse.

http://www.knowledgegate.in/gate
Basic term used in Indexing
• BLOCKING FACTOR = No. of Records per block= ⌊block size/record size⌋

• No of blocks required by file = ⌈no of records / blocking factor⌉

http://www.knowledgegate.in/gate
• If file is unordered then no of block assesses required to reach correct block which contain
the desired record is O(n), where n is the number of blocks.

• if file is ordered then no of block assesses required to reach correct block which contain the
desired record is O(log2n), where n is the number of blocks.

http://www.knowledgegate.in/gate
TYPES OF INDEXING

Index

Single
Multilevel
level

Primary Clustering Secondary Simple


B tree B+ tree
Index Index index multilevel

• Single level index means we create index file for the main
file, and then stop the process.

• Multiple level index means, we further index the index file


and keep repeating the process until we get one block.
http://www.knowledgegate.in/gate
PRIMARY INDEXING
• Main file is always sorted according to primary key.

• Indexing is done on Primary Key, therefore called as primary indexing

• Index file have two columns, first primary key and second anchor pointer (base
address of block)

http://www.knowledgegate.in/gate
• It is an example of Sparse Indexing.

• Here first record (anchor record) of every block gets an entry in the index file

• No. of entries in the index file = No of blocks acquired by the main file.

http://www.knowledgegate.in/gate
Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block
Size B = 1024 B. File records are of fixed size and are unspanned with record length
R = 100 B. Suppose that ordering key field of file is 9 B long and a block pointer is 6
B long, Implement primary indexing?

http://www.knowledgegate.in/gate
CLUSTERED INDEXING
• Main file will be ordered on some non-key attributes
• No of entries in the index file = no of unique values of the attribute on which
indexing is done.
• It is the example of Sparse as well as dense indexing

http://www.knowledgegate.in/gate
Q An index is clustered, if (GATE-2013) (1 Marks)
(a) it is on a set of fields that form a candidate key

(b) it is on a set of fields that include the primary key

(c) the data records of the file are organized in the same order as the data entries of the
index

(d) the data records of the file are organized not in the same order as the data entries of
the index

http://www.knowledgegate.in/gate
Q A clustering index is defined on the fields which are of type (GATE-2008) (1
Marks)
a) non-key and ordering

b) non-key and non-ordering

c) key and ordering

d) key and non-ordering

http://www.knowledgegate.in/gate
SECONDARY INDEXING
• Most common scenarios, suppose that we already have a primary indexing on primary key,
but there is frequent query on some other attributes, so we may decide to have one more
index file with some other attribute.

• Main file is ordered according to the attribute on which indexing is done(unordered).

http://www.knowledgegate.in/gate
• Secondary indexing can be done on key or non-key attribute.

• No of entries in the index file is same as the number of entries in the main file.

• It is an example of dense indexing.

http://www.knowledgegate.in/gate
Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block Size B
= 1024 B. File records are of fixed size and are unspanned with record length R = 100 B.
Suppose that ordering key field of file is 9 B long and a block pointer is 6 B long,
Implement Secondary indexing?

http://www.knowledgegate.in/gate
Q A data file consisting of 1,50,000 student-records is stored on a hard disk with block size of 4096 bytes. The data
file is sorted on the primary key RollNo. The size of a record pointer for this disk is 7 bytes. Each student-record has a
candidate key attribute called ANum of size 12 bytes. Suppose an index file with records consisting of two fields,
ANum value and the record pointer the corresponding student record, is built and stored on the same disk. Assume
that the records of data file and index file are not split across disk blocks. The number of blocks in the index file is
________ .(GATE 2021) (1 MARKS)

http://www.knowledgegate.in/gate
MULTILEVEL INDEXING
• Multi-level Index helps in breaking down the index into several smaller indices in order to
make the outermost level so small that it can be saved in a single disk block-0, which can easily
be accommodated anywhere in the main memory.

http://www.knowledgegate.in/gate
Q Consider a file of 16384 records. Each record is 32 bytes long and its key field is of size 6 bytes.
The file is ordered on a non-key field, and the file organization is unpanned. The file is stored in a
file system with block size 1024 bytes, and the size of a block pointer is 10 bytes. If the secondary
index is built on the key field of the file, and a multi-level index scheme is used to store the
secondary index, the number of first-level and second-level blocks in the multi-level index are
respectively (GATE-2008) (1 Marks)
a) 8 and 0

b) 128 and 6

c) 256 and 4

d) 512 and 5

http://www.knowledgegate.in/gate
Reason to have B tree and B+ tree
• After studying indexing in detail now we understand that an index file is always sorted in
nature and will be searched frequently, and sometimes index files can be so large that even
we want to index the index file (Multilevel index), therefore we must search best data
structure to meet our requirements.
• There are number of options in data structure like array, stack, link list, graph, table etc. but
we want a data structure which support frequent insertion deletion, and modify it self
accordingly but at the same time also provide speed search and give us the advantage of
having a sorted data.

http://www.knowledgegate.in/gate
• If we look at the data structures option then tree seem to be the most appropriate but every
kind of tree in the available option have some problems either simple tree or binary search
tree or AVL tree, so we end up on designing new data structure called B-tree which are kind of
specially designed for sorted stored index files in databases.

• In general, with multilevel indexing, we require dynamic structure, b and b+ tree is generalized
implementation of multilevel indexing, which are dynamic in nature, that is increasing and
decreasing number of records. In the first level index file can be easily supported by other
level index.

• B tree and B+ tree also provides efficient search time, as the height of the structure is very less
and they are also perfectly balanced.

http://www.knowledgegate.in/gate
B tree
• A B-tree of order m if non-empty is an m-way search tree in which.
• The root has at least zero child nodes and at most m child nodes.
• The internal nodes except the root have at least celling(m/2) child nodes and at most m
child nodes.
• The number of keys in each internal node is one less than the number of child nodes and
these keys partition the subtrees of the nodes in a manner similar to that of m-way search
tree.
• All leaf nodes are on the same level(perfectly balanced).
Root Internal except Root Leaf
Rules MAX MIN
Rules MAX MIN Rules MAX MIN
CHILD m 0
CHILD m ⌈m/2⌉ CHILD 0 0
DATA m-1 1
DATA m-1 ⌈m/2⌉ - 1 DATA m-1 ⌈m/2⌉ - 1

http://www.knowledgegate.in/gate
Insertion in B-TREE

• A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once the root node is full with
m – 1 search key values and we attempt to insert another entry in the tree, the root node splits into two nodes at
level 1.

• Only the middle value is kept in the root node, and the rest of the values are split evenly between the other two
nodes. When a non-roof node is full and a new entry is inserted into it, that node is split into two nodes at the
same level, and the middle entry is moved to the parent node along with two pointers to the new split nodes.

• If the parent node is full, it is also split. Splitting can propagate all the way to the root node, creating a new level if
the root is split.

http://www.knowledgegate.in/gate
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 3, 4
insert them into an empty b-tree of order = 3.

http://www.knowledgegate.in/gate
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 4, 20, 18, 19,
17, 16, 15, 25, 23, 24 insert them into an empty b-tree of order = 5.

http://www.knowledgegate.in/gate
Analysis
• B- TREE- In computer science, a B-tree is a self-balancing tree data structure that maintains
sorted data and allows searches, sequential access, insertions, and deletions in logarithmic
time.
• A search tree of order p is a tree such that each node contains at most p -1 search values and
p pointers in the order <P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq>, where q <= p. Each Pi is a pointer to a
child node (or a NULL pointer), and each Ki is a search value from some ordered set of values.
All search values are assumed to be unique.

http://www.knowledgegate.in/gate
• Two constraints must hold at all times on the search tree:
• Within each node, K1 < K2 < ... < Kq-1.
• For all values X in the subtree pointed at by Pi, we have Ki-1 < X < Ki for 1 < i <
q; X < Ki for i = 1; and Ki-1 < X for i = q.

http://www.knowledgegate.in/gate
• We can use a search tree as a mechanism to search for records stored in a disk file. The values
in the tree can be the values of one of the fields of the file, called the search field (which is the
same as the index field if a multilevel index guides the search).

• Each key value in the tree is associated with a pointer to the record in the data file having that
value.

• To guarantee that nodes are evenly distributed, so that the depth of the tree is minimized for
the given set of keys and that the tree does not get skewed with some nodes being at very
deep levels.

http://www.knowledgegate.in/gate
• To make the search speed uniform, so that the average time to find any random key is roughly the
same

• While minimizing the number of levels in the tree is one goal, another implicit goal is to make sure that
the index tree does not need too much restructuring as records are inserted into and deleted from the
main file.

• Thus, we want the nodes to be as full as possible and do not want any nodes to be empty if there are
too many deletions. Record deletion may leave some nodes in the tree nearly empty, thus wasting
storage space and increasing the number of levels.

• The B-tree addresses both of these problems by specifying additional constraints on the search tree.

http://www.knowledgegate.in/gate
• The B-tree has additional constraints that ensure that the tree is always balanced and that the
space wasted by deletion, if any, never becomes excessive.

• The algorithms for insertion and deletion, though, become more complex in order to maintain
these constraints. Nonetheless, most insertions and deletions are simple processes; they
become complicated only under special circumstances—namely, whenever we attempt an
insertion into a node that is already full or a deletion from a node that makes it less than half
full.

• More formally, a B-tree of order p, when used as an access structure on a key field to search
for records in a data file, can be defined as follows:

http://www.knowledgegate.in/gate
• Each internal node in the B-tree is of the form
• <P1, <K1, Pr1>, P2, <K2, Pr2>, ..., <Kq–1, Prq–1>, Pq> where q <= p. Each Pi is a tree pointer—a pointer to
another node in the B- tree. Each Pri is a data pointer—a pointer to the record whose search key
field value is equal to Ki (or to the data file block containing that record).

• Within each node, K1 < K2 < ... < Kq-1.

• For all search key field values X in the subtree pointed at by Pi (the ith sub- tree), we have: Ki–1 < X <
Ki for 1 < i < q; X < Ki for i = 1; and Ki–1 < X for i = q.

http://www.knowledgegate.in/gate
• Each node has at most p tree pointers.

• Each node, except the root and leaf nodes, has at least ⎡(p/2)⎤ tree pointers. The root node has at least
two tree pointers unless it is the only node in the tree.

• A node with q tree pointers, q <= p, has q – 1 search key field values (and hence has q – 1 data
pointers).

• All leaf nodes are at the same level. Leaf nodes have the same structure as internal nodes except that
all of their tree pointers Pi are NULL.

http://www.knowledgegate.in/gate
Conclusion
• Very less internal fragmentation, memory utilization is very good.
• Less number of nodes(blocks) are used and height is also optimized, so access will be very fast.
• Difficulty of traversing the key sequentially. Means B-TREE do not hold good for range-based
queries of database.

http://www.knowledgegate.in/gate
B+ Tree
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 3, 4 insert them into an
empty b+ tree of order = 3.

http://www.knowledgegate.in/gate
B+ Tree
Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 4, 20, 18, 19, 17, 16, 15,
25, 23, 24, 22 insert them into an empty b+ tree of order = 5

http://www.knowledgegate.in/gate
Insertion in B+ Tree
• Start from root node and proceed towards leaf using the logic of binary search tree. Value is
inserted in the leaf.

• If overflow condition occurs pick the median and push it into the parent node. Also copy the
median or key inserted in parent node to the left or right child node.

• Repeat this procedure until tree is maintained.

http://www.knowledgegate.in/gate
Analysis
• Most implementations of a dynamic multilevel index use a variation of the B-tree data structure called a B+-tree.
In a B-tree, every value of the search field appears once at some level in the tree, along with a data pointer.

• In a B+-tree, data pointers are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes differs
from the structure of internal nodes.

• The leaf nodes have an entry for every value of the search field, along with a data pointer to the record (or to the
block that contains this record) if the search field is a key field.

http://www.knowledgegate.in/gate
• The leaf nodes of the B+-tree are usually linked to provide ordered access on the search field to the
records.

• Each internal node is of the form <P1, K1, P2, K2, ..., P q – 1, K q –1, Pq>

• Within each internal node, K1 < K2 < ... < Kq-1.

• For all search field values X in the subtree pointed at by Pi, we have Ki-1 < X <= Ki for 1 < i < q; X <= Ki for
i = 1; and Ki-1 < X for i = q.

http://www.knowledgegate.in/gate
• Each internal node has at most p tree pointers.

• Each internal node, except the root, has at least ⎡(p/2)⎤ tree pointers. The root node has at least two tree pointers if it is an
internal node.

• An internal node with q pointers, q <= p, has q - 1 search field values.

• The structure of the leaf nodes of a B+-tree of order p is as follows:

• Each leaf node is of the form <<K1, Pr1>, <K2, Pr2>, ..., <Kq–1, Prq–1>, Pnext> where q <= p, each Pri is a data pointer, and Pnext
points to the next leaf node of the B+-tree.

http://www.knowledgegate.in/gate
• Within each leaf node, K1 <= K2 ... , K q-1, q <= p.
• Each Pri is a data pointer that points to the record whose search field value is Ki or to a file block containing the
record (or to a block of record pointers that point to records whose search field value is Ki if the search field is not
a key).
• Each leaf node has at least ⎡(p/2)⎤ values.
• All leaf nodes are at the same level.

http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
a) 2 b) 3 c) 4 d) 5
(GATE-2009) (2 Marks)

http://www.knowledgegate.in/gate
Q In a B+ tree, if the search-key value is 8 bytes long, the block size is 512 bytes
and the block pointer is 2 bytes, then the maximum order of the B+ tree is. (GATE-
2017) (2 Marks)

http://www.knowledgegate.in/gate
Q Consider B+ tree in which the search key is 12 bytes long, block size is 1024
bytes, record pointer is 10 bytes long and block pointer is 8 bytes long. The
maximum number of keys that can be accommodated in each non-leaf node of the
tree is (Gate-2015) (2 Marks)

http://www.knowledgegate.in/gate
Q The order of a leaf node in a B+- tree is the maximum number of (value, data record pointer)
pairs it can hold. Given that the block size is 1K bytes, data record pointer is 7 bytes long, the
value field is 9 bytes long and a block pointer is 6 bytes long, what is the order of the leaf node?
(GATE-2007) (1 Marks)

a) 63

b) 64

c) 67

d) 68

http://www.knowledgegate.in/gate
Q Which one of the following statements is NOT correct about the B+ tree data structure used for
creating an index of a relational database table? (GATE-2019) (1 Marks)

(a) Each leaf node has a pointer to the next leaf node

(b) Non-leaf nodes have pointers to data records

(c) B+ Tree is a height-balanced tree

(d) Key values in each node are kept in sorted order

http://www.knowledgegate.in/gate
Q B+ Trees are considered BALANCED because (GATE-2016) (1 Marks)

a) the lengths of the paths from the root to all leaf nodes are all equal

b) the lengths of the paths from the root to all leaf nodes differ from each other by at
most 1

c) the number of children of any two non-leaf sibling nodes differ by at most 1

d) the number of records in any two leaf nodes differ by at most 1

http://www.knowledgegate.in/gate
Q (GATE-2015) (1 Marks)

http://www.knowledgegate.in/gate
B tree
• A B-tree of order m if non-empty is an m-way search tree in which.
• The root has at least zero child nodes and at most m child nodes.
• The internal nodes except the root have at least celling(m/2) child nodes and at most m
child nodes.
• The number of keys in each internal node is one less than the number of child nodes and
these keys partition the subtrees of the nodes in a manner similar to that of m-way search
tree.
• All leaf nodes are on the same level(perfectly balanced).

Root Internal except Root Leaf


Rules MAX MIN Rules MAX MIN Rules MAX MIN
CHILD m 0
CHILD m ⌈m/2⌉ CHILD 0 0
DATA m-1 1
DATA m-1 ⌈m/2⌉ - 1 DATA m-1 ⌈m/2⌉ - 1

http://www.knowledgegate.in/gate
Q Consider the Following B-tree of order m=6, delete the following
nodes H, T, R, E, A, C, S in sequence?

http://www.knowledgegate.in/gate
Deletion in B-TREE

• If the deletion is from the leaf node and leaf node is satisfying the minimal condition even after the
deletion, then delete the value directly.

• If deletion from leaf node renders leaf node in minimal condition, then first search the extra key in left
sibling and then in the right sibling. Largest value from left sibling or smallest value from right sibling is
pushed into the root node and corresponding value can be fetched from parent node to leaf node.

• If the deletion is to be from internal node, then first we check for the extra key in the left and then in
the right child. If we find one, we fetch the value in the required node. And delete the key.

http://www.knowledgegate.in/gate
• If deletion of a value causes a node to be less than half full, it is combined with its neighboring
nodes, and this can also propagate all the way to the root. Hence, deletion can reduce the
number of tree levels.

• It has been shown by analysis and simulation that, after numerous random insertions and
deletions on a B-tree, the nodes are approximately 69 percent full when the number of values
in the tree stabilizes. This is also true of B+ trees.

• If this happens, node splitting and combining will occur only rarely, so insertion and deletion
become quite efficient. If the number of values grows, the tree will expand without a
problem—although splitting of nodes may occur, so some insertions will take more time.

http://www.knowledgegate.in/gate
Q Consider the following elements 5, 8, 1, 7, 3, 12, 9, 6 insert them into
an empty b+ tree of order = 3. and then delete following nodes in
sequence 9, 8, 12?

http://www.knowledgegate.in/gate
Q Consider a B+-tree in which the maximum number of keys in a node is 5. What is
the minimum number of keys in any non-root node? (GATE-2010) (1 Marks)

http://www.knowledgegate.in/gate
Chapter-6
(Relational algebra, tuple calculus, SQL)

http://www.knowledgegate.in/gate
Query Language
• After designing a data base, that is ER diagram followed by conversion in relational model
followed by normalization and indexing, now next task is how to store, retrieve and modify
data in the data base.

• Thought here we will be concentrating more on the retrieval part. Query languages are used
for this purpose.

ER diagram relational model Normalization Indexing

http://www.knowledgegate.in/gate
• Query languages, data query languages or database query
languages (DQLs) are computer languages using which user request some
information from the database. A well known example is the Structured
Query Language (SQL).

http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
http://www.knowledgegate.in/gate
• Procedural Query Language
• Here users instruct the system to performs a sequence of operations on the data base in
order to compute the desired result. Means user provides both what data to be retrieved
and how data to be retrieved. e.g. Relational Algebra.

• Non-Procedural Query Language


• In nonprocedural language, the user describes the desired information without giving a
specific procedure for obtaining that information. What data to be retrieved e.g.
Relational Calculus. Tuple relational calculus, Domain relational calculus are declarative
query languages based on mathematical logic

http://www.knowledgegate.in/gate
• Relational Algebra (Procedural) and Relational Calculus (non-procedural) are mathematical
system/ query languages which are used for query on relational model.

• RA and RC are not executed in any computer they provide the fundamental mathematics on
which SQL is based.
• SQL (structured query language) works on RDBMS, and it includes elements of both procedural
or non-procedural query language.

Relational model RDBMS


RA, RC SQL
Algo Code
Conceptual Reality
Theoretical Practical
Chess Battle Field
http://www.knowledgegate.in/gate
RELATIONAL ALGEBRA

• RA like any other mathematical system provides a number of operators and use
relations (tables) as operands and produce a new relation as their result.

3+4

Operand Operator Operand

A⊕B

http://www.knowledgegate.in/gate
• Every operator in the RA accepts (one or two) relation/table as input arguments

*( )
and returns always a single relation instance as the result without a name.

*
http://www.knowledgegate.in/gate
• It also does not consider duplicity by default as it is based on set theory. Same
query is written in RA and SQL the result may be different as SQL considers
duplication.
• As it is pure mathematics no use of English keywords. Operators are represented
using symbols.

http://www.knowledgegate.in/gate
BASIC / FUNDAMENTAL OPERATORS

• The fundamental operations in the relational algebra are select, project, union,
set difference, Cartesian product, and Rename.

Name Symbol
Select
(σ)
Project
(∏)
Union
(∪)
Set difference
(−)
Cross product
(Χ)
Rename
(ρ)
http://www.knowledgegate.in/gate
DERIVED OPERATORS

• There are several other operations namely: set intersection, natural


join, and assignment.

Name Symbol DERIVED FROM


Join (⋈) (Χ)
(−)
Intersection (∩) A∩B=A-(A-B)

Division (÷) (X,-, ∏)


Assignment (=)
http://www.knowledgegate.in/gate
• The select, project, and rename operations are called unary operations, because
they operate on one relation.

• Union, Cartesian product and set difference operate on pairs of relations and
are, therefore, called binary operations.

http://www.knowledgegate.in/gate
The Project Operation (Vertical Selection)
• Main idea behind project operator is to select desired columns.

• The project operation is a unary operation that returns its


argument relation, with certain attributes left out.

• Projection is denoted by the uppercase Greek letter pi (Π).

• Πcolumn_name (table_name)

http://www.knowledgegate.in/gate
• We list those attributes that we wish to appear in the result as a subscript to Π,
argument relation follows in parentheses.

• Minimum number of columns selected can be 1, Maximum selected Columns


can be n - 1.

http://www.knowledgegate.in/gate
• Some points to remember
• Eliminates duplicate rows in a result relation by default.

• ∏A1, A2, An (r), A1, A2 ,…., An refers to the set of attributes to be projected.

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find the name of all customer without
duplication having bank account?

Q Write a RELATIONAL ALGEBRA query to find all the details of bank branches?

http://www.knowledgegate.in/gate
Q Suppose R1(A, B) and R2(C, D) are two relation schemas. Let r1 and r2 be the
corresponding relation instances. B is a foreign key that refers to C in R2. If data
in r1 and r2 satisfy referential integrity constraints, which of the following is ALWAYS
TRUE? (Gate-2012) (2 Marks)
a) ∏B(r1) − ∏C(r2) = ∅ b) ∏C(r2) − ∏B(r1) = ∅

c) ∏B(r1) = ∏C(r2) d) ∏B(r1) − ∏C(r2) ≠ ∅

Roll no(A) Br_code(B) Br_code(C) Br_name(D)


1 101 101 CS
2 101 102 EC
3 101 103 ME
http://www.knowledgegate.in/gate
4 102
The Select Operation (Horizontal Selection)
• The select operation selects tuples that satisfy a given predicate/Condition p.

• Lowercase Greek letter sigma (σ) is used to denote selection.

• It is a unary operator.

• Eliminates only tuples/rows.

• σ condition (table_name)

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find all account_no where balance is less the 1000?

Q Write a RELATIONAL ALGEBRA query to find branch name which is situated in Delhi and having assets
less than 1,00,000?

Q Write a RELATIONAL ALGEBRA query to find branch name and account_no which has balance greater
than equal to 1,000 but less than equal to 10,000?

http://www.knowledgegate.in/gate
• Predicate appears as a subscript to σ, the argument relation is in parentheses after the σ.

• Commutative in Nature, σp2 ^ p1(r)) = σp1 ^ p2(r)) = σp1( σp2(r)) = σp2( σp1(r))

http://www.knowledgegate.in/gate
• Some points to remember
• We allow comparisons using =, ≠, <, >, ≤ and ≥ in the selection predicate.

• Using the connectives and (∧), or (∨), and not (¬), we can combine several predicates into a
larger predicate.

• Minimum number of tuples selected can be 0, Maximum selected tuples can be all.

• Degree (Result relation) = degree (parent relation), where degree refers to no. of
attributes.

• 0 <= cardinality (result relation) <= cardinality (parent relation), where cardinality refers to
no. of tuples.

http://www.knowledgegate.in/gate
Q What is the optimized version of the relation algebra expression
πA1(πA2(σF1(σF2(r)))), where A1, A2 are sets of attributes in r with A1⊂A2 and F1,
F2 are Boolean expressions based on the attributes in r? (Gate-2014) (2 Marks)
a) πA1(σ(F1 ∧ F2)(r)) b) πA1(σ(F1 ∨ F2)(r))

c) πA2(σ(F1 ∧ F2)(r)) d) πA2(σ(F1 ∨ F2)(r))

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find all the customer name
who have a loan or an account or both?

http://www.knowledgegate.in/gate
The Union Operation
• It is a binary operation, denoted, as in set theory, by ∪.

• Written as, Expression1 ∪ Expression2, r ∪ s = {t | t ∈ r or t ∈ s}

• For a union operation r ∪ s to be valid, we require that two conditions hold:

• The relations r and s must be of the same arity. That is, they must have the same number
of attributes, the domains of the ith attribute of r and the ith attribute of s must be the
same, for all i.

• Mainly used to fetch data from different relations.

http://www.knowledgegate.in/gate
The Cartesian-Product Operation
• The Cartesian-product operation, denoted by a cross (×), allows us to combine
information from any two relations.
• It is a binary operator; we write the Cartesian product of relations R1 and R2 as R1 × R2.
• Cartesian-product operation associates every tuple of R1 with every tuple of R2.
• R1 Χ R2 = {rs | r ∈ R1 and s ∈ R2}, contains one tuple <r, s> (concatenation of tuples r
and s) for each pair of tuples r ϵ R1, s ϵ R2.
R1 X R2
A R1.B R2.B C
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z
http://www.knowledgegate.in/gate
• R1 Χ R2 returns a relational instance whose schema contains all the fields of R1 (in
order as they appear in R1) and all fields of R2 (in order as they appear in R2).

• If R1 has m tuples and R2 has n tuples the result will be having = m*n tuples.

• Same attribute name may appear in both R1 and R2, we need to devise a naming
schema to distinguish between these attributes.

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along
with account balance, who have an account in the bank?

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with loan amount, who have a
loan in the bank?

Q Write a RELATIONAL ALGEBRA query to find all loan_no along with amount and branch_name, which is situated in
Delhi?

Q Write a RELATIONAL ALGEBRA query to find the name of the customer who have an account in the branch
situated in Delhi and balance greater than 1000?

http://www.knowledgegate.in/gate
Rename Operation
• The results of relational algebra are also relations but
without any name.

• The rename operation allows us to rename the output relation.


It is denoted with small Greek letter rho ρ. Where the result
of expression E is saved with name of x.

• ρx(A1, A2, A3, A4,…..AN)(E)

• ρLearner(Student)

• ρLearner(Stu_ID, User_Name, Age)(Student(Roll_No, Name, Age))


http://www.knowledgegate.in/gate
• Deposit = (Branch-name, Account- number, Customer-name, Balance)

• ρPlus_Cust(Name)(π customer-name (σ balance > 10000 (Deposit))

http://www.knowledgegate.in/gate
Important Point

• ρLearner(Student)

• This Query do not change the name of the table in the original data
base, but create a new copy of the table student with name Learner

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find the account_no along with balance
with 8% interest as total amount, with table name as balance sheet?

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find the loan_no with
maximum loan amount?

http://www.knowledgegate.in/gate
Additional/Derived Relational-Algebra Operations
• If we restrict ourselves to just the fundamental operations, certain common
queries are lengthy to express. Therefore, we use additional operations.

• These additional operations do not add any power to the algebra.

• They are used to simplify the queries.

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find all the customer name
who have both a loan and an account?

http://www.knowledgegate.in/gate
Set-Intersection Operation
• We will be using ∩ symbol to denote set intersection.
• r ∩ s = r − (r − s)
• Set intersection is not a fundamental operation and does not add any
power to the relational algebra.
• r ∩ s = {t | t ∈ r and t ∈ s}
• 0 <= Ӏ R∩S Ӏ <= min (ӀRӀ, ӀSӀ)

http://www.knowledgegate.in/gate
Q Let R1 (A, B, C) and R2 (D, E) be two relation schema, where the primary keys are
shown underlined, and let C be a foreign key in R1 referring to R2. Suppose there is
no violation of the above referential integrity constraint in the corresponding
relation instances r1 and r2. Which one of the following relational algebra
expressions would necessarily produce an empty relation? (Gate-2004) (1 Marks)
a) ΠD(r2)−ΠC(r1) b) ΠC(r1)−ΠD(r2)

c) ΠD(r1⋈C≠Dr2) d) ΠC(r1⋈C=Dr2)

http://www.knowledgegate.in/gate
Join Operation
Join

Inner Join Outer Join

Left Outer Right Outer Full/Complete


Theta Join Equi Join Natural Join
Join Join Outer Join

http://www.knowledgegate.in/gate
Theta join
• The theta join/ Conditional join operation is a variant of the Cartesian product
operation that allows us to combine a selection and a Cartesian product into a
single operation.

• The theta join operation is defined as follows:

http://www.knowledgegate.in/gate
• The general form of a Theta Join operation on two relations R(A1, A2, ..., An) and
S(B1, B2, ..., Bm) is
R⨝θS
• The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An, B1, B2,
... , Bm) in that order; Q has one tuple for each combination of tuples—one from
R and one from S—whenever the combination satisfies the join condition.

http://www.knowledgegate.in/gate
• A general join condition is of the form
• <Condition>AND <Condition> AND...AND <Condition>
• where each <Condition> is of the form Ai θ Bj , Ai is an attribute of R, Bj is an
attribute of S, Ai and Bj have the same domain, and θ (theta) is one of the
comparison operators {=, <, ≤, ≠, >, ≥}.

• A JOIN operation with such a general join condition is called a THETA JOIN.
Tuples whose join attributes are NULL or for which the join condition is FALSE
do not appear in the result.

http://www.knowledgegate.in/gate
Equi Join
• The most common use of JOIN involves join conditions with equality
comparisons only. Such a JOIN, where the only comparison operator used is =, is
called an EQUIJOIN.

http://www.knowledgegate.in/gate
Q Consider the following relations A, B and C:

How many tuples does the result of the following relational algebra expression contain? Assume
that the schema of A∪B is the same as that of A. (Gate-2007) (2 Marks)
(A∪B) ⋈ A.Id > 40 ∨ C.Id < 15 C
a) 7 b) 4 c) 5 d) 9

http://www.knowledgegate.in/gate
The Natural-Join Operation
• The natural join is a binary operation that allows us to combine
certain selections and a Cartesian product into one operation.

• The natural join of r and s, denoted by r ⋈ s

• The standard definition of NATURAL JOIN requires that the two join attributes
(or each pair of join attributes) have the same name in both relations. If this is
not the case, a renaming operation is applied first.

http://www.knowledgegate.in/gate
• The natural join of r and s is a relation on schema R ∪ S
formally defined as follows:

R1
A B
R2 R1 ⋈ R2
B C
1 P Q X
2 Q R Y
3 R S Z

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with account
balance, who have an account in the bank?

Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along with loan
amount, who have a loan in the bank?

http://www.knowledgegate.in/gate
Q Write a RELATIONAL ALGEBRA query to find all loan_no along with amount and branch_name,
which is situated in Delhi?

Q Write a RELATIONAL ALGEBRA query to find the name of the customer who have an account in
the branch situated in Delhi and balance greater than 1000?

http://www.knowledgegate.in/gate
Notes
• The natural join is associative in nature.

• The natural join is a Lossy operator


R1 R2 R1 ⋈ R2
A B B C
A B C
1 P Q X
2 Q R Y
3 R http://www.knowledgegate.in/gate
S Z
Q The following functional dependencies hold for relations R(A, B, C) and S(B, D, E).
B→A
A→C
The relation R contains 200 tuples and the relation S contains 100 tuples. What is the maximum
number of tuples possible in the natural join R ⋈ S? (Gate-2010) (2 Marks)
a) 100

b) 200

c) 300

d) 2000

http://www.knowledgegate.in/gate
Q Consider the join of a relation R with a relation S. If R has m tuples and S has n
tuples, then the maximum and minimum sizes of the join respectively are: (Gate-
1999) (1 Marks)
(A) m+n and 0

(B) mn and 0

(C) m+n and m-n

(D) mn and m+n

http://www.knowledgegate.in/gate
Outer join Operations
• The outer-join operation is an extension of the join operation to deal with
missing information.

• The outer join operation works in a manner similar to the natural join operation,
but preserves those tuples that would be lost in a join by creating tuples in the
result containing null values.

• We can use the outer-join operation to avoid this loss of information.

http://www.knowledgegate.in/gate
• There are actually three forms of the operation:

• left outer join, denoted ⟕

• right outer join, denoted ⟖

• full outer join, denoted ⟗

http://www.knowledgegate.in/gate
Left Outer Join
• The left outer join ( )takes all tuples in the left relation that did not match with any tuple in
the right relation, pads the tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join.

R1 R2 R1 ⟕ R2
A B B C A B C
1 P Q X
2 Q R Y
3 R S Z

http://www.knowledgegate.in/gate
Q Consider the relations r(A, B) and s(B, C), where s.B is a primary key and r.B is a foreign key
referencing s.B. Consider the query
Q: r ⋈ (σ B<5 (s))
Let LOJ denote the natural left outer-join operation. Assume that r and s contain no null values.
Which of the following is NOT equivalent to Q? (Gate-2018) (2 Marks)
a) σ B<5(r ⋈ s) b) σ B<5(r LOJ s)

c) r LOJ (σ B<5 (s)) d) σ B<5(r) LOJ s

http://www.knowledgegate.in/gate
Right Outer Join
• The left outer join ( )takes all tuples in the left relation that did not match with any tuple in
the right relation, pads the tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join.

R1 R2 R1 ⟖ R2
A B B C A B C
1 P Q X
2 Q R Y
3 R S Z

http://www.knowledgegate.in/gate
Full Outer Join
• The full outer join(⟗ ) does both the left and right outer join operations, padding tuples from
the left relation that did not match any from the right relation, as well as tuples from the right
relation that did not match any from the left relation, and adding them to the result of the
join.

R1 ⟗ R2
R1 R2
A B B C A B C
1 P Q X
2 Q R Y
3 R S Z
http://www.knowledgegate.in/gate
Q Consider two relations R1(A, B) with the tuples (1, 5), (3, 7) and R2(A, C) = (1, 7),
(4, 9). Assume that R(A, B, C) is the full natural outer join of R1 and R2. Consider the
following tuples of the form (A, B, C)
a = (1, 5, null),
b = (1, null, 7),
c = (3, null, 9),
d = (4, 7, null),
e = (1, 5, 7),
f = (3, 7, null),
g = (4, null, 9).
Which one of the following statements is correct? (Gate-2015) (1 Marks)
(A) R contains a, b, e, f, g but not c, d
(B) R contains a, b, c, d, e, f, g
(C) R contains e, f, g but not a, b
(D) R contains e but not f, g
http://www.knowledgegate.in/gate
DIVISION
• In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), where the attributes of R are a subset
of the attributes of S; that is, X ⊆ Z.
• Z={A,B}
• X={A}
• Y = Z-X = {B}
• This means that, for a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in
combination with every tuple in S.
• Note that in the formulation of the DIVISION operation, the tuples in the denominator relation S restrict the
numerator relation R by selecting those tuples in the result that match all values present in the denominator.

http://www.knowledgegate.in/gate
• The semantics of the division is defined as follows:
• R ÷ S = { t[a1,...,an] : t ∈ R ∧ ∀s ∈ S ( (t[a1,...,an] ∪ s) ∈ R) }
• where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the restriction of t to this set. It is
usually required that the attribute names in the header of S are a subset of those of R because otherwise the
result of the operation will always be empty.

http://www.knowledgegate.in/gate
πStudent(R) − {πStudent[(πStudent(R) × S) − πStudent,Task(R)]}

http://www.knowledgegate.in/gate
πStudent(R) − {πStudent[(πStudent(R) × S) − πStudent,Task(R)]}

T := πStudent(R) × S
U := T − πStudent,Task(R)
V := πStudent(U)
W := πStudent(R) − V

http://www.knowledgegate.in/gate
Q Consider the given table R and S find the number of elements retrieved by the query

http://www.knowledgegate.in/gate
Q Consider a database that has the relation schema CR (StudentName, CourseName). An
instance of the schema CR is as given below.
The following query is made on the database.
T1 ← π CourseName (σ StudentName=SA(CR))
T2 ← CR÷T1

The number of rows in T2 is ______________ . (Gate-2017) (1 Marks)


http://www.knowledgegate.in/gate
Q A relation r(A,B) in a relational database has 1200 tuples. The attribute A has integer
values ranging from 6 to 20, and the attribute B has integer values ranging from 1 to 20.
Assume that the attributes A and B are independently distributed. The estimated number
of tuples in the output of σ(A>10)∨(B=18)(r) is ____________. (GATE 2021)
(a) 820

(b) 1200

(c) 960

(d) 1000

http://www.knowledgegate.in/gate
Q The following relation records the age of 500 employees of a company, where empNo ( indicating the employee
number) is the key:

Consider the following relational algebra expression:

What does the above expression generate? (GATE 2021) (2 MARKS)

(A) Employee numbers of only those employees whose age is the maximum

(B) Employee numbers of only those employees whose age is more than the age of exactly one other employee

(C) Employee numbers of all employees whose age is not the minimum

(D) Employee numbers of all employees whose age is the minimum

http://www.knowledgegate.in/gate
Q Consider the relational schema given below, where eId of the relation dependent is a foreign
key referring to empId of the relation employee. Assume that every employee has at least one
associated dependent in the dependent relation. (Gate-2014) (2 Marks)
employee (empId, empName, empAge)
dependent(depId, eId, depName, depAge)
Consider the following relational algebra query:
ΠempId(employee) − ΠempId(employee⋈(empId=eID)∧(empAge≤depAge)dependent)

The above query evaluates to the set of empIds of employees whose age is greater than that of
(A) some dependent.
(B) all dependents.
(C) some of his/her dependents
(D) all of his/her dependents.

http://www.knowledgegate.in/gate
Q Information about a collection of students is given by the relation studInfo(studId, name, sex).
The relation enroll (studId, courseId) gives which student has enrolled for (or taken) that
course(s). Assume that every course is taken by at least one male and at least one female
student. What does the following relational algebra expression represent? (Gate-2007) (2 Marks)
πcourceId((πstudId(σsex="female“ (studInfo)) × π courseId(enroll)) − enroll)

(A) Courses in which all the female students are enrolled.


(B) Courses in which a proper subset of female students are enrolled.
(C) Courses in which only male students are enrolled.

http://www.knowledgegate.in/gate
(D) None of the above
Q Consider the relation Student (name, sex, marks), where the primary key is shown underlined,
pertaining to students in a class that has at least one boy and one girl. What does the following
relational algebra expression produce? (Note: ρ is the rename operator). (Gate-2004) (2 Marks)
πname{σsex=female(Student)} − πname(Student ⨝ (sex=female ∧ x=male ∧ marks≤m) ρn,x,m(Student))

a) names of girl students with the highest marks


b) names of girl students with more marks than some boy student
c) names of girl students with marks not less than some boy student
d) names of girl students with more marks than all the boy students
http://www.knowledgegate.in/gate
Introduction to SQL
• There are a number of database query languages in use, either commercially or
experimentally, around 50 are used popularly. We will study the most widely used query
language ‘SQL’.

• Structured Query Language is a domain-specific language (not general purpose) used in


programming and design for managing data held in a relational database management
system (RDBMS).

• Although we refer to the SQL language as a “query language,” it can do much more than just
query a database. It can define the structure of the data base, modify data in the database,
specify security constraints and number of other tasks.

• Originally based upon relational algebra(procedural) and tuple relational calculus (Non-
procedural) mathematical model.

http://www.knowledgegate.in/gate
Overview of the SQL Query Language
1. IBM developed the original version of SQL, originally called Sequel (Structured English Query
Language), as part of the System R project in the early 1970s.

2. The Sequel language has evolved since then, and its name has changed to SQL (Structured Query
Language) (some other company has trademark on the word sequel). SQL has clearly established itself
as the standard relational database language.

3. In 1986, the American National Standards Institute (ANSI) and the International Organization for
Standardization (ISO) published an SQL standard, called SQL-86.

4. The next version of the standard was SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008,
SQL:2011, SQL: 2016 and most recently SQL:2023.

http://www.knowledgegate.in/gate
Parts of SQL
• Data-definition language (DDL). The SQL DDL provides commands for defining relation schemas,
deleting relations, and modifying relation schemas.

• Integrity. The SQL DDL includes commands for specifying integrity constraints that the data stored
in the database must satisfy. Updates that violate integrity constraints are disallowed. e.g. not null
which means Primary key value cannot be null.

• View definition. The SQL DDL includes commands for defining views. e.g. sorting result in
ascending or descending order with order by clause.

• Authorization. The SQL DDL includes commands for specifying access rights to relations and views
like read only, read/write. E.g. grant etc.

http://www.knowledgegate.in/gate
• Data-manipulation language (DML). The SQL DML provides the ability to query information
from the database and to insert tuples into, delete tuples from, and modify tuples in the
database. e.g. insert, delete command etc.

• Transaction control. SQL includes commands for specifying the beginning and ending of
transactions. E.g. commit, rollback, savepoint etc.

• Embedded SQL and dynamic SQL. Embedded and dynamic SQL define how SQL statements
can be embedded within general-purpose programming languages, such as C, C++, and Java.

http://www.knowledgegate.in/gate
Q Which of the following is/are correct? (GATE-1999) (1 Marks)
(a) An SQL query automatically eliminates duplicates

(b) An SQL query will not work if there are no indexes on the relations

(c) SQL permits attribute names to be repeated in the same relation

(d) None of the above

http://www.knowledgegate.in/gate
Basic Structure of SQL Queries

• For any SQL as query, input and output both are relations.

• Number of relations inputs to a query will be at least one, but output will always be a single
relation without any name unless specified, but columns will have names from input tables.

http://www.knowledgegate.in/gate
• The basic structure of an SQL query consists of three clauses: select, from, and where.

• The query takes it’s input the relations listed in the from clause, operates on them as
specified in the where and select clauses, and then produces a relation as the result without
any name unless specified.

• A typical SQL query has the form.

Select A1, A2,..., An (Column name)


from r1, r2,... , rm (Relation/table name)
Where P; (Condition)

http://www.knowledgegate.in/gate
Select A1, A2,..., An (Column name)
from r1, r2,... , rm (Relation/table name)
Where P; (Condition)

1. It is to be noted only select and from are mandatory clauses, and if not required then it is not essential to write
where. If the where clause is omitted, the predicate P is true.

2. SQL in general is not case sensitive i.e. it doesn’t matter whether we write query in upper or lower case.

3. In the formal, mathematical definition of the relational model, a relation is a set. Thus, duplicate tuples would
never appear in relations.

4. In practice, duplicate elimination is time-consuming. Therefore, SQL allows duplicates in relations as well as in
the results of SQL expressions. In those cases where we want to force the elimination of duplicates, we insert
the keyword distinct after select, will discuss in detail later.

5. SQL allows us to use the keyword all to specify explicitly that duplicates are not removed, Since duplicate
retention is the default, we shall not use all in our examples
http://www.knowledgegate.in/gate
Select Clause
• The function of Select clause in SQL is more or less same as that of ‘∏’ projection in the
relational algebra. It is used to pick the column required in result of the query out of all the
columns in relation/table. (Vertical filtering)

• Select A1, A2,..., An (Column name)

• The order in which columns are represented in the select clause, will be same the column will
appear in the relation.

• It is to be noted that in Relational Algebra ‘∏’ projection is not mandatory and should be used
only when we require less than all the column available in the table, but is SQL even if we
need all the column we must use Select, the argument is it makes SQL query more readable.

http://www.knowledgegate.in/gate
• But to make things easy we can use ‘*’ to specify that we need all
columns
Select *
• The select clause may also contain arithmetic expressions involving
the operators +, -, / and * operating on constants or attributes of
tuples. however, that it does not result in any change to the
relation/table.

http://www.knowledgegate.in/gate
Q Write a SQL query to find all the details of bank branches?

http://www.knowledgegate.in/gate
Q Write a SQL query to find each loan number along with loan amount?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the name of all customer without duplication having bank account?

http://www.knowledgegate.in/gate
Q Write a SQL query to find all account_no and balance with 6% yearly interest added to it?

http://www.knowledgegate.in/gate
Q Select operation in SQL is equivalent to (Gate-2015) (1 Marks)

(A) the selection operation in relational algebra

(B) the selection operation in relational algebra, except that select in SQL retains
duplicates

(C) the projection operation in relational algebra, except that select in SQL retains
duplicates

(D) the projection operation in relational algebra

http://www.knowledgegate.in/gate
Select Clause with where clause
1. Where clause in SQL is same as ‘σ’ sigma of relational algebra where we specify the
conditions/Predicate (horizontal filtering)

2. Where clause can have expressions involving the comparison operators <, <, >, >=, <= and <>. SQL
allows us to use the comparison operators to compare strings and arithmetic expressions.

3. SQL allows the use of the logical connectives and, or, and not in the where clause.

4. SQL includes a between comparison operator to simplify where clauses that specify that a value be
less than or equal to some value and greater than or equal to some other value.

5. Similarly, we can use the not between comparison operator.

http://www.knowledgegate.in/gate
Q Write a SQL query to find all account_no where balance is less the 1000?

http://www.knowledgegate.in/gate
Q Write a SQL query to find branch name which is situated in Delhi and having
assets less than 1,00,000?

http://www.knowledgegate.in/gate
Q Write a SQL query to find branch name and account_no which has balance
greater than equal to 1,000 but less than equal to 10,000?

http://www.knowledgegate.in/gate
Q Consider a database table T containing two columns X and Y each of type integer. After the creation of the table,
one record (X=1, Y=1) is inserted in the table. Let MX and My denote the respective maximum values of X and Y
among all records in the table at any point in time. Using MX and MY, new records are inserted in the table 128
times with X and Y values being MX+1, 2*MY+1 respectively. It may be noted that each time after the insertion,
values of MX and MY change. What will be the output of the following SQL query after the steps mentioned above
are carried out? (Gate-2011) (2 Marks)
SELECT Y FROM T WHERE X=7;
(A) 127 (B) 255 (C) 129 (D) 257

http://www.knowledgegate.in/gate
Set Operation

1. The SQL operations union, intersect, and except/minus operate on relations and corresponds to
the mathematical set-theory operations ∪, ∩ and – respectively.

2. The union operation automatically eliminates duplicates, unlike the select clause, If we want to
retain all duplicates, we must write union all in place of union.

3. The intersect operation automatically eliminates duplicates. If we want to retain all duplicates,
we must write intersect all in place of intersect.

4. If we want to retain duplicates, we must write except all in place of except.

http://www.knowledgegate.in/gate
Q Write a SQL query to find all the customer name
a) who have a loan or an account or both ?
b) who have both a loan and an account?
c) who have a loan but do not have an account?

http://www.knowledgegate.in/gate
Queries on Multiple Relations
• So far, our example queries were on a single relation. Queries often need to access
information from multiple relations.
• The from clause by itself defines a Cartesian product of the relations listed in the clause.
• Cartesian product of two relations, which concatenates each tuple of the first relation with
every tuple of the second

http://www.knowledgegate.in/gate
• Since the same attribute name may appear in both r1 and r 2, we prefix the name of the
relation from which the attribute originally came, before the attribute name.

• For those attributes that appear in only one of the two schemas, we shall usually drop the
relation-name prefix. This simplification does not lead to any ambiguity.

• Cartesian Product is commutative in nature

R1 * R2
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z

http://www.knowledgegate.in/gate
Q Write a SQL query to find the name of all the customers with account
balance, who have an account in the bank?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the name of all the customers, who have a loan in the
bank of less than 1000 or loan from north_delhi branch?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the name of the customer who have an account in the
branch situated in Delhi?

http://www.knowledgegate.in/gate
Q Consider the set of relations shown below and the SQL query that follows (Gate-2003) (2 Marks)
Students: (Roll_number, Name, Date_of_birth)

Courses: (Course number, Course_name, Instructor)

Grades: (Roll_number, Course_number, Grade)

select distinct Name


from Students, Courses, Grades
where Students. Roll_number = Grades.Roll_number
and Courses.Instructor = Korth
and Courses.Course_number = Grades.Course_number
and Grades.grade = A

Which of the following sets is computed by the above query?


(A) Names of students who have got an A grade in all courses taught by Korth

(B) Names of students who have got an A grade in all courses

(C) Names of students who have got an A grade in at least one of the
courses taught by Korth
http://www.knowledgegate.in/gate
(D) None of the above
Natural Join
1. To make the life of an SQL programmer easier for this common case, SQL supports an
operation called the natural join.

2. The natural join operation like cartesian product operates on two relations and produces a
relation as the result.

3. Natural join considers only those pairs of tuples with the same value on those attributes that
appear in the schemas of both relations.

http://www.knowledgegate.in/gate
1. Notice that we do not repeat those attributes that appear in the schemas of both relations;
rather they appear only once.

2. Notice also the order in which the attributes are listed: first the attributes common to the
schemas of both relations, second those attributes unique to the schema of the first relation,
and finally, those attributes unique to the schema of the second relation.

3. commutative in nature

http://www.knowledgegate.in/gate
A from clause in an SQL query can have multiple relations combined using natural join, as shown here:

select A1, A2,..., An


from r1 natural join r2 natural join .. . natural join rm
where P;

http://www.knowledgegate.in/gate
R1 R2 R1 * R2
A B B C A R1.B R2.B C
1 P Q X 1 P Q X
2 Q R Y 1 P R Y
3 R S Z 1 P S Z
2 Q Q X
2 Q R Y
2 Q S Z
3 R Q X
R1 ⋈ R2 3
3
R
R
R
S
Y
Z
A B C

http://www.knowledgegate.in/gate
Q Write a SQL query to find the name of all the customers along with account
balance, who have an account in the bank?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the name of the customer who have an account in the branch
situated in Delhi?

http://www.knowledgegate.in/gate
Q Database table by name Loan_Records is given below. (Gate - 2011) (2 Marks)
Borrower Bank_Manager Loan_Amount
Ramesh Sunderajan 10000.00
Suresh Ramgopal 5000.00
Mahesh Sunderajan 7000.00

What is the output of the following SQL query?

SELECT Count(*)
FROM ( (SELECT Borrower, Bank_Manager FROM Loan_Records) AS S
NATURAL JOIN
(SELECT Bank_Manager, Loan_Amount FROM Loan_Records) AS T );
(A) 3
(B) 9
(C) 5
(D) 6
http://www.knowledgegate.in/gate
Q Consider the following relation schema pertaining to a students database:
Student (rollno, name, address)
Enroll (rollno, courseno, coursename)
where the primary keys are shown underlined. The number of tuples in the Student and Enroll
tables are 120 and 8 respectively. What are the maximum and minimum number of tuples that
can be present in (Student * Enroll), where ‘*’ denotes natural join? (Gate-2004) (2 Marks)
(A) 8, 8

(B) 120, 8

(C) 960, 8

(D) 960, 120

http://www.knowledgegate.in/gate
Q Consider the following SQL query (GATE-2003) (1 Marks)
select distinct a1, a2,........., an
from r1, r2,........, rm
where P
For an arbitrary predicate P, this query is equivalent to
which of the following relational algebra expressions ?

http://www.knowledgegate.in/gate
Join /inner join
1. Join or inner join is a operation which works same as cartesian product but when it is used
with “using” keyword it provides additional functionality.

2. Join with using – Provides ability to explicitly chose columns which must be used by join for
comparison and removal of redundant tuples, i.e. if there are more than one column which
are common between two table but we do not want each of them to be considered, then we
can use join with using.

http://www.knowledgegate.in/gate
1. So join is a form of the natural join construct that allows you to specify exactly which columns
should be equated.
• R1(A, B, C)
• R2(B, C, D)

2. Consider the operation R1 join R2 using ( B). The operation is similar to R1 natural join R2,
except that a pair of tuples t1 from R1 and t2 from R2 match if t1. B = t2. B; even if R1 and R2
both have an attribute named C, it is not required that t1. C = t2. C.

http://www.knowledgegate.in/gate
Outer Join
1. The join operations we studied earlier that do not preserve nonmatched tuples are called inner join operations,
to distinguish them from the outer-join operations.

2. The problem with natural join or join is only those values that appears in both relations will manage to reach
final table, but if some value is explicitly in table one or in second table then that information will be lost, and
that will be loss of information.

3. The outer join operation works in a manner similar to the join operations we have already studied, but preserve
those tuples that would be lost in a join, by creating tuples in the result containing null values.

http://www.knowledgegate.in/gate
• There are in fact three forms of outer join:
1. The left outer join (left join) preserves tuples only in the relation named before (to the
left of) the left outer join operation.
2. The right outer join (right join) preserves tuples only in the relation named after (to the
right of) the right outer join operation.
3. The full outer join preserves tuples in both relations.

http://www.knowledgegate.in/gate
R1 R2 R1 * R2 R1 ⋈ R2
A B B C A R1.B R2.B C A B C
1 P Q X 1 P Q X 2 Q X
2 Q R Y 1 P R Y 3 R Y
3 R S Z 1 P S Z
2 Q Q X
2 Q R Y R1 ⟗ R2
2 Q S Z A B C
3 R Q X
3 R R Y
3 R S Z

R1 ⟕ R2 R1 ⟖R2
A B C A B C

http://www.knowledgegate.in/gate
Q Consider the following two tables and four queries in SQL. (GATE - 2018) (1 Marks)
Book (isbn, bname) Query 1: SELECT B.isbn, S.copies Query 2: SELECT B.isbn, S.copies
FROM Book B INNER JOIN Stock S FROM Book B LEFT OUTER JOIN Stock S
Stock (isbn, copies) ON B.isbn = S.isbn; ON B.isbn = S.isbn;

Query 3: SELECT B.isbn, S.copies Query 4: SELECT B.isbn, S.copies


FROM Book B RIGHT OUTER JOIN Stock S FROM Book B FULL OUTER JOIN Stock S
ON B.isbn = S.isbn; ON B.isbn = S.isbn;

Which one of the queries above is certain to have an output that is a superset of the outputs of
the other three queries?
(a) Query 1
(b) Query 2
(c) Query 3
(d) Query 4

http://www.knowledgegate.in/gate
Alias Operation/rename
1. SQL aliases are used to give a table, or a column in a table, a temporary name. Just create a
new copy but do not change anything in the data base.

2. Aliases are often used to make column names more readable.

3. It uses the as clause, taking the form: old-name as new-name. The as clause can appear in
both the select and from clauses.

4. An alias only exists for the duration of the query.

http://www.knowledgegate.in/gate
Q Write a SQL query to find the account_no along and balance with 8% interest, as
Account, total_balance?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the loan_no with maximum loan amount?

http://www.knowledgegate.in/gate
Aggregate Functions
• Aggregate functions are functions that take a collection (a set or multiset) of values as input
and return a single value. SQL offers five built-in aggregate functions:
• Average: avg
• Minimum: min
• Maximum: max
• Total: sum
• Count: count

http://www.knowledgegate.in/gate
1. The input to sum and avg must be a collection of numbers, but the other operators can
operate on collections of nonnumeric data types, such as strings, as well.

2. We use the aggregate function count frequently to count the number of tuples in a relation.
The notation for this function in SQL is count (*).

3. Count is the only aggregate function which can work with null, all other aggregate functions
simply ignore null.

http://www.knowledgegate.in/gate
Q Write a SQL query to find the number of accounts in the bank?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the average balance of every account in the banks
from south_delhi branch?

http://www.knowledgegate.in/gate
Q Consider a table along with two query?

Select avg (balance)


from account
Account_no balance Branch_name
Abc123 100 N_delhi
Pqr123 500 S_mumbai
Wyz123 null S_delhi

Select sum(balance)/count(balance)
from account

http://www.knowledgegate.in/gate
Ordering the Display of Tuples
• SQL offers the user some control over the order in which tuples in a relation are displayed.
The order by clause causes the tuples in the result of a query to appear in sorted order.

http://www.knowledgegate.in/gate
Q Write a SQL query to find all the branch_name which are situated in Delhi in alphabetic order?

http://www.knowledgegate.in/gate
String Operations
1. SQL specifies strings by enclosing them in single quotes, for example, ’Computer’.

2. The SQL standard specifies that the equality operation on strings is case sensitive; as a result the expression
‘Computer’ = ’computer’ evaluates to false.

3. However, some database systems, such as MySQL and SQL Server, do not distinguish uppercase from lowercase
when matching strings; as a result, would evaluate to true on these databases.

4. This default behavior can, however, be changed, either at the database level or at the level of specific
attributes.

http://www.knowledgegate.in/gate
1. SQL also permits a variety of functions on character strings, such as concatenating, extracting
substrings, finding the length of strings, converting strings to uppercase and lowercase,
removing spaces at the end of the string and so on.

2. There are variations on the exact set of string functions supported by different database
systems.

3. A single quote character that is part of a string can be specified by using two single quote
characters; for example, the string It’s right can be specified by It”s right.

http://www.knowledgegate.in/gate
• Pattern matching can be performed on strings, using the operator like. We describe patterns
by using two special characters:
• Percent (%): The % character matches any substring.
• Underscore (_): The _ character matches any character.

http://www.knowledgegate.in/gate
• ’%Comp%’ matches any string containing “Comp” as a substring, for example, ’Intro to
Computer Science’, and ’Computational Biology’.

• ’_ _ _’ matches any string of exactly three characters.

• ’_ _ _%’ matches any string of at least three characters.

http://www.knowledgegate.in/gate
Q Write a SQL query to find all the branch name who have exactly 5 character in their name ?

http://www.knowledgegate.in/gate
Q Write a SQL query to find all the customer name who have ‘kumar’ in their name ?

http://www.knowledgegate.in/gate
1. For patterns to include the special pattern characters (that is, % and_), SQL allows the
specification of an escape character.

2. The escape character is used immediately before a special pattern character to indicate that
the special pattern character is to be treated like a normal character.

3. We define the escape character for a like comparison using the escape keyword. To illustrate,
consider the following patterns, which use a backslash (\) as the escape character:

4. like ’ab\%cd%’ escape ’\’ matches all strings beginning with “ab%cd”.

5. like ’ab\\cd%’ escape ’\’ matches all strings beginning with “ab\cd”.

http://www.knowledgegate.in/gate
1. SQL allows us to search for mismatches instead of matches by using the not like comparison
operator. Some databases provide variants of the like operation which do not distinguish
lower and upper case.

2. SQL:1999 also offers a similar to operation, which provides more powerful pattern matching
than the like operation; the syntax for specifying patterns is similar to that used in Unix
regular expressions.

http://www.knowledgegate.in/gate
Group by clause
1. There are circumstances where we would like to work on a set of tuples in a relation rather
than working on the whole table as one unit.

2. The attribute or attributes given in the group by clause are used to form groups. Tuples with
the same value on all attributes in the group by clause are placed in one group.

http://www.knowledgegate.in/gate
Q Write a SQL query to find the average account balance of each branch?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the branch name of Gwalior city with average balance more than
1500?

http://www.knowledgegate.in/gate
Q A relational database contains two tables Student and Performance as shown below: (GATE - 2019) (2 Marks)

The primary key of the Student table is Roll_no. For the Performance table, the columns Roll_no. and
Subject_code together from the primary key. Consider the SQL query given below:

SELECT S.Student_name, sum (P.Marks)


FROM Student S, Performance P
WHERE P.Marks > 84
GROUP BY S.Student_name;
The number of rows returned by the above SQL query is _____.

http://www.knowledgegate.in/gate
Q Consider a database that has the relation schema EMP
(EmpId, EmpName, and DeptName). An instance of the
schema EMP and a SQL query on it are given below.
(GATE-2018) (2 Marks)

The output of executing the SQL query is ___________

http://www.knowledgegate.in/gate
Q The employee information in a company is stored in the relation
Employee (name, sex, salary, deptName)
Consider the following SQL query

select deptName
from Employee
where sex = 'M'
group by deptName
having avg (salary) > (select avg (salary) from Employee)

It returns the names of the department in which (Gate-2004) (2 Marks) (NET-JUNE-2013)


(A) the average salary is more than the average salary in the company
(B) the average salary of male employees is more than the average salary of all male employees in the company
(C) the average salary of male employees is more than the average salary of employees in the same department
(D) the average salary of male employees is more than the average salary in the company

http://www.knowledgegate.in/gate
• The SQL WITH clause was introduced by Oracle in the Oracle 9i release 2 database. The SQL
WITH clause allows you to give a sub-query block a name (a process also called sub-query
refactoring), which can be referenced in several places within the main SQL query.

• The clause is used for defining a temporary relation such that the output of this temporary
relation is available and is used by the query that is associated with the WITH clause.

• WITH clause is not supported by all database system.

http://www.knowledgegate.in/gate
Q Consider the following database table named water_schemes :
The number of tuples returned by the following SQL query is
_________. GATE(2015 SET 1)

with total(name, capacity) as


select district_name, sum(capacity)
from water_schemes
group by district_name
with total_avg(capacity) as
select avg(capacity)
from total
select name
from total, total_avg
where total.capacity ≥ total_avg.capacity

http://www.knowledgegate.in/gate
1. When an SQL query uses grouping, it is important to ensure that the only attributes that appear
in the select statement without being aggregated are those that are present in the group by
clause.

2. In other words, any attribute that is not present in the group by clause must appear only inside
an aggregate function if it appears in the select clause, otherwise the query is treated as
erroneous.

3. The Having Clause - At times, it is useful to state a condition that applies to groups rather than to
tuples.

4. To express such a query, we use the having clause of SQL. SQL applies predicates in the having
clause after groups have been formed.

5. Any attribute that is present in the having clause without being aggregated must appear in the
group by clause, otherwise the query is treated as erroneous.

http://www.knowledgegate.in/gate
• The meaning of a query containing aggregation, group by, or having clauses is defined by the
following sequence of operations:
1. As was the case for queries without aggregation, the from clause is first evaluated to get a
relation.
2. If a where clause is present, the predicate in the where clause is applied on the result relation of
the from clause.
3. Tuples satisfying the where predicate are then placed into groups by the group by clause if it is
present. If the group by clause is absent, the entire set of tuples satisfying the where predicate is
treated as being in one group.
4. The having clause, if it is present, is applied to each group; the groups that do not satisfy the
having clause predicate are removed.
5. The select clause uses the remaining groups to generate tuples of the result of the query,
applying the aggregate functions to get a single result tuple for each group.

http://www.knowledgegate.in/gate
Q Which of the following statements are TRUE about an SQL query? (Gate-2012) (2 Marks)
P : An SQL query can contain a HAVING clause even if it does not have a GROUP BY clause
Q : An SQL query can contain a HAVING clause only if it has a GROUP BY clause
R : All attributes used in the GROUP BY clause must appear in the SELECT clause
S : Not all attributes used in the GROUP BY clause need to appear in the SELECT clause
(A) P and R (B) P and S (C) Q and R (D) Q and S

http://www.knowledgegate.in/gate
Q Write a SQL query to find all the customer name who have a loan and account?

http://www.knowledgegate.in/gate
Q Write a SQL query to find all the customer name who have a loan but do not have an account?

http://www.knowledgegate.in/gate
Q Write a SQL query to find the largest loan amount in the bank?

http://www.knowledgegate.in/gate
Q Write a SQL query to find loan amount which is not smallest?

http://www.knowledgegate.in/gate
Q Write a SQL query to find account no, which has balance greater than every
account of north Delhi branch?

http://www.knowledgegate.in/gate
Q Consider the following relations A, B, C. How many tuples does the result of the following
relational algebra expression contain? Assume that the schema of A U B is the same as that of A.
(Gate-2012) (2 Marks)

SELECT A.id
FROM A
WHERE A.age > ALL (SELECT B.age
FROM B
WHERE B. name = "arun")
(A) 4 (B) 3
(C) 0 (D) 1

http://www.knowledgegate.in/gate
Q The relation book (title, price) contains the titles and prices of different books. Assuming that no two
books have the same price, what does the following SQL query list? (Gate - 2005) (2 Marks) [Asked in TCS
NQT 2020 ]

select title
from book as B
where (select count(*)
from book as T
where T.price > B.price) < 5

(A) Titles of the four most expensive books

(B) Title of the fifth most inexpensive book

(C) Title of the fifth most expensive book

(D) Titles of the five most expensive books

http://www.knowledgegate.in/gate
Q Consider the following relation (GATE-2015) (2 Marks)
Cinema (theater, address, capacity)
Which of the following options will be needed at the end of the SQL query

SELECT P1.address FROM Cinema P1

Such that it always finds the addresses of theaters with maximum capacity?

(A) WHERE P1. Capacity> = All (select P2. Capacity from Cinema P2)

(B) WHERE P1. Capacity> = Any (select P2. Capacity from Cinema P2)

(C) WHERE P1. Capacity> All (select max(P2. Capacity) from Cinema P2)

(D) WHERE P1. Capacity> Any (select max (P2. Capacity) from Cinema P2)

http://www.knowledgegate.in/gate
Q Given the following schema (GATE-2014) (2 Marks)
employees(emp-id, first-name, last-name, hire-date, dept-id, salary)
departments(dept-id, dept-name, manager-id, location-id)
You want to display the last names and hire dates of all latest hires in their respective departments in the location ID 1700. You
issue the following query:
SELECT last-name, hire-date
FROM employees
WHERE (dept-id, hire-date) IN (SELECT dept-id, MAX(hire-date)
FROM employees JOIN departments USING(dept-id)
WHERE location-id = 1700
GROUP BY dept-id);
What is the outcome?
(A) It executes but does not give the correct result.
(B) It executes and gives the correct result.
(C) It generates an error because of pairwise comparison.
(D) It generates an error because the GROUP BY clause cannot be used with table joins in a subquery

http://www.knowledgegate.in/gate
Test for Empty Relations

Select branch_name
from branch as a
Where assets <= 1,00,000 and
exists(Select *
from Branch as b
where branch_city = ‘Gwalior’
and a.branch_name = b.branch_name)

• SQL includes a feature for testing whether a subquery has any tuples in its result. The exists construct returns the value true if
the argument subquery is nonempty.
• The above query also illustrates a feature of SQL where a correlation name from an outer query, can be used in a subquery in
the where clause. A subquery that uses a correlation name from an outer query is called a correlated subquery.

http://www.knowledgegate.in/gate
Q Consider the following relational schema (Gate-2014) (2 Marks)
employee(empId, empName, empDept)
customer(custId, custName, salesRepId, rating)
salesRepId is a foreign key referring to empId of the employee relation. Assume that each employee makes a sale to
at least one customer. What does the following query return?
SELECT empName
FROM employee E
WHERE NOT EXISTS (SELECT custId
FROM customer C
WHERE C.salesRepId = E.EmpId AND C.rating <> ’GOOD’);
(a) Names of all the employees with at least one of their customers having a ‘GOOD’ rating.
(B) Names of all the employees with at most one of their customers having a ‘GOOD’ rating.
(C) Names of all the employees with none of their customers having a ‘GOOD’ rating.
(D) Names of all the employees with all their customers having a ‘GOOD’ rating.

http://www.knowledgegate.in/gate
Q A relational schema for a train reservation database is given below.
Passenger (pid, pname, age)
Reservation (pid, class, tid)
What pids are returned by the following SQL query for the above instance of the tables? (Gate-2010) (2 Marks)
SLECT pid
FROM Reservation
WHERE class ‘AC’ AND EXISTS (SELECT *
FROM Passenger
WHERE age > 65 AND Passenger. pid = Reservation.pid)
(A) 1, 0 (B) 1, 2 (C) 1, 3 (D) 1, 5

Reservation
Passenger
pid class tid
pid pname age
0 AC 8200
0 Sachin 65
1 AC 8201
1 Rahul 66
2 SC 8201
2 Sourav 67
5 AC 8203
3 Anil 69
1 SC 8204
3 AC 8202

http://www.knowledgegate.in/gate
Q Consider the following Employee table (Gate-2015) (2 Marks)
How many rows are there in the result of following query?
SELECT E.ID
FROM Employee E
WHERE EXISTS (SELECT E2.salary
FROM Employee E2
WHERE E2.DeptName = 'CS’ AND E.salary > E2.salary)
(A) 0 (B) 4 (C) 5 (D) 6

ID salary DeptName
1 10000 EC
2 40000 EC
3 30000 CS
4 40000 ME
5 50000 ME
6 60000 ME
7 70000 CS

http://www.knowledgegate.in/gate
null
http://www.knowledgegate.in/gate
Q In SQL, relations can contain null values, and comparisons with null values are
treated as unknown. Suppose all comparisons with a null value are treated as false.
Which of the following pairs is not equivalent? (Gate-2000) (2 Marks)
(A) x = 5 AND not(not(x = 5))

(B) x = 5 AND x> 4 and x < 6, where x is an integer

(C) x < 5 AND not (x = 5)

(D) None of the above

http://www.knowledgegate.in/gate
Relational Calculus
• Relational calculus is non-procedural query language, where we have to define what to get and not how to get it

Relational
Calculus

Tuple Domain
relational relational
calculus calculus

http://www.knowledgegate.in/gate
Tuple Relational Calculus

• TRC is based on specifying a number of tuple variables. Each tuple variable usually range over a particular
database relation, meaning that the variable may takes as its value from any individual tuple of a relation.

• A simple tuple relational calculus query is of the form.

{t| Condition(t)}

• Where t is a tuple variable and condition(t) is a conditional expression involving. The result of such a query is the
set of all tuple t that satisfy condition (t).

http://www.knowledgegate.in/gate
• We use t[A] or t.A to denote the value of tuple t on attribute A.

• we use t ∈ r or r(t) to denote that tuple t is in relation r.

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the details of all computer science students?

SQL: select * from student where branch = CSE


RA: {σ branch = CSE (Student)}
TRC:
DRC:

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the details of all computer science students?
SQL: select * from student where branch = CSE
RA: {σ branch = CSE (Student)}
TRC: {t | Student(t) ∩ t. branch = CSE}
DRC:

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the Roll No of all computer science students?
SQL: select Roll No from student where branch = CSE
RA: {∏sname (σ branch = CSE (Student))}
TRC:
DRC:

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the Roll No of all computer science students?
SQL: select Roll No from student where branch = CSE
RA: {∏roll no (σ branch = CSE (Student))}
TRC: {t. Roll No | Student(t) ∩ t. branch = CSE}
DRC:

http://www.knowledgegate.in/gate
Formal Definition
• A tuple-relational-calculus expression is of the form: {t | P(t)} where P is a formula.

• Several tuple variables may appear in a formula.

• A tuple variable is said to be a free variable unless it is quantified by a ∃ or ∀.

• A tuple variable is said to be a bounded variable if it is quantified by ∃ or ∀.

http://www.knowledgegate.in/gate
• P(t) may have various conditions logically combined with OR (∨), AND (∧), NOT(¬).

• It also uses quantifiers:


• ∃ t ∈ r (Q(t)) = ”there exists” a tuple in t in relation r such that predicate Q(t) is true.

• ∀ t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.

http://www.knowledgegate.in/gate
• A tuple-relational-calculus formula is built up out of atoms. An atom has one of the following
forms:

• s ∈ r, where s is a tuple variable and r is a relation.

• s[x] op u[y], where s and u are tuple variables, x is an attribute on which s is defined, y is
an attribute on which u is defined, and op is a comparison operator (<, ≤, =, =, >, ≥); we
require that attributes x and y have domains whose members can be compared by .

• s[x] op c, where s is a tuple variable, x is an attribute on which s is defined, op is a


comparison operator, and c is a constant in the domain of attribute x.

http://www.knowledgegate.in/gate
• We build up formulae from atoms by using the following rules:

o An atom is a formula.

o If P1 is a formula, then so are ¬P1 and (P1).

o If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.

o If P1(s) is a formula containing a free tuple variable s, and r is a relation,


o then ∃ s ∈ r (P1(s)) and ∀ s ∈ r (P1(s))

http://www.knowledgegate.in/gate
In the tuple relational calculus, equivalences include the following three rules:
1. P1 ∧ P2 is equivalent to ¬ (¬ (P1) ∨ ¬ (P2)).
2. ∀ t ∈ r (P1(t)) is equivalent to ¬ ∃ t ∈ r (¬P1(t)).
3. P1 ⇒ P2 is equivalent to ¬(P1) ∨ P2.

http://www.knowledgegate.in/gate
{t| t ∈ loan ∧ t[amount] > 1200] }

http://www.knowledgegate.in/gate
Q Find all the details of loan for amount over 1200?
{t| t ∈ loan ∧ t[amount] > 1200] }

http://www.knowledgegate.in/gate
{t| ∃s ∈ loan (t[loan number] = s[loan number]) ∧ s[amount] > 1200] }

http://www.knowledgegate.in/gate
Q Find the loan number for each loan of amount over 1200?
{t| ∃s ∈ loan (t[loan number] = s[loan number]) ∧ s[amount] > 1200] }

http://www.knowledgegate.in/gate
{t| ∃s ∈ borrower (t[customer name] = s[customer name])
∧ ∃u ∈ loan (u[customer name] = s[customer name])
∧ u[branch name] = ‘Noida’ }

http://www.knowledgegate.in/gate
Q Find the name of all the customers who have a loan from Noida branch?
{t| ∃s ∈ borrower (t[customer name] = s[customer name])
∧ ∃u ∈ loan (u[customer name] = s[customer name])
∧ u[branch name] = ‘Noida’ }

http://www.knowledgegate.in/gate
{t| ∃s ∈ borrower (t[customer name] = s[customer name])
V ∃u ∈ depositor (t[customer name] = u[customer name]) }

http://www.knowledgegate.in/gate
Q Find the name of all the customers who have a loan or account or both at the bank?
{t| ∃s ∈ borrower (t[customer name] = s[customer name])
V ∃u ∈ depositor (t[customer name] = u[customer name]) }

http://www.knowledgegate.in/gate
{t| ∃s ∈ borrower (t[customer name] = s[customer name])
∧ ∃u ∈ depositor (t[customer name] = u[customer name]) }

http://www.knowledgegate.in/gate
Q Find the name of all the customers who have a loan and account both at the bank?
{t| ∃s ∈ borrower (t[customer name] = s[customer name])
∧ ∃u ∈ depositor (t[customer name] = u[customer name]) }

http://www.knowledgegate.in/gate
{t| ∃u ∈ depositor borrower (t[customer name] = u[customer name])
∧ ¬∃s ∈ borrower (t[customer name] = s[customer name]) }

http://www.knowledgegate.in/gate
Q Find the name of all the customers who have a loan from the bank and do not
have a account?
{t| ∃u ∈ depositor borrower (t[customer name] = u[customer name])
∧ ¬∃s ∈ borrower (t[customer name] = s[customer name]) }

http://www.knowledgegate.in/gate
Expressive Power of Languages
Important Points to Remember

• The tuple relational calculus restricted to safe expressions is equivalent in expressive power to the basic relational algebra
(with the operators ∪, −, ×, and , but without the extended relational operations such as generalized projection and
aggregation (G)).

• For every relational-algebra expression using only the basic operations, there is an equivalent expression in the tuple relational
calculus, and for every tuple-relational-calculus expression, there is an equivalent relational algebra expression.

• Tuple relational calculus does not have any equivalent of the aggregate operation, but it can be extended to support
aggregation. Extending the tuple relational calculus to handle arithmetic expressions is straightforward.

http://www.knowledgegate.in/gate
Safety of Expressions
• A tuple-relational-calculus expression may generate an infinite relation.
Example: {t |¬ (t ∈ instructor )}
There are infinitely many tuples that are not in instructor. Most of these tuples contain values
that do not even appear in the database. We do not want to allow such expressions.
• To help us define a restriction of the tuple relational calculus the concept of the domain of a
tuple relational formula, P is introduced.
• The domain of P, denoted dom(P), is the set of all values referenced by P.
• They include values mentioned in P itself, as well as values that appear in a tuple of a relation
mentioned in P.

http://www.knowledgegate.in/gate
Example: dom (t ∈ instructor ∧ t[salary] > 80000) is the set containing 80000 as
well as the set of all values appearing in any attribute of any tuple in the instructor
relation.

• An expression {t | P(t)} is safe if all values that appear in the result are values
from dom(P).
• The expression {t |¬ (t ∈ instructor)} is not safe.
• Safe expressions are guaranteed to have finite results.

http://www.knowledgegate.in/gate
Q Which of the rational calculus expression is not safe? (Gate-2001) (2 Marks)
a) {t ∣ ∃u ∈ R1(t[A] = u[A]) ∧ ¬∃s ∈ R2 (t[A]=s[A])}
b) {t ∣ ∀u ∈R1(u[A]="x" ⇒ ∃s ∈ R2(t[A] = s[A] ∧ s[A]=u[A]))}
c) {t∣¬(t ∈ R1)}
d){t ∣ ∃u ∈ R1(t[A]=u[A]) ∧ ∃s ∈ R2(t[A]=s[A])}

http://www.knowledgegate.in/gate
Domain Relational Calculus
• Domain calculus differs from the tuple calculus in the type of variables used in formulas: rather than having
variables range over tuples.

• The variable range over single values from domains of attributes. To form a relation of degree n for a query
result, we must have n of these domain variables. One for each attribute.

• An expression of the domain calculus is of the form


• (x1, x2, …, xn | COND(x1, x2, …, xn, xn+1, xn+2, …, x n+m ))
• Where x1, x2, …, xn are domain variables that range over domains and COND is a condition of the domain
relational calculus.

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the details of all computer science students?
SQL: select * from student where branch = CSE
RA: {σ branch = CSE (Student)}
TRC: {t | Student(t) ∩ t. branch = CSE}
DRC:

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the details of all computer science students?
SQL: select * from student where branch = CSE
RA: {σ branch = CSE (Student)}
TRC: {t | Student(t) ∩ t. branch = CSE}
DRC: {(Roll No, Name, Branch) | Student (Roll no, Name, Branch) ∩ branch = CSE}

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the Roll No of all computer science students?
SQL: select Roll No from student where branch = CSE
RA: {∏sname (σ branch = CSE (Student))}
TRC: {t. Roll No | Student(t) ∩ t. branch = CSE}
DRC:

http://www.knowledgegate.in/gate
Student(Roll No, Name, Branch)
Q Find the Roll No of all computer science students?
SQL: select Roll No from student where branch = CSE
RA: {∏sname (σ branch = CSE (Student))}
TRC: {t. Roll No | Student(t) ∩ t. branch = CSE}
DRC: {(Roll No) | Student (Roll no, Name, Branch) (Name, Branch) ∩ branch = CSE}

http://www.knowledgegate.in/gate
Instructor(instructorID, name, dept name, and salary)

{< i, n, d,s > | < i, n, d,s > ∈ instructor ∧ s > 80000}

http://www.knowledgegate.in/gate
Instructor(instructorID, name, dept name, and salary)
Example: Find all details of instructors whose salary is greater than $80,000

{< i, n, d,s > | < i, n, d,s > ∈ instructor ∧ s > 80000}

http://www.knowledgegate.in/gate
The Domain Relational Calculus
• Uses domain variables that take on values from an attributes domain, rather than values for
an entire tuple.

• Closely related to the tuple relational calculus.

• Domain relational calculus serves as the theoretical basis of the widely used QBE language

http://www.knowledgegate.in/gate
Formal Definition
• An expression in the domain relational calculus is of the form
{< x1, x2,..., xn > | P(x1, x2,..., xn)} where x1, x2,..., xn represent domain variables. P
represents a formula composed of atoms.

http://www.knowledgegate.in/gate
• An atom in the domain relational calculus has one of the following forms:
o < x1, x2,..., xn > ∈ r, where r is a relation on n attributes and x1, x2,..., xn are
domain variables or domain constants.

o x op y, where x and y are domain variables and op is a comparison operator


(<, >, ≥). We require that attributes x and y have domains that can be
compared by .

o x op c, where x is a domain variable, op is a comparison operator, and c is a


constant in the domain of the attribute for which x is a domain variable.

http://www.knowledgegate.in/gate
We build up formulae from atoms by using the following rules:
• An atom is a formula.
• If P1 is a formula, then so are ¬P1 and (P1).
• If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.
• If P1(x) is a formula in x, where x is a free domain variable, then ∃ x (P1(x)) and ∀ x (P1(x))
are also formulae.
As a notational shorthand, we write ∃ a, b, c (P(a, b, c)) for ∃ a (∃ b (∃ c (P(a, b, c)))).

http://www.knowledgegate.in/gate
{ < l, b, a > | < l, b, a > ∈ loan ∧ a > 1200] }

http://www.knowledgegate.in/gate
Q Find the branch name, loan number and amount for loan of amount over 1200?
{ < l, b, a > | < l, b, a > ∈ loan ∧ a > 1200] }

http://www.knowledgegate.in/gate
{ < l > | ∃b, a < l, b, a > ∈ loan ∧ a > 1200] }

http://www.knowledgegate.in/gate
Q Find the loan number for each loan of amount over 1200?
{ < l > | ∃b, a < l, b, a > ∈ loan ∧ a > 1200] }

http://www.knowledgegate.in/gate
{ < c, a > | ∃l (< c, l > ∈ borrower ∧ ∃b (< l, b, a > ∈ loan ∧ b = ‘Noida’)) }

http://www.knowledgegate.in/gate
Q Find the name of all the customers who have a loan from Noida branch with loan amount?
{ < c, a > | ∃l (< c, l > ∈ borrower ∧ ∃b (< l, b, a > ∈ loan ∧ b = ‘Noida’ ))}

http://www.knowledgegate.in/gate
{ < c > | ∃l (< c, l > ∈ borrower ∧ ∃b, a (< l, b, a > ∈ loan ∧ b = ‘Noida’))
V ∃a (< c, a > ∈ depositor ∧ ∃bn, bal (< a, bn, bal > ∈ account ∧ bn = ‘Noida’))}

http://www.knowledgegate.in/gate
Q Find the name of all the customers who have a loan or account or both at the bank?
{ < c > | ∃l (< c, l > ∈ borrower ∧ ∃b, a (< l, b, a > ∈ loan ∧ b = ‘Noida’))
V ∃a (< c, a > ∈ depositor ∧ ∃b, a (< a, bn, bal > ∈ account ∧ bn = ‘Noida’))}

http://www.knowledgegate.in/gate
Safety of Expressions
• Similar to TRC we can have an unsafe expression that may generate infinite relations.
• An expression such as
{< i, n, d,s > | ¬ (< i, n, d,s > ∈ instructor )}
is unsafe, because it allows values in the result that are not in the domain of the expression.

http://www.knowledgegate.in/gate
• We say that an expression {< x1, x2,..., xn > | P (x1, x2,..., xn)} is safe if all of the following hold:

1. All values that appear in tuples of the expression are values from dom(P).

2. For every “there exists” sub formula of the form ∃ x (P1(x)), the sub formula is true if and only
if there is a value x in dom(P1) such that P1(x) is true.

3. For every “for all” sub formula of the form ∀x (P1(x)), the sub formula is true if and only if
P1(x) is true for all values x from dom(P1).

http://www.knowledgegate.in/gate
Expressive Power of Languages
• When the domain relational calculus is restricted to safe expressions, it is equivalent in expressive power to the tuple
relational calculus restricted to safe expressions.

• All three of the following are equivalent:


o The basic relational algebra (without the extended relational-algebra operations)
o The tuple relational calculus restricted to safe expressions
o The domain relational calculus restricted to safe expressions

• Domain relational calculus also does not have any equivalent of the aggregate operation, but it can be extended to support
aggregation, and extending it to handle arithmetic expressions is straightforward.

http://www.knowledgegate.in/gate
Important difference between Tuple Relational Calculus and Domain Relational Calculus.
• In the tuple calculus, when we write ∃ s for some tuple variable s, we bind it immediately to a
relation by writing ∃ s ∈ r.
• However, when we write ∃ n in the domain calculus, n refers not to a tuple, but rather to a
domain value. Thus, the domain of variable n is unconstrained until the sub formula < i, n, d,s
> ∈ instructor constrains n to instructor names that appear in the instructor relation.

http://www.knowledgegate.in/gate
Q Consider the following relational schema. (Gate-2013) (2 Marks)
Students (rollno: integer, sname: string)
Courses (courseno: integer, cname: string)
Registration (rollno: integer, courseno: integer, percent: real)
Which of the following queries are equivalent to this query in English?
“Find the distinct names of all students who score more than 90% in the course numbered 107”

i) SELECT DISTINCT S. sname FROM Students as S, Registration as R WHERE


R. rollno = S. rollno AND R. courseno = 107 AND R. percent > 90
ii) ∏sname (σcourseno = 107 ∧ percent > 90 (Registration ⋈ Students))
iii) {T∣∃S ∈ Students, ∃R ∈ Registration(S.rollno=R.rollno
∧R.courseno=107∧R.percent>90∧T.sname=S.sname)}
iv) {⟨SN⟩∣∃SR∃RP(⟨SR,SN⟩∈Students∧⟨SR,107,RP⟩∈Registration∧RP>90)}
a) I, II, III and IV b) I, II and III only
c) I, II and IV only d) II, III and IV only

http://www.knowledgegate.in/gate
GATE (2008)

(A) I only (B) II only


(C) III only (D) III and IV only

http://www.knowledgegate.in/gate
Q Consider the relation employee (name, sex, supervisor Name) with name as the key.
supervisor Name gives the name of the supervisor of the employee under consideration. What
does the following Tuple Relational Calculus query produce? (Gate-2007) (2 Marks)

(A) Names of employees with a male supervisor.


(B) Names of employees with no immediate male subordinates.
(C) Names of employees with no immediate female subordinates.
(D) Names of employees with a female supervisor.
http://www.knowledgegate.in/gate
Q With regard to the expressive power of the formal relational query languages,
which of the following statements is true? (CS-2002)
(A) Relational algebra is more powerful than relational calculus
(B) Relational algebra has the same power as relational calculus
(C) Relational algebra has the same power as safe relational calculus
(D) None of the above

http://www.knowledgegate.in/gate

You might also like