Module-2
RDBMS
Relational DBMS
Relational model can represent as a table with columns and rows. Each
row is known as a tuple. Each table of the column has a name or attribute.
Terminologies:
• Tuple: Each row of a relation is known as tuple. e.g.; STUDENT relation
  given below has 4 tuples.
• Attribute: It contains the name of a column in a particular table. Each
  attribute Ai must have a domain, dom (Ai)
• Domain: The possible values an attribute can take in a relation is called
  its domain. For Example, domain of STUD_AGE can be from 18 to 40.
• Relational instance: In the relational database system, the relational
  instance is represented by a finite set of tuples. Relation instances do
  not have duplicate tuples.
• Relational schema: A relational schema contains the name of the
  relation and name of all columns or attributes.
• Relational key: In the relational key, each row has one or more
  attributes. It can identify the row in the relation uniquely.
EF Codd’s Rules
• Rule 0: Foundation rule
   • RDBMS should be able to manage the stored data in its entirety
     through its relational capabilities.
• Rule 1: Information Rule
   • The data stored in a database, may it be user data or metadata, must
     be a value of some table cell.
• Rule 2: Guaranteed Access Rule
   • Every single data element (value) is guaranteed to be accessible
     logically with a combination of table-name, primary-key (row value),
     and attribute-name (column value).
• Rule 3: Systematic Treatment of NULL Values
   • The NULL values in a database must be given a systematic and
     uniform treatment. This is a very important rule because a NULL can
     be interpreted as one the following − data is missing, data is not
     known, or data is not applicable.
• Rule 4: Active Online Catalog
   • The structure description of the entire database must be stored in an
      online catalog, known as data dictionary, which can be accessed by
      authorized users.
• Rule 5: Comprehensive Data Sub-Language Rule
   • A database can only be accessed using a language having linear syntax
      that supports data definition, data manipulation, and transaction
      management operations.
• Rule 6: View Updating Rule
   • All the views of a database, which can theoretically be updated, must
      also be updatable by the system.
• Rule 7: High-Level Insert, Update, and Delete Rule
   • A database must support high-level insertion, updation, and deletion.
• Rule 8: Physical Data Independence
   • The data stored in a database must be independent of the applications
      that access the database. Any change in the physical structure of a
      database must not have any impact on how the data is being accessed
      by external applications.
• Rule 9: Logical Data Independence
   • The logical data in a database must be independent of its user’s
     view (application). Any change in logical data must not affect the
     applications using it.
• Rule 10: Integrity Independence
   • All its integrity constraints can be independently modified without
     the need of any change in the application. This rule makes a
     database independent of the front-end application and its interface.
• Rule 11: Distribution Independence
   • The end-user must not be able to see that the data is distributed
     over various locations.
• Rule 12: Non-Subversion Rule
   • If a system has an interface that provides access to low-level
     records, then the interface must not be able to subvert the system
     and bypass security and integrity constraints.
Constraints
Every relation has some conditions that must hold for it to be a
valid relation. These conditions are called Relational Integrity
Constraints. There are three main integrity constraints −
        • Key constraints
        • Domain constraints
        • Referential integrity constraints
Key Constraints or Entity Constraints
• The minimal subset of attributes is called key for a relation
  which can identify a tuple uniquely.
• Key constraints force that −
    • in a relation with a key attribute, no two tuples can have
      identical values for key attributes.
    • a key attribute can not have NULL values.
Domain Constraints
  • Attributes have specific values in real-world scenario.
  • For example, age can only be a positive integer. The same
    constraints have been tried to employ on the attributes of a
    relation.
  • Every attribute is bound to have a specific range of values.
  • For example, age cannot be less than zero and telephone
    numbers cannot contain a digit outside 0-9.
Referential integrity Constraints
   • Referential integrity constraints work on the concept of
     Foreign Keys. A foreign key is a key attribute of a relation
     that can be referred in other relation.
   • Referential integrity constraint states that if a relation refers
     to a key attribute of a different or same relation, then that
     key element must exist.
REFERENCE TABLE
Conversion from ER Diagram to Relational Schema
Relational Algebra
• Relational algebra is a procedural query language, which takes
  instances of relations as input and yields instances of relations as
  output.
• It uses operators to perform queries. An operator can be
  either unary or binary.
• The fundamental operations of relational algebra are as follows
     • Select (σ)
     • Project (∏)
     • Union (∪)
     • Intersection(∩)
     • Set different (−)
     • Cartesian product (Χ)
     • Rename (ρ)
     • Join
Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
• Notation − σp(R)
• Where σ stands for selection predicate and R stands for
  relation. p is prepositional logic formula which may use
  connectors like and, or, and not. These terms may use relational
  operators like − =, ≠, ≥, < , >, ≤.
• For example −        σ              (Books)
                        subject = "database"
   • Output − Selects tuples from books where subject is
     'database'.    σsubject = "database" and price = "450"(Books)
   • Output − Selects tuples from books where subject is
     'database' and 'price' is 450.
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
                       Notation − ∏A1, A2, An (R)
Where A1, A2 , An are attribute names of relation R.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
                       ∏subject, author (Books)
Output:-Selects and projects columns named as subject and
author from the relation Books.
Sequences of Operations and the
RENAME Operation
• For example, to retrieve the first name, last name,
  and salary of all employees who work in
  department number 5.
Union Operation (∪)
• It includes all tuples that are in tables A or in B. It also
  eliminates duplicate tuples. So, set A UNION set B would be
  expressed as:
• Notation<- A ∪ B
• For a union operation to be valid, the following conditions
  must hold -
    • R and S must be the same number of attributes.
    • Attribute domains need to be compatible.
    • Duplicate tuples should be automatically removed.
• Example         ∏      (Books) ∪ ∏        (Articles)
                   author           author
• Output − Projects the names of the authors who have either
  written a book or an article or both.
Intersection(∩)
• An intersection is defined by the symbol ∩
• Notation:- A ∩ B
• Defines a relation consisting of a set of all tuple that are in
  both A and B. However, A and B must be union-compatible.
• Example:- Find all the authors who written books and
  articles
               ∏ author (Books) ∩ ∏ author (Articles)
• Output − Projects the names of the authors who have
  written book and article.
Set Difference (−)
• The result of set difference operation is tuples, which are
  present in one relation but are not in the second relation.
• Notation :- R − S
• Finds all the tuples that are present in R but not in S.
               ∏ author (Books) − ∏ author (Articles)
• Output − Provides the name of authors who have
  written books but not articles.
Cartesian Product (Χ)
• Combines information of two different relations into one.
• Notation :− A Χ B.
• If A has m tuples and and B has n tuples, cross
  product of A and B will have m X n tuples.
Rename Operation (ρ)
• The results of relational algebra are also relations but
  without any name. The rename operation allows us to
  rename the output relation. 'rename' operation is denoted
  with small Greek letter rho ρ.
• Notation − ρ (X,E)
• Where the result of expression E is saved with name of X.
• Example:-
                     ρ(STUDENT1, STUDENT)
• Output:- STUDENT table is renamed as STUDENT1
Join Operations
• Join is a combination of a Cartesian product followed by a
  selection process.
• A Join operation pairs two tuples from different relations, if
  and only if a given join condition is satisfied.
• Types of Join Operation
  • Conditional Join(⋈c)
  •   Equijoin(⋈)
  •   Natural Join(⋈)
  •   Left Outer Join(⟕)
  •   Right Outer Join(⟖)
  •   Full Outer Join(⟗)
                                                                   20
Conditional Join(⋈c)
• Conditional Join is used when you want to join two or more
  relations based on some conditions.
• Example: Retrieve the name of the manager of each
  department.
In terms of basic operators (cross product and selection) :
            σ (MGRSSN=SSN)(DEPARTMENT×EMPLOYEE)
  Equijoin(⋈)
• Equijoin is a special case of conditional join where only the
  equality condition holds between a pair of attributes.
• As values of two attributes will be equal in the result of
  equijoin, only one attribute will appear in the result.
• Example: Retrieve the name of the manager of each
  department.
       DEPARTMENT⋈ MGRSSN=SSN EMPLOYEE
Natural Join(⋈)
• It is a special case of equijoin in which the equality condition
  holds on all attributes which have the same name in
  relations R and S (relations on which join operation is
  applied).
• While applying natural join on two relations, there is no
  need to write equality conditions explicitly.
• Example: Select students whose ROLL_NO is equal to
  ROLL_NO of STUDENT_SPORTS as:
                       STUDENT⋈STUDENT_SPORTS
Analysis of Inner join
• Theta Join, Equijoin, and Natural Join are called inner joins.
• An inner join includes only those tuples with matching
  attributes and the rest are discarded in the resulting relation.
                                                                     26
Outer Joins
• we need to use outer joins to include all the tuples from the
  participating relations in the resulting relation.
• There are three kinds of outer joins −
• left outer join,
• right outer join,
• full outer join.
                                                                  27
Left Outer Join(R                S)
• All the tuples from the Left relation, R, are included in the
  resulting relation.
• If there are tuples in R without any matching tuple in the
  Right relation S, then the S-attributes of the resulting relation
  are made NULL.
                                                                      28
Right Outer Join: ( R Right Outer Join S )
• All the tuples from the Right relation, S, are included in the
  resulting relation.
• If there are tuples in S without any matching tuple in R, then
  the R-attributes of resulting relation are made NULL.
           Course (Left)                  HOD (Right)
     A           B                 C            D
     100         Database          100          Alex
     101         Mechanics         102          Maya
     102         Electronics       104          Mira
                                                                   29
Full Outer Join: ( R Full Outer Join S)
• All the tuples from both participating relations are included in
  the resulting relation.
• If there are no matching tuples for both relations, their
  respective unmatched attributes are made NULL.
         Course (Left)                      HOD (Right)
        A                B             C                  D
       100           Database         100             Alex
       101          Mechanics         102             Maya
       102          Electronics       104             Mira
                                                                     30
DIVISION Operation
• The DIVISION operation is applied to two relations R(Z) ÷ S(X), where
  the attributes of R are a subset of the attributes of S; that is, X ⊆ Z
• Syntax:-
                          Result(Attribute) ← R ÷ S
• Example :- Retrieve the names of employees who work on all the
  projects that ‘John Smith’ works on.
      R1 ← σFname=‘John’ AND Lname=‘Smith’(EMPLOYEE)
      R2 ← πPno(WORKS_ON Essn=Ssn R1)
      R3 ← πEssn, Pno(WORKS_ON)
      R4(Ssn) ← R3 ÷ R2
      RESULT ← πFname, Lname(R4 * EMPLOYEE)
R3   R2
     R4
Generalized Projection
• The generalized projection operation extends the projection
  operation by allowing functions of attributes to be included in the
  projection list.
                    Syntax:       ∏ F1,F2,….,Fn (R)
• where F1, F2, ..., Fn are functions over the attributes in relation R and
  may involve arithmetic operations and constant values
• This operation is helpful when developing reports where computed
  values have to be produced in the columns of a query result
Example
• As an example, consider the relation
• EMPLOYEE (Ssn, Salary, Deduction, Years_service)
• A report may be required to show
• Net Salary = Salary – Deduction,
• Bonus = 2000 * Years_service, and
• Tax = 0.25 * Salary.
• Then a generalized projection combined with renaming may be used
  as follows:
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax) (∏ssn   , Salary – Deduction, 2000 * Years_service,
0.25 * Salary
                (EMPLOYEE)).
Aggregate Functions and Grouping
• The aggregate functions are used to perform calculations on a set of
  tuples (rows) from a relation (table) and return a single value.
• These functions are crucial for summarizing data, such as calculating
  totals, averages, minimum and maximum values, etc.
• Aggregate functions in relational algebra include:
   •   SUM: Adds up all the values in a specified attribute.
   •   COUNT: Counts the number of tuples in a relation.
   •   AVG: Calculates the average of values in a specified attribute.
   •   MIN: Returns the minimum value of a specified attribute.
   •   MAX: Returns the maximum value of a specified attribute.
• Syntax: <group attributes>ℑ <function list> (R)
• where <group attributes> are the list of attributes of the relation
  specified in R,
• and <function list> is a list of (<function> <attribute> ) pairs.
• Example: Retrieve each department number, the number of
  employees in the department, and their average salary, while
  renaming the resulting attributes.
ρR(Dno, No_of_employees, Average_sal) (Dno ℑ COUNT (Ssn), AVERAGE (Salary) (EMPLOYEE))
• Retrieve the name and address of all employees who work
  for the 'Research' department.
• For every project located in 'Stafford', list the project
  number, the controlling department number, and the
  department manager's last name, address, and birth date.
• Retrieve the names of employees who have no dependents.
• Retrieve the names of all employees in department 5 who
  work more than 10 hours per week on the 'ProductX'
  project.
• List the names of all employees who have a dependent with
  the same first name as themselves.
• List the last names of all department managers who have no
  dependents.
Relational Calculus
 Relational calculus is a non-procedural query language.
 In the non-procedural query language, the user is concerned
  with the details of how to obtain the end results.
 The relational calculus tells what to do but never explains
  how to do.
 Types of Relational calculus
                                                                42
Comparison Chart
BASIS FOR
COMPARISON   RELATIONAL ALGEBRA               RELATIONAL CALCULUS
Basic        Relational Algebra is a          Relational Calculus is
             Procedural language.             Declarative language.
States       Relational Algebra states how to Relational Calculus states
             obtain the result.               what result we have to
                                              obtain.
Order        Relational Algebra describes the Relational Calculus does not
             order in which operations have specify the order of
             to be performed.                 operations.
Domain       Relational Algebra is not domain Relation Calculus can be
             dependent.                       domain dependent.
Related      It is close to a programming     It is close to the natural
             language.                        language.
                                                                             43
Tuple Relational Calculus
The Tuple Relational Calculus list the tuples selected from a relation,
based on a certain condition provided. It is formally denoted as:
                                            { t | P(t) }
Where t is the set of tuples from which the condition P is true.
where t = resulting tuples,
P(t) = known as Predicate and these are the conditions that are used to
         fetch t
Thus, it generates a set of all tuples t, such that Predicate P(t) is true for
t.
P(t) may have various conditions logically combined with OR (∨), AND
(∧), NOT(¬).
It also uses quantifiers:
∃ t ∈ r (Q(t)) = ”there exists” a tuple in t in relation r such that
predicate                    Q(t) is true.
∀ t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.
                                                                                 44
Build TRC expression case
 Ex- let t is a tuple variable assigns to EMP relation as follows:
                 t → EMP( eno, ename, age, sal)
Q1. list the Details of the employees who are getting a salary
above 10,000.
        { t| EMP(t) ∧ t.sal > 10,000 } // list all attributes
Q1. list the ename and age of the employees who are getting
salary above 10,000.
           { t.ename, t.age | EMP(t) ∧ t.sal > 10,000 }
                                                                      45
Some more
Q1. list the age of the employee whose eno is 102.
                  {t.age | EMP(t) ∧ t.eno = 102 }
Q2. list the ename, age, and sal of the employees who are getting
salary below 10,000 and above 20,000.
{ t.ename, t.age, t.sal | EMP(t) ∧ t.sal < 10,000 ∨ t.sal > 20,000}
For practice:
1. list the employee name and age of the employees whose age is
    below 35 or above 50 and sal is above 30000.
2. List the eno and sal of the employees who are getting salary
    between 20,000 to 50000.
                                                                      46
Domain Relational Calculus
• The Domain Relational Calculus list the attributes to be selected
  from a relation, based on certain condition.
• The formal definition of Domain Relational Calculus is as follows:
               {<X1, X2, X3, . . . Xn> | P(X1, X2, X3, . . . Xn)}
• Where X1, X2, X3, . . . Xn are the attributes and P is the certain
  condition.
• Domain relational calculus uses the same operators as tuple
  calculus. It uses logical connectives ∧ (and), ∨ (or), and ┓ (not).
• It uses Existential (∃) and Universal Quantifiers (∀) to bind the
  variable.
                                                                        47
Build DRC expression case
EMP(eno, ename, age, sal)
For individual domain one variable need to be
assigned.
P = eno < variable p ranges over the domain eno>
q =ename
r = age
s = sal
Q1. Build the DRC expression to find the age of Mr. X
[r|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ q= ‘Mr. X’} ]
                                                        48
More …
Q2. List the ename of the employees who are getting salary 10000 and
above.
[q|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ s ≥ 10000} ]
Q3. List the employee name and age of the employees who age is
below 35 or above 50 and sal is above 30000.
[q, r|(∃p)(∃q)(∃r)(∃s) { EMP(pqrs) ∧ r < 35 ∨ r > 50 ∧ s ≥ 30000 } ]
For Practice:
1. List the employee id and ename of the employees who age is 35
    or above 50 and sal is below 30000.
2. List the eno and sal of the employees who are getting salary
    between 20,000 to 50000.
                                                                       49
Queries
• Queries-1: Find the tuples of loans where amount is
  greater than or equal to 10000.
• Queries-2: Find the loan number for each loan of an
  amount greater or equal to 10000.
• Queries-3: Find the names of all customers who have a
  loan and an account at the bank.
• Queries-4: Find the names of all customers having a loan
  at the “ABC” branch.
Tuple Calculus
• {t| t ∈ loan ∧ t.amount >=10000}
• {t| ∃ s ∈ loan(t. loan number = s.loan number ∧ s.amount
  >=10000)}
• {t | ∃ s ∈ borrower( t. customer-name = s. customer-name)
  ∧ ∃ u ∈ depositor( t. customer-name = u. customer-name)}
• {t | ∃ s ∈ borrower(t. customer-name = s. customer-name ∧
  ∃ u ∈ loan(u. branch-name = “ABC” ∧ u. loan-number =
  s.loan-number))}
• Domain Calculus
• { p, q, r | (∃p)(∃q)(∃r) {loan(pqr) ∧ r >=10000}}
• { p | (∃p)(∃q)(∃r) {loan(pqr) ∧ r >=10000 }}
• {x | (∃x)(∃y) (∃p)(∃q) { depositor(xy) ∧ borrower(pq) ∧ x=p}}
Queries
1. Retrieve the names of all employees in department 5 who work more than 10 hours
    per week on the 'ProductX' project.
2. List the names of all employees who have a dependent with the same first name as
    themselves.
3. Find the names of all employees who are directly supervised by 'Franklin Wong'.
4. For each project, list the project name and the total hours per week (by all
    employees) spent on that project.
5. Retrieve the names of all employees who work on every project.
6. Retrieve the names of all employees who do not work on any project.
7. For each department, retrieve the department name and the average salary of all
    employees working in that department.
8. Retrieve the average salary of all female employees.
9. Find the names and addresses of all employees who work on at least one project
    located in Houston but whose department has no location in Houston.
10. List the last names of all department managers who have no dependents.
Normalization
Problems Without Normalization
• Since, databases contains a lot of data in the form of tables, it
  is very difficult or almost impossible to manage data if any of
  the anomaly occur i.e. either Insertion, Deletion or Updation.
  Hence, removal of redundant data is very necessary.
• Insertion, Updation and Deletion Anomalies are very frequent
  if database is not normalized.
• Inconsistency – redundancy---anomalies--extra memory space
                                                                      55
Example of Anomalies
Sid   Sname   cid   Cname   Fid   Fname   Salary
1     Ram     C1    Dbms    F1    John    50,000
2     Ravi    C2    Java    F2    Bob     70,000
3     Nitin   C1    Dbms    F1    John    50,000
4     Amit    C1    Dbms    F1    John    50,000
Normalizations on Relational Database
 Normalization is a database design technique which
  organizes tables in a manner that reduces redundancy and
  dependency of data.
 It divides larger tables to smaller tables and links them using
  relationships.
 The inventor of the relational model Edgar Codd proposed
  the theory of normalization with the introduction of First
  Normal Form, and he continued to extend theory with
  Second and Third Normal Form. Later he joined with
  Raymond F. Boyce to develop the theory of Boyce-Codd
  Normal Form(BCNF).
                                                                    57
Evolution of Normalization theories
 Theory of Data Normalization in SQL is still being developed
  further.
 For example, there are discussions even on 6th Normal
  Form.
 However, in most practical applications, normalization
  achieves its best in 3rd Normal Form.
 Normalization is carried out in practice so that the resulting
  designs are of high quality and meet the desirable
  properties
                                                                   58
First Normal Form(1NF)
• The First Normal Form(1NF) works on the concept of “Atomicity”
  in values of every individual tuple of tables present in the
  database.
• It means, a relation is said to be in "1NF" if, every attribute in a
  relation is has “Single Valued” tuple.
Functional Dependency (FD)
 FD is a set of constraints between two attributes in a
  relation.
 A relationship which only exists when an attribute can
  determine other attribute functionally.
• Functional Dependency in DBMS is denoted using an
  arrow between two or more attributes such as FD : A
  →B
• Here, A & B are the attributes present in any relation.
• “A → B” means, “B” is functionally dependent upon “A”
  or “A” functionally determines “B”. Functional
  dependency acts as a constraint between set of
  attributes present in any database.
                                                            60
Example-1 : Consider a table student_details containing details of
some students.
Example : student_details Table
We can conclude from Roll_No attribute in the table, we are able
to determine the Name of student uniquely and same is the case
with marks too. Hence, we can say that Name and Marks are
functionally dependent on Roll_No but the vice versa is not true.
FD1 : Roll_No → Name
FD2 : Roll_No → Marks
                                                                     61
Armstrong’s Axioms
• Axioms in database management systems was introduced
  by William W. Armstrong in late 90’s and these axioms play a
  vital role while implementing the concept of functional
  dependency in DBMS for database normalization.
   • Reflexive : It means, if set “B” is a subset of “A”, then A → B.
   • Augmentation : It means, if A → B, then AC → BC.
   • Transitive : It means, if A → B and B → C, then A → C.
   • Decomposition : It means, if A → BC, then A → B and A →
     C.
   • Union : It means, if A → B and A → C, then A → BC.
   • Pseudo-Transitivity : It means, if A → B and DB → C,
                                     then DA → C.
Closure Of Functional Dependency
• The Closure Of Functional Dependency means the
  complete set of all possible attributes that can be
  functionally derived from given functional dependency
  using the inference rules known as Armstrong’s Rules.
• If “F” is a functional dependency then closure of
  functional dependency can be denoted using “{F}+”.
• There are three steps to calculate closure of functional
  dependency. These are:
   • Step-1 : Add the attributes which are present on Left
     Hand Side in the original functional dependency.
   • Step-2 : Now, add the attributes present on the Right
     Hand Side of the functional dependency.
   • Step-3 : With the help of attributes present on Right
     Hand Side, check the other attributes that can be
     derived from the other given functional dependencies.
   • Step-4: Repeat this process until all the possible
     attributes which can be derived are added in the closure.
Example
Consider a relation R(A,B,C,D,E) having below mentioned
functional dependencies.
FD1 : A → BC
FD2 : C → B
FD3 : D → E
FD4 : E → D
Now, we need to calculate the closure of attributes of the
relation R. The closures will be:
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {B, C}
{D}+ = {D, E}
{E}+ = {E,D}
Identifying Candidate Key
•“A Candidate Key of a relation is an attribute or set of attributes
that can determine the whole relation or contains all the
attributes in its closure."
Example-1 : Consider the relation R(A,B,C) with given functional
dependencies :
FD1 : A → B
FD2 : B → C
Now, calculating the closure of the attributes as :
{A}+ = {A, B, C}
{B}+ = {B, C}
{C}+ = {C}
Clearly, “A” is the candidate key as, its closure contains all the
attributes present in the relation “R”.
Example-2 : Consider another relation R(A, B, C, D, E) having the
Functional dependencies :
FD1 : A → BC
FD2 : C → B
FD3 : D → E
FD4 : E → D
Now, calculating the closure of the attributes as :
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {C, B}
{D}+ = {E, D}
{E}+ = {E, D}
In this case, a single attribute is unable to determine all the attribute on
its own like in previous example. Here, we need to combine two or
more attributes to determine the candidate keys.
{A, D}+ = {A, B, C, D, E}
{A, E}+ = {A, B, C, D, E}
• Example -2: Consider a relation R ( A , B , C , D , E , F , G ) with the
  functional dependencies-
                                  • A → BC
                                 • BC → DE
                                   • D→F
                                  • CF → G
• Example -3: Consider the given functional dependencies-
                                 • AB → CD
                                  • AF → D
                                  • DE → F
                                   • C→G
                                   • F→E
                                   • G→A
Key Definitions
• Prime Attributes : Attributes which are indispensable part of
  candidate keys. For example : “A, D, E” attributes are prime
  attributes in above example-2.
• Non-Prime Attributes : Attributes other than prime attributes
  which does not take part in formation of candidate keys. For
  example “B, C”.
• Extraneous Attributes : Attributes which does not make any
  effect on removal from candidate key.
• For example : Consider the relation R(A, B, C, D) with FDs
• FD1 : A → BC            Here, Candidate key can be “AD” only.
  Hence,
• FD2 : B → C                     Prime Attributes : A, D.
• FD3 : D → C                      Non-Prime Attributes : B, C
• Extraneous Attributes : B, C(As if we add any of the attribute to the
  candidate key, it will remain unaffected).
Second Normal Form(2NF)
• Any relation to be in 2NF must follow the below two rules:
    • The relation/table must be in 1NF.
    • There should not be any partial dependency.
    • Whenever any non-prime attribute is dependent upon a part of
      candidate key of a relation, it is known as partial dependency.
• For example : Consider a relation R(X,Y,Z,T) with following functional
  dependencies :
    • FD1 : XY → T
    • FD2 : Y → Z
Step 1: Find the closure and determine the candidate key.
    • {X}+ = {X}
    • {Y}+ = {Y,Z}
    • {XY}+ = {X,Y,Z,T}              Here, Candidate key will be “XY”.
    • {T}+ = {T}
Step-2: Check whether the non-prime attributes in the FD’s
are fully dependent upon candidate key or not.
• XY → T (Attribute “T” fully depends upon “XY” as “XY” is a
  candidate key)
• Y → Z (Attribute “Z” is partially dependent upon the candidate
  key as, “Y” is a part of “XY”)
Step-3: To remove partial dependency, decompose the whole
relation R(X,Y,Z,T) into R1(X,Y,T) and R2(Y,Z).
• Therefore, R1 : XY → T (“XY” will be the candidate key)
• and R2 : Y → Z (“Y” will be the candidate key)
• Hence in relation R2, “Z” will be fully dependent upon “Y” as it is
  the candidate key for relation R2.
Third Normal Form(3NF)
• Any relation to be in Third Normal Form(3NF) must follow the
  below mentioned two rules :
   • The relation/table should be in Second Normal Form.
   • The relation/table must not have any transitive
     dependency.
   • If a “Non-Prime” attribute is functionally dependent upon another
     “Non-Prime” attribute, that functional dependency will also be
     termed as transitive.
Example:- Consider the relation R(X,Y,Z,T) with the following FDs
   • FD1 : XY → Z
   • FD2 : Z → T
Step-1 : Check whether given relation is in 2NF or not.
Here, “XY” will be the candidate key for the above mentioned FD’s
and no partial dependency exists in this relation too.
Step-2 : Remove transitive dependencies from the relation
                  and try to decompose it into a separate
relations.
• XY → Z (“Z” is fully dependent on candidate key “XY”)
• Z → T (Z & T, both are non-prime attributes and thus forms a
  transitive dependency)
Step-3: Splitting the whole relation into two separated
                  relations R1(X,Y,Z) and R2(Z,T)
   • R1 : XY → Z
  (Here, “XY” is the candidate key and there is no transitive
  dependency in this whole relation)
   • R2 : Z → T
  (Here, “Z” is the candidate key and there is no transitive dependency
  in this whole relation)
• Hence, the relation can be termed to be in 3NF, as both the
  relation R1 and R2 does not have any transitive dependency
  as well as both as are in 2NF too.
BCNF (Boyce Codd. Normal Form)
• BCNF is known as “Boyce Codd Normal Form” and is a
  successor to “Third Normal Form”.
• Any relation to be in BCNF must follow below mentioned
  two rules.
    • The relation/table needs to be in 3NF.
    • For a functional dependency P → Q, “P” should be a
      super key.
• BCNF deals with the cases where “Non-Prime attribute
  derives a prime attribute”.
Example:- Consider a relation R(X,Y,Z) having following
functional dependencies :
   • FD1 : XY → Z
   • FD2 : Z → Y
Step-1 : Check whether given relation is in 3NF or not.
   • Attribute “Z” is fully dependent upon the candidate key “XY” and there
     is not transitive dependency too. Also, “X & Y” are prime attributes
     whereas “Z & T” are non-prime attributes.
Step-2 : Check for FD’s where a non-prime attribute determines
          a prime attribute.
   • In this example “Z → Y” is that FD.
   • FD1 : XY → Z (“XY” is the candidate key, so no problem in this FD)
   • FD2 : Z → Y (Here, a non-prime attribute is deriving a prime attribute)
Step-3: Decompose relation R(X,Y,Z,T) into R1(X,Y,Z) an
         R2(Z, Y) to convert the relation in BCNF.
   • R1 : XY → Z
   • R2 : Z → Y
   • “XY” and “Z” are super keys of their respective relations i.e. “R1”
     and “R2” as each of them can uniquely determine the attributes
     present in the relations.
Fourth normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form
  and has no multi-valued dependency.
• For a dependency X → Y, if for a single value of X, multiple
  values of Y exists, then the relation will be a multi-valued
  dependency.
• Also, a table should have at-least 3 columns for it to have a
  multi-valued dependency.
• And, for a relation R(X,Y,Z), if there is a multi-valued
  dependency between, X and Y, then Y and Z should be
  independent of each other.
Steps to solve Normalization
problems
• Step -1: Find candidate Key
• Step-2: List Prime and Non-Prime attributes
• Step-3: Check different NFs
Challenging task to practice
Q.          R = {A,B,C,D,E,F}
FD= {AB→C, AD→B, C→B, F→AD,F→E}
1) Is R in 2 NF? If no, decompose R into 2NF.
2) Check R in 3NF or not, if not Decompose into 3NF.
3) Further check R in BCNF or not, if not
                                                       79
Challenging task to practice
Q.          R = {A,B,C,D,E,F,G,H,I,J,K}
FD= {AB→CK, AD→BG, C→BH, F→ADI,F→EJ}
1) Is R in 2 NF? If no, decompose R into 2NF.
2) Check R in 3NF or not, if not Decompose into 3NF.
3) Further check R in BCNF or not, if not
                                                       80