KEMBAR78
Dbms Chapter 3&4 | PDF | Sql | Relational Database
0% found this document useful (0 votes)
72 views56 pages

Dbms Chapter 3&4

The document discusses various SQL concepts including: 1. SQL queries, joins, and set operations like UNION, INTERSECT, and EXCEPT that allow combining result sets from multiple tables. 2. Equi joins match rows based on equality between columns, while non-equi joins use comparison operators. 3. Inner joins return only matched data, while outer joins also return non-matched or null data.

Uploaded by

Fafani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views56 pages

Dbms Chapter 3&4

The document discusses various SQL concepts including: 1. SQL queries, joins, and set operations like UNION, INTERSECT, and EXCEPT that allow combining result sets from multiple tables. 2. Equi joins match rows based on equality between columns, while non-equi joins use comparison operators. 3. Inner joins return only matched data, while outer joins also return non-matched or null data.

Uploaded by

Fafani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Chapter 3: SQL (structured query language)

Syllabus :

1. Overview
2. The form of basic SQL Query :UNION,INTERSECT &EXCEPT
3. Join operations :Equi join and Non Equi join
4. Nested Queries : correlated and uncorrelated
5. Aggregate functions
6. Null values
7. Views
8. Triggers

1. Overview :

What is SQL?

➢ SQL is a language to operate databases; it includes Database Creation, Database


Deletion, Fetching Data Rows, Modifying & Deleting Data rows, etc.
➢ SQL stands for Structured Query Language, a computer language for storing,
manipulating and retrieving data in a relational database. SQL was developed in the
1970s by IBM Computer Scientists and became a standard of the American National
Standards Institute (ANSI) in 1986, and the International Organization for
Standardization (ISO) in 1987.
➢ SQL is the standard language to communicate with Relational Database Systems. All
the Relational Database Management Systems (RDMS) like MySQL, MS Access,
Oracle, Sybase, Informix, Postgres and SQL Server use SQL as their Standard Database
Language.

Why SQL?

SQL is widely popular because it offers the following advantages −

➢ Allows users to access data in the relational database management systems.


➢ Allows users to describe the data.
➢ Allows users to define the data in a database and manipulate that data.
➢ Allows to embed within other languages using SQL modules, libraries &
pre-compilers.
➢ Allows users to create and drop databases and tables.
➢ Allows users to create view, stored procedure, functions in a database.
➢ Allows users to set permissions on tables, procedures and views.
A Brief History of SQL
➢ 1970 − Dr. Edgar F. "Ted" Codd of IBM is known as the father of relational
databases. He described a relational model for databases.
➢ 1974 − Structured Query Language (SQL) appeared.
➢ 1978 − IBM worked to develop Codd's ideas and released a product named
System/R.
➢ 1986 − IBM developed the first prototype of relational database and
standardized by ANSI. The first relational database was released by
Relational Software which later came to be known as Oracle.
➢ 1987 − SQL became the part of the International Organization for
Standardization (ISO).
How SQL Works?

When executing an SQL command for any RDBMS, the system determines the best way to
carry out your request, and the SQL engine figures out how to interpret the task.

There are various components included in this process. These components are −

• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.

A classic query engine handles all the non-SQL queries, but a SQL query engine will not handle
logical files. Following is a simple diagram showing the SQL Architecture –
SQL Basic Commands: refer chapter 1 database langages

2. The form of basic SQL Query :UNION,INTERSECT &EXCEPT

SQL Set Operators :

In SQL, set operators are used to combine the results of two or more SELECT statements into
a single result set. The commonly used set operators are UNION, INTERSECT, and EXCEPT
(or MINUS in some databases). These operators allow you to perform operations on sets of
rows rather than individual rows. Here's an overview of SQL set operators.

The set operators are

1. UNION, UNION ALL


2. INTERSECT
3. EXCEPT

Basic Rules on Set Operations:

➢ Result sets of all the queries must be the same number of columns.
➢ In all result sets the data type of each of the columns must be well matched and
compatible with the data type of its corresponding columns in another result set.
➢ For sorting the result, the ORDER BY clause can be applied to the last query.
Example :

Create two tables speakers and authors :

Speakers Authors

UNION

➢ Union combines the results of two queries into a single result set of all matching
rows.
➢ Both queries must have the same number of columns and compatible data types to
unite.
➢ All duplicate records are removed automatically unless UNION ALL is used.
➢ Generally, it can be useful in applications where tables are not perfectly normalized,
for example, a data warehouse application.

Synthax :

SELECT column_name(s) FROM table1


UNION
SELECT column_name(s) FROM table2;

Example-1:

You want to invite all the Speakers and Authors for the annual conference. Hence, how will
you prepare the invitation list?

Synthax :

select name from Speakers

union

select name from Authors

order by name
output :

As you can see here, the default order is ascending order and you have to use in the last query
instead of both queries.

UNION ALL

It will not remove duplicate records. It can be faster than UNION.

Example-2

You want to give a prize to all the Speakers and Authors at the annual conference. Hence,
how will you prepare the prize list?

Syntax:

select name, 'Speaker' as 'Role' from Speakers

union all

select name, 'Author' as 'Role' from Authors

order by name

output
INTERSECT
It is used to take the result of two queries and returns the only those rows which are
common in both result sets. It removes duplicate records from the final result set.
Syntax
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;

Example-3

You want the list of people who are Speakers and they are also Authors. Hence, how will you
prepare such a list?

Syntax:

select name from Speakers

intersect

select name from Authors

order by name

Output:

EXCEPT :
It is used to take the distinct records of two one query and returns the only those rows which
do not appear in the second result set.
Syntax :
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2;

Example-4
You want the list of people who are only Speakers and they are not Authors. Hence, how will
you prepare such a list?
Syntax :
select name from Speakers

except

select name from Authors

order by name

OUTTPUT

Example-5
You want the list of people who are only Authors and they are not Speakers. Hence, how will
you prepare such a list?
Syntax :
select name from Authors

except

select name from Speakers

order by name

output :
➢ UNION combines results from both tables.
➢ UNION ALL combines two or more result sets into a single set, including all
duplicate rows.
➢ INTERSECT takes the rows from both the result sets which are common in both.
➢ EXCEPT takes the rows from the first result data but does not in the second result
set.
3. Join operations

A SQL Join statement combines data or rows from two or more tables based on a common
field between them.

➢ SQL Joins are mostly used when a user is trying to extricate data from multiple
tables (which have one-to-many or many-to-many relationships with each other) at
one time.
➢ Large databases are often prone to data redundancy, i.e., the creation of repetitive
data anomalies by insertion, deletion, and updation. But by using SQL Joins, we
promote database normalization, which reduces data redundancy and eliminates
redundant data.

➢ In SQL joins are mainly two types they are INNER JOINS and OUTER JOINS.
Join can represented by using below symbol.

In SQL, joins can be classified as either equi-joins or non-equi-joins based on the


conditions used to match rows between tables.

JOINS

INNER JOIN OUTER JOIN

THETA JOIN NATURAL JOIN LEFT JOIN RIGHT JOIN FULL OUTER
(Non-Equi join) (Equi-Join)
JOIN

CONDITION
Equi-Join: An equi-join is a type of join where the matching condition is based on equality
between columns from different tables. It matches rows where the specified columns have the
same values.

➢ the Equi Join in SQL returns only the data in all the tables we are comparing based on
the common column field. It does not display null or unmatchable data.
➢ The equality operator in the Equi Join operation is used to refer to the equality in
the WHERE clause. However, it returns the same result when we use
the JOIN keyword with the ON clause along with column names and their respective
tables
Example :

SELECT *

FROM TableName1, TableName2

WHERE TableName1.ColumnName = TableName2.ColumnName;

OR

SELECT *

FROM TableName1

JOIN TableName2

ON TableName1.ColumnName = TableName2.ColumnName;

➢ An equi join is any JOIN operation that uses only and only the equals sign. If there
is a query with more than one join condition, out of which one condition has an
equals sign, and the other doesn't, then this query would be considered a non-equi
join in SQL.

Non-equi join :

➢ A non-equi join, also known as a range join or a theta join, is a type of join operation
where the joining condition involves operators other than equality, such as greater
than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), or
not equal to (!= or <>).
➢ Non-Equi Join is also a type of INNER Join in which we need to retrieve data from
multiple tables. In a non-equi join, rows are matched based on a range of values
rather than a direct equality. Non-equi joins are less common and are often used to
solve specific data analysis problems.

However, we use the Non-Equi joins for the below-mentioned reasons-

➢ Retrieving data matching in a range of values.


➢ Checking for duplicate data between tables.
➢ For calculating totals.
Here's an example

SELECT *
FROM TableName1, TableName2
WHERE TableName1.columnName [> | < | >= | <= | != | BETWEEN ] table_name2.column;
EQUI-Join Example:

Suppose we have two tables, namely state and city, which contain the name of the states and
the name of the cities, respectively. In this example, we will map the cities with the states in
which they are present.

Table City :

Table state: City_ID City_Name


1 Luknow
State_ID State_Name
1 Gorakhpur
1 Uttar Pradesh
1 Noida
2 Uttarakhand
2 Dehradun
3 Madhyapradesh
2 Rishikesh
3 Gwalior

Now, if we execute a query of Equi-join using the equality operation and the WHERE clause,
then

SELECT *
FROM state, city
WHERE state.State_Id = city.City_Id;
Output

State_ID State_Name City_Id City_Name

1 Uttar Pradesh 1 Luknow

1 Uttar Pradesh 1 Gorakhpur

1 Uttar Pradesh 1 Noida

2 Uttarakhand 2 Dehradun

2 Uttarakhand 2 Rishikesh

3 Madhyapradesh 3 Gwalior

Non Equi join Example:

➢ We take two tables, test1 and test2.

Table test1 Table test2

S_NO Name S_NO Name


20 Amith 80 Tarun
30 Ankush 60 Mitali
10 Akash 10 Akash
50 jatin 50 Jatin
5 Aman

Now, if we execute a query of Non-Equi-join using any operator other than the equality
operator, such as >(greater than) with the WHERE clause –

SELECT *
FROM test1,test2
WHERE test1.SNo > test2.SNo;
Output

S_NO Name S_NO Name


20 Amith 10 Akash
30 Ankush 10 Akash
50 Jatin 10 Akash
20 Amith 5 Aman
30 Ankush 5 Aman
50 Jatin 5 Aman
INNER JOIN in SQL
An inner join returns only the rows that have matching values in both tables being joined. It
combines rows from two tables where the join condition is satisfied. The syntax for an inner
join is as follows:

SELECT *

FROM table1

INNER JOIN table2

ON table1.column = table2.column;

NATURAL JOIN

➢ SQL Natural Join is a type of Inner join based on the condition that columns having
the same name and datatype are present in both the tables to be joined.

SELECT *

FROM table-1

NATURAL JOIN table-2;

OUTER JOINS in SQL

SQL Outer joins give both matched and unmatched rows of data depending on the type of outer
joins. These types are outer joins are sub-divided into the following types:

➢ Left Outer Join


➢ Right Outer Join
➢ Full Outer Join
Left Join (or Left Outer Join):

➢ A left join returns all the rows from the left table and the matching rows from the
right table. If there are no matching rows in the right table, NULL values are
included for the columns of the right table. The syntax for a left join is as follows:
SELECT *
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;

Right Join (or Right Outer Join):

A right join returns all the rows from the right table and the matching rows from the left table.
If there are no matching rows in the left table, NULL values are included for the columns of
the left table. The syntax for a right join is as follows:

SELECT *

FROM table1

RIGHT JOIN table2

ON table1.column = table2.column;

Full Join (or Full Outer Join):

A full join returns all the rows from both the left and right tables. If there are no matching rows
in either table, NULL values are included for the columns of the non-matching table.
SELECT *

FROM table1

FULL JOIN table2

ON table1.column = table2.column;

Theta join :

In SQL, a theta join, also known as a non-equi join or a range join, is a type of join operation
where the joining condition involves comparison operators other than equality (=).

The syntax for a theta join typically involves using the JOIN keyword followed by the tables
being joined and the join condition with the desired comparison operator(s). Here's an example:

SELECT *

FROM table1

JOIN table2

ON table1.column1 < table2.column2;

Examples :

Customer Orders

Id Cust_Name Address Or_Id amount Cust_id


1 Ram A 601 1000 1
2 Raj B 602 2000 4
3 Rani C 603 3000 2

1. Inner join :
SELECT Orders.Or_Id ,Customers.Cust_Name
FROM Orders
INNER JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
603 Raj

2. LEFT JOIN

SELECT Orders.Or_Id ,Customers.Cust_Name


FROM Orders
LEFT JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
602 NULL
603 Raj

Right Join

SELECT Orders.Or_Id ,Customers.Cust_Name


FROM Orders
RIGHT JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
603 Raj
NULL Rani
Full join
SELECT Orders.Or_Id ,Customers.Cust_Name
FROM Orders
FULL JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
602 NULL
603 Raj
NULL Rani

4. Nested Queries:
➢ A nested query in SQL contains a query inside another query. The outer query will
use the result of the inner query. For instance, a nested query can have
two SELECT statements, one on the inner query and the other on the outer query.

Types of Nested Queries in SQL :


Nested queries in SQL can be classified into two different types:

➢ Unco-related Nested Queries (Independent Nested Queries)


➢ Co-related Nested Queries

Independent Nested Queries:

In independent nested queries, the execution order is from the innermost query to the outer
query. An outer query won't be executed until its inner query completes its execution. The outer
query uses the result of the inner query. Operators such as IN, NOT IN, ALL, and ANY are
used to write independent nested queries.

➢ If a subquery does use any refrences from outer query then it would be called as
Independent Subquery.
➢ The IN operator checks if a column value in the outer query's result is present in
the inner query's result. The final result will have rows that satisfy the IN condition.
➢ The NOT IN operator checks if a column value in the outer query's result is not
present in the inner query's result. The final result will have rows that satisfy
the NOT IN condition.
➢ The ALL operator compares a value of the outer query's result with all the
values of the inner query's result and returns the row if it matches all the values.
➢ The ANY operator compares a value of the outer query's result with all the inner
query's result values and returns the row if there is a match with any value.

Co-related Nested Queries

In co-related nested queries, the inner query uses the values from the outer query to execute
the inner query for every row processed by the outer query. The co-related nested queries run
slowly because the inner query is executed for every row of the outer query's result.

➢ Outer query needs to be executed before the inner query.


➢ The inner query is executed separately for each row of the outer query.
➢ here are different types of set comparison operators like EXISTS, IN and UNIQUE.
SQL also supports op ANY and op ALL, where op means arithmetic comparison
operators such as <, <=, =, <>, >=, >. SOME are also one of the set comparison
operators but it is similar to ANY.
➢ A correlated subquery is evaluated once for each row processed by the parent
statement. The parent statement can be a SELECT, UPDATE,
or DELETE statement.

Syntax:
SELECT column1, column2, ....
FROM table1 outer
WHERE column1 operator
(SELECT column1, column2
FROM table2
WHERE expr1 = outer.expr2);

Examples :

We will use the Employees and Awards table below to understand independent and co-related
nested queries. We will be using Oracle SQL syntax in our queries.

➢ Let's create the Employees and Awards tables:


Creating employee Table:

CREATE TABLE employee (

id NUMBER PRIMARY KEY,

name VARCHAR2(100) NOT NULL,

salary NUMBER NOT NULL,

role VARCHAR2(100) NOT NULL

);

Creating Awards Table:

CREATE TABLE awards(

id NUMBER PRIMARY KEY,

employee_id NUMBER NOT NULL,

award_date DATE NOT NULL

);

Let's add data to the tables created above:

INSERT INTO employees VALUES (1, 'Augustine Hammond', 10000, 'Developer');

INSERT INTO employees VALUES (2, 'Perice Mundford', 10000, 'Manager');

INSERT INTO employees VALUES (3, 'Cassy Delafoy', 30000, 'Developer');

INSERT INTO employees VALUES (4, 'Garwood Saffen', 40000, 'Manager');

INSERT INTO employees VALUES (5, 'Faydra Beaves', 50000, 'Developer');


INSERT INTO awards VALUES(1, 1, TO_DATE('2022-04-01', 'YYYY-MM-DD'));

INSERT INTO awards VALUES(2, 3, TO_DATE('2022-05-01', 'YYYY-MM-DD'));

Example 1: IN

➢ Select all employees who won an award


Sintax :

SELECT id, name FROM employees

WHERE id IN (SELECT employee_id FROM awards);

Output:
Example 2: NOT IN

Select all employees who never won an award.

SELECT id, name FROM employees

WHERE id NOT IN (SELECT employee_id) FROM awards);

Output :

Example 3: ALL

➢ Select all Developers who earn more than all the Managers

SELECT * FROM employees

WHERE role = 'Developer'

AND salary > ALL (

SELECT salary FROM employees WHERE role = 'Manager'

);
Output :

Example 4: ANY

➢ Select all Developers who earn more than any Manager

SELECT * FROM employees


WHERE role = 'Developer'
AND salary > ANY (
SELECT salary FROM employees WHERE role = 'Manager'
);

Output :

Co-related Nested Queries

➢ Select all employees whose salary is above the average salary of employees in their
role.

SELECT * FROM employees emp1

WHERE salary > (

SELECT AVG(salary)

FROM employees emp2

WHERE emp1.role = emp2.role

);
Output :

5. Aggregate Functions in SQL

An aggregate function in SQL performs a calculation on multiple values and returns a single
value. SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An
aggregate function ignores NULL values when it performs the calculation, except for the count
function.

1. COUNT() Function

The COUNT() aggregate function returns the total number of rows from a database table that
matches the defined criteria in the SQL query.

Syntax

COUNT(*) OR COUNT(COLUMN_NAME)

Example :

The given table named EMP_DATA consists of data concerning 10 employees working in the
same organization in different departments.
1. Suppose you want to know the total number of employees working in the organization.
You can do so by the below-given query.

SELECT COUNT(*) FROM EMP_DATA;

As COUNT(*) returns the total number of rows and the table named EMP_DATA provided
above consists of 10 rows, so the COUNT(*) function returns 10. The output is printed as
shown below.

Output: 10

Note: Except for COUNT(*), all other SQL aggregate functions ignore NULL values.

2. Suppose you need to count the number of people who are getting a salary. The query
given below can help you achieve this.

SELECT COUNT(Salary) FROM EMP_DATA;

Output : 9
Here, the Salary column is passed as a parameter to the COUNT() function, and hence, this
query returns the number of non NULL values from the column Salary, i.e. 9.

3. Suppose you need to count the number of distinct departments present in the
organization. The following query can help you achieve this.

SELECT COUNT(DISTINCT Department) FROM EMP_DATA;

Output: 3

The above query returns the total number of distinct non NULL values over the column
Department i.e. 3 (Marketing, Production, R&D). The DISTINCT keyword makes sure that
only non-repetitive values are counted.

4. What if you want to calculate the number of people whose salaries are more than a
given amount(say 70,000)? Check out the example below.

SELECT COUNT(Salary) WHERE Salary >= 70000 FROM EMP_DATA;

Output : 5

The query returns the number of rows where the salary of the employee is greater than or equal
to 70,000 i.e 5.

2. SUM() Function

The SUM() function takes the name of the column as an argument and returns the sum of all
the non NULL values in that column. It works only on numeric fields(i.e the columns contain
only numeric values). When applied to columns containing both non-numeric(ex - strings) and
numeric values, only numeric values are considered. If no numeric values are present, the
function returns 0.

Syntax:

The function name is SUM() and the name of the column to be considered is passed as an
argument to the function.

SUM(COLUMN_NAME)

Example:
1. Suppose you need to build a budget for the organization and you need to know the total
amount needed to provide salaries to all the employees. To calculate the sum of all the
values present in column Salary. You can refer to the below-given example.

SELECT SUM(Salary) FROM EMP_DATA;

Output :646000

The above-mentioned query returns the sum of all non-NULL values over the column Salary
i.e 80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 = 646000

2. What if you need to consider only distinct salaries? The following query will help you
achieve that.

SELECT SUM(DISTINCT Salary) FROM EMP_DATA;

Output : 430000

The DISTINCT keyword makes sure that only non-repetitive values are considered. The query
returns the sum of all distinct non NULL values over the column Salary i.e. 80000 + 76000 +
84000 + 64000 + 60000 + 66000 = 430000.

3. Suppose you need to know the collective salaries for each department(say Marketing).
The query given below can help you achieve this.

SELECT SUM(SALARY) FROM EMP_DATA WHERE Department = "Marketing";

Output :160000

The query returns the sum of salaries of employees who are working in the Marketing
Department i.e 80000 + 80000 = 160000.

Note: There are 3 rows consisting of Marketing as Department value but the third value is a
NULL value. Thus, the sum is returned considering only the first two entries having Marketing
as Department.
3.AVG() Function

The AVG() aggregate function uses the name of the column as an argument and returns the
average of all the non NULL values in that column. It works only on numeric fields(i.e the
columns contain only numeric values).

Note: When applied to columns containing both non-numeric (ex - strings) and numeric values,
only numeric values are considered. If no numeric values are present, the function returns 0.

Syntax:

The function name is AVG() and the name of the column to be considered is passed as an
argument to the function.

AVG(COLUMN_NAME)

Example:

1. To obtain the average salary of an employee of an organization, the following query


can be used.
SELECT AVG(Salary) FROM EMP_DATA;

Output : 71777.77777

Here, the column name Salary is passed as an argument and thus the values present in column
Salary are considered. The above query returns the average of all non NULL values present in
the Salary column of the table.

Average = (80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 ) / 9
= 646000 / 9 = 71777.77777

2. If you need to consider only distinct salaries, the following query will help you out.

SELECT AVG(DISTINCT Salary) FROM EMP_DATA;

Output : 71666.66666

The query returns the average of all non NULL distinct values present in the Salary column of
the table.

Average = (80000 + 76000 + 84000 + 64000 + 60000 + 66000) / 6 = 430000/ 6 = 71666.66666.


4. MIN() Function

The MIN() function takes the name of the column as an argument and returns the minimum
value present in the column. MIN() returns NULL when no row is selected.

Syntax:

The function name is MIN() and the name of the column to be considered is passed as an
argument to the function.

MIN(COLUMN_NAME)

Example;

1. Suppose you want to find out what is the minimum salary that is provided by the
organization. The MIN() function can be used here with the column name as an
argument.
SELECT MIN(Salary) FROM EMP_DATA;

Output :60000

The query returns the minimum value of all the values present in the mentioned column i.e
60000.

2. Suppose you need to know the minimum salary of an employee belonging to the
Production department. The following query will help you achieve that.

SELECT MIN(Salary) FROM EMP_DATA WHERE Department = "Production";

Output:60000

The query returns the minimum value of all the values present in the mentioned column and
has Production as Department value i.e 60000.

5. MAX() Function

The MAX() function takes the name of the column as an argument and returns the maximum
value present in the column. MAX() returns NULL when no row is selected.

Syntax:

The function name is MAX() and the name of the column to be considered is passed as an
argument to the function.
MAX(COLUMN_NAME)

Example :

1. Suppose you want to find out what is the maximum salary that is provided by the
organization. The MAX() function can be used here with the column name as an
argument.

SELECT MAX(Salary) FROM EMP_DATA;

Output : 84000

The query returns the maximum value of all the values present in the mentioned column i.e
84000.

2. Suppose you need to know the maximum salary of an employee belonging to the R&D
department. The following query will help you achieve that.

SELECT MAX(Salary) FROM EMP_DATA WHERE Department="R&D";

Output : 84000

The query returns the maximum value of all the values present in the mentioned column and
has R&D as Department value i.e 84000.

6. Null values
➢ In SQL, NULL represents the absence of a value. It is used to indicate that a data
point does not have a value or that the value is unknown or undefined. Here are
some important points to understand about handling NULL values in SQL:
➢ NULL is not the same as an empty string or zero. It is a distinct value that signifies
the absence of a value.
➢ NULL values can be used in columns of any data type, including numeric, string,
date, and other data types.
➢ When performing comparisons involving NULL values, the result is always
unknown (neither true nor false). Therefore, you cannot use standard equality
operators like = or <> to compare NULL values.
➢ To check for NULL values in SQL, you use the IS NULL or IS NOT NULL
operators. For example:
SELECT * FROM table_name WHERE column_name IS NULL;

SELECT * FROM table_name WHERE column_name IS NOT NULL;

➢ When performing calculations involving NULL values, the result is usually NULL.
However, some database systems have specific behaviors when NULL values are
involved in calculations, so it's important to consult the documentation for your
specific database management system (DBMS) to understand its behavior.
➢ When inserting or updating data in a table, you can explicitly set a column to NULL
if you want to represent the absence of a value. For example:

INSERT INTO table_name (column1, column2) VALUES (value1, NULL);


UPDATE table_name SET column1 = NULL WHERE condition;

➢ NULL values can also be used in joins and filtering conditions. For example, you
can include NULL values in a result set using a LEFT JOIN.
➢ It's important to handle NULL values appropriately in your SQL queries to ensure
accurate and reliable data processing. Be aware of any specific behavior and
handling of NULL values in your chosen database system, as it can vary between
different database management systems.
7. Triggers
➢ In SQL, a trigger is a database object that is associated with a table and
automatically executes a set of actions in response to certain database events, such
as INSERT, UPDATE, or DELETE operations on the table. Triggers are useful for
enforcing data integrity rules, auditing changes, maintaining derived data, or
implementing complex business logic within the database. Here's an overview of
SQL trigger
➢ Syntax: The basic syntax to create a trigger in SQL is as follows:
CREATE TRIGGER trigger_name
{BEFORE | AFTER}
{INSERT | UPDATE | DELETE}
ON table_name
[FOR EACH ROW]
[WHEN (condition)]
BEGIN
-- Trigger actions here (trigger body)
END;

Types of Triggers

The following are the different types of triggers present in SQL.

DML Triggers

➢ These triggers fire in response to data manipulation language (DML) statements


like INSERT, UPDATE, or DELETE.

After Triggers

➢ These triggers execute after the database has processed a specified event (such as
an INSERT, UPDATE, or DELETE statement). AFTER triggers are commonly used
to perform additional processing or auditing tasks after a data modification has
occurred.

Instead Triggers

➢ These triggers are used for views and fire instead of the DML statement (INSERT,
UPDATE, DELETE) on the view.

DDL Triggers

➢ These triggers fire in response to data definition language (DDL) statements like
CREATE, ALTER, or DROP.

LOGON Triggers

➢ These triggers fire when a user logs into the database.

LOGOFF Triggers

➢ These triggers fire when a user logs out of the database.

SERVERERROR Triggers

➢ These triggers fire when a server error occurs.

✓ When any DDL operation is done. E.g., CREATE, ALTER, DROP


✓ For a DML operation. e.g., INSERT, UPDATE, DELETE.
✓ For a database operation like LOGON, LOGOFF, STARTUP, SHUTDOWN or
SERVERERROR.

EXAMPLE :

Let’s take an example. Let’s assume a student table with column id, first_name, last_name,
and full_name.

Query 1:

CREATE TABLE student(Id integer PRIMARY KEY, first_name varchar(50), last_name


varchar(50), full_name varchar(50));

First, let’s create a SQL Trigger –

Query 2:

create trigger student_name


after INSERT
on student
for each row
BEGIN
UPDATE student set full_name = first_name || ' ' || last_name;
END;
Let’s insert the students.

Query 3:

/* Create a few records in this table */

INSERT INTO student(id, first_name, last_name) VALUES(1,'Alvaro', 'Morte');

INSERT INTO student(id, first_name, last_name) VALUES(2,'Ursula', 'Corbero');

INSERT INTO student(id, first_name, last_name) VALUES(3,'Itziar', 'Ituno');

INSERT INTO student(id, first_name, last_name) VALUES(4,'Pedro', 'Alonso');

INSERT INTO student(id, first_name, last_name) VALUES(5,'Alba', 'Flores');

Query 4:
/* Display all the records from the table */

SELECT * FROM student;

OUTPUT

8. Views in SQL
Refer in Chapter 2.
Chapter 4 : DEPENDENCIES AND NORMAL FORMS

Syllabus:

1. Importance of a good schema design


2. Problems encountered with bad schema design
3. Motivation for normal forms
4. Functional dependencies-armstrong’s axioms for FD’s -closure of a set of FD’s
5. Minimal covers
6. Definitions of 1NF,2NF,3NF and BCNF
7. Decompositions and desirable properties

1. IMPORTANCE OF A GOOD SCHEMA DESIGN:

Schema: A database schema is a blueprint that represents the tables and relations of a data set.
Good database schema design is essential to making your data tractable so that you can make
sense of it and build the dashboards, reports, and data models that you need.

A good schema design is crucial for the efficient and effective management of data in a database
system. It plays a fundamental role in determining how data is organized, stored, and retrieved,
and impacts the overall performance, scalability, and maintainability of the system. Here are
some key reasons why a good schema design is important

Data organization: A well-designed schema helps in structuring data in a logical and organized
manner. It defines the tables, relationships, and constraints that govern the data model, ensuring
data integrity and consistency. This organization facilitates easy navigation and understanding
of the data, making it more manageable and accessible.

Query performance: The schema design significantly impacts the performance of database
queries. By properly structuring tables, defining appropriate indexes, and optimizing data
types, a good schema design can enhance query execution speed and minimize resource
consumption. Efficient query performance leads to faster response times and improved overall
system performance.

Data integrity and consistency: A good schema design enforces data integrity and ensures
consistency. By defining appropriate constraints, such as primary keys, foreign keys, unique
constraints, and check constraints, it prevents the insertion of invalid or inconsistent data. This
helps maintain data quality and reliability throughout the system.

Scalability: A well-designed schema allows for easy scalability as the volume and complexity
of data grow. By considering future requirements and potential expansion, a good schema
design can accommodate evolving needs without significant rework or performance
degradation. This scalability is crucial for applications and systems that need to handle
increasing data loads over time.

Maintainability and extensibility: A good schema design simplifies the maintenance and
evolution of the database system. It provides a solid foundation for making changes and
additions to the schema without causing disruptions or data inconsistencies. A well-designed
schema also allows for seamless integration with new features or modules, making the system
more extensible and adaptable to future enhancements.

Data analysis and reporting: A well-designed schema facilitates effective data analysis and
reporting. By structuring data in a way that aligns with the analytical needs of the system, a
good schema design enables efficient querying, aggregation, and summarization of data. This,
in turn, supports decision-making processes and enables the extraction of meaningful insights
from the data

In summary, a good schema design is essential for data organization, query performance, data
integrity, scalability, maintainability, and data analysis. It is a foundational element in the
design and implementation of a robust and efficient database system.

2.Problems encountered with bad schema designs

Bad schema designs can lead to several problems that can hinder the efficient management and
utilization of data in a database system. Here are some common problems encountered with
bad schema designs:

➢ Poor query performance: A bad schema design can result in slow and inefficient query
performance. This can be due to a lack of proper indexing, inappropriate data types, or
inefficient table relationships. Slow queries can negatively impact the overall system
performance and user experience.
➢ Data redundancy and inconsistency: Inadequate schema designs can lead to data
redundancy and inconsistency. Redundant data takes up unnecessary storage space and
can cause data integrity issues when updates or modifications are made. Inconsistent
data, such as conflicting values or duplicate records, can lead to inaccurate results and
unreliable information.

➢ Difficulty in data maintenance: Bad schema designs can make data maintenance
challenging and error-prone. Without proper constraints and relationships, it becomes
harder to enforce data integrity and ensure consistent updates. This can lead to data
corruption, data loss, or difficulties in updating and modifying data in a controlled and
reliable manner.

➢ Lack of scalability: A poorly designed schema may lack scalability, making it difficult
to accommodate future growth and evolving data requirements. This can result in
performance degradation and the need for extensive schema modifications when the
system needs to handle increased data volumes or changes in data structure.

➢ Limited flexibility and extensibility: Bad schema designs can restrict the flexibility
and extensibility of the database system. It may be challenging to add new features or
modify existing ones without significant schema changes. This can lead to increased
development time, complexity, and potential disruptions to the system.

➢ Data analysis and reporting challenges: Inefficient schema designs can make data
analysis and reporting difficult. Poorly organized data, lack of appropriate relationships,
or inconsistent naming conventions can hinder the extraction of meaningful insights
from the data. This can limit the effectiveness of decision-making processes and hinder
the overall value derived from the data.

➢ Increased development and maintenance costs: Bad schema designs can result in
higher development and maintenance costs. Fixing or modifying a poorly designed
schema requires significant effort and resources. It may involve rewriting queries,
restructuring tables, or migrating data, which can be time-consuming and error-prone.
In summary, bad schema designs can lead to poor query performance, data redundancy and
inconsistency, difficulties in data maintenance, limited scalability and flexibility, challenges in
data analysis and reporting, and increased development and maintenance costs. It is crucial to
invest time and effort in designing a well-thought-out schema to avoid these problems and
ensure the efficient management of data in a database system

3. FUNCTIONAL DEPENDENCY

➢ Functional dependency is a relationship that exists between two sets of attributes of a


relational table where one set of attributes can determine the value of the other set of
attributes.
➢ It typically exists between the primary key and non-key attribute within a table.

For any relation R, attribute Y is functionally dependent on attribute X(usually the Primary
key), It is denoted by X -> Y, where X is called a determinant and Y is called dependent.

Example: Assume we have an employee table with attributes: Emp_Id, Emp_Name,


Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional Dependencies in DBMS:


1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency in DBMS
➢ In Trivial functional dependency, a dependent is always a subset of the determinant.
In other words, a functional dependency is called trivial if the attributes on the right
side are the subset of the attributes on the left side of the functional dependency.
➢ X → Y is called a trivial functional dependency if Y is the subset of X.
➢ For example, consider the Employee table below.
➢ Here, { Employee_Id, Name } → { Name } is a Trivial functional dependency, since
the dependent Name is the subset of determinant { Employee_Id, Name }.
➢ { Employee_Id } → { Employee_Id }, { Name } → { Name } and { Age } → { Age
} are also Trivial.
2. Non-Trivial Functional Dependency in DBMS
➢ It is the opposite of Trivial functional dependency. Formally speaking, in Non-Trivial
functional dependency, dependent if not a subset of the determinant.
➢ X → Y is called a Non-trivial functional dependency if Y is not a subset of X. So, a
functional dependency X → Y where X is a set of attributes and Y is also a set of the
attribute but not a subset of X, then it is called Non-trivial functional dependency.
➢ X→ Y is called completely non-trivial when X intersect Y is NULL.
➢ For example, consider the Employee table above
➢ Here, { Employee_Id } → { Name } is a non-trivial functional dependency
because Name(dependent) is not a subset of Employee_Id(determinant).
➢ Similarly, { Employee_Id, Name } → { Age } is also a non-trivial functional
dependency.
3. Multivalued Functional Dependency in DBMS
➢ In Multivalued functional dependency, attributes in the dependent set are not
dependent on each other.
➢ For example, X → { Y, Z }, if there exists is no functional dependency between Y
and Z, then it is called as Multivalued functional dependency.
➢ For example, consider the Employee table above.
➢ Here, { Employee_Id } → { Name, Age } is a Multivalued functional dependency,
since the dependent attributes Name, Age are not functionally dependent(i.e. Name →
Age or Age → Name doesn’t exist !).
4. Transitive Functional Dependency in DBMS
➢ Consider two functional dependencies A → B and B → C then according to
the transitivity axiom A → C must also exist. This is called a transitive functional
dependency.
➢ In other words, dependent is indirectly dependent on determinant in Transitive
functional dependency.
➢ For example, consider the Employee table below.

Here, { Employee_Id → Department } and { Department → Street Number } holds


true. Hence, according to the axiom of transitivity, { Employee_Id → Street Number } is a
valid functional dependency.

4.ARMSTRONG’S AXIOMS IN FUNCTIONAL DEPENDENCY / PROPERTIES OF


FUNCTIONAL DEPENDENCY IN DBMS:

William Armstrong in 1974 suggested a few rules related to functional dependency. They are
called RAT rules.

➢ Reflexivity: If A is a set of attributes and B is a subset of A, then the functional


dependency A → B holds true.
For example, { Employee_Id, Name } → Name is valid.
➢ Augmentation: If a functional dependency A → B holds true, then appending any
number of the attribute to both sides of dependency doesn't affect the dependency. It
remains true.
➢ For example, X → Y holds true then, ZX → ZY also holds true.
For example, if { Employee_Id, Name } → { Name } holds true then, { Employee_Id,
Name, Age } → { Name, Age }
➢ Transitivity: If two functional dependencies X → Y and Y → Z hold true, then X →
Z also holds true by the rule of Transitivity.
For example, if { Employee_Id } → { Name } holds true and { Name } → {
Department } holds true, then { Employee_Id } → { Department } also holds true.

Secondary Rules

– These rules can be derived from the above axioms.

1. Union– Union rule says, if X determines Y and X determines Z, then X must also
determine Y and Z. If X → Y and X → Z then X → YZ

2. Decomposition– Decomposition rule is also known as project rule. It is the reverse of


union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z.

3. Pseudo Transitivity – In Pseudo transitive Rule, if X determines Y and YZ determines W,


then XZ determines W. If X → Y and YZ → W then XZ → W

ADVANTAGES OF FUNCTIONAL DEPENDENCY IN DBMS

Let's discuss some of the advantages of Functional dependency,

➢ It is used to maintain the quality of data in the database.


➢ It expresses the facts about the database design.
➢ It helps in clearly defining the meanings and constraints of databases.
➢ It helps to identify bad designs.
➢ Functional Dependency removes data redundancy where the same values should not
be repeated at multiple locations in the same database table.
➢ The process of Normalization starts with identifying the candidate keys in the
relation. Without functional dependency, it's impossible to find candidate keys
and normalize the database.
5. CLOSURE OF FUNCTIONAL DEPENDENCY

➢ The Closure of Functional Dependency means the complete set of all possible
attributes that can be functionally derived from given functional dependency using the
inference rules known as Armstrong’s Rules.
➢ If “F” is a functional dependency then closure of functional dependency can be
denoted using “{F}+ ”.

There are three steps to calculate closure of functional dependency. These are:

➢ Step-1 : Add the attributes which are present on Left Hand Side in the original
functional dependency.
➢ Step-2 : Now, add the attributes present on the Right Hand Side of the functional
dependency.
➢ Step-3 : With the help of attributes present on Right Hand Side, check the other
attributes that can be derived from the other given functional dependencies. Repeat this
process until all the possible attributes which can be derived are added in the closure.

Example-1 : Consider the table student_details having (Roll_No, Name,Marks, Location)


as the attributes and having two functional dependencies.

➢ FD1 : Roll_No -> Name


➢ FD2 : Name -> Marks, Location
➢ Now, We will calculate the closure of all the attributes present in the relation using the
three steps mentioned below.
➢ Step-1 : Add attributes present on the LHS of the first functional dependency to the
closure.

{Roll_no}+ = {Roll_No}

➢ Step-2 : Add attributes present on the RHS of the original functional dependency to
the closure.

{Roll_no}+ = {Roll_No, Marks}

Step-3 : Add the other possible attributes which can be derived using attributes present on the
RHS of the closure. So Roll_No attribute cannot functionally determine any attribute but Name
attribute can determine other attributes such as Marks and Location using 2nd Functional
Dependency

➢ Therefore, complete closure of Roll_No will be :


➢ {Roll_no}+ = {Roll_No, Marks, Name, Location}

Similarly, we can calculate closure for other attributes too i.e “Name”.

Step-1 : Add attributes present on the LHS of the functional dependency to the closure.

{Name}+ = {Name}

Step-2 : Add the attributes present on the RHS of the functional dependency to the closure.

{Name}+ = {Name, Marks, Location}

➢ Step-3 : Since, we don’t have any functional dependency where “Marks or Location”
attribute is functionally determining any other attribute , we cannot add more attributes
to the closure. Hence complete closure of Name would be :

{Name}+ = {Name, Marks, Location}

➢ NOTE : We don’t have any Functional dependency where marks and location can
functionally determine any attribute. Hence, for those attributes we can only add the
attributes themselves in their closures. Therefore,

➢ {Marks}+ = {Marks}
➢ {Location}+ = { Location}

Example :

In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D} find closure
of each attribute.

{A}+ = {ABCD}

{B}+ = {BCD}

{C}+ = {CD}

{D}+ = {D}
Here attribute A have all attributes have in their closure , so it is a candidate key of relation.

Example :

In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D, D->A} find
closure of each attribute.

{A}+ = {ABCD}

{B}+ = {BCDA}

{C}+ = {CDAB}

{D}+ = {DABC}

6. MINIMAL COVERS:
➢ A minimal cover is a simplified and reduced version of the given set of functional
dependencies.
Since it is a reduced version, it is also called as Irreducible set.
It is also called as Canonical Cover.

Characteristics :

➢ Canonical cover is free from all the extraneous functional dependencies.


➢ The closure of canonical cover is same as that of the given set of functional
dependencies.
➢ Canonical cover is not unique and may be more than one for a given set of
functional dependencies
➢ We cannot replace any dependency X->A in F with a dependency Y->A, where Y is a
proper subset of X, and still have a set of dependencies that is equivalent to F.
➢ We cannot remove any dependency from F and still have a set of dependencies that
are equivalent to F

Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
is called canonical cover of F if each FD in

➢ Simple FD.
➢ Left reduced FD.
➢ Non-redundant FD.

Simple FD − X->Y is a simple FD if Y is a single attribute.


Left reduced FD − X->Y is a left reduced FD if there are no extraneous attributes in X.
{extraneous attributes: Let XA->Y then, A is a extraneous attribute if X->Y}

Non-redundant FD − X->Y is a Non-redundant FD if it cannot be derived from F- {X->y}

Need :

• Working with the set containing extraneous functional dependencies increases the
computation time.
• Therefore, the given set is reduced by eliminating the useless functional
dependencies.
• This reduces the computation time and working with the irreducible set becomes
easier.

Steps To Find Canonical Cover-


Step-01:
Write the given set of functional dependencies in such a way that each functional dependency
contains exactly one attribute on its right side.
Example-

The functional dependency X → YZ will be written as-


X→Y
X→Z
Step-02:

➢ Consider each functional dependency one by one from the set obtained in Step-01.
➢ Determine whether it is essential or non-essential.

To determine whether a functional dependency is essential or not, compute the closure of its
left side-
• Once by considering that the particular functional dependency is present in the set
• Once by considering that the particular functional dependency is not present in the set

Then following two cases are possible-


Case-01: Results Come Out to be Same-
• If resultscome out to be same,it means that the presence or absence of that functional
dependency does not create any difference.
• Thus, it is non-essential.
• Eliminate that functional dependency from the set.
NOTE-
• Eliminate the non-essential functional dependency from the set as soon as it is
discovered.
• Do not consider it while checking the essentiality of other functional dependencies.
Case-02: Results Come Out to be Different-
If results come out to be different,
• It means that the presence or absence of that functional dependency creates a
difference.
• Thus, it is essential.
• Do not eliminate that functional dependency from the set.
• Mark that functional dependency as essential.
Step-03:

• Consider the newly obtained set of functional dependencies after performing Step-02.
• Check if there is any functional dependency that contains more than one attribute on
its left side.
Then following two cases are possible-
Case-01: No-
• There exists no functional dependency containing more than one attribute on its
left side.
• In this case, the set obtained in Step-02 is the canonical cover.

Case-02: Yes-
• There exists at least one functional dependency containing more than one attribute on
its left side.
• In this case, consider all such functional dependencies one by one.
• Check if their left side can be reduced.

PRACTICE PROBLEM BASED ON FINDING CANONICAL COVER-
Problem-

The following functional dependencies hold true for the relational scheme R ( W , X , Y , Z )

X→W
WZ → XY
Y → WXZ
Write the irreducible equivalent for this set of functional dependencies.
Solution-
Step-01:

Write all the functional dependencies such that each contains exactly one attribute on its right
side-
X→W
WZ → X
WZ → Y
Y→W
Y→X
Y→Z
Step-02:
Check the essentiality of each functional dependency one by one.
For X → W:

• ConsideringX → W, (X)+ = { X , W }
• Ignoring X → W, (X)+ = { X }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that X → W is essential and can not be eliminated.

For WZ → X:

• Considering WZ→ X, (WZ)+ = { W , X , Y , Z }


• Ignoring WZ → X, (WZ)+ = { W , X , Y , Z }
Now,
• Clearly,
the two results are same.
• Thus, we conclude that WZ → X is non-essential and can be eliminated.

Eliminating WZ → X, our set of functional dependencies reduces to-


X→W
WZ → Y
Y→W
Y→X
Y→Z
Now, we will consider this reduced set in further checks.
For WZ → Y:

• Considering WZ→ Y, (WZ)+ = { W , X , Y , Z }


• Ignoring WZ → Y, (WZ)+ = { W , Z }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that WZ → Y is essential and can not be eliminated.

For Y → W:

• Considering Y → W, (Y)+ = { W , X , Y , Z }
• Ignoring Y → W, (Y)+ = { W , X , Y , Z }
Now,
• Clearly,
the two results are same.
• Thus, we conclude that Y → W is non-essential and can be eliminated.
Eliminating Y → W, our set of functional dependencies reduces to-
X→W
WZ → Y
Y→X
Y→Z
For Y → X:
• Considering Y → X, (Y)+ = { W , X , Y , Z }
• Ignoring Y → X, (Y)+ = { Y , Z }

Now,
• Clearly, the two results are different.
• Thus, we conclude that Y → X is essential and can not be eliminated.

For Y → Z:

• Considering Y → Z, (Y)+ = { W , X , Y , Z }
• Ignoring Y → Z, (Y)+ = { W , X , Y }

Now,
• Clearly,
the two results are different.
• Thus, we conclude that Y → Z is essential and can not be eliminated.
From here, our essential functional dependencies are-
X→W
WZ → Y
Y→X
Y→Z

Step-03:

• Consider the functional dependencies having more than one attribute on their left side.
• Check if their left side can be reduced.
In our set,
• Only WZ → Y contains more than one attribute on its left side.
• Considering WZ → Y, (WZ)+ = { W , X , Y , Z }
Now,
• Consider all the possible subsets of WZ.
• Check if the closure result of any subset matches to the closure result of WZ.
(W)+ = { W }
(Z)+ = { Z }
Clearly,
• None of the subsets have the same closure result same as that of the entire left side.
• Thus, we conclude that we can not write WZ → Y as W → Y or Z → Y.
• Thus, set of functional dependencies obtained in step-02 is the canonical cover.

Finally, the canonical cover is-


X→W
WZ → Y
Y→X
Y→Z
Example:1

Consider an example to find canonical cover of F.

The given functional dependencies are as follows –

A -> BC, B -> C ,A -> B ,AB -> C

Example 2:
Minimize {A->C, AC->D, E->H, E->AD}
7. NORMALIZATION

Normalization is the process of organizing the data and the attributes of a database. It is
performed to reduce the data redundancy in a database and to ensure that data is stored
logically.
➢ It helps to divide large database tables into smaller tables and make a relationship
between them. It can remove the redundant data and ease to add, manipulate or delete
table fields.

➢ Data redundancy in DBMS means having the same data but at multiple places.
➢ It is necessary to remove data redundancy because it causes anomalies in a database
which makes it very hard for a database administrator to maintain it.
➢ A normal form is a process that evaluates each relation against defined criteria and
removes the multi valued, joins, functional and trivial dependency from a relation.

THE MOTIVATION FOR NORMAL FORMS:

The motivation for normal forms in database design is to eliminate data redundancy and
anomalies, ensure data integrity, and promote efficient data management.

Normal forms provide guidelines and principles for structuring the database schema to achieve
these objectives. Here are some key motivations for normal forms

➢ Eliminate data redundancy: Redundant data occurs when the same information is
repeated across multiple records or tables. This redundancy wastes storage space and
can lead to inconsistencies when updating or modifying data. Normal forms help
identify and eliminate redundant data by organizing data into separate tables based on
their functional dependencies.
➢ Prevent update anomalies: Update anomalies occur when modifying data results in
inconsistencies or unintended changes. For example, if the same data is stored in
multiple places and not all instances are updated correctly, inconsistencies can arise.
Normal forms help prevent these anomalies by ensuring that data is stored in a way that
allows for easy and controlled updates without introducing inconsistencies.
➢ Maintain data integrity: Data integrity refers to the accuracy, validity, and consistency
of data. Normal forms help enforce data integrity by defining appropriate constraints,
such as primary keys, foreign keys, and entity relationships. These constraints ensure
that data is correctly and consistently represented, preventing the insertion of invalid or
inconsistent data.
➢ Simplify data management and maintenance: Normal forms provide guidelines for
organizing data in a logical and structured manner. By following these guidelines,
database management and maintenance tasks become more manageable and less prone
to errors. Normalized schemas are typically easier to understand, navigate, and modify,
reducing the complexity and effort required for data management activities
➢ Support efficient query processing: Normal forms can contribute to improved query
performance and efficiency. By reducing data redundancy and organizing data based on
functional dependencies, normalized schemas allow for more efficient retrieval and
manipulation of data. Well-designed indexes and relationships based on normal forms
can speed up query execution and improve overall system performance.
➢ Facilitate data integration and interoperability: Normalized schemas provide a
standardized and consistent way of representing data. This promotes data integration
and interoperability across different systems and applications. By adhering to normal
forms, databases can easily exchange and share data without conflicts or
inconsistencies, enabling seamless integration and collaboration.
➢ Adapt to evolving data requirements: Normal forms provide a foundation for a flexible
and extensible database design. By organizing data based on functional dependencies and
avoiding data anomalies, normalized schemas can adapt to changing data requirements
without significant schema modifications. This scalability and flexibility are crucial for
accommodating future data growth and evolving business needs
➢ In summary, the motivations for normal forms in database design are to eliminate data
redundancy and anomalies, ensure data integrity, simplify data management, support
efficient query processing, facilitate data integration, and adapt to evolving data
requirements. By following normal forms, database designers can create well-structured
and efficient database schemas that promote reliable and effective data management

Why Do We Need Normalization?

As we have discussed above, normalization is used to reduce data redundancy. It provides a


method to remove the following anomalies from the database and bring it to a more
consistent state:
A database anomaly is a flaw in the database that occurs because of poor planning and
redundancy.

Insertion anomalies: This occurs when we are not able to insert data into a database because
some attributes may be missing at the time of insertion.

Updation anomalies: This occurs when the same data items are repeated with the same
values and are not linked to each other.

Deletion anomalies: This occurs when deleting one part of the data deletes the other
necessary information from the database.

8.NORMAL FORMS :

The process of normalization helps us divide a larger table in the database into various smaller
tables and then link their using relationships. Normal forms are basically useful for reducing
the overall redundancy (repeating data) from the tables present in a database, so as to ensure
logical storage.

There are four types of normal forms that are usually used in relational databases as you can
see in the following figure:

FIRST NORMAL FORM (1NF)

➢ A relation is in 1NF if every attribute is a single-valued attribute or it does not contain


any multi-valued or composite attribute, i.e., every attribute is an atomic attribute.
➢ If there is a composite or multi-valued attribute, it violates the 1NF. To solve this, we
can create a new row for each of the values of the multi-valued attribute to convert the
table into the 1NF.
Let’s take an example of a relational table <CourseDetail> that contains the details of the
course.

➢ Here, the Corse content is a multi-valued attribute. So, this relation is not in 1NF.
➢ We re-arrange the relation (table) as below, to convert it to First Normal Form.

To convert this table into 1NF, we make new rows with each Course Content as a new row as
shown below

SECOND NORMAL FORM (2NF)

➢ The normalization of 1NF relations to 2NF involves the elimination of partial


dependencies.
➢ A partial dependency in DBMS exists when any non-prime attributes, i.e., an
attribute not a part of the candidate key, is not fully functionally dependent on one of
the candidate keys.

For a relational table to be in second normal form, it must satisfy the following rules:

➢ The table must be in first normal form.


➢ It must not contain any partial dependency, i.e., all non-prime attributes are fully
functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially dependent
attributes and move them to some other table where they fit in well.
Example : Student_Project relation

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually.

But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in Second
Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

THIRD NORMAL FORM (3NF)

The normalization of 2NF relations to 3NF involves the elimination of transitive


dependencies in DBMS.

A functional dependency X -> Z is said to be transitive if the following three functional


dependencies hold:

• X -> Y
• Y does not -> X
• Y -> Z

For a relational table to be in third normal form, it must satisfy the following rules:

1. The table must be in the second normal form.


2. No non-prime attribute is transitively dependent on the primary key.
3. For each functional dependency X -> Z at least one of the following conditions hold:
• X is a super key of the table.
• Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the transitively dependent
attributes and place them to a new table along with a copy of the determinant.

Example :

We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is
a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations as
follows –

The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and
remove redundancy caused by that. However, there may still exist some dependencies that
cause redundancy in the database. These redundancies are removed by a more strict normal
form known as BCNF.

BOYCE-CODD NORMAL FORM (BCNF)

➢ Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains


additional constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:

1. The table must be in the third normal form.


2. For every non-trivial functional dependency X -> Y, X is the superkey of the table.
That means X cannot be a non-prime attribute if Y is a prime attribute.

Let us take an example of the following <EmployeeProjectLead> table to understand how to


normalize the table to the BCNF:

The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because
the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project
Leader is a non-prime attribute. This is not allowed in BCNF.

To convert the given table into BCNF, we decompose it into three tables:

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<EmployeeProject> and <ProjectLead> tables.
9.DECOMPOSITIONS AND DESIRABLE PROPERTIES

➢ A relation in BCNF is free of redundancy and a relation schema in 3NF comes close. If
a relation schema is not in one of these normal forms, the FDs that cause a violation
can give us insight into the potential problems.
➢ A decomposition of a relation schema R consists of replacing the relation schema by
two (or more) relation schemas that each contain a subset of the attributes of R and
together include all attributes in R.
➢ When a relation in the relational model is not appropriate normal form then the
decomposition of a relation is required. In a database, breaking down the table into
multiple tables termed as decomposition.

The properties of a relational decomposition are listed below :

Attribute Preservation:

➢ Using functional dependencies the algorithms decompose the universal relation schema
R in a set of relation schemas D = { R1, R2, ….. Rn } relational database schema, where
‘D’ is called the Decomposition of R.
➢ The attributes in R will appear in at least one relation schema Ri in the decomposition,
i.e., no attribute is lost. This is called the Attribute Preservation condition of
decomposition.

Dependency Preservation:

➢ If each functional dependency X->Y specified in F appears directly in one of the


relation schemas Ri in the decomposition D or could be inferred from the dependencies
that appear in some Ri.
➢ This is the Dependency Preservation. If a relation R is decomposed into relation R1 and
R2, then the dependencies of R either must be a part of R1 or R2 or must be derivable
from the combination of functional dependencies of R1 and R2.

For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

3Lossless Join:
➢ Lossless join property is a feature of decomposition supported by normalization. It is
the ability to ensure that any instance of the original relation can be identified from
corresponding instances in the smaller relations.

For example: R : relation, F : set of functional dependencies on R, X, Y : decomposition of R,


A decomposition {R1, R2, …, Rn} of a relation R is called a lossless decomposition for R if
the natural join of R1, R2, …, Rn produces exactly the relation R.

➢ The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.

Decomposition is lossless if:

➢ X intersection Y -> X, that is: all attributes common to both X and Y functionally
determine ALL the attributes in X.
➢ X intersection Y -> Y, that is: all attributes common to both X and Y functionally
determine ALL the attributes in Y
➢ If X intersection Y forms a super key of either X or Y, the decomposition of R is a
lossless decomposition.

Lack of Data Redundancy

➢ Lack of Data Redundancy is also known as a Repetition of Information.


➢ The proper decomposition should not suffer from any data redundancy.
➢ The careless decomposition may cause a problem with the data.
➢ The lack of data redundancy property may be achieved by Normalization process.

You might also like