UNIT III
NORMALIZATION OF DATABASE TABLES
Database normalization is a systematic process for organizing data in a relational
database to reduce data redundancy and improve data integrity. It involves
breaking down large tables into smaller, well-structured tables and defining
relationships between them. This process helps to eliminate undesirable
characteristics such as insertion, update, and deletion anomalies, and enhances the
overall efficiency and consistency of the database.
The process of normalization adheres to a series of "normal forms," each building
upon the previous one with stricter rules for data organization. The most
commonly applied normal forms are:
First Normal Form (1NF):
Eliminates repeating groups within tables, ensures each column contains atomic
values, and requires a primary key for each table.
Second Normal Form (2NF):
Requires the table to be in 1NF and all non-key attributes to be fully functionally
dependent on the primary key, eliminating partial dependencies.
Third Normal Form (3NF):
Requires the table to be in 2NF and eliminates transitive dependencies, meaning
non-key attributes should not depend on other non-key attributes.
Boyce-Codd Normal Form (BCNF):
A stricter version of 3NF, requiring that every determinant (attribute or set of
attributes that determines another attribute) be a candidate key.
By applying these normal forms, database designers can create a more robust and
manageable database system that minimizes data duplication, ensures data
consistency, and facilitates efficient data retrieval and manipulation.
Introduction to Database Normalization
Normalization is an important process in database design that helps improve the
database's efficiency, consistency, and accuracy. It makes it easier to manage and
maintain the data and ensures that the database is adaptable to changing business
needs.
Database normalization is the process of organizing the attributes of the
database to reduce or eliminate data redundancy (having the same data but at
different places).
Data redundancy unnecessarily increases the size of the database as the same
data is repeated in many places. Inconsistency problems also arise during insert,
delete, and update operations.
In the relational model, there exist standard methods to quantify how efficient a
databases is. These methods are called normal forms and there are algorithms to
covert a given database into normal forms.
Normalization generally involves splitting a table into multiple ones which
must be linked each time a query is made requiring data from the split tables.
Why do we need Normalization?
The primary objective for normalizing the relations is to eliminate the below
anomalies. Failure to reduce anomalies results in data redundancy, which may
threaten data integrity and cause additional issues as the database increases.
Normalization consists of a set of procedures that assist you in developing an
effective database structure.
Insertion Anomalies: Insertion anomalies occur when it is not possible to
insert data into a database because the required fields are missing or because the
data is incomplete. For example, if a database requires that every record has
a primary key, but no value is provided for a particular record, it cannot be
inserted into the database.
Deletion anomalies: Deletion anomalies occur when deleting a record from a
database and can result in the unintentional loss of data. For example, if a
database contains information about customers and orders, deleting a customer
record may also delete all the orders associated with that customer.
Updation anomalies: Updation anomalies occur when modifying data in a
database and can result in inconsistencies or errors. For example, if a database
contains information about employees and their salaries, updating an
employee’s salary in one record but not in all related records could lead to
incorrect calculations and reporting.
Read more about Anomalies in Relational Model.
1/3
Before Normalization: The table is prone to redundancy and anomalies (insertion,
update, and deletion).
After Normalization: The data is divided into logical tables to ensure consistency,
avoid redundancy and remove anomalies making the database efficient and
reliable.
Features of Database Normalization
Elimination of Data Redundancy: One of the main features of normalization
is to eliminate the data redundancy that can occur in a database. Data
redundancy refers to the repetition of data in different parts of the database.
Normalization helps in reducing or eliminating this redundancy, which can
improve the efficiency and consistency of the database.
Ensuring Data Consistency: Normalization helps in ensuring that the data in
the database is consistent and accurate. By eliminating redundancy,
normalization helps in preventing inconsistencies and contradictions that can
arise due to different versions of the same data.
Simplification of Data Management: Normalization simplifies the process of
managing data in a database. By breaking down a complex data structure into
simpler tables, normalization makes it easier to manage the data, update it, and
retrieve it.
Improved Database Design: Normalization helps in improving the overall
design of the database. By organizing the data in a structured and systematic
way, normalization makes it easier to design and maintain the database. It also
makes the database more flexible and adaptable to changing business needs.
Avoiding Update Anomalies: Normalization helps in avoiding update
anomalies, which can occur when updating a single record in a table affects
multiple records in other tables. Normalization ensures that each table contains
only one type of data and that the relationships between the tables are clearly
defined, which helps in avoiding such anomalies.
Standardization: Normalization helps in standardizing the data in the database.
By organizing the data into tables and defining relationships between them,
normalization helps in ensuring that the data is stored in a consistent and
uniform manner.
Normal Forms in DBMS
Normal
Forms Description of Normal Forms
First Normal A relation is in first normal form if every attribute in that relation
Form (1NF) is single-valued attribute.
Normal
Forms Description of Normal Forms
Second A relation that is in First Normal Form and every non-primary-key
Normal Form attribute is fully functionally dependent on the primary key, then
(2NF) the relation is in Second Normal Form (2NF).
A relation is in the third normal form, if there is no transitive
dependency for non-prime attributes as well as it is in the second
normal form. A relation is in 3NF if at least one of the following
conditions holds in every non-trivial function dependency X –> Y.
Third X is a super key.
Normal Form Y is a prime attribute (each element of Y is part of some
(3NF) candidate key).
For BCNF the relation should satisfy the below conditions
Boyce-Codd The relation should be in the 3rd Normal Form.
Normal Form X should be a super-key for every functional dependency (FD)
(BCNF) X−>Y in a given relation.
A relation R is in 4NF if and only if the following conditions are
Fourth satisfied:
Normal Form It should be in the Boyce-Codd Normal Form (BCNF).
(4NF) The table should not have any Multi-valued Dependency.
A relation R is in 5NF if and only if it satisfies the following
conditions:
Fifth Normal R should be already in 4NF.
Form (5NF) It cannot be further non loss decomposed (join dependency).
Read more about Normal Forms in DBMS.
Advantages of Normalization
Normalization eliminates data redundancy and ensures that each piece of data is
stored in only one place, reducing the risk of data inconsistency and making it
easier to maintain data accuracy.
By breaking down data into smaller, more specific tables, normalization helps
ensure that each table stores only relevant data, which improves the overall data
integrity of the database.
Normalization simplifies the process of updating data, as it only needs to be
changed in one place rather than in multiple places throughout the database.
Normalization enables users to query the database using a variety of different
criteria, as the data is organized into smaller, more specific tables that can be
joined together as needed.
Normalization can help ensure that data is consistent across different
applications that use the same database, making it easier to integrate different
applications and ensuring that all users have access to accurate and consistent
data.
Disadvantages of Normalization
Normalization can result in increased performance overhead due to the need for
additional join operations and the potential for slower query execution times.
Normalization can result in the loss of data context, as data may be split across
multiple tables and require additional joins to retrieve.
Proper implementation of normalization requires expert knowledge of database
design and the normalization process.
Normalization can increase the complexity of a database design, especially if
the data model is not well understood or if the normalization process is not
carried out correctly.
Higher level normal forms in DBMS include the Fifth Normal Form (5NF) and
even higher forms like Sixth Normal Form (6NF), which progressively address
more complex dependencies like join dependencies and temporal data, ensuring
greater data integrity by breaking down tables to their smallest possible
components. The process of normalization is progressive, meaning a table must
satisfy the conditions of lower normal forms before it can meet the requirements of
a higher one.
Common Higher Normal Forms
Boyce-Codd Normal Form (BCNF):
A stricter version of Third Normal Form (3NF), BCNF ensures that for every
functional dependency (X → Y), the left side (X) is a superkey.
Fourth Normal Form (4NF):
Deals with multivalued dependencies , where one attribute determines a set of
independent values for another attribute.
Fifth Normal Form (5NF):
Also known as Project-Join Normal Form (PJNF) , it addresses join dependencies,
which occur when a table can be losslessly decomposed into smaller tables and
then reconstituted.
Sixth Normal Form (6NF):
An even more advanced form focused on independently handling temporal data
(time-varying data) and ensuring that no further decomposition of a table is
possible.
Why Use Higher Normal Forms?
Reduce Redundancy: Each higher normal form eliminates specific types of data
redundancy that can lead to anomalies.
Improve Data Integrity: By minimizing data duplication, the risk of inconsistent
data is reduced.
Handle Complex Relationships: Higher forms address more intricate
relationships and dependencies within the data, leading to a more robust database
design.
Trade-offs
While higher normal forms provide benefits like improved integrity, they can also
lead to more complex database schemas and may negatively impact query
performance due to the increased number of joins required. It's crucial to balance
the desire for high normalization with the practical needs of the specific
application.
DDL Commands & Syntax
In this article, we will discuss the overview of DDL commands and will
understand DDL commands like create, alter, truncate, drop. We will cover each
command syntax with the help of an example for better understanding. Let's
discuss it one by one.
Overview :
Data Definition Language(DDL) is a subset of SQL and a part of DBMS(Database
Management System). DDL consist of Commands to commands like CREATE,
ALTER, TRUNCATE and DROP. These commands are used to create or modify
the tables in SQL.
DDL Commands :
In this section, We will cover the following DDL commands as follows.
1. Create
2. Alter
3. truncate
4. drop
5. Rename
Let's discuss it one by one.
Command-1 :
CREATE :
This command is used to create a new table in SQL. The user has to give
information like table name, column names, and their datatypes.
Syntax -
CREATE TABLE table_name
(
column_1 datatype,
column_2 datatype,
column_3 datatype,
....
);
Example -
We need to create a table for storing Student information of a particular College.
Create syntax would be as below.
CREATE TABLE Student_info
(
College_Id number(2),
College_name varchar(30),
Branch varchar(10)
);
Command-2 :
ALTER :
This command is used to add, delete or change columns in the existing table. The
user needs to know the existing table name and can do add, delete or modify tasks
easily.
Syntax -
Syntax to add a column to an existing table.
ALTER TABLE table_name
ADD column_name datatype;
Example -
In our Student_info table, we want to add a new column for CGPA. The syntax
would be as below as follows.
ALTER TABLE Student_info
ADD CGPA number;
Command-3 :
TRUNCATE :
This command is used to remove all rows from the table, but the structure of the
table still exists.
Syntax -
Syntax to remove an existing table.
TRUNCATE TABLE table_name;
Example -
The College Authority wants to remove the details of all students for new batches
but wants to keep the table structure. The command they can use is as follows.
TRUNCATE TABLE Student_info;
Command-4 :
DROP :
This command is used to remove an existing table along with its structure from the
Database.
Syntax -
Syntax to drop an existing table.
DROP TABLE table_name;
Example -
If the College Authority wants to change their Database by deleting the
Student_info Table.
DROP TABLE Student_info;
Command -5
RENAME:
It is possible to change name of table with or without data in it using simple
RENAME command.
We can rename any table object at any point of time.
Syntax -
RENAME TABLE <Table Name> To <New_Table_Name>;
Example:
If you want to change the name of the table from Employee to Emp we can
use rename command as
RENAME TABLE Employee To EMP;
DML stands for Data Manipulation Language, a subset of SQL used for
managing data within database objects. DML commands are distinct from
DDL (Data Definition Language) commands, which are used to define
database structures.
The primary DML commands are:
SELECT: Used to retrieve data from one or more database tables. It allows
for filtering, ordering, and aggregating data.
Code
SELECT column1, column2 FROM table_name WHERE condition;
INSERT: Used to add new rows (records) into a table.
Code
INSERT INTO table_name (column1, column2) VALUES (value1, value2);
UPDATE: Used to modify existing data within a table.
Code
UPDATE table_name SET column1 = new_value1, column2 = new_value2 WHERE
condition;
DELETE: Used to remove rows from a table based on a specified condition.
Code
DELETE FROM table_name WHERE condition;
MERGE: (Less common than the others) Used to perform INSERT, UPDATE,
or DELETE operations on a target table based on the results of a join with a
source table.
Code
MERGE INTO target_table AS T
USING source_table AS S
ON T.id = S.id
WHEN MATCHED THEN UPDATE SET T.column = S.column
WHEN NOT MATCHED THEN INSERT (column) VALUES (value);
DML commands directly affect the data stored in the database and are
typically not auto-committed, meaning changes can be rolled back if
necessary.
In a Database Management System (DBMS), SELECT queries are used to
retrieve data from one or more tables. They are the most fundamental and
frequently used type of query in SQL (Structured Query Language).
The basic syntax of a SELECT query involves specifying the columns you want
to retrieve and the table from which you want to retrieve them.
Here are the key components and common uses of SELECT queries:
Basic Retrieval:
o To retrieve all columns from a table:
Code
SELECT * FROM table_name;
To retrieve specific columns from a table:
Code
SELECT column1, column2 FROM table_name;
Filtering Data:
o The WHERE clause is used to filter rows based on a specified condition:
Code
SELECT * FROM table_name WHERE condition;
Example:
Code
SELECT name, age FROM employees WHERE age > 30;
Ordering Results:
o The ORDER BY clause sorts the result set based on one or more columns:
Code
SELECT * FROM table_name ORDER BY column_name ASC/DESC;
Limiting Results:
o The LIMIT clause (or TOP in some SQL dialects) restricts the number of rows
returned:
Code
SELECT * FROM table_name LIMIT 10;
Aggregating Data:
o Aggregate functions like COUNT(), SUM(), AVG(), MIN(), MAX() are used to perform
calculations on a set of rows:
Code
SELECT COUNT(*) FROM table_name;
Grouping Data:
o The GROUP BY clause groups rows that have the same values in specified
columns, often used with aggregate functions:
Code
SELECT department, AVG(salary) FROM employees GROUP BY department;
Joining Tables:
o JOIN clauses combine rows from two or more tables based on a related
column between them:
Code
SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;
SELECT queries are essential for extracting, analyzing, and presenting data
stored within a database.
Beyond basic SELECT statements, various additional SELECT queries and clauses
are used in DBMS to retrieve and manipulate data more effectively:
Filtering Data:
o WHERE clause: Filters rows based on specified conditions.
Code
SELECT column1, column2 FROM table_name WHERE condition;
HAVING clause: Filters groups of rows created by GROUP BY.
Code
SELECT column1, COUNT(column2) FROM table_name GROUP BY column1 HAVING
COUNT(column2) > 5;
Ordering and Limiting Results:
o ORDER BY clause: Sorts the result set in ascending or descending order.
Code
SELECT column1, column2 FROM table_name ORDER BY column1 DESC;
/ TOP clause: Restricts the number of rows returned. (Syntax varies by
LIMIT
DBMS, e.g., LIMIT in MySQL/PostgreSQL, TOP in SQL Server).
Code
SELECT column1, column2 FROM table_name LIMIT 10;
Combining Data from Multiple Tables:
o clauses (INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN):
JOIN
Combine rows from two or more tables based on related columns.
Code
SELECT t1.column1, t2.column2 FROM table1 t1 INNER JOIN table2 t2 ON
t1.id = t2.id;
UNION / UNION ALL:
Combines the result sets of two or
more SELECT queries. UNION removes duplicates, UNION ALL includes all rows.
Code
SELECT column1 FROM table1 UNION SELECT column1 FROM table2;
Aggregating Data:
o Aggregate Functions (e.g., COUNT(), SUM(), AVG(), MIN(), MAX()): Perform
calculations on a set of rows and return a single value.
Code
SELECT COUNT(column1) FROM table_name;
GROUP BYclause: Groups rows that have the same values in specified columns
into summary rows.
Code
SELECT column1, COUNT(column2) FROM table_name GROUP BY column1;
Advanced Techniques:
o Subqueries (Nested Queries): A query embedded within another SQL query,
used to retrieve data that will be used by the outer query.
Code
SELECT column1 FROM table_name WHERE column2 IN (SELECT column3 FROM
another_table WHERE condition);
CASE Statement: Allows for conditional logic within a SELECT statement,
returning different values based on specified conditions.
Code
SELECT column1, CASE WHEN column2 > 10 THEN 'High' ELSE 'Low' END AS
status FROM table_name;