UNIT -3
Database Table and Normalization:
Database Tables:
A database table is the basic building block of a relational database. It's a structured
collection of data organized into rows and columns.
Rows (Records/Tuples): Each row in a table represents a single, complete set of related
data. For example, in a Customers table, each row would represent a unique customer.
Columns (Fields/Attributes): Each column in a table represents a specific piece of
information or attribute about the data. For example, in a Customers table, columns might
include CustomerID, FirstName, LastName, Email, and PhoneNumber.
Schema: The structure of a table, including the names of its columns, their data types
(e.g., VARCHAR, INT, DATE), and any constraints (e.g., PRIMARY KEY, NOT NULL), is
called its schema.
Primary Key: A column or a set of columns that uniquely identifies each row in a table.
It ensures data integrity and is essential for establishing relationships between tables.
Foreign Key: A column or a set of columns in one table that refers to the primary key in
another table. Foreign keys establish relationships between tables, enforcing referential
integrity.
EmployeeID FirstName LastNameDepartmentI DepartmentID Salary
(PK) D (FK) (FK)
101 Anu Smith 10 60000
102 Yash Bob 20 75000
103 Sri Red 10 68700
Normalization
Normalization is a systematic process of organizing the columns and tables of a
relational database to minimize data redundancy and improve data integrity. The goal is to
decompose larger tables into smaller, less redundant ones without losing information, and to
ensure that data dependencies are logical.
The Need for Normalization :
Eliminating Data Redundancy
Without normalization, the same data may be stored in multiple places, leading to
unnecessary duplication. Normalization ensures that each piece of data exists only once,
reducing storage space and preventing inconsistencies.
Ensuring Data Integrity and Consistency
Data stored redundantly can lead to inconsistencies when updates, deletions, or insertions are
made in some places but missed in others. Normalization maintains consistency by keeping
data centralized and logically organized.
Improving Data Organization and Structure
Complex tables with mixed information can be confusing and difficult to manage.
Normalization breaks these tables into smaller, related ones, making the database easier to
understand, maintain, and update.
Facilitating Efficient Data Updates
With a normalized structure, updates only need to be performed in one place, reducing the
risk of errors and making operations faster and more reliable.
Preventing Anomalies
Insertion Anomaly: Difficulty adding data due to missing other data.
Update Anomaly: Inconsistent data after updates.
Deletion Anomaly: Unintended loss of data when deleting records.
Normalization minimizes these anomalies.
Enhancing Database Flexibility and Scalability
A well-normalized database is easier to modify as requirements grow or change. New tables
or relationships can be added without major restructuring.
The Normalization Process:
Database Normalization is any systematic process of organizing a database schema
such that no data redundancy occurs and there is least or no anomaly while performing any
update operation on data
A large table into smaller pieces such that data redundancy should be eliminated. The
normalizing procedure depends on the functional dependencies among the attributes inside
a table and uses several normal forms to guide the design process.
First Normal Form (1NF)
A table is in First Normal Form (1NF) if: All the values in each column are atomic
(indivisible) There are no repeating groups or arrays in any row. Each record is unique and
identifiable.
Why 1NF is Needed:
To avoid storing multiple values in a single field
To create a simple, organized table structure
To reduce data redundancy and improve query efficiency
Example of Table Before Applying 1NF (Unnormalized Table):
StudentID StudentName PhoneNumbers
1 John 9876512345
2 jane 457895
Issue:The PhoneNumbers column has multiple values in a single cell, violating 1NF
Table After Applying 1NF:
StudentID StudentName PhoneNumber
1 John Smith 98765
1 John Smith 12345
2 Jane Doe 45678
Now, each field contains a single, indivisible value,The table satisfies the rules of First
Normal Form (1NF)
Benefits of 1NF:
Removes repeating groups and multi-valued fields. Creates a more structured and
organized database table, Makes data easier to search, update, and maintain
Second Normal Form (2NF) in DBMS
A table is in Second Normal Form (2NF) if it is already in First Normal Form (1NF)
and all non-key attributes are fully dependent on the entire primary key. There should be no
partial dependency, meaning no attribute depends only on a part of a composite primary key
Why 2NF is Important
2NF removes partial dependencies to reduce redundancy and prevent anomalies. It
ensures that each non-key attribute relates only to the whole primary key and not to a part of
it
Example: Before 2NF (Partial Dependency Exists)
OrderID ProductID ProductName Quantity
101 P01 Laptop 2
101 P02 Mouse 3
102 P01 Laptop 1
In this table, the composite primary key is (OrderID, ProductID). However,
ProductName depends only on ProductID, not on the full primary key. This is a partial
dependency and violates 2NF
After Applying 2NF:We break the table into two separate tables to remove partial
dependency
OrderDetails Table
OrderID ProductID Quantity
101 P01 2
101 P02 3
102 P01 1
Products Table
ProductID ProductName
P01 Laptop
P02 Mouse
Now, all non-key attributes in the OrderDetails table depend on the entire primary key
(OrderID, ProductID) and ProductName depends only on ProductID in a separate table. The
database is now in Second Normal Form
Third Normal Form (3NF) in DBMS
A table is in Third Normal Form (3NF) if it is already in Second Normal Form (2NF) and
all the non-key attributes are not only fully dependent on the primary key but also directly
dependent on the primary key. There should be no transitive dependency
Transitive Dependency Explanation
A transitive dependency occurs when a non-key attribute depends on another non-key
attribute instead of depending directly on the primary key
Why 3NF is Important
3NF removes transitive dependencies to further reduce redundancy and improve data
integrity. It ensures that every non-key attribute is directly dependent on the primary key only
Example:Before 3NF (Transitive Dependency Exists):Student Table
StudentID StudentName DepartmentID DepartmentName
1 John Smith D01 Computer Science
2 Jane Doe D02 Mechanical Engineering
In this table, StudentID is the primary key
DepartmentName depends on DepartmentID, which is a non-key attribute. This is a transitive
dependency and violates 3NF
After Applying 3NF:Student Table
StudentID StudentName DepartmentID
1 John Smith D01
2 Jane Doe D02
Department Table:
DepartmentID DepartmentName
D01 Computer Science
D02 Mechanical Engineering
Now, all non-key attributes in the Student table depend directly on the primary key StudentID
and DepartmentName is stored separately in the Department table depending directly on
DepartmentID. The database is now in Third Normal Form
Boyce-Codd Normal Form (BCNF) in DBMS
A table is in Boyce-Codd Normal Form (BCNF) if it is already in Third Normal Form
(3NF) and for every functional dependency X → Y, X is a super key. In simple terms, the
left-hand side of every functional dependency must be a super key
Why BCNF is Important
BCNF eliminates certain types of anomalies that 3NF does not always address,
especially when there are overlapping candidate keys. It ensures better data integrity and
removes redundancy
Example:Before BCNF (Anomaly Exists)Consider the following table
StudentID Course Instructor
1 Math Mr. Smith
1 Science Mrs. Taylor
2 Math Mr. Smith
In this example:Candidate Keys are (StudentID, Course) and (Course, Instructor)
Functional Dependencies:StudentID, Course → Instructor,Course → Instructor
Here, Course is not a super key, but Course determines Instructor, violating BCNF
After Applying BCNF:We decompose the table into two tables
CourseInstructor Table
Course Instructor
Math Mr. Smith
Science Mrs. Taylor
StudentCourse Table
StudentID Course
1 Math
1 Science
2 Math
Now, all functional dependencies satisfy the BCNF condition, meaning the left-hand side
of each dependency is a super key. The database is now in Boyce-Codd Normal Form
Fourth Normal Form (4NF) in DBMS
A table is in Fourth Normal Form (4NF) if it is already in Boyce-Codd Normal Form
(BCNF) and there are no multi-valued dependencies other than a dependency on a candidate
key
Multi-valued Dependency Explanation
A multi-valued dependency occurs when one attribute in a table depends on the
primary key, and another independent attribute also depends on the same primary key, but the
two attributes are independent of each other
Why 4NF is Important
4NF removes multi-valued dependencies to eliminate data redundancy and prevent
insertion, deletion, and update anomalies
Example:Before 4NF (Multi-valued Dependency Exists)
StudentID Hobby Language
1 Football English
1 Football French
1 Music English
1 Music French
In this table, StudentID determines both Hobby and Language independently. There is a
multi-valued dependency, violating 4NF
After Applying 4NF:We decompose the table into two separate tables
StudentHobby Table
StudentID Hobby
1 Football
1 Music
StudentLanguage Table
StudentID Language
1 English
1 French
Now, there are no multi-valued dependencies, and the database satisfies Fourth Normal Form
Higher Level Normal Forms in DBMS
After Fourth Normal Form (4NF), there are more advanced levels of normalization
designed to handle specific types of data anomalies and complex dependencies. These
include Fifth Normal Form (5NF) and Sixth Normal Form (6NF).
Fifth Normal Form (5NF) or Project-Join Normal Form (PJNF)
A table is in Fifth Normal Form (5NF) if it is already in Fourth Normal Form (4NF) and it
cannot be further decomposed into smaller tables without losing data, except when using join
operation
5NF deals with join dependencies, ensuring that information can be reconstructed from
smaller tables through natural joins without introducing redundancy or anomalies
Example:Consider a table that holds information about suppliers, parts, and projects
Supplier Part Project
S1 P1 PR1
S1 P2 PR1
S1 P1 PR2
If there are independent relationships among Supplier, Part, and Project, the table can be
decomposed into three smaller tables:
Supplier-Part Table
Supplier Part
S1 P1
S1 P2
Supplier-Project Table
Supplier Project
S1 PR1
S1 PR2
Part-Project Table
Part Project
P1 PR1
P2 PR1
P1 PR2
These tables can be joined to reconstruct the original table without redundancy,
satisfying Fifth Normal Form
Sixth Normal Form (6NF)
A table is in Sixth Normal Form (6NF) if it is already in Fifth Normal Form and contains
no non-trivial join dependencies, including time-dependent or temporal data
6NF is used in specialized environments where temporal databases or data with
frequently changing attributes are managed. It is mostly applied in advanced data warehouses
or systems requiring extreme normalization
Introduction of SQL: SQL stands for Structured Query Language. It is a standard
programming language specifically designed for managing, accessing, and manipulating
databases. SQL allows users to communicate with relational database management systems
(RDBMS) to perform tasks such as inserting, updating, retrieving, and deleting data.
Data Manipulation Language (DML) in SQL
Data Manipulation Language (DML) commands are used to manage and manipulate the
data stored within database tables. These commands allow users to insert new data, update
existing data, retrieve data, and delete unwanted data from the tables.
Common DML Commands
INSERT:Used to add new records into a table:
INSERT INTO Students (StudentID,StudentName,Age)VALUES (1'John Smith', 20);
UPDATE:Used to modify existing records in a table
Example:UPDATE Students SET Age = 21 WHERE StudentID = 1;
DELETE:Used to remove records from a table
Example:DELETE FROM Students WHERE StudentID = 1;
SELECT:Used to retrieve records from one or more tables
Example:SELECT * FROM Students;
Characteristics of DML
DML commands interact with the actual data in the tables
Changes made by INSERT, UPDATE, or DELETE can be controlled using transaction
commands like COMMIT or ROLLBACK
SELECT is used only for querying and does not modify data
DML is essential for handling the data stored in a database during daily operations
such as adding, editing, deleting, or retrieving information
SELECT Queries :
The SELECT statement is one of the most important SQL commands used to
retrieve data from database tables. It allows users to query data, apply filters, sort results, and
perform calculations.
Basic Syntax of SELECT
SELECT column1, column2, ...FROM table_name;
To select all columns from a table
SELECT * FROM table_name;
Important Clauses and Options in SELECT
1. WHERE Clause:Used to filter records based on conditions
SELECT * FROM Students WHERE Age > 18;
2. DISTINCT Keyword:Removes duplicate records from the result
SELECT DISTINCT DepartmentID FROM Students;
3. ORDER BY Clause:Sorts the result set in ascending (ASC) or descending (DESC) order
SELECT * FROM Students ORDER BY StudentName ASC;
4. Aliases (AS Keyword):Renames columns or tables temporarily for better readability
SELECT StudentName AS Name, Age FROM Students;
5. Aggregate Functions:Used to perform calculations on data
COUNT() → Counts the number of rows
SUM() → Adds values
AVG() → Calculates average
MAX() → Finds the maximum value
MIN() → Finds the minimum value
Example:SELECT COUNT(*) FROM Students;
Examples of SELECT Queries
Select all records:SELECT * FROM Employees;
Select specific columns:SELECT EmployeeID, EmployeeName FROM Employees;
Select records with condition:SELECT * FROM Employees WHERE Department = 'HR';
Sort results:SELECT * FROM Employees ORDER BY Salary DESC;
Use DISTINCT:SELECT DISTINCT Department FROM Employees;
Summary of SELECT Queries
Used to retrieve data from one or more tables
Can filter data using WHERE
Can sort data using ORDER BY
Removes duplicates using DISTINCT
Can perform calculations with aggregate functions
SELECT queries are essential for viewing, analyzing, and managing data in a database
system efficiently.
Additional DDL Commands in SQL
In addition to the common DDL commands like CREATE, ALTER, DROP, and
TRUNCATE, there are other DDL commands that help manage the structure and
organization of database objects more effectively.
1. RENAME Command:Used to rename an existing database object such as a table,
column, or view.
Example (Rename Table):ALTER TABLE Employees RENAME TO Staff;
Example (Rename Column):ALTER TABLE Employees RENAME COLUMN
EmployeeName TO FullName;
2. COMMENT Command:Adds comments or descriptions to database objects for
better documentation.
Example:COMMENT ON TABLE Employees IS 'Stores employee details';
COMMENT ON COLUMN Employees.Salary IS 'Monthly salary of the employee';
3. CREATE INDEX:Creates an index on one or more columns to improve query
performance.
Example:CREATE INDEX idx_employee_name ON Employees (EmployeeName);
4. DROP INDEX:Removes an existing index from the table.
Example:DROP INDEX idx_employee_name;
5. CREATE VIEW:Creates a virtual table (view) based on the result of a query. Views
simplify complex queries and provide data security.
Example: CREATE VIEW ActiveEmployees AS SELECT EmployeeID, mployeeName,
Department FROM Employees WHERE Status = 'Active';
6. DROP VIEW:Deletes an existing view from the database.
Example:DROP VIEW ActiveEmployees;
Summary of Additional DDL Commands
Command Purpose
RENAME Renames tables, columns, or views
COMMENT Adds descriptions to tables or columns
CREATE INDEX Improves query performance with indexes
DROP INDEX Removes indexes from tables
CREATE VIEW Creates a virtual table from a SELECT query
DROP VIEW Deletes an existing view
Additional SELECT Query Keywords in SQL
Apart from the basic SELECT, FROM, and WHERE clauses, SQL provides several
additional keywords to enhance queries and perform more advanced operations.
1. GROUP BY: Groups rows that have the same values in specified columns. Often
used with aggregate functions like COUNT(), SUM(), AVG(), etc.
Example:SELECT Department,COUNT(*)FROM Employees GROUP BY Department;
2. HAVING:Filters groups created by GROUP BY. Works like WHERE but for
grouped data.
Example:SELECT Department, COUNT(*) FROM Employees GROUP BY Department
HAVING COUNT(*) > 5;
3. BETWEEN:Filters records within a specific range (inclusive).
Example:SELECT * FROM Employees WHERE Salary BETWEEN 30000 AND 50000;
4. IN:Matches a value against a set of values.
Example:SELECT * FROM Employees WHERE Department IN ('HR', 'Finance', 'IT');
5. LIKE:Searches for a specific pattern in a column.
Example:SELECT * FROM Employees WHERE EmployeeName LIKE 'J%'; -- Names
starting with J
6. IS NULL / IS NOT NULL:Checks for NULL (missing or unknown) values.
Example:SELECT * FROM Employees WHERE ManagerID IS NULL;
7. LIMIT (In MySQL, PostgreSQL):Restricts the number of rows returned by a query.
Example:SELECT * FROM Employees LIMIT 5;
In SQL Server, use TOP:SELECT TOP 5 * FROM Employees;
8. UNION and UNION ALL:Combines results from multiple SELECT statements.
UNION removes duplicates; UNION ALL includes duplicates.
Example:SELECT City FROM Customers UNION SELECT City FROM Suppliers;
9. EXISTS:Checks whether a subquery returns any rows.
Example:SELECT * FROM Employees e WHERE EXISTS (SELECT 1 FROM
Departments d WHERE e.DepartmentID = d.DepartmentID);