International Islamic UniversityH-10, Islamabad, Pakistan
Database Managements Systems
Week 10
Database Design and
Normalization
Engr. Rashid Farid Chishti
http://youtube.com/rfchishti
http://sites.google.com/site/chisht
i
2.
Understand theimportance of database design in relational databases.
Learn about functional dependencies and their role in normalization.
Understand the concepts of First Normal Form (1NF) and Second Normal Form
(2NF).
Apply normalization techniques to reduce data redundancy and improve
database design.
Learning Objectives
3.
Definition: Databasedesign is the process of defining the structure,
relationships, and integrity constraints for a database.
Importance:
Ensures efficient data storage and retrieval.
Reduces redundancy and avoids data anomalies.
Improves data integrity and consistency.
Introduction to Database Design
4.
Definition: Normalizationis the process of organizing data in a database to
minimize redundancy and dependency by dividing large tables into smaller,
related tables.
Goal: To design the database in such a way that it eliminates undesirable
characteristics like insertion, update, and deletion anomalies.
Normalization Levels:
Several forms, each building upon the previous one:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
What is Normalization?
5.
Definition: Arelation (table) is in 1NF if:
All attributes (columns) contain atomic (indivisible) values.
Each column contains values of a single type.
Each column has a unique name.
The order of rows and columns does not matter.
Objective:
Eliminate repeating groups and multi-valued attributes
First Normal Form (1NF)
Table: Student (Now in 1NF)
ID Name Courses
1 Alice Math
1 Alice Science
2 Bob English
3 David Math
3 David English
Table: Student (Not in 1NF)
ID Name Courses
1 Alice Math, Science
2 Bob English
3 David Math, English
Table: Employee (Not in 1NF)
ID Name Phone
1 Alice 0301-1234567
0302-9876543
2 Bob 0303-5678901
3 David 0304-2468135
0305-1357924
Table: Employee (Now in 1NF)
ID Name Phone
1 Alice 0301-1234567
1 Alice 0302-9876543
2 Bob 0303-5678901
3 David 0304-2468135
3 David 0305-1357924
6.
Problems:
The"Authors" column
has multiple values
(John, Alice).
The “Category" column has
multiple values (AI, Machine Learning).
Example 2: First Normal Form (1NF)
BOOK_ID Book_Title Authors Book_Category
1 Database Systems John, Alice Technology
2 Python Basics Bob, Carol Programming
3 AI Foundations David, Eve AI, Machine Learning
Table: Book (Not in 1NF)
ID Book_Title Authors Book_Category
1 Database Systems John Technology
1 Database Systems Alice Technology
2 Python Basics Bob Programming
2 Python Basics Carol Programming
3 AI Foundations David AI
3 AI Foundations David Machine Learning
3 AI Foundations Eve AI
3 AI Foundations Eve Machine Learning
Table: Book (Now in 1NF)
7.
Definition:
A relationis in Second Normal Form (2NF) if:
It is in 1NF.
It has no partial dependencies (i.e., no non-key attribute
is dependent on only part of a composite primary key).
Here non-key attributes Department and Instructor depends
only on Course, not on the full primary key (Student_ID, Course).
(Course → Instructor, Department)
Second Normal Form (2NF)
Table: Student_Course (Not in 2NF)
Student_ID Course Instructor Department
1 Math Dr. A Science
1 Science Dr. B Science
Table: Student_Course
Student_ID Course
1 Math
1 Science
Table: Course_Details
Course Instructor Department
Math Dr. A Science
Science Dr. B Science
8.
Here, Emp_IDis the primary key.
Here, Deptment Location depends on Department, not on Emp_ID.
Example 2: Second Normal Form (2NF)
Table: Employee (Not in 2NF)
Emp_ID Name Department Dept_Location
1 John IT Lahore
2 Alice HR Karachi
3 Bob IT Lahore
Table: Department
Department Location
IT Lahore
HR Karachi
Table: Employee
ID Name Department
1 John IT
2 Alice HR
3 Bob IT
9.
Here, BookID is the primary key.
The non-prime (non key) attribute Author does not depend Book ID.
Also Book Category also does not depend in Book ID.
Only Book Title depends on Book ID.
Example 3: Second Normal Form (2NF)
ID Book_Title Authors Book_Category
1 Database Systems John Technology
1 Database Systems Alice Technology
2 Python Basics Bob Programming
2 Python Basics Carol Programming
3 AI Foundations David AI
3 AI Foundations David Machine Learning
3 AI Foundations Eve AI
3 AI Foundations Eve Machine Learning
Table: Book (in 1NF)
ID Book_Title
1 Database Systems
2 Python Basics
3 AI Foundations
3 AI Foundations
Table: Book (in 2NF)
Book_ID Authors
1 John
1 Alice
2 Bob
2 Carol
3 David
3 Eve
Table: Author
BOOK_ID Category
1 Technology
2 Programming
3 AI
3 Machine Learning
Table: Category
10.
Partial Dependencies:
Course_Name depends on Course_ID
Instructor_Name depends on Instructor_ID
Dept depends on Instructor_ID
Solution:
Split into separate tables.
Example 4: Second Normal Form (2NF)
Table: Enrolment (Not in 2NF)
Student_ID Student_Name Course_ID Course_Name Instructor_ID Instructor_Name Dept
S1 Alice C101 DBMS I1 Prof. Smith CS
S2 Bob C102 AI I2 Prof. Lee AI
S3 Charlie C101 DBMS I1 Prof. Smith CS
S4 Dave C103 Networks I3 Prof. Green Networks
11.
Now itis in 2NF
Example 4: Second Normal Form (2NF)
Table: Student
ID Name
S1 Alice
S2 Bob
S3 Charlie
S4 Dave
Table: Course
ID Name
C101 DBMS
C102 AI
C101 DBMS
C103 Networks
Student_ID Course_ID Instructor_ID
S1 C101 I1
S2 C102 I2
S3 C101 I1
S4 C103 I3
ID Name Dept
I1 Prof. Smith CS
I2 Prof. Lee AI
I3 Prof. Green Networks
Table: Instructor
Table: Student_Course_Instructor
12.
Definition: Afunctional dependency (FD) is a relationship between two
attributes (or sets of attributes) in a table.
Notation: A → B means that attribute A uniquely determines attribute B.
Example:
If student_id → student_name, then for each student_id, there is a unique
student_name.
Importance: Understanding functional dependencies is crucial for
normalization, as they help identify relationships between data attributes.
Functional Dependencies
13.
Definition of3NF: A table is in Third Normal Form (3NF) if:
It is in Second Normal Form (2NF) (i.e., no partial dependencies).
No transitive dependencies exist (i.e.,).
Transitive Dependency: Occurs when a no non-key attribute depends on
another non-key attribute.
Example:
A → B (Where A is a primary key)
B → C (Where B is Non key attribute)
This means A indirectly determines C, which is a transitive dependency.
Thus A → C is a transitive dependency.
Objective: Remove transitive dependencies to ensure that non-key
attributes depend only on the key-attribute (primary key).
Third Normal Form (3NF)
14.
Primary Key(PK): Student_ID
Department_Name depends on Department_ID, not directly on Student_ID.
Transitive Dependency:
Student_ID → Department_ID (Direct Dependency)
Department_ID → Department_Name (Indirect Dependency)
Therefore, Student_ID → Department_Name (Transitive Dependency)
To remove transitive dependency, we create a separate Department Table
Example 1: Third Normal Form (3NF)
Student_ID (PK) Student_Name Department_ID Department_Name
101 Alice D01 Computer Science
102 Bob D02 Electrical Engineering
103 Charlie D01 Computer Science
15.
Now, thereis no transitive dependency because Department_Name depends only on
Department_ID.
Example 1: Third Normal Form (3NF)
ID (PK) Name Deptment_ID (FK)
101 Alice D01
102 Bob D02
103 Charlie D01
ID (PK) Name
D01 Computer Science
D02 Electrical Engineering
Table: Student Table: Department
16.
Definition: Arelation is in Boyce-Codd Normal Form (BCNF) if:
It is already in 3NF.
For every functional dependency (X → Y) in the relation, the X on left-hand
side (determinant) is a superkey (A set of attributes that uniquely identifies
each record in a table).
Problem with 3NF: A relation can still have a non-superkey determinant.
Therefore, to check for BCNF, we simply identify all the determinants and
verify that they are superkeys.
If they are not, we break up the relation by projection until we have a set of
relations all in BCNF.
Boyce-Codd Normal Form (BCNF)
17.
Consider thefollowing relation.
Functional Dependencies:
(Student_ID, Course_ID) → Instructor
Since an instructor teaches a specific course
Course_ID → Instructor
Each course is taught by only one instructor.
Here, Course_ID is not a superkey, but it determines Instructor, violating BCNF.
Converting to BCNF: To achieve BCNF, we will decompose the relation into:
Course Table (Ensuring that each course has only one instructor)
Student-Course Table
Mapping students to courses
Example 1: Boyce-Codd Normal Form (BCNF)
Student_ID Course_ID Instructor
1 DBMS Dr. A
2 OS Dr. B
3 DBMS Dr. A
4 OS Dr. B
Course Instructor
DBMS Dr. A
OS Dr. B
Student_ID Course_ID
1 DBMS
2 OS
3 DBMS
4 OS
Table: Course_Instructor
Table: Student_Course
18.
Functional Dependencies
(Supplier_ID, Part_ID) → Supplier_City, Part_Category
Each supplier supplies a specific part and has a fixed city.
Supplier_ID → Supplier_City
A supplier is based in only one city.
Part_ID → Part_Category
A part belongs to only one category.
Issue
Supplier_ID → Supplier_City
violates BCNF because Supplier_ID is not a superkey.
Part_ID → Part_Category
violates BCNF because Part_ID is not a superkey.
Example 2: Boyce-Codd Normal Form (BCNF)
Supplier_ID Part_ID Supplier_City Part_Category
1 101 Karachi Electronics
2 102 Lahore Machanical
1 103 Karachi Electrical
Table: Supplier_Parts
19.
To achieveBCNF, we will decompose the relation into 3 tables:
Now, all tables are in BCNF because all functional dependencies have superkeys on the left.
Example 2: Boyce-Codd Normal Form (BCNF)
Part_ID Part_Category
101 Electronics
102 Machanical
103 Electrical
Table: Parts
Supplier_ID Supplier_City
1 Karachi
2 Lahore
Table: Supplier
Supplier_ID Part_ID
1 101
2 102
1 103
Table: Supplier_Parts
20.
3NF: Arelation is in 3NF if it has no transitive dependencies, but there could
still be functional dependencies where the determinant is not a superkey.
BCNF: A relation is in BCNF if every determinant is a superkey.
Key Difference: BCNF is a stricter form of 3NF. Every relation in BCNF is in 3NF,
but not every relation in 3NF is in BCNF.
Difference Between 3NF and BCNF
21.
Recap of Normalization
NormalizationMain Rule Problem Solved
1NF
No duplicate columns, atomic
values
Removes multi-valued attributes
2NF
1NF + No partial dependencies Eliminates dependency on part of a
composite key
3NF
2NF + No transitive dependencies Eliminates non-key dependencies
BCNF
3NF + Every determinant is a
superkey
Stronger than 3NF, removes
remaining anomalies
22.
Benefits ofNormalization:
Reduces Data Redundancy: Eliminates duplicate data by dividing data into
smaller tables.
Improves Data Integrity: Ensures consistency by minimizing the chances of
anomalies.
Easier Updates: Easier to update data without causing inconsistencies.
Drawbacks:
More complex queries (due to more joins).
Potential performance impact in some cases (especially with excessive
normalization).
Why Normalize ?
23.
Denormalization isthe process of intentionally introducing redundancy into a
database to improve read performance by reducing the need for complex
joins. It involves combining tables or storing redundant data to optimize query
execution.
Why Denormalization?
Faster Read Queries: Reduces the number of joins needed in SELECT
queries.
Improves Performance: Useful for OLAP (Online Analytical Processing)
systems and reporting.
Reduces Computational Overhead: Joins and aggregations can be expensive
in large datasets.
Denormalization
24.
Common DenormalizationTechniques:
Precomputed Aggregations: Storing calculated values (e.g., total sales per
month) instead of computing them on the fly.
Duplicating Data: Storing frequently accessed columns in multiple tables to
reduce joins.
Adding Derived Columns: Keeping derived data like full names (FirstName +
LastName) instead of computing it at runtime.
Using Composite Tables: Creating summary tables for frequently queried
reports.
Denormalization
25.
Drawbacks ofDenormalization:
Increased Storage: Data redundancy leads to more storage consumption.
Data Inconsistency: Updates and deletions must be carefully managed to
avoid inconsistencies.
More Complex Write Operations: Insert, update, and delete operations may
require changes in multiple places.
When to Use Denormalization?
When database reads are significantly more frequent than writes.
In data warehousing and reporting systems where query performance is
critical.
When optimizing NoSQL databases, where joins are costly.
Denormalization
26.
Advantages:
Improvedquery performance (fewer joins).
Faster retrieval of complex data relationships.
Disadvantages:
Increased storage requirements due to redundancy.
More complex update operations, leading to potential inconsistencies.
When to Use:
Denormalization is typically used in reporting systems or when querying
performance becomes critical.
Pros and Cons of Denormalization
27.
Normalized Schema(3rd Normal Form - 3NF)
Customers (CustomerID, Name, Email, Address)
Orders (OrderID, CustomerID, OrderDate)
OrderDetails (OrderDetailID, OrderID, ProductID, Quantity, Price)
Products (ProductID, ProductName, Category, Price)
👉 Every time we need to display order details, we must join Orders, OrderDetails,
Products, and Customers, which can slow down performance.
To optimize query performance, we can denormalize by adding redundant data to avoid
excessive joins. We introduce a single table called OrdersSummary.
Faster Read Query:
Denormalization Example
OrderID CustomerID CustomerName ProductID ProductName OrderDate Quantity Price TotalPrice
101 1 John Doe 501 Laptop 2024-03-10 1 1000 1000
101 1 John Doe 502 Mouse 2024-03-10 2 20 40
SELECT * FROM OrdersSummary WHERE CustomerID
= 1;