Introduction to Relational
Databases
Chapter 5- Normalisation
Slides adopted from:
Database Design for mere Mortals
Michael L. Hernandez
, 4th edition 1
Outline of Today’s Lecture
• Entity integrity: selecting primary keys
• Database tables and normalization
• The need for normalization
• Functional Dependency
• The normalization process
2
Entity Integrity: Selecting Primary Keys
• Primary keys: single attribute or a combination of attributes
• Uniquely identifies each entity instance
• Guarantees entity integrity
• Works with foreign keys to implement relationships
3
Natural Keys and Primary Keys
• Natural key or natural identifier: real-world identifier used to uniquely identify real-world
objects
• Familiar to end users and forms part of their day-to-day business vocabulary
• Used as the primary key of the entity being modeled
4
Primary Key Guidelines
• Desirable primary key characteristics
• Non intelligent
• No change over time
• Preferably single-attribute
• Preferably numeric
• Security-compliant
5
When to Use Composite Primary Keys
• Identifiers of composite entities
• Each primary key combination is allowed once in M:N relationship
• Identifiers of weak entities
• Strong identifying relationship with the parent entity
• Represents a real-world object that is existence-dependent on another real-world object
• Represented in the data model as two separate entities in a strong identifying relationship
6
The M:N Relationship Between Student and Class
(Coronel & Morris, 2015, p.179) 7
Dependencies
• Determination
• State in which knowing the value of one attribute makes it possible to determine the
value of another
• Establishes the role of a key
• Based on the relationships among the attributes
• Functional dependence: value of one or more attributes determines the value of one or
more other attributes
• Determinant: attribute whose value determines another
• Dependent: attribute whose value is determined by the other attribute
• Full functional dependence: entire collection of attributes in the determinant is necessary
for the relationship
8
Types of Keys
• Keys used in the relational model
• Composite key: key composed of more than one attribute
• Key attribute: attribute that is a part of a key
• Superkey: can uniquely identify any row in the table
• Candidate key: minimal superkey
• Entity integrity: condition in which each row in the table has its own unique identity
• All of the values in the primary key must be unique
• No key attribute in the primary key can contain a null
9
Types of Keys Continued …
• Keys used in the relational model …
• Null: absence of any data value
• Unknown attribute value, known but missing attribute value, or inapplicable condition
• Referential integrity: every reference to an entity instance by another entity instance is valid
• Foreign key: primary key of one table that has been placed into another table to create a
common attribute
• Secondary key: key used strictly for data retrieval purposes
10
Database Tables and Normalization
• Normalization: evaluating and correcting table structures to minimize data redundancies
• Reduces data anomalies
• Assigns attributes to tables based on determination
• Normal forms
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)
11
Database Tables and Normalization Continued
• Structural point of view of normal forms
• Higher normal forms are better than lower normal forms
• Properly designed 3NF structures meet the requirement of fourth normal form (4NF)
• Denormalization: produces a lower normal form
• Results in increased performance and greater data redundancy
12
The Need for Normalization
• Used while designing a new database structure
• Analyzes the relationship among the attributes within each entity
• Determines if the structure can be improved through normalization
• Improves the existing data structure and creates an appropriate database design
13
The Normalization Process
• Objective is to ensure that each table conforms to the concept of well-formed relations
• Each table represents a single subject
• Each row/column intersection contains only one value and not a group of values
• No data item will be unnecessarily stored in more than one table
• All nonprime attributes in a table are dependent on the primary key
• Each table has no insertion, update, or deletion anomalies
14
The Normalization Process
• Ensures that all tables are in at least 3NF
• Higher forms are not likely to be encountered in business environment
• Works one relation at a time
• Identifies the dependencies of a relation (table)
• Progressively breaks the relation up into a new set of relations
15
The Normalization Process
16
Functional Dependency
• A functional dependency occurs when the value of one (set of) attribute(s) determines the
value of a
second (set of) attribute(s):
StudentID → StudentName
StudentID → (DormName, DormRoom, Fee)
• The attribute on the left side of the functional dependency is called the determinant.
• Functional dependencies may be based on equations:
ExtendedPrice = Quantity X UnitPrice
(Quantity, UnitPrice) → ExtendedPrice
17
Composite Determinants
• Composite determinant = a determinant of a functional dependency that consists
of more than one attribute
(StudentName, ClassName) → (Grade)
18
The Normalization Process
• Partial dependency: functional dependence in which the determinant is only part of the primary key
• Assumption: one candidate key
• Straight forward
• Easy to identify
• Transitive dependency: attribute is dependent on another attribute that is not part of the primary
key
• More difficult to identify among a set of data
• Occur only when a functional dependence exists among nonprime attributes
19
Conversion to First Normal Form (1NF)
• Repeating group: group of multiple entries of same type can exist for any single key
attribute occurrence
• Reduces data redundancies
• Three step procedure
• Eliminate the repeating groups
• Identify the primary key
• Identify all dependencies
• Dependency diagram: depicts all dependencies found within given table structure
• Overview of all relationships among table’s attributes
• Less likely that an important dependency will be overlooked
20
Conversion to First Normal Form (1NF)
• 1NF describes tabular format in which:
• All key attributes are defined
• There are no repeating groups in the table
• All attributes are dependent on the primary key
• All relational tables satisfy 1NF requirements
• Some tables contain partial dependencies
• Update, insertion, or deletion
21
Employee Project
Conversion to Emp_Id
123456
Name
Smith John
Pnumber
1
Hours
32.5
First Normal 333444 Wang
2
2
7.5
10
Form- Examples Franklin
3
10
10
10
20 10
1 8
22
Conversion to First Normal Form- Result
Emp_Id Name
113456 SMITH JOHN
333444 Wang Franklin
Emp_Id Pnumber Hours
123456 1 32.5
123456 2 7.5
333444 1 8
333444 2 10
333444 3 10
23
Conversion to Second Normal Form (2NF)
• Conversion to 2NF occurs only when the 1NF has a composite primary key
• If the 1NF has a single-attribute primary key, then the table is automatically in 2NF
• The 1NF-to-2NF conversion is simple
• Make new tables to eliminate partial dependencies
• Reassign corresponding dependent attributes
• Table is in 2NF when it:
• Is in 1NF
• Includes no partial dependencies
24
Dependency Diagram
(Coronel & Morris, 2015, p.208)
25
Second Normal
Form Second
Normal Form (2NF)
Conversion Results
(Coronel & Morris, 2015, p.211)
26
Conversion to Third Normal Form (3NF)
• The data anomalies created by the database organization shown in on previous
slide are easily eliminated
• Make new tables to eliminate transitive dependencies
• Reassign corresponding dependent attributes
• Table is in 3NF when it:
• Is in 2NF
• Contains no transitive dependencies
27
Third Normal
Form (3NF)
Conversion
Results
(Coronel & Morris, 2015, p.212)
28
The Boyce-Codd Normal Form
• Every determinant in the table should be a candidate key
• Candidate key: same characteristics as primary key but not chosen to be
the primary key
• Equivalent to 3NF when the table contains only one candidate key
• Violated only when the table contains more than one candidate key
• Considered to be a special case of 3NF
29
A Table That is in 3NF but Not
In BCNF
(Coronel & Morris, 2015, p.220) 30
Conversion
to BCNF
• (Coronel & Morris, 2015, p.221)
31
Fourth Normal Form (4NF)
• Rules
• All attributes must be dependent on the primary key, but they must be independent of
each other
• No row may contain two or more multivalued facts about an entity
• Table is in 4NF when it:
• Is in 3NF
• Has no multivalued dependencies
32
Tables With
Multivalued
Dependencies
• (Coronel & Morris, 2015, p.222) 33
Normalization and Database Design
• Normalization should be part of the design process
• Proposed entities must meet required the normal form before table structures are created
• Principles and normalization procedures to be understood to redesign and modify databases
• ERD is created through an iterative process
• Normalization focuses on the characteristics of specific entities
34
Denormalization
• Design goals
• Creation of normalized relations
• Processing requirements and speed
• Number of database tables expands
• Tables are decomposed to conform to normalization requirements
35
Denormalization Continued …
• Joining a larger number of tables
• Takes additional input/output (I/O) operations and processing logic
• Reduces system speed
• Defects in unnormalized tables
• Data updates are less efficient because tables are larger
• Indexing is more cumbersome
• No simple strategies for creating virtual tables known as views
36
Any Questions
37