Normalization in Databases
What is Normalization?
Unnormalized data exists in flat files Normalization is the process of moving data into related tables This is usually done by running action queries (Make Table and Append queries).unless youre starting from scratch then do it right the first time!
Why Normalize Tables?
Save typing of repetitive data Increase flexibility to query, sort, summarize, and group data (Simpler to manipulate data!) Avoid frequent restructuring of tables and other objects to accommodate new data Reduce disk space
A Typical Spreadsheet File
Emp No Employee Name Time Card No Time Card Date Dept No Dept Name 10 10 10 10 Thomas Arquette Thomas Arquette Thomas Arquette Thomas Arquette 106 106 106 115 11/02/2002 11/02/2002 11/02/2002 11/09/2002 20 20 20 20 Marketing Marketing Marketing Marketing
99
500 500 700 700 700
Janice Smitty
Alan Cook Alan Cook Ernest Gold Ernest Gold Ernest Gold 107 107 108 116 116 11/02/2002 11/02/2002 11/02/2002 11/09/2002 11/09/2002
10
50 50 50 50 50
Accounting
Shipping Shipping Shipping Shipping Shipping
Employee, Department, and Time Card Data in Three Tables
Table: Employees
EmpNo 10 500 700 99 EmpFirstName Thomas Alan Ernest Janice EmpLastName Arquette Cook Gold Smitty DeptNo 20 50 50 10
Table: Departments
DeptNo DeptName 10 20 50 Accounting Marketing Shipping
Table: Time Card Data
TimeCardNo EmpNo 106 10 TimeCardDate 11/02/2002
107
108 115
Primary Key
500
700 10 700
11/02/2002
11/02/2002 11/09/2002 11/09/2002
116
Another Example of Normalizing
Non-Normalized Table
Normalized Table
Types of Normalization
First Normal Form each field contains the smallest meaningful value the table does not contain repeating groups of fields or repeating data within the same field
Create a separate field/table for each set of related data. Identify each set of related data with a primary key
Tables Violating First Normal Form
PART (Primary Key) WAREHOUSE P0010 P0020 Warehouse A, Warehouse B, Warehouse C Warehouse B, Warehouse D
Really Bad Set-up!
Better, but still flawed!
PART
(Primary Key)
WAREHOUSE A WAREHOUSE B WAREHOUSE C
P0010 P0020
Yes No
No Yes
Yes Yes
Table Conforming to First Normal Form
PART WAREHOUSE (Primary Key) (Primary Key) QUANTITY
P0010 Warehouse A 400
P0010 P0010 P0020 P0020
Warehouse B Warehouse C Warehouse B Warehouse D
543 329 200 278
Second Normal Form usually used in tables with a multiplefield primary key (composite key) each non-key field relates to the entire primary key any field that does not relate to the primary key is placed in a separate table MAIN POINT
eliminate redundant data in a table Create separate tables for sets of values that apply to multiple records
Table Violating Second Normal Form
PART (Primary Key) P0010 P0010 P0010 P0020 P0020 WAREHOUSE (Primary Key) Warehouse A Warehouse B Warehouse C Warehouse B Warehouse D WAREHOUSE QUANTITY ADDRESS 400 543 329 200 278 1608 New Field Road 4141 Greenway Drive 171 Pine Lane 4141 Greenway Drive 800 Massey Street
Tables Conforming to Second Normal Form
PART_STOCK TABLE PART (Primary Key) P0010 P0010 P0010 P0020 P0020 WAREHOUSE (Primary Key) Warehouse A Warehouse B Warehouse C Warehouse B Warehouse D
1
QUANTITY 400 543 329 200 278
WAREHOUSE TABLE
WAREHOUSE_ADDRESS
WAREHOUSE (Primary Key)
Warehouse A
Warehouse B Warehouse C Warehouse D
1608 New Field Road
4141 Greenway Drive 171 Pine Lane 800 Massey Street
Third Normal Form usually used in tables with a singlefield primary key records do not depend on anything other than a table's primary key each non-key field is a fact about the key
Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.
Table Violating Third Normal Form
EMPLOYEE_DEPARTMENT TABLE EMPNO (Primary Key) 000290 000320 000310 FIRSTNAME John Ramlal Maude LASTNAME Parker Mehta Setright WORKDEPT E11 E21 E11 DEPTNAME Operations Software Support Operations
Tables Conforming to Third Normal Form
EMPLOYEE TABLE EMPNO (Primary Key) 000290 000320 000310 DEPARTMENT TABLE FIRSTNAME John Ramlal Maude LASTNAME Parker Mehta Setright WORKDEPT E11 E21 E11
DEPTNO (Primary Key) E11 E21
DEPTNAME Operations Software Support
Example 1
Un-normalized Table:
Student# Advisor# 1022 4123 10 12
Advisor
Adv-Room Class1 101-07 101-07
Class2 143-01 159-02
Class3 159-02 214-01
Susan Jones 412 Anne Smith 216
Table in First Normal Form No Repeating Fields Data in Smallest Parts
Student# Advisor# 1022 1022 1022 4123 4123 4123 10 10 10 12 12 12 AdvisorFName AdvisorLName Susan Susan Susan Anne Anne Anne Jones Jones Jones Smith Smith Smith AdvRoom 412 412 412 216 216 216 Class# 101-07 143-01 159-02 101-07 159-02 214-01
Tables in Second Normal Form Redundant Data Eliminated
Table: Students
Student# Advisor# AdvFirstName AdvLastName 1022 4123 10 12 Susan Anne Jones Smith AdvRoom 412 216
Table: Registration
Student# Class# 1022 1022 101-07 143-01
1022
4123 4123 4123
159-02
201-01 211-02 214-01
Tables in Third Normal Form
Data Not Dependent On Key is Eliminated
Table: Advisors
Advisor# AdvFirstName AdvLastName 10 12 Susan Anne Jones Smith AdvRoom 412 216
Table: Registration
Student# Class# 1022 1022 101-07 143-01
Table: Students
Student# 1022 4123 Advisor# 10 12 StudentFName Jane Mark StudentLName Mayo
1022
4123 4123 4123 Baker
159-02
201-01 211-02 214-01
Relationships for Example 1
Registration Student# Class# Students Student# Advisor# Advisors Advisor# AdvFirstName AdvLastName Adv-Room
Example 2
Un-normalized Table:
EmpID EN1-26 EN1-33 EN1-36 Name Sean Breen Amy Guya Liz Roslyn Dept Code TW TW AC Dept Name Technical Writing Technical Writing Accounting Proj 1 30-T3 30-T3 35-TC Time Proj 1 25% 50% 90% Proj 2 30-TC 30-TC Time Proj 2 40% 35% Proj 3 31-T3 31-T3 Time Proj 3 30% 60%
Table in First Normal Form
EmpID EN1-26 EN1-26 EN1-26 EN1-33 EN1-33 EN1-33 EN1-36 Project Time on Number Project 30-T3 30-TC 31-T3 30-T3 30-TC 31-T3 35-TC 25% 40% 30% 50% 35% 60% 90% Last Name Breen Breen Breen Guya Guya Guya Roslyn First Name Sean Sean Sean Amy Amy Amy Liz Dept Code TW TW TW TW TW TW AC Dept Name Technical Writing Technical Writing Technical Writing Technical Writing Technical Writing Technical Writing Accounting
Tables in Second Normal Form
Table: Employees and Projects
EmpID Project Number Time on Project
25% 40%
Table: Employees
EmpID
EN1-26 EN1-33 EN1-36
Last Name
Breen Guya Roslyn
First Name
Sean Amy Liz
Dept Code
TW TW AC
Dept Name
Technical Writing Technical Writing Accounting
EN1-26 30-T3 EN1-26 30-T3
EN1-26 31-T3
EN1-33 30-T3 EN1-33 30-TC EN1-33 31-T3 EN1-36 35-TC
30%
50% 35% 60% 90%
Tables in Third Normal Form
Table: Employees_and_Projects
EmpID Project Number Time on Project
25% 40%
Table: Employees
EmpID Last Name First Name
Sean
Dept Code
TW
EN1-26 30-T3 EN1-26 30-T3
EN1-26 Breen
EN1-33 Guya EN1-26 31-T3
EN1-33 30-T3 EN1-33 30-TC EN1-33 31-T3 EN1-36 35-TC
Amy
Liz
TW
AC
30%
50% 35% 60% 90% EN1-36 Roslyn
Table: Departments
Dept Code Dept Name TW AC Technical Writing Accounting
Relationships for Example 2
Employees_and_Projects EmpID ProjectNumber Employees EmpID FirstName LastName DeptCode Departments DeptCode DeptName
TimeonProject
Example 3
Un-normalized Table:
EmpID 285 Name Carl Carlson Manager Smithers Dept Sector Spouse/Children
Engineering 6G
365
458
Lenny
Homer Simpson
Smithers
Marketing
8G
7G Marge, Bart, Lisa, Maggie
Mr. Burns Safety
Table in First Normal Form
Fields contain smallest meaningful values
EmpID
285 365 458
FName
Carl Lenny Homer
LName
Carlson
Manager
Smithers Smithers
Dept
Eng.
Sector Spouse Child1 Child2 Child3
6G
Marketing 8G 7G Marge Bart Lisa Maggie
Simpson Mr. Burns Safety
Table in First Normal Form
No more repeated fields
EmpID FName LName Manager Department Sector Dependent
285 365 458 458 458 458
Carl Lenny Homer Homer Homer Homer
Carlson
Smithers Smithers
Engineering Marketing
6G 8G 7G 7G 7G 7G Marge Bart Lisa Maggie
Simpson Mr. Burns Safety Simpson Mr. Burns Safety Simpson Mr. Burns Safety Simpson Mr. Burns Safety
Second/Third Normal Form Remove Repeated Data From Table Step 1
EmpID FName LName Manager Department Sector
285 365 458
Carl Lenny
Carlson
Smithers Smithers
Engineering 6G Marketing 8G 7G
Homer Simpson Mr. Burns Safety EmpID Dependent 458 458 458 458 Marge Bart Lisa Maggie
Tables in Second Normal Form
Removed Repeated Data From Table Step 2
EmpID FName LName ManagerID Dept Sector
285 365 458
Carl Lenny
Carlson
2 2
Engineering 6G Marketing Safety 8G 7G
Homer Simpson 1
EmpID Dependent 458 458 458 458 Marge Bart Lisa Maggie ManagerID Manager 1 2 Mr. Burns Smithers
Tables in Third Normal Form
Employees Table EmpID FName LName DeptCode Manager Table
ManagerID Manager 1 Mr. Burns
285 365 458
Carl Lenny
Carlson
EN MK
Smithers
Homer Simpson SF
Dependents Table EmpID 458 458 458 458 Dependent Marge Bart Lisa Maggie
Department Table DeptCode Department Sector ManagerID
EN MK SF
Engineering 6G Marketing Safety 8G 7G
2 2 1
Relationships for Example 3
Example 4
Table Violating 1st Normal Form
Rep ID Representative TS-89 RK-56 Mary Mayhem Client 1 Italiana Time 1 Client 2 67 hrs Linkers Time 2 Client 3 2 hrs Time 3
Gilroy Gladstone US Corp. 14 hrs
Taggarts 26 hrs
Kilroy Inc. 9 hrs
Table in 1st Normal Form
Rep ID Rep First Name Rep Last Name TS-89 TS-89 TS-89 RK-56 RK-56 Gilroy Gilroy Gilroy Mary Mary Gladstone Gladstone Gladstone Mayhem Mayhem Client ID* 978 665 782 221 982 Client US Corp Taggarts Italiana Linkers Time With Client 14 hrs 26 hrs 67 hrs 2 hrs
Kilroy Inc. 9 hrs
Tables in 2nd and 3rd Normal Form
Rep ID* First Name Rep ID* Client ID* Time With Client TS-89 TS-89 TS-89 RK-56 RK-56 RK-56 978 665 782 221 982 665 14 hrs 26 hrs 9 hrs 67 hrs 2 hrs 4 hrs TS-89 RK-56 Gilroy Mary Client ID* Last Name Gladstone Mayhem Client Name
978
665 782 221
US Corp
Taggarts Kilroy Inc. Italiana
982
This example comes from a tutorial from http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=95 and http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=104 Please check them out, as they are very well done.
Linkers
Example 5
SupplierID Status S1 Table in 1st Normal Form S1 S2 S2 S3 S4 20 20 10 10 10 20 City London London Paris Paris Paris London PartID P1 P2 P1 P2 P2 P2 Quantity 300 200 300 400 200 200
S4
20
London
P4
300
Although this table is in 1NF it contains redundant data. For example, information about the supplier's location and the location's status have to be repeated for every part supplied. Redundancy causes what are called update anomalies. Update anomalies are problems that arise when information is inserted, deleted, or updated. For example, the following anomalies could occur in this table: INSERT. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added until they supplied a part. DELETE. If a row is deleted, then not only is the information about quantity and part lost but also information about the supplier. UPDATE. If supplier s1 moved from London to New York, then two rows would have to be updated with this new information.
Tables in 2NF
Suppliers SupplierID S1 S2 S3 Status 20 10 10 City London Paris Paris Parts
SupplierID PartID S1 S1 S2 S2 S3 P1 P2 P1 P2 P2
Quantity 300 200 300 400 200
S4
S5
20
30
London
Athens
S4
S4
P4
P5
300
400
Tables in 2NF but not in 3NF still contain modification anomalies. In the example of Suppliers, they are: INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be inserted until there is a supplier in the city. DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as the association between supplier and city.
Tables in 3NF
Advantages of Third Normal Form The advantage of having relational tables in 3NF is that it eliminates redundant data which in turn saves space and reduces manipulation anomalies. For example, the improvements to our sample database are: INSERT. Facts about the status of a city, Rome has a status of 50, can be added even though there is not supplier in that city. Likewise, facts about new suppliers can be added even though they have not yet supplied parts. DELETE. Information about parts supplied can be deleted without destroying information about a supplier or a city. UPDATE. Changing the location of a supplier or the status of a city requires modifying only one row.
Additional Notes About Example 3
Going to extremes can create too many tables which in turn can make it difficult to manage your data. The key to developing an efficient database is to determine your needs. A postal carrier may need an Address field broken down into smaller fields for sorting and grouping purposes, but do you? Another good example is Example 3 - leaving the Dept Code field in our completed table design. If you also wanted to track information such as pay rate, health insurance, etc., then a new table that contains company related data for the employee would be necessary. If all you need is to track the department an employee belongs to then leaving it in the Employees table is fine.
In Summary
If you type a data value more than once then consider placing the field in another table. Consider your sorting and grouping needs. If you need to sort or group on a portion of a field, then the field is not broken down into its smallest meaningful value. If you have multiple groups of fields, such as several telephone numbers, then consider eliminating those fields and turning them into records in another table. Think verticallynot horizontally!
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.