The importance of
data normalization
C R E AT I N G P O S T G R E S Q L D ATA B A S E S
Darryl Reeves
Industry Assistant Professor, New York
University
Example 1: redundant data
Data redundancy can be problematic
CREATE TABLE loan (
borrower_id INTEGER REFERENCES borrower(id),
bank_name VARCHAR(50) DEFAULT NULL,
...
);
CREATE TABLE bank (
id SERIAL PRIMARY KEY,
name VARCHAR(50) DEFAULT NULL,
...
);
CREATING POSTGRESQL DATABASES
Example 1: redundant data
CREATE TABLE loan (
borrower_id INTEGER REFERENCES borrower(id),
bank_name VARCHAR(50) DEFAULT NULL,
...
);
CREATE TABLE bank (
id SERIAL PRIMARY KEY,
name VARCHAR(50) DEFAULT NULL,
...
);
Problem 1: Different banks/same name
Problem 2: Name changes
CREATING POSTGRESQL DATABASES
Example 1: redundant data
CREATE TABLE loan (
borrower_id INTEGER REFERENCES borrower(id),
bank_id INTEGER REFERENCES bank(id),
...
);
Banks share name with distinct ids
Updates to bank names will only affect bank table
CREATING POSTGRESQL DATABASES
Example 2: consolidating records
applicant borrower
id name id name
1 Jane Simmmons 1 Jack Smith
2 Rick Demps 2 Sara Williams
3 Pam Jones 3 Jennifer Valdez
CREATING POSTGRESQL DATABASES
Example 2: consolidating records
applicant borrower
id name id name
1 Jane Simmmons 1 Jack Smith
2 Rick Demps 2 Sara Williams
3 Pam Jones 3 Jennifer Valdez
4 Pam Jones
CREATING POSTGRESQL DATABASES
Example 2: consolidating records
applicant borrower
id name id name
1 Jane Simmmons 1 Jack Smith
2 Rick Demps 2 Sara Williams
3 Pam Jones 3 Jennifer Valdez
CREATING POSTGRESQL DATABASES
Example 2: consolidating records
applicant borrower
id name id name
1 Jane Simmmons 1 Jack Smith
2 Rick Demps 2 Sara Williams
3 Jennifer Valdez
4 Pam Jones
CREATING POSTGRESQL DATABASES
Example 2: consolidating records
CREATE TABLE borrower (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL
);
CREATING POSTGRESQL DATABASES
Example 2: consolidating records
CREATE TABLE borrower (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
approved BOOLEAN DEFAULT NULL
);
approved is NULL => applicant
approved is true => borrower
approved is false => denied application
CREATING POSTGRESQL DATABASES
Why normalize data?
Reduces data duplication
Increases data consistency
Improves data organization
CREATING POSTGRESQL DATABASES
Let's practice!
C R E AT I N G P O S T G R E S Q L D ATA B A S E S
1st Normal Form
C R E AT I N G P O S T G R E S Q L D ATA B A S E S
Darryl Reeves
Industry Assistant Professor, New York
University
Example: maintaining student records
CREATE TABLE student (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
courses VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
Update errors
Insertion errors
Deletion errors
CREATING POSTGRESQL DATABASES
Example: duplicated data after update
id name courses home_room
122 Susan Roth Algebra I, Physics, Spanish II 101
413 Robert Cruz History, Geometry, Biology 204
613 Thomas Wright English III, Chemistry, Algebra II 102
CREATING POSTGRESQL DATABASES
Example: duplicated data after update
id name courses home_room
122 Susan Roth Algebra I, Chemistry, Spanish II 101
413 Robert Cruz History, Geometry, Biology 204
613 Thomas Wright English III, Chemistry, Algebra II 102
CREATING POSTGRESQL DATABASES
Example: duplicated data after update
id name courses home_room
122 Susan Roth Algebra I, Chemistry, Spanish II, Chemistry 101
413 Robert Cruz History, Geometry, Biology 204
613 Thomas Wright English III, Chemistry, Algebra II 102
CREATING POSTGRESQL DATABASES
Example: insertions with column restrictions
CREATE TABLE student (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
courses VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
id name courses home_room
122 Susan Roth Algebra I, Physics, Spanish II 101
413 Robert Cruz History, Geometry, Biology 204
613 Thomas Wright English III, Chemistry, Algebra II 102
CREATING POSTGRESQL DATABASES
Example: insertions with column restrictions
CREATE TABLE student (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
courses VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
id name courses home_room
122 Susan Roth Algebra I, Physics, Spanish II 101
413 Robert Cruz History, Geometry, Biology, French Literature 204
613 Thomas Wright English III, Chemistry, Algebra II 102
CREATING POSTGRESQL DATABASES
Example: data integrity impacted by deleting records
id name courses home_room
122 Susan Roth Algebra I, Physics, Spanish II 101
413 Robert Cruz History, Geometry, Biology 204
613 Thomas Wright English III, Chemistry, Algebra II 102
CREATING POSTGRESQL DATABASES
Example: data integrity impacted by deleting records
id name courses home_room
122 Susan Roth Algebra I, Physics, Spanish II 101
413 Robert Cruz History, Geometry, Biology 204
613 Thomas Wright ??? 102
CREATING POSTGRESQL DATABASES
Satisfying 1st Normal Form (1NF)
1NF Requirement:
Table values must be atomic
CREATING POSTGRESQL DATABASES
Example: student table satisfying 1NF
CREATE TABLE student (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
courses VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
CREATING POSTGRESQL DATABASES
Example: student table satisfying 1NF
CREATE TABLE student (
id INTEGER,
name VARCHAR(50) NOT NULL,
courses VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
CREATING POSTGRESQL DATABASES
Example: student table satisfying 1NF
CREATE TABLE student (
id INTEGER,
name VARCHAR(50) NOT NULL,
course VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
CREATING POSTGRESQL DATABASES
Example: student table satisfying 1NF
id name course home_room
122 Susan Roth Algebra I 101
122 Susan Roth Physics 101
122 Susan Roth Spanish II 101
413 Robert Cruz History 204
413 Robert Cruz Geometry 204
413 Robert Cruz Biology 204
CREATING POSTGRESQL DATABASES
Example: student table satisfying 1NF
CREATE TABLE student (
id INTEGER,
name VARCHAR(50) NOT NULL,
course VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
CREATING POSTGRESQL DATABASES
Example: student table satisfying 1NF
CREATE TABLE student (
student_id INTEGER,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
course VARCHAR(50) NOT NULL,
home_room SMALLINT NOT NULL
);
CREATING POSTGRESQL DATABASES
Example: student table satisfying 1NF
id first_name last_name course home_room
122 Susan Roth Algebra I 101
122 Susan Roth Physics 101
122 Susan Roth Spanish II 101
413 Robert Cruz History 204
413 Robert Cruz Geometry 204
413 Robert Cruz Biology 204
CREATING POSTGRESQL DATABASES
Let's practice!
C R E AT I N G P O S T G R E S Q L D ATA B A S E S
2nd Normal Form
C R E AT I N G P O S T G R E S Q L D ATA B A S E S
Darryl Reeves
Industry Assistant Professor, New York
University
Example: school textbooks
CREATE TABLE textbook (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
publisher_name VARCHAR(100) NOT NULL,
publisher_site VARCHAR(50),
quantity SMALLINT NOT NULL DEFAULT 0
);
CREATING POSTGRESQL DATABASES
Example: school textbooks
id title publisher_name publisher_site quantity
23 Introductory Algebra: 1st Edition ABC Publishing www.abc.com 32
74 Calculus Foundations ABC Publishing www.abc.com 27
112 Statistical Concepts Martin House www.mh.com 22
CREATING POSTGRESQL DATABASES
Example: inconsistency from updating url
id title publisher_name publisher_site quantity
23 Introductory Algebra: 1st Edition ABC Publishing www.abc.com 32
74 Calculus Foundations ABC Publishing www.abc.com 27
112 Statistical Concepts Martin House www.mh.com 22
CREATING POSTGRESQL DATABASES
Example: inconsistency from updating url
id title publisher_name publisher_site quantity
23 Introductory Algebra: 1st Edition ABC Publishing www.newabc.com 32
74 Calculus Foundations ABC Publishing www.abc.com 27
112 Statistical Concepts Martin House www.mh.com 22
CREATING POSTGRESQL DATABASES
Example: adding publisher without textbook
id title publisher_name publisher_site quantity
23 Introductory Algebra: 1st Edition ABC Publishing www.abc.com 32
74 Calculus Foundations ABC Publishing www.abc.com 27
112 Statistical Concepts Martin House www.mh.com 22
CREATING POSTGRESQL DATABASES
Example: adding publisher without textbook
id title publisher_name publisher_site quantity
23 Introductory Algebra: 1st Edition ABC Publishing www.abc.com 32
74 Calculus Foundations ABC Publishing www.abc.com 27
112 Statistical Concepts Martin House www.mh.com 22
?? ?? New Horizons www.nhorizon.com ??
CREATING POSTGRESQL DATABASES
Example: removing a textbook
id title publisher_name publisher_site quantity
23 Introductory Algebra: 1st Edition ABC Publishing www.abc.com 32
74 Calculus Foundations ABC Publishing www.abc.com 27
112 Statistical Concepts Martin House www.mh.com 22
CREATING POSTGRESQL DATABASES
Example: removing a textbook
id title publisher_name publisher_site quantity
23 Introductory Algebra: 1st Edition ABC Publishing www.abc.com 32
74 Calculus Foundations ABC Publishing www.abc.com 27
Publisher requires separate table
Data anomalies from insertions and deletions
CREATING POSTGRESQL DATABASES
Satisfying 2nd Normal Form (2NF)
1NF is satisfied
All non-key columns are dependent on the table's PRIMARY KEY
CREATING POSTGRESQL DATABASES
Example: textbooks and publishers in 2NF
CREATE TABLE textbook (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
publisher_name VARCHAR(100) NOT NULL,
publisher_site VARCHAR(50),
quantity SMALLINT NOT NULL DEFAULT 0
);
CREATING POSTGRESQL DATABASES
Example: textbooks and publishers in 2NF
CREATE TABLE textbook (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
quantity SMALLINT NOT NULL DEFAULT 0,
);
CREATE TABLE publisher (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
site VARCHAR(50)
);
CREATING POSTGRESQL DATABASES
Example: textbooks and publishers in 2NF
CREATE TABLE textbook (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
quantity SMALLINT NOT NULL DEFAULT 0,
publisher_id INTEGER REFERENCES publisher(id)
);
CREATE TABLE publisher (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
site VARCHAR(50)
);
CREATING POSTGRESQL DATABASES
Let's practice!
C R E AT I N G P O S T G R E S Q L D ATA B A S E S
3rd Normal Form
C R E AT I N G P O S T G R E S Q L D ATA B A S E S
Darryl Reeves
Industry Assistant Professor, New York
University
Defining 3rd Normal Form
Requirements
2NF is satisfied
No "transitive dependencies" exist
i.e., All non-key columns are only dependent on the PRIMARY KEY
CREATING POSTGRESQL DATABASES
Transitive dependencies
Involve 3 columns in table
Columns X, Y, Z
column X -> column Y
column Y -> column Z
column X -> column Z
CREATING POSTGRESQL DATABASES
Example: course room assignments
id name teacher num course name -> teacher
157 Algebra Maggie Winters 244 teacher -> room number
162 Physics Maggie Winters 244 course name -> room number
321 Spanish I Jeremy Smith 309
497 History I Sarah Williams 313
613 Spanish II Jeremy Smith 309
CREATING POSTGRESQL DATABASES
Example: course room assignments
id name teacher num course name -> teacher
157 Algebra Maggie Winters 244 teacher -> room number
162 Physics Maggie Winters 244 course name -> room number
321 Spanish I Jeremy Smith 309 (transitive dependency)
497 History I Sarah Williams 313
613 Spanish II Jeremy Smith 309
CREATING POSTGRESQL DATABASES
Example: course room assignments
id name teacher num 1. Updating room number
157 Algebra Maggie Winters 244
162 Physics Maggie Winters 244
321 Spanish I Jeremy Smith 309
497 History I Sarah Williams 313
613 Spanish II Jeremy Smith 309
CREATING POSTGRESQL DATABASES
Example: course room assignments
id name teacher num 1. Updating room number
157 Algebra Maggie Winters 244 2. Adding new teachers
162 Physics Maggie Winters 244
321 Spanish I Jeremy Smith 309
497 History I Sarah Williams 313
613 Spanish II Jeremy Smith 309
CREATING POSTGRESQL DATABASES
Example: course room assignments
id name teacher num 1. Updating room number
157 Algebra Maggie Winters 244 2. Adding new teachers
162 Physics Maggie Winters 244 3. Deleting all courses for a teacher
321 Spanish I Jeremy Smith 309
497 History I Sarah Williams 313
613 Spanish II Jeremy Smith 309
CREATING POSTGRESQL DATABASES
Example: course room assignments
How do we change the structure of our data in order to alleviate these potential problems?
CREATING POSTGRESQL DATABASES
Example: course room assignments
teacher table
id name room_num
1 Maggie Winters 244
2 Jeremy Smith 309
3 Sarah Williams 313
CREATING POSTGRESQL DATABASES
Example: course room assignments
teacher table course_assignment table
id name room_num id name teacher_id
1 Maggie Winters 244 157 Algebra 1
2 Jeremy Smith 309 162 Physics 1
3 Sarah Williams 313 321 Spanish I 2
497 History I 3
613 Spanish II 2
CREATING POSTGRESQL DATABASES
Let's practice!
C R E AT I N G P O S T G R E S Q L D ATA B A S E S