DS 5110 – Lecture 2
Database Design
Roi Yehoshua
Agenda
Database design principles
The entity-relationship (ER) model
Reducing ER diagrams to relational schemas
Functional dependencies
Normal forms
2 Roi Yehoshua, 2024
Database Design
The goal in database design is to find a “good” collection of relation schemas
This involves two types of decisions
Business decision: What data should we store in the database?
Computer science decision: How to group these data to form the various tables?
Database design should be done with care
Changes to the database schema may affect large parts of the application code
Methods used in the design process:
Entity-relationship (ER) model
Normalization
3 Roi Yehoshua, 2024
Database Design Phases
Initial phase: Characterize the data needs of the
prospective database users
Conceptual design: Translate user requirements into a
conceptual schema of the database
Logical design: Map the conceptual design into a set of
relational schemas
Physical design: Map the relational schemas into a set
of tables in the selected database system
4 Roi Yehoshua, 2024
Good Database Design Principles
Avoid duplicate information (redundancy)
e.g., storing the course title in each section of the course
Wastes space and can lead to inconsistencies
Store information in its smallest logical parts
e.g., don’t use a single field for a full name
Makes it difficult to retrieve individual facts later
Define a primary key for each table
If there is no column that might make a good primary key, use an auto-increment column
Ensure the integrity of the data by applying integrity constraints
e.g., define foreign keys between related tables
Apply normalization rules
5 Roi Yehoshua, 2024
Entity-Relationship (ER) Model
Used to map entities and interactions in the real world into a conceptual schema
Consists of three main components:
Entity sets
Attributes
Relationship sets
Has an associated graphical representation called an ER diagram (ERD)
6 Roi Yehoshua, 2024
Entity Sets
An entity is an object in the real world
An entity set is a set of entities of the same type that share the same properties
e.g., the set of all instructors at a given university
Each entity has a set of attributes
e.g., instructor = (ID, name, salary)
A subset of the attributes form a primary key that uniquely identifies each entity
Two alternative notations for entity sets:
7 Roi Yehoshua, 2024
Attribute Types
Complex attributes
Composed of simpler attributes, e.g., address
Multivalued attributes
Can have multiple values, e.g., hobbies, phone_number
Derived attributes
Can be computed from other attributes
e.g., age given date_of_birth
8 Roi Yehoshua, 2024
Relationship Sets
A relationship set is an association between several entity sets
Formally, a relationship set is a mathematical relation among n 2 entity sets
If E1, …, En are entity sets, then a relationship set R is a subset of
{(e1 , e2 ..., en ) | e1 E1 , e2 E2 ,..., en En }
A relationships set is represented in an ERD by a diamond
9 Roi Yehoshua, 2024
Relationship Sets with Attributes
A relationship may also have attributes
An attribute of a relationship set is represented by an undivided rectangle, which is
linked with a dashed line to the diamond representing that relationship set
10 Roi Yehoshua, 2024
Roles
The function that an entity plays in a relationship is called that entity’s role
Roles are explicitly specified when the meaning of the relationship needs clarification
e.g., in recursive relationship sets
We indicate roles in ERDs by labeling the lines that connect diamonds to rectangles
11 Roi Yehoshua, 2024
Non-Binary Relationship Sets
Most relationship sets in a database system are binary
However, sometimes relationship sets involve more than two entity sets
Example: students work on research projects under the guidance of an instructor
proj_guide is a ternary relationship between instructor, student, and project
12 Roi Yehoshua, 2024
Relationship Cardinalities
There are four types of relationships with regard to the number of entities that
participate in the relationship from each entity set
One-to-one One-to-many Many-to-one Many-to-many
13 Roi Yehoshua, 2024
One-To-Many Relationship
Exists when each row in one relation has many related rows in the second relation
In ERD, an arrow (→) indicates “one” and an undirected line (—) indicates “many”
For example, a one-to-many relationship between an instructor and a department:
An instructor belongs to one department
A department may have many instructors
instructor department
ID inst_dept dept_name
name building
salary budget
14 Roi Yehoshua, 2024
One-To-One Relationship
Exists when each row in one relation has only one related row in a second relation
For example, a one-to-one relationship between a department and a manager
A department can have only one manager
A manager can manage only one department
manager department
ID dept_mgr dept_name
name building
salary budget
15 Roi Yehoshua, 2024
Many-To-Many Relationship
For example, many-to-many relationship between an instructor and a section:
An instructor may teach multiple courses
A courses may be taught by one or more instructors
instructor course
ID teaches course_id
name title
salary credits
16 Roi Yehoshua, 2024
Total and Partial Participation
Total participation (indicated by double line): every entity participates in at least one
relationship in the relationship set
For example, a university may require every student to have at least one advisor:
Partial participation: some entities may not participate in any relationship in the
relationship set
For example, participation of instructor in advisor is partial
i.e., an instructor need not advise any students
17 Roi Yehoshua, 2024
Expressing Cardinality Constraints
A line may have an associated minimum and maximum cardinality
A minimum value of 1 indicates total participation
A maximum value of * indicates no limit
Example:
Instructor can advise 0 or more students
A student must have exactly one advisor
18 Roi Yehoshua, 2024
Redundant Attributes
Entities shouldn’t have attributes replicating information present in the relationship
For example, a student entity shouldn’t have a dept_name attribute
This information is already included in the relationship set stud_dept
19 Roi Yehoshua, 2024
Weak Entity Sets
A weak entity set is an entity set that doesn’t have a primary key of its own
It is dependent on another entity set to provide it with extra attributes to uniquely
identify its entities
In ERD, a weak entity set is depicted via a double rectangle
The attributes that are part of its primary key are underlined with a dashed line
These are attribute are sometimes called the discriminator
Its associated identifying relationship is depicted by a double diamond
20 Roi Yehoshua, 2024
Specializations
An entity set can be divided into lower-level entity sets based on their characteristics
For example, a person entity can be specialized into employee and student
The lower-level entity sets inherit all the attributes and relationship participation of
the higher-level entity set to which they are linked
Depicted in ERD by a triangle arrow
21 Roi Yehoshua, 2024
Aggregation
ER models cannot express relationships among relationships
For example, consider the ternary relationship proj_guide
Suppose that each instructor guiding a student on a project is required to file a
monthly evaluation report
• Relationship sets eval_for and proj_guide
represent overlapping information
• Every instructor, student, project combination
in eval_for must also be in proj_guide
• However, some proj_guide relationships may
not correspond to any eval_for relationships
22 Roi Yehoshua, 2024
Aggregation
We can eliminate this redundancy by using aggregation
We treat the relationship proj_guide as a higher-level entity set
We then create a relationship eval_for between proj_guide and evaluation
23 Roi Yehoshua, 2024
Summary of Symbols used in ERD
24 Roi Yehoshua, 2024
Example: ERD for a University Organization
25 Roi Yehoshua, 2024
ERD Tools
There are many data modeling tools available online that support ER diagrams
I recommend using Visual Paradigm which provides a free community edition
Download from https://www.visual-paradigm.com/download/community.jsp
26 Roi Yehoshua, 2024
Creating an ERD
You can build an ERD from scratch or start from a template
27 Roi Yehoshua, 2024
Defining an Entity Set
You can add an entity by dragging it from the left sidebar
Right-click on the entity to add columns
28 Roi Yehoshua, 2024
Creating an Entity Set
To create a primary key right click on a column and choose Include in Primary Key
29 Roi Yehoshua, 2024
Defining a Relationship
To define a relationship click on one of the entities and drag out the Resource
Catalog icon at the top right to the second entity
Release the mouse button and select the type of relationship
30 Roi Yehoshua, 2024
Exporting the ERD
You can export the diagram to a file like PDF or image
You can even generate a file with the DDL commands to create the database
Right click anywhere in the editor and choose Utilities > Generate SQL…
31 Roi Yehoshua, 2024
Class Exercise
Design a database for a library management system
Each book has a unique identification number (ISBN), title, one or more authors, genre (e.g.,
“Mystery”, “Biography”, “Fantasy”) and publication year
Each book has one or more copies in the library
Each library member has a reader id, name and address
Library members are able to check-out any copy of any book
For each check-out, the system stores the check-out date and the return date
Members can also reserve books that are not currently available
For each book reservation, the system stores the reservation date
32 Roi Yehoshua, 2024
Reducing ERD to Relational Schemas
There are a set of rules you can use to convert an ER design into relational schemas
First, for each entity set we define a relational schema with the same attributes
The primary key of the entity set serves as the primary key of the resulting schema
Examples
instructor (ID, name, salary)
student (ID, name, tot_cred)
course (course_id, title, credits)
classroom (building, room_number, capacity)
34 Roi Yehoshua, 2024
One-To-Many Relationships
Represented by adding an extra attribute to the “many” side, which is linked to the
primary key of the “one” side
For example, the relationship between instructor and department is represented by
adding an attribute dept_name to the schema of instructor:
instructor department instructor (ID, name, salary, dept_name)
ID inst_dept dept_name
name building
salary budget
dept_name will be a foreign key referencing the primary key of department
35 Roi Yehoshua, 2024
One-To-One Relationships
For one-to-one relationship sets, either side can be chosen to act as the “many” side
i.e., an extra attribute can be added to either of the tables corresponding to the two entities
manager department manager (ID, name, salary, dept_name)
ID dept_mgr dept_name
name building OR
salary budget department (dept_name, building,
salary, manager_ID)
36 Roi Yehoshua, 2024
Many-To-Many Relationships
Assume we have a many-to-many relationship between two entity sets A and B
Let {a1, … , am} be the attributes of the primary key of A
Let {b1, … , bn} be the attributes of the primary key of B
We represent the relationship set between A and B by a new schema R
R = {a1 , a2 ,..., am } {b1 , b2 ,..., bn }
R may contain additional attributes that are not part of the primary key
The primary key of R consists of union of the attributes a1, … , am, b1, … , bn
We also define foreign key constraints on the attributes {a1, … , am} and {b1, … , bn}
instructor course
ID teaches course_id teaches (ID, course_ID)
name title
salary credits
37 Roi Yehoshua, 2024
Weak Entity Sets
A weak entity set is represented by a schema that includes the attributes of the
primary key of its identifying entity set
The primary key of the relation is a combination of the primary key of the strong
entity and the discriminator of the weak entity
We also define a foreign key constraint on the attributes of the identifying entity
section (course_id, sec_id, semester, year)
38 Roi Yehoshua, 2024
Representing Specializations
Method (1)
Create a schema for the higher-level entity set
For each lower-level entity set, create a schema that person (ID, name, street, city)
includes the primary key of the higher-level entity set student (ID, tot_cred)
Drawback: getting information about the lower-level employee (ID, salary)
entity requires accessing two relations
Method (2)
Create schemas only for the lower-level entity sets
The schema of the lower-level entity set includes the student (ID, name, street, city, tot_cred)
inherited attributes from the higher-level entity set employee (ID, name, street, city, salary)
Drawback: redundant definition of attributes
39 Roi Yehoshua, 2024
The University Relational Schema
40 Roi Yehoshua, 2024
Class Exercise
Map the ERD of book reservation system into a set of relational schemas
41 Roi Yehoshua, 2024
Normalization
Structuring a relational database in accordance with a series of normal forms
Goals:
Reduce data redundancy
Improve data integrity
First proposed by Edgar F. Codd as part of his relational model
There are six normal forms numbered from 1NF to 6NF (see next slide)
The normalization process is progressive
A higher level of normalization cannot be achieved unless the previous levels are satisfied
A database is often described as "normalized" if it meets third normal form
Normal forms beyond 4NF are mainly of academic interest
The problems they exist to solve rarely appear in practice
43 Roi Yehoshua, 2024
Normal Forms
Properties of the normal forms (from Wikipedia)
44 Roi Yehoshua, 2024
Normal Forms
Goal: Decide whether a particular relation R is in “good” form
If R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that
Each relation is in good form
The decomposition is lossless
i.e., there is no loss of information by replacing R with the set {R1, R2, ..., Rn}
To decide if R is in a good form we use the concept of functional dependencies
45 Roi Yehoshua, 2024
Normal Forms
Suppose we combine instructor and department into in_dep
Issues in this relation
Repetition of information
Need to use null values (if we add a new department with no instructors)
To avoid these issues, we need to decompose in_dep into two schemas
46 Roi Yehoshua, 2024
Functional Dependencies
Let R be relational schema and R, R, sets of attributes
The functional dependency → holds on R, whenever any two tuples t1 and t2 of
r(R) agree on the attributes , they also agree on the attributes
t1[] = t2[] t1[ ] = t2[ ]
Example: Consider R(A, B) with the following instance r
A B
1 4
2 5
3 5
On this instance, A → B holds but B → A does not hold
47 Roi Yehoshua, 2024
Keys and Functional Dependencies
K is a superkey for relation schema R if and only if K → R
K is a candidate key for R if and only if
K→R
For no K, → R
Functional dependencies allow us to express constraints that superkeys cannot
Consider the schema:
in_dep (ID, name, salary, dept_name, building, budget)
We expect these functional dependencies to hold:
dept_name → building, budget
ID → dept_name
but would not expect the following to hold:
dept_name → salary
48 Roi Yehoshua, 2024
Trivial Function Dependencies
A functional dependency is trivial if it is satisfied by all instances of a relation
Example:
name → name
ID, name → ID
In general, → is trivial if
49 Roi Yehoshua, 2024
Closure of a Set of Functional Dependencies
Given a set F of functional dependencies
The closure of F is the set of all functional dependencies logically implied by F
e.g., if A → B and B → C, then we can infer that A → C
We denote the closure of F by F+
We can compute F+ by repeatedly applying Armstrong’s Axioms:
Reflexive rule: if , then →
Augmentation rule: if → , then →
Transitivity rule: if → and → , then →
These rules are
Sound - generate only functional dependencies that actually hold
Complete - generate all functional dependencies that hold
50 Roi Yehoshua, 2024
Example of F+
R = (A, B, C, G, H, I)
F = {A → B,
A→C
CG → H
CG → I
B → H}
Some members of F+
A→H
by transitivity from A → B and B → H
AG → I
by augmenting A → C with G to get AG → CG, and then transitivity with CG → I
CG → HI
by augmenting CG → I to infer CG → CGI,
and augmenting CG → H to infer CGI → HI, and then transitivity
51 Roi Yehoshua, 2024
Closure of Attribute Sets
Given a set of attributes , define the closure of under F (denoted by +) is the set
of attributes that are functionally determined by under F
Algorithm to compute +:
result := {}
while (changes to result) do
for each → in F do
begin
if result then result := result
end
52 Roi Yehoshua, 2024
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A → B,
A→C
CG → H
CG → I
B → H}
(AG)+
result = AG
result = ABCG (A → B and A → C)
result = ABCGHI (CG → H and CG → I)
AG is a candidate key, since (AG)+ = R
and no subset of AG is a candidate key
53 Roi Yehoshua, 2024
Class Exercise
Given the schema R = (A, B, C, D, E, H, I) with the functional dependencies:
F = {A→B, C→D, CD→E, BD→AH, H→D, AC→H}
Find all the candidate keys of R
54 Roi Yehoshua, 2024
Lossless Decompositions
We can use functional dependencies to know when a decomposition is lossless
A lossless-join decomposition is a decomposition of a relation R into relations
{R1, R2, ..., Rn} such that a natural join of R1, R2, ..., Rn yields back the original relation
For the case of R = (R1, R2) a decomposition is lossless if at least one of the following
dependencies is in F+
R1 R2 → R1
R1 R2 → R2
55 Roi Yehoshua, 2024
Example
R = (A, B, C)
F = {A → B, B → C}
R1 = (A, B), R2 = (B, C)
Lossless decomposition:
R1 R2 = {B} and B → BC
R1 = (A, B), R2 = (A, C)
Lossless decomposition:
R1 R2 = {A} and A → AB
56 Roi Yehoshua, 2024
Dependency Preservation
At least one decomposed table must satisfy every functional dependency
If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2
In other words, the closure of F with respect to the attributes in R1 and R2 should be
the same as the closure of F with respect to the attributes in R
57 Roi Yehoshua, 2024
Dependency Preservation
For example, consider the schema:
dept_advisor(student_ID, instructor_ID, department_name)
With function dependencies:
instructor_ID → dept_name
student_ID, dept_name → instructor_ID
In this design, the department name is repeated once for each time an instructor
participates in a dept_advisor relationship
To fix this, we need to decompose dept_advisor
Any decomposition will not include all the attributes in
student_ID, dept_name → instructor_ID
Thus, the decomposition is NOT dependency preserving
58 Roi Yehoshua, 2024
Class Exercise
Given the schema R = (A, B, C, D, E, H, I) with the functional dependencies:
F = {A→B, C→D, CD→E, BD→AH, H→D, AC→H}
Is the decomposition of R into (A, B, C, D, I) and (B, C, E, H) join-lossless?
Is it dependency preserving?
59 Roi Yehoshua, 2024
First Normal Form
A relational schema R is in 1NF if the domains of all attributes of R are atomic
i.e., cannot be divided into smaller units
Examples of non-atomic attributes: sets of values, composite attributes
Example for a table not in 1NF
Non-atomic values complicate storage and encourage redundant storage of data
60 Roi Yehoshua, 2024
Second Normal Form
A functional dependency on part of any candidate key is a violation of 2NF
For example, the Book table below has a composite candidate key of {Title, Format}
All of the attributes that are not part of the candidate key depend only on Title
However, Price also depends on Format
Thus, the table doesn’t satisfy 2NF
61 Roi Yehoshua, 2024
Boyce-Codd Normal Form
A schema R is in BCNF if for all
→ in F+
at least one of the following holds:
→ is trivial (i.e., )
is a superkey of R
Example for schema that is not in BCNF:
in_dep (ID, name, salary, dept_name, building, budget)
Because:
dept_name → building, budget
but dept_name is not a superkey
Need to decompose in_dept into instructor and department
62 Roi Yehoshua, 2024
BCNF Decomposition
Let R be a schema that is not in BCNF
Let → be the functional dependency that causes a violation of BCNF
We decompose R into:
U
R-
In our example of in_dep
= dept_name
= building, budget
and in_dep is replaced by
U = (dept_name, building, budget)
R - = (ID, name, salary, dept_name)
63 Roi Yehoshua, 2024
BCNF Decomposition Algorithm
result := {R};
done := false;
compute F+;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let → be a nontrivial functional dependency that holds on Ri such that:
→ Ri is not in F+
and =
result := (result – Ri ) (Ri – ) (, )
end
else done := true;
64 Roi Yehoshua, 2024
Example of BCNF Decomposition
class (course_id, title, dept_name, credits, sec_id, semester, year, building,
room_number, capacity, time_slot_id)
Functional dependencies:
course_id → title, dept_name, credits
building, room_number → capacity
course_id, sec_id, semester, year → building, room_number, time_slot_id
Candidate key: {course_id, sec_id, semester, year}
BCNF Decomposition:
course_id → title, dept_name, credits holds
but course_id is not a superkey
We replace class by:
course (course_id, title, dept_name, credits)
class-1 (course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id)
65 Roi Yehoshua, 2024
Example of BCNF Decomposition
course is in BCNF
How do we know this?
building, room_number → capacity holds on class-1
but {building, room_number} is not a superkey for class-1
We replace class-1 by:
classroom (building, room_number, capacity)
section (course_id, sec_id, semester, year, building, room_number, time_slot_id)
classroom and section are in BCNF
66 Roi Yehoshua, 2024
Dependency Preservation
It is not always possible to achieve both BCNF and dependency preservation
Consider a schema:
dept_advisor(student_ID, advisor_ID, dept_name)
With function dependencies:
student_ID, dept_name → advisor_ID
advisor_ID → dept_name
dept_advisor is not in BCNF
advisor_ID is not a superkey
Any decomposition of dept_advisor will not include all the attributes in
student_ID, dept_name → advisor_ID
Thus, the decomposition is NOT dependency preserving
67 Roi Yehoshua, 2024
Class Exercise
Given the schema R = (A, B, C, D, E, H, I) with the functional dependencies:
F = {A→B, C→D, CD→E, BD→AH, H→D, AC→H}
Find a BCNF decomposition of R
Is it dependency preserving?
68 Roi Yehoshua, 2024
3NF
A schema R is in third normal form (3NF) if for all
→ in F+
at least one of the following holds:
→ is trivial (i.e., )
is a superkey of R
Each attribute A in – is contained in a candidate key of R
Each attribute may be in a different candidate key
If a relation is in BCNF it is in 3NF
since in BCNF one of the first two conditions above must hold
Third condition is a minimal relaxation of BCNF to ensure dependency preservation
There is always a lossless-join, dependency-preserving decomposition into 3NF
69 Roi Yehoshua, 2024
3NF Example
Consider the schema:
dept_advisor(student_ID, advisor_ID, dept_name)
With function dependencies:
student_ID, dept_name → advisor_ID
advisor_ID → dept_name
Two candidate keys = {student_ID, dept_name}, {student_ID, advisor_ID}
We have seen before that dept_advisor is not in BCNF
dept_advisor, however, is in 3NF
{student_ID, dept_name} is a superkey
advisor_ID → dept_name and advisor_ID is NOT a superkey, but:
{dept_name} – {advisor_ID} = {dept_name} and
dept_name is contained in a candidate key
70 Roi Yehoshua, 2024
Redundancy in 3NF
Consider an instance of the advisor schema:
student_ID advisor_ID dept_name
S1 A1 D1
S2 A1 D1
S3 A1 D1
NULL A2 D2
What is wrong with the table?
Repetition of information
Need to use null values
to represent the relationship (advisor, dept_name) where there is no corresponding value for student
71 Roi Yehoshua, 2024
Comparison of BCNF and 3NF
Advantages to 3NF over BCNF:
Simpler to understand and apply
Less restrictive, allowing for more flexibility in database design
Sufficient for most applications
Always possible to obtain a 3NF design without losing information or dependency
preservation
Disadvantages to 3NF
Potential for redundancy
Less robust: doesn’t guarantee that every determinant is a candidate key
We may have to use null values to represent some of the possible meaningful relationships
72 Roi Yehoshua, 2024
ER Model and Normalization
When an ER diagram is carefully designed, identifying all entities correctly, the tables
generated from the ER diagram should not need further normalization
However, in a real (imperfect) design, there can be functional dependencies from
non-key attributes of an entity to other attributes of the entity
Example: an employee entity with
Attributes: dept_name and building,
Functional dependency: dept_name → building
Good design would have made department an entity
73 Roi Yehoshua, 2024