Normalization
Stacy Kovar
Normalization is a method for organizing data elements in a database into tables.
Normalization Avoids
    Duplication of Data – The same data is listed in multiple lines of the database
    Insert Anomaly – A record about an entity cannot be inserted into the table without first
      inserting information about another entity – Cannot enter a customer without a sales
      order
    Delete Anomaly – A record cannot be deleted without deleting a record about a related
      entity. Cannot delete a sales order without deleting all of the customer’s information.
    Update Anomaly – Cannot update information without changing information in many
      places. To update customer information, it must be updated for each sales order the
      customer has placed
Normalization is a three-stage process – After the first stage, the data is said to be in first normal
form, after the second, it is in second normal form, after the third, it is in third normal form
Before Normalization
1. Begin with a list of all of the fields that must appear in the database. Think of this as one big
   table.
2. Do not include computed fields
3. One place to begin getting this information is from a printed document used by the system.
4. Additional attributes besides those for the entities described on the document can be added to
   the database.
Before Normalization – Example
See Sales Order from below:
                                        Sales Order
                                     Fiction Company
                                       202 N. Main
                                    Mahattan, KS 66502
 CustomerNumber:         1001                          Sales Order Number:            405
 Customer Name:          ABC Company                   Sales Order Date:         2/1/2000
 Customer Address:       100 Points                    Clerk Number:                  210
                         Manhattan, KS 66502           Clerk Name:       Martin Lawrence
 Item Ordered    Description                               Quantity    Unit Price   Total
 800             widgit small                                    40         60.00    2,400.00
 801             tingimajigger                                   20         20.00      400.00
 805             thingibob                                       10        100.00    1,000.00
                 Order Total                                                        3,800.00
Fields in the original data table will be as follows:
        SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName,
        ItemNo, Description, Qty, UnitPrice
Think of this as the baseline – one large table
Normalization: First Normal Form
      Separate Repeating Groups into New Tables.
      Repeating Groups Fields that may be repeated several times for one document/entity
      Create a new table containing the repeating data
      The primary key of the new table (repeating group) is always a composite key; Usually
       document number and a field uniquely describing the repeating line, like an item number.
First Normal Form Example
The new table is as follows:
      SalesOrderNo, ItemNo, Description, Qty, UnitPrice
The repeating fields will be removed from the original data table, leaving the following.
       SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName
These two tables are a database in first normal form
What if we did not Normalize the Database to First Normal Form?
Repetition of Data – SO Header data repeated for every line in sales order.
Normalization: Second Normal Form
      Remove Partial Dependencies.
      Functional Dependency The value of one attribute in a table is determined entirely by
       the value of another.
      Partial Dependency A type of functional dependency where an attribute is functionally
       dependent on only part of the primary key (primary key must be a composite key).
      Create separate table with the functionally dependent data and the part of the key on
       which it depends. Tables created at this step will usually contain descriptions of
       resources.
Second Normal Form Example
The new table will contain the following fields:
      ItemNo, Description
All of these fields except the primary key will be removed from the original table. The primary
key will be left in the original table to allow linking of data:
        SalesOrderNo, ItemNo, Qty, UnitPrice
Never treat price as dependent on item. Price may be different for different sales orders
(discounts, special customers, etc.)
Along with the unchanged table below, these tables make up a database in second normal form:
       SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName
What if we did not Normalize the Database to Second Normal Form?
      Repetition of Data – Description would appear every time we had an order for the item
      Delete Anomalies – All information about inventory items is stored in the
       SalesOrderDetail table. Delete a sales order, delete the item.
      Insert Anomalies – To insert an inventory item, must insert sales order.
      Update Anomalies – To change the description, must change it on every SO.
Normalization: Third Normal Form
      Remove transitive dependencies.
      Transitive Dependency A type of functional dependency where an attribute is
       functionally dependent on an attribute other than the primary key. Thus its value is only
       indirectly determined by the primary key.
      Create a separate table containing the attribute and the fields that are functionally
       dependent on it. Tables created at this step will usually contain descriptions of either
       resources or agents. Keep a copy of the key attribute in the original file.
Third Normal Form Example
The new tables would be:
      CustomerNo, CustomerName, CustomerAdd
       ClerkNo, ClerkName
All of these fields except the primary key will be removed from the original table. The primary
key will be left in the original table to allow linking of data as follows:
        SalesOrderNo, Date, CustomerNo, ClerkNo
Together with the unchanged tables below, these tables make up the database in third normal
form.
      ItemNo, Description
      SalesOrderNo, ItemNo, Qty, UnitPrice
What if we did not Normalize the Database to Third Normal Form?
      Repetition of Data – Detail for Cust/Clerk would appear on every SO
      Delete Anomalies – Delete a sales order, delete the customer/clerk
      Insert Anomalies – To insert a customer/clerk, must insert sales order.
      Update Anomalies – To change the name/address, etc, must change it on every SO.
Completed Tables in Third Normal Form
Customers: CustomerNo, CustomerName, CustomerAdd
Clerks: ClerkNo, ClerkName
Inventory Items: ItemNo, Description
Sales Orders: SalesOrderNo, Date, CustomerNo, ClerkNo
SalesOrderDetail: SalesOrderNo, ItemNo, Qty, UnitPrice
Source: http://www.dbnormalization.com/examples
Q1. Choose a key and write the dependencies for the following Grades:
relation:
GRADES(Student_ID, Course#, Semester#, Grade)
Answer:
Key is :
Student_ID, Course#, Semester#,
Dependency is:
Student_ID, Course#, Semester#  ->  Grade
Q2. Choose a key and write the dependencies for the LINE_ITEMS relation:
LINE_ITEMS (PO_Number, ItemNum, PartNum, Description, Price, Qty)
Answer:
Key can be: PO_Number, ItemNum
Dependencies are:
PO_Number, ItemNum -> PartNum, Description, Price, Qty
PartNum -> Description, Price
Q3. What normal form is the above LINE_ITEMS relation in?
Answer:
First off, LINE_ITEMS could not be in BCNF because:
not all determinants are keys.
next: it could not be in 3NF because there is a transitive dependency:
PO_Number, ItemNum -> PartNum
and
PartNum -> Description
Therefore, it must be in 2NF, we can check this is true because:
the key of PO_Number, ItemNum determines all of the non-key attributes however, PO_Number by
itself and ItemNum by itself can not determine any other attributes.
Q4: What normal form is the following relation in?
STORE_ITEM( SKU, PromotionID, Vendor, Style, Price )
SKU, PromotionID -> Vendor, Style, Price
SKU -> Vendor, Style 
Answer:
 STORE_ITEM is in 1NF (non-key attribute (vendor) is dependent on only part of the key.
Q5: Normalize the above (Q4) relation into the next higher normal form.
Answer:
STORE_ITEM (SKU, PromotionID, Price)
VENDOR ITEM (SKU, Vendor, Style)
Q6: Choose a key and write the dependencies for the following SOFTWARE relation (assume all of the
vendor’s products have the same warranty).
SOFTWARE (SoftwareVendor, Product, Release, SystemReq, Price, Warranty)
SoftwareVendor, Product, Release -> SystemReq, Price, Warranty
Answer:
key is: SoftwareVendor, Product, Release
SoftwareVendor, Product, Release -> SystemReq, Price, Warranty
SoftwareVendor -> Warranty
.:. SOFTWARE is in 1NF
Question 7: Normalize the above Software relation into 4NF.
Answer:
SOFTWARE (SoftwareVendor, Product, Release, SystemReq, Price)
WARRANTY (SoftwareVendor, Warranty)
Question 8: What normal form is the following relation in?
only H,I can act as the key.
STUFF (H, I, J, K, L, M, N, O)
H, I -> J, K, L
J -> M
K -> N
L -> O
Answer:
2NF (Transitive dependencies exist)
Question 9: What normal form the following relation in?
STUFF2 (D, O, N, T, C, R, Y)
D, O -> N, T, C, R, Y
C, R -> D
D -> N
Answer:
1NF (Partial Key Dependency exist)
Invoice Relation
Is this relation in 1NF? 2NF? 3NF?
Convert the relation to 3NF.
Inv# date custID Name Part# Desc Price #Used                     Ext    Tax     Tax    Total
                                                                Price   rate
 14      12/63     42       Lee      A38   Nut    0.32     10   3.20    0.10    1.22   13.42
 14      12/63     42       Lee      A40   Saw    4.50      2   9.00    0.10    1.22   13.42
 15      1/64      44       Pat      A38   Nut    0.32     20   6.40    0.10    064    7.04
Table not in 1NF because
- it contains derived values
EXT PRICE(=Price X # used)
3.2 = 0.32 X 10
-  Tax (=sum of Ext price of same Inv# X Tax rate)
1.22 = (3.2 + 9.00) X 0.10
-   Total (=sum of Ext price + Tax)
13.42 = (3.20 + 9.00) + 1.22
To get 1NF,   identify PK and remove derived attributes
Inv# date          custID    Name Part# Desc             Price #Used Tax rate
  14     12/63      42        Lee      A38       Nut     0.32   10       0.10
  14     12/63      42        Lee      A40       Saw     4.50    2       0.10
  15      1/64      44         Pat     A38       Nut      32    20       0.10
To get 2NF
- Remove partial dependencies
- Partial FDs with key attributes.
- Inv#  -> Date, CustID, Name, Tax Rate
- Part# -> Desc, Price
Remove Partial FDs
|–K1-||———————–D1———————————||—K2—||——-D2———|
Inv# date custID Name Tax rate Part# Desc Price #Used
    14    12/63   42      Lee        0.10   A38   Nut   0.32   10
    14    12/63   42      Lee        0.10   A40   Saw   4.50   2
    15    1/64    44      Pat        0.10   A38   Nut   32     20
=
Inv# date custID Name            Tax
                                 rate
    14    12/63   42      Lee    0.10
    14    12/63   42      Lee    0.10
    15    1/64    44      Pat    0.10 
Inv# Part# #Used
    14     A38     10
    14    A40      2
    15     A38    20 
Part# Desc        Price
    A38    Nut    0.32
    A40    Saw    4.50
    A38    Nut     32
Remove transitive FD
Inv#(PK) -> CustID  -> Name
    Inv#      date     custID    Name      Tax rate
      14      12/63      42       Lee       0.10
      15      1/64       44       Pat       0.10
=
    Inv#      date     custID   Tax rate
      14      12/63      42      0.10
      15      1/64       44      0.10
+
    custID    Name
      42       Lee
     44        Pat
All relations in 3NF
    Inv#      Part#    #Used
      14      A38        10
      14      A40         2
      15      A38        20 
      Part#            Desc         Price
       A38             Nut          0.32
       A40             Saw          4.50 
    Inv#      date     custID   Tax rate
      14      12/63      42      0.10
      15      1/64       44      0.10 
    custID    Name
      42       Lee
      42       Pat
In this page you will see a basic database normalization example, transforming BCNF table into a 4NF
one(s).
The next table is in the BCNF form, convert it to the 4th normal form.
                   Employee                          Skill                        Language
                    Jones                          electrical                      French
                    Jones                          electrical                      German
                    Jones                         mechanical                       French
                    Jones                         mechanical                       German
                    Smith                         plumbing                         Spanish
The above table does not comply with the 4th normal form, because it has repetitions like this:
                         Jones                    X                      A
                         Jones                    Y                      B
So this data may be already in the table, which means that it’s repeated.
                         Jones                    B                      X
                         Jones                    A                      Y
To transform this into the 4th normal form (4NF) we must separate the original table into two
tables like this:
                                                                  mployeeE             killS
                                                                     Jones           electrical
                                                                     Jones         mechanical
                                                                     Smith          plumbing
                                              And
                                                                  mployeeE          anguageL
                                                                     Jones            French
                                                                     Jones           German
                                                                     Smith            Spanish
To normalize databases, there are certain rules to keep in mind. These pages will illustrate the
basics of normalization in a simplified way, followed by some examples.
Database normalization Rule 1: Eliminate Repeating Groups. Make a separate table for each set of
related attributes, and give each table a primary key.
Unnormalized Data Items for Puppies
    puppy number
    puppy name
    kennel code
    kennel name
    kennel location
    trick ID
    trick name
    trick where learned
    skill level
In the original list of data, each puppy description is followed by a list of tricks the puppy has
learned. Some might know 10 tricks, some might not know any. To answer the question “Can
Fifi roll over?” we need first to find Fifi’s puppy record, then scan the list of tricks associated
with the record.This is awkward, inefficient, and extremely untidy.
Moving the tricks into a separate tablehelps considerably. Seperating the repeating groupsof
tricks from the puppy information results in first normal form. The puppy number in the trick
table matches the primarykey in the puppy table, providing a foreign key for relating the two
tables with a join operation. Now we can answer our question with a direct retrieval look to see if
Fifi’s puppy number and the trick ID for “roll over” appear together in the trick table.
First Normal Form:
Puppy Table
puppy number                     — primary key
puppy name
kennel name
kennel location
Trick Table
puppy number
trick ID
trick name
trick where learned
skill level