NORMALIZATION OF DATABASE
TABLES
                                                   Lesson 5
   Rueda Street, Calbayog City, Samar, Philippines | +63 (055) 533 9857 | main@nwssu.edu.ph | www.nwssu.edu.ph
Database Tables and Normalization
The table is the basic building block of database design. Consequently, the table’s structure is of great
 interest.
It is possible to create poor table structures even in a good database design. How do you recognize a poor
 table structure, and how do you produce a good table? The answer to both questions involves
 normalization.
Normalization is a process for evaluating and correcting table structures to minimize data redundancies,
 thereby reducing the likelihood of data anomalies. The normalization process involves assigning attributes
 to tables based on the concept of determination discussed in Lesson 3, The Relational Database Model.
Normalization works through a series of stages called normal forms. The first three stages are described as
 first normal form (1NF), second normal form (2NF), and third normal form (3NF). From a structural point of
 view, 2NF is better than 1NF, and 3NF is better than 2NF. For most purposes in business database design,
 3NF is as high as you need to go in the normalization process. However, you will discover that properly
 designed 3NF structures also meet the requirements of fourth normal form (4NF)
         .
Although normalization is a very important ingredient in database design, you should not assume that the
 highest level of normalization is always the most desirable. Generally, the higher the normal form, the more
 relational join operations you need to produce a specified output. Also, more resources are required by the
 database system to respond to end-user queries.
A successful design must also consider end-user demand for fast performance. Therefore, you will
 occasionally need to denormalize some portions of a database design to meet performance requirements.
 Denormalization produces a lower normal form; that is, a 3NF will be converted to a 2NF through
 denormalization. However, the price you pay for increased performance through denormalization is greater
 data redundancy.
        .
The Need for Normalization
Normalization is typically used in conjunction with the entity relationship modeling that you learned in the
 previous lessons.
Database designers commonly use normalization in two situations. When designing a new database
 structure based on the business requirements of the end users, the database designer will construct a data
 model using a technique such as Crow’s Foot notation ERDs.
After the initial design is complete, the designer can use normalization to analyze the relationships among
 the attributes within each entity and determine if the structure can be improved through normalization.
 By analyzing relationships among the attributes or fields in the data structure, the database designer can
 use the normalization process to improve the existing data structure and create an appropriate database
 design. Whether you are designing a new database structure or modifying an existing one, the
 normalization process is the same.
         .
To get a better idea of the normalization process, consider the simplified database activities of a
 construction company that manages several building projects. Each project has its own project number,
 name, assigned employees, and so on. Each employee has an employee number, name, and job
 classification, such as engineer or computer technician.
         The company charges its clients by billing the hours spent on each contract. The hourly billing rate
  is dependent on the employee’s position. For example, one hour of computer technician time is billed at a
  different rate than one hour of engineer time. Periodically, a report is generated that contains the
  information displayed in Table 6.1.
The easiest short-term way to generate the required report might seem to be a table whose contents
 correspond to the reporting requirements. (See Figure 6.1.)
        .
Unfortunately, the structure of the dataset in Figure 6.1 does not conform to the requirements discussed in
 Lesson 3. Consider the following deficiencies:
   1.   The project number (PROJ_NUM) is apparently intended to be a primary key (PK) or at least a part of
        a PK, but it contains nulls. Given the preceding discussion, you know that PROJ_NUM + EMP_NUM
        will define each row.
   2.   The table entries invite data inconsistencies. For example, the JOB_CLASS value “Elect. Engineer”
        might be entered as “Elect.Eng.” in some cases, “El. Eng.” in others, and “EE” in still others.
   3.   The table displays data redundancies that yield the following anomalies:
        a.   Update anomalies. Modifying the JOB_CLASS for employee number 105 requires many potential
             alterations, one for each EMP_NUM = 105.
        b.   Insertion anomalies. Just to complete a row definition, a new employee must be assigned to a project. If
             the employee is not yet assigned, a phantom project must be created to complete the employee data entry.
        c.   Deletion anomalies. Suppose that only one employee is associated with a given project. If that employee
             leaves the company and the employee data is deleted, the project information will also be deleted. To
             prevent the loss of the project information, a fictitious employee must be created.
The Normalization Process
The objective of normalization is to ensure that each table conforms to the concept of well-formed
 relations—in other words, tables that have the following characteristics:
    • Each table represents a single subject. For example, a COURSE table will contain only data that directly
      pertain to courses. Similarly, a STUDENT table will contain only student data.
    • No data item will be unnecessarily stored in more than one table (in short, tables have minimum
      controlled redundancy). The reason for this requirement is to ensure that the data is updated in only
      one place.
    • All nonprime attributes in a table are dependent on the primary key—the entire primary key and
      nothing but the primary key. The reason for this requirement is to ensure that the data is uniquely
      identifiable by a primary key value.
    • Each table is void of insertion, update, or deletion anomalies, which ensures the integrity and
      consistency of the data.
         .
To accomplish the objective, the normalization process takes you through the steps that lead to
 successively higher normal forms. The most common normal forms and their basic characteristic are listed
 in Table 6.2:
The concept of keys is central to the discussion of normalization.
From the data modeler’s point of view, the objective of normalization is to ensure that all tables are at least
 in third normal form (3NF).
Functional Dependence. Before outlining the normalization process, it is a good idea to review the
 concepts of determination and functional dependence that were covered in detail in Chapter 3. Table 6.3
 summarizes the main concepts.
         .
Two types of functional dependencies that are of special interest in normalization are partial dependencies
 and transitive dependencies. A partial dependency exists when there is a functional dependence in which
 the determinant is only part of the primary key (remember the assumption that there is only one candidate
 key).
   • For example, if (A, B) → (C, D), B → C, and (A, B) is the primary key, then the functional dependence B
      → C is a partial dependency because only part of the primary key (B) is needed to determine the value
      of C. Partial dependencies tend to be straightforward and easy to identify.
A transitive dependency exists when there are functional dependencies such that X → Y, Y → Z, and X is
 the primary key. In that case, the dependency X → Z is a transitive dependency because X determines the
 value of Z via Y. Unlike partial dependencies, transitive dependencies are more difficult to identify among a
 set of data. Fortunately, there is an effective way to identify transitive dependencies: they occur only when
 a functional dependence exists among nonprime attributes. In the previous example, the actual transitive
 dependency is X → Z.
However, the dependency Y → Z signals that a transitive dependency exists. Hence, throughout the
 discussion of the normalization process, the existence of a functional dependence among nonprime
 attributes will be considered a sign of a transitive dependency. To address the problems related to
 transitive dependencies, changes to the table structure are made based on the functional dependence that
 signals the transitive dependency’s existence. Therefore, to simplify the description of normalization, from
 this point forward the signaling dependency will be called the transitive dependency.
Conversion To First Normal Form
Eliminate repeating group. A repeating group derives its name from the fact that a group of multiple
 entries of the same type can exist for any single key attribute occurrence.
In Figure 6.1, note that each single project number (PROJ_NUM) occurrence can reference a
 group of related data entries. For example, the Evergreen project (PROJ_NUM = 15) shows
 five entries at this point—and those entries are related because they each share the
 PROJ_NUM = 15 characteristic. Each time a new record is entered for the Evergreen project,
 the number of entries in the group grows by one.         .
.
     .
Conversion To Second Normal Form
Steps for conversion to the Second Normal Form.
    Step 1: Make New Tables to Eliminate Partial Dependencies
    Step 2: Reassign Corresponding Dependent Attributes
               PROJECT (PROJ_NUM, PROJ_NAME)
               EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
               ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
       .
Conversion To Third Normal Form
Steps for conversion to the Third Normal Form.
   Step 1: Make New Tables to Eliminate Transitive Dependencies
   Step 2: Reassign Corresponding Dependent Attributes
        EMP_NUM → EMP_NAME, JOB_CLASS
       .
.
     .
In other words, after the 3NF conversion has been completed, your database will contain four
 tables:.
               PROJECT (PROJ_NUM, PROJ_NAME)
               EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
               JOB (JOB_CLASS, CHG_HOUR)
               ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
       .
End of Lesson 5.