KEMBAR78
Unit Two | PDF | Information Science | Computer Data
0% found this document useful (0 votes)
36 views11 pages

Unit Two

This document discusses database normalization. It provides an overview of normalization, including the goals and benefits. It also describes common anomalies that can occur without normalization, and outlines the different normal forms up to 5NF. Examples are provided to illustrate concepts like insertion, update, and deletion anomalies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views11 pages

Unit Two

This document discusses database normalization. It provides an overview of normalization, including the goals and benefits. It also describes common anomalies that can occur without normalization, and outlines the different normal forms up to 5NF. Examples are provided to illustrate concepts like insertion, update, and deletion anomalies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit Two: Normalization

This unit is developed to provide you the necessary information regarding the following
content coverage and topics

 Identification of suitable business data


 Rules of normalization
 Normalize business data and document results
 Compare normalization results with ER diagram
 Reconcile differences between data

2.1. Identify Suitable Business Data


When developing normalization for a database, it is important to identify suitable business data that accurately
represents the entities and relationships within the system. Here are some steps to help identify suitable business data
for normalization:
1. Understand the Business Processes: Gain a deep understanding of the business processes and operations that the
database will support. This involves analyzing the workflow, data flows, and interactions between different
entities within the organization. Identify the key entities, their attributes, and the relationships between them.
2. Conduct Stakeholder Interviews: Engage with stakeholders, such as business owners, managers, and end-users,
to gather their input on the data requirements. Conduct interviews to understand their needs, pain points, and
how they interact with the data. This will help identify the critical data elements and relationships that need to
be captured in the database.
3. Review Existing Documentation: Examine any existing documentation, such as business requirements
documents, process flowcharts, or data dictionaries. This can provide valuable insights into the data elements
that are already identified or documented. Reviewing existing documentation will help ensure that no important
data elements are overlooked during the normalization process.
4. Analyze Sample Data: Analyze sample data sets to identify the different types of data that need to be captured.
This can be done by reviewing existing spreadsheets, forms, reports, or any other sources of data. Identify the
unique values, common patterns, and potential data dependencies within the sample data. This analysis will help
determine the appropriate entities, attributes, and relationships that should be included in the normalized
database.
5. Identify Key Business Rules: Understand the business rules that govern the data and its relationships. These rules
define the logic, constraints, and dependencies that shape the data. Identify any unique constraints, mandatory
fields, or conditional relationships that need to be captured in the database design.
6. Consider Future Growth and Scalability: Anticipate future growth and scalability requirements of the database.
Identify data elements that are likely to change or expand over time. This includes considering potential new
features, additional data sources, or changes in business processes. By planning for future growth, you can ensure
that the normalized database can accommodate evolving business needs without significant redesign or
restructuring.
7. Collaborate with Database Experts: Seek guidance from database experts or experienced database administrators.
They can provide valuable insights into best practices for normalization and help identify suitable business data
based on their expertise and knowledge.
Remember, the goal of normalization is to eliminate redundancy and improve data integrity. By carefully identifying
suitable business data, you can create a well-structured and efficient database that accurately represents the business
processes and supports the organization's goals.
2.2. Understand Rules of Normalization
2.2.1 Normalization
 It is a technique for producing a set of relations with desirable properties, given the data requirements of an
enterprise.Database designed based on ER model may have some amount of inconsistency, ambiguity and
redundancy. To resolve these issues, some amount of refinement is required. This refinement process is called
as Normalization.
 The process of normalization is a formal method that identifies relations based on their primary key.Primarily it
is a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary
redundancy of data.
 It is the process of decomposing relations with anomalies to produce smaller, well-structured relations and helps
eliminate data anomalies/problems
2.2.2 Benefits of database normalization
 Reduced usage of storage space by intelligently categorizing data.
 It enables better, faster, stronger searches as it entails fewer entities to scan in comparison with the earlier
searches based on mixed entities.
 Improves data integrity: it splits all the data into individual entities yet building strong linkages with the related
data.
 More efficient database structure.
 Better understanding of data.
 More flexible database structure.
 Easier to maintain database structure.
 Few (if any) costly surprises down the road.
 Validates your common sense and intuition.
 Avoids redundant fields.
 Ensures that distinct tables exist when necessary
2.2.3 Anomalies in Normalization
Anomalies in normalization refer to inconsistencies or issues that can occur in a database when it is not properly
normalized. These anomalies can affect data integrity, accuracy, and the ability to perform efficient data operations.
There are three main types of anomalies that can occur:
1. Insertion Anomaly: An insertion anomaly happens when it is not possible to insert a new record into a table
without including additional, unrelated data. This occurs when a table has attributes that are functionally
dependent on only a part of the primary key.
2. Update Anomaly: An update anomaly occurs when modifying data in a table leads to inconsistencies or
redundant updates in other parts of the table. This happens when a table has redundant data or dependencies
between non-key attributes.
3. Deletion Anomaly: A deletion anomaly occurs when removing data from a table unintentionally removes other
related data that should have been preserved. This happenswhen a table has dependencies between attributes, and
removing data leads to the loss of other necessary data.
Normalization helps to eliminate these anomalies by organizing data into well-structured tables, ensuring data
dependencies are properly defined, and reducing redundancy. By achieving higher levels of normalization, such as
3NF or BCNF, the likelihood of anomalies occurring is minimized, and data integrity is improved
Example
We’ll be using a student database as an example in this article, which records student, class, and teacher information.

Insert Anomaly

For example, if we wanted to add a new student but did not know their course name this will be how

Update Anomaly
For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would have to query all of the
columns that could have this Class field and rename each one that was found.
Delete Anomaly
For example, let’s say Susan Johnson quits and her record needs to be deleted from the system. We could delete her
row:

Normalization stages
 1NF - First normal form
 2NF - Second normal form
 3NF - Third normal form
 3.5NF - Boyce Codd Normal Form (BCNF)
 4NF - Fourth normal form
 5NF - Fifth normal form

The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are
referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as First normal
form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF
along with the occasional 4NF. Fifth normal form is very rarely seen.

I. First normal form (1NF):Eliminating Repeating Groups


 Is the "basic" level of normalization, corresponds to the definition of any database
 It contains two-dimensional tables with rows and columns.
 Each column corresponds to a sub-object or an attribute of the object represented by the entire table.
 Each row represents a unique instance of that sub-object or attribute and must be different in some way from
any other row (no duplicate rows are possible).
 All entries in any column must be of the same kind. For example, in the column labeled "Customer," only
customer names or numbers are permitted.
Requirements:
 There should be a primary key, and
 No column should contain more than one value (no multi-valued attributes)
In order to perform first normalizationrule, we have to consider the following concepts
 Eliminate duplicative columns from the same table.
 Create separate tables for each group of related data and identify each row with a unique column or set of columns
(theprimary key).
 Eliminate composite attributes
Example 1: Develop 1NF of the following table
Student Details Course Details Result details
1001 Ram 11/09/1986 M4 Basic Math’s 7 11/11/2004 89 A
1002 Shyam 12/08/1987 M4 Basic Math’s 7 11/11/2004 78 B
1001 Ram 23/06/1987 H6 4 11/11/2004 87 A
1003 Sita 16/07/1985 C3 Basic Chemistry 11 11/11/2004 90 A
1004 Gita 24/09/1988 B3 8 11/11/2004 78 B
1002 Shyam 23/06/1988 P3 Basic Physics 13 11/11/2004 67 C
1005 Sunita 14/09/1987 P3 Basic Physics 13 11/11/2004 78 B
1003 Sita 23/10/1987 B4 5 11/11/2004 67 C
1005 Sunita 13/03/1990 H6 4 11/11/2004 56 D
1004 Gita 21/08/1987 M4 Basic Math’s 7 11/11/2004 78 B

The above table Student Details, Course Details and Result Details can be further divided.
 Student Details attribute is divided into Student#(Student Number), Student Name and date of birth.
 Course Details is divided into Course#, Course Name and duration.
 Results attribute is divided into Date ofexam, Marks and Grade.
II. Second normal form (2NF): Eliminating Redundant Data
 Second normal form (2NF) requires that all non-key columns are fully dependent on the entire primary key.
If the table has only a single-column primary key, this requirement is easily met.
 At this level of normalization, each column in a table that is not a determiner of the contents of another
column must itself be a function of the other columns in the table. For example, in a table with three columns
containing customer ID, product sold, and price of the product when sold, the price would be a function of
the customer ID (entitled to a discount) and the specific product.
In order to perform first normalizationrule, we have to consider the following concepts
 Meet all the requirements of the first normal form and remove subsets of data that apply to multiple rows of
a table and place them in separate tables.
 Create relationships between these new tables and their predecessors through the use of foreign keys.
Example 2
Let us re-visit 1NF table structure.
 Student# is key attribute for Student,
 Course# is key attribute for Course
 Student# and Course# together form the composite key attributes for result relationship.
 Other attributes are non - key attributes.
To make this table 2NF compliant, we have to remove all the partial dependencies.
 Student Name and Date ofBirth depends only on student#.
 CourseName, Pre-Requisite and DurationInDays depends only on Course#
 Date ofExam depends only on Course#.
To remove this partial dependency, we need to split Student_Course_Result table into four separate tables,
STUDENT, COURSE, RESULT and EXAM_DATE tables as shown in the following:
STUDENT TABLE
Student # Student Name DateofBirth
1001 Ram Some value
1002 Shyam Some value
1003 Sita Some value
1004 Geeta Some value
1005 Sunita Some value
COURSE TABLE
Course# CourseName Duration of days
C3 Bio Chemistry 3
B3 Botany 8
P3 Nuclear Physics 1
M4 Applied Mathematics 4
H6 American History 5
B4 Zoology 9

RESULT TABLE
Student# Course# Marks Grade

1001 M4 89 A
1002 M4 78 B
1001 H6 87 A
1003 C3 90 A
1004 B3 78 B
1002 P3 67 C
1005 P3 78 B
1003 B4 67 C
1005 H6 56 D
1004 M4 78 B

EXAM DATE Table


Course# DateOfExam

M4 Some value
H6 Some value
C3 Some value
B3 Some value
P3 Some value
B4 Some value

 In STUDENT table, the key attribute is Student# and all other non-key attributes, Student name and Date
ofBirth are fully functionally dependent on the key attribute.
 In COURSE table, the key attribute is Course# and all the non-key attributes, Course name, Duration in days
are fully functional dependent on the key attribute.
 In RESULT table, the key attributes are #StudentCourse# together and all other non-key attributes, Marks and
Grade are fully functional dependent on the key attributes.
 In EXAM DATE table, the key attribute is Course# and the non key attribute Date ofExam is fully
functionally dependent on the key attribute.

At first look it appears like all our anomalies are taken away! Now we are storing Student 1003 and M4 record
only once. We can insert prospective students and courses at our will. We will update only once if we need to
change any data in STUDENT, COURSE tables. We can get rid of any course or student details by deleting just
one row.

Let us analyze the RESULT Table


We already concluded that:
 All attributes are atomic in nature
 No partial dependency exists between the key attributes and non-key attributes
 RESULT table is in 2NF
Assume, at present, as per the university evaluation policy,
 Students who score more than or equal to 80 marks are awarded with “A” grade
 Students who score more than or equal to 70 marks up till 79 are awarded with “B” grade
 Students who score more than or equal to 60 marks up till 69 are awarded with “C” grade
 Students who score more than or equal to 50 marks up till 59 are awarded with “D” grade
The University management which is committed to improve the quality of education wants to change the existing
grading system to a new grading system. In the present RESULT table structure,
 We don’t have an option to introduce new grades like A+, B- and E
 We need to do multiple updates on the existing record to bring them to new grading definition
 We will not be able to take away “D” grade if we want to.
 2NF does not take care of all the anomalies and inconsistencies.

III. Third normal form (3NF): Eliminating Columns Not Dependent on Keys
Requires that there are no transitive dependencies, where one column depends on another column which depends on
the primary key.At the 2NF, modifications are still possible because a change to one row in a table may affect data
that refers to this information from another table. For example, using the customer table just cited, removing a row
describing a customer purchase (because of a return perhaps) will also remove the fact that the product has a certain
price. In the third normal form, these tables would be divided into two tables so that product pricing would be tracked
separately.
In order to perform first normalizationrule, we have to consider the following concepts
 Meet all the requirements of the second normal form.
 Remove columns that are not dependent upon the primary key.
 No transitive dependency exists between non-key attributes and key attributes.
Example 3:In the above RESULT table Student# and Course# are the key attributes. All other attributes, except
grade are non-partially, non-transitively dependent on key attributes. The grade attribute is dependent on “Marks “,
and in turn “Marks” is dependent on #Student#Course. To bring the table in 3NF, we need to take off this transitive
dependency.

Student# Course# Marks

1001 M4 89
1002 M4 78
1001 H6 87
1003 C3 90
1004 B3 78
1002 P3 67
1005 P3 78
1003 B4 67
1005 H6 56
1004 M4 78

UpperBound LowerBound Grade

100 95 A+
94 90 A
89 85 B+
84 80 B
79 75 B-
74 70 C
69 65 C-
After normalizing tables to 3NF, we got rid of all the anomalies and inconsistencies. Now we can add new grade
systems, update the existing one and delete the unwanted ones. Hence the Third Normal form is the most optimal
normal form and 99% of the databases which require efficiency in
 INSERT, UPDATE and DELETE Operations are designed in this normal form
IV. Boyce-Codd Normal Form (BCNF or 3.5NF)
 The Boyce-Codd Normal form, also referred to as the "third and half (3.5) normal form", adds one more
requirement:
 Meet all the requirements of the third normal form.
 Every determinant must be a candidate key.
 Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. A row is in Boyce Codd normal form if
and only if every determinant is a candidate key.
 Most entities in 3NF are already in BCNF.
V. Fourth Normal Form (4NF)
 Fourth normal form (4NF) has one additional requirement:
 Meet all the requirements of the third normal form.
 A relation is in 4NF if it has no multi-valued dependencies.
 An entity is in Fourth Normal Form (4NF) if and only if it is in 3NF and has no multiple sets of multi-valued
dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship
within an entity if the one-to-many attributes are independent of each other.
VI. Fifth Normal Form (5NF)
 5NF specifies that every join dependency for the entity must be a consequence of its candidate keys.

Summary of normal forms


 A row is in first normal form if and only if all underlying domains contain atomic values only. 1NF eliminates
repeating groups by putting each into a separate table and connecting them with a one-to-many relationship.
 A row is in second normal form if and only if it is in first normal form and every non-key attribute is fully
dependent on the key. 2NF eliminates functional dependencies on a partial key by putting the fields in a
separate table from those that are dependent on the whole key. A row is in third normal form if and only if it
is in second normal form and every non-key attribute is non-transitively dependent on the primary key. 3NF
eliminates functional dependencies on non-key fields by putting them in a separate table.non-key fields are
dependent on the key.
 Functional Dependency
Functional dependency is a relationship between two sets of attributes in a database table. It describes the dependency
of one attribute (or a set of attributes) on another attribute (or a set of attributes). In other words, if changing the
value of one attribute determines the value of another attribute(s), then a functional dependency exists.
Example: In a table called "Employees," if the attribute "EmployeeID" determines the attribute "EmployeeName,"
it can be represented as: EmployeeID -> EmployeeName. This means that for every value of EmployeeID, there is
a unique value of EmployeeName associated with it.

 Full Functional Dependency


Full functional dependency occurs when an attribute is functionally dependent on the entire primary key of a table,
and not just a part of it. In other words, all non-key attributes depend on the entire primary key and not on any subset
of it.
Example: In a table called "Students," if the primary key is composed of "StudentID" and "CourseID," and the
attribute "Grade" depends on both of these attributes, it can be represented as: StudentID, CourseID -> Grade. This
means that for every combination of StudentID and CourseID, there is a unique value of Grade associated with it.

 Partial Dependency
Partial dependency occurs when an attribute is functionally dependent on only a part of the primary key, rather than
the entire primary key. It means that a non-key attribute depends on only a subset of the primary key, and not on the
entire primary key.
Example: In a table called "Orders," if the primary key is "OrderID," and the attributes "CustomerName" and
"CustomerAddress" depend on only the attribute "OrderID," it indicates a partial dependency. This can be
represented as: Ordered ->CustomerName, CustomerAddress. To remove the partial dependency, the table can be
split into two separate tables: "Orders" and "Customers," where the customer details are stored separately.
 Transitive Dependency
Transitive dependency occurs when an attribute depends on another non-key attribute, rather than directly depending
on the primary key. It means that the dependency is indirectly established through another attribute.
Example: In a table called "Employees," if the primary key is "EmployeeID," and the attributes "Department" and
"DepartmentLocation" depend on each other, it indicates a transitive dependency. This can be represented as:
EmployeeID -> Department -> DepartmentLocation. To remove the transitive dependency, the table can be split into
two separate tables: "Employees" and "Departments," where the department details are stored separately.

Identifying and eliminating partial and transitive dependencies are crucial in achieving higher levels of normalization
(such as 3NF or BCNF) to ensure data integrity, reduce redundancy, and avoid anomalies in a database.

2.3. Normalize business data and document results


Undertaking normalization of business data involves the process of organizing and structuring the data in a database
to eliminate redundancy, improve data integrity, and reduce anomalies. The goal is to ensure that each piece of data
is stored in the most efficient and logical manner.
The steps involved in normalization typically include:

1. Analyzing the data: The first step is to analyze the existing data in the database. This involves identifying the
various entities, attributes, and relationships between them.
2. Applying normalization rules: Next, the data is normalized by applying normalization rules, specifically the rules
outlined in normal forms, such as First Normal Form (1NF), Second Normal Form (2NF), and so on. Each
normalization form has specific criteria that need to be met.
3. Breaking down tables: In order to meet the criteria for normalization forms, it may be necessary to break down
existing tables into multiple tables, with each table focusing on a specific entity or relationship.
4. Resolving dependencies: During the normalization process, dependencies between attributes are identified and
resolved. This includes identifying and eliminating partial dependencies and transitive dependencies.
5. Documenting the results: Once the normalization process is complete, it is important to document the results.
This documentation includes the structure of the normalized tables, the relationships between them, and any
changes made to the original data model.
By undertaking normalization of business data and documenting the results, organizations can ensure that their
databases are efficiently structured, leading to improved data quality, easier data maintenance, and more effective
data operations.

2.4. Compare normalization results with ERdiagram


When comparing normalization results with an entity relationship diagram (ERD), it is important to understand the
relationship between the two and how they complement each other in the process of designing a database.
An entity relationship diagram is a visual representation of the entities (tables), attributes, and relationships between
them in a database. It helps in understanding the structure and organization of the data, as well as the dependencies
between different entities. An ERD typically includes entity boxes, attribute labels, and lines representing the
relationships between entities.
Normalization, on the other hand, is a set of rules and guidelines used to eliminate redundancy, improve data
integrity, and reduce anomalies in a database. It involves breaking down tables, resolving dependencies, and
organizing data into well-structured tables.
When comparing normalization results with an ERD, the main focus is on ensuring that the normalization process
aligns with the relationships and dependencies depicted in the ERD.
Suppose we have an ER diagram representing a library database. The diagram includes entities such as "Books,"
"Authors," and "Publishers," with relationships like "Author writes Book" and "Publisher publishes Book."
Now, let's say we apply normalization to this database. During the normalization process, we break down the initial
table into smaller, more atomic tables to eliminate redundancy and improve data integrity.
For instance, we might have initially had a single table with columns like "Book ID," "Book Title," "Author Name,"
and "Publisher Name." After normalization, we would have separate tables for "Books," "Authors," and "Publishers,"
each with their own unique identifiers and relevant attributes.
When comparing the normalization results with the ER diagram, we would examine if the relationships depicted in
the ER diagram are accurately represented in the normalized tables. We would ensure that the foreign keys in the
normalized tables correctly establish the relationships between entities.
In our example, we would check if the "Author ID" in the "Books" table references the corresponding "Author ID"
in the "Authors" table, and if the "Publisher ID" in the "Books" table references the appropriate "Publisher ID" in
the "Publishers" table. This comparison ensures that the normalization process has preserved the intended
relationships between entities.

Additionally, we would verify if the attributes in the normalized tables align with the attributes specified in the ER
diagram. We would ensure that no redundant data exists and that the data is properly organized according to
normalization rules.
By comparing the normalization results with the ER diagram, we can validate the accuracy and consistency of the
database design, ensuring that the normalized tables effectively capture the structure and relationships depicted in
the ER diagram.
2.5. Reconcile differences between data
Reconciling differences between data in normalization refers to the process of resolving conflicts or inconsistencies
that may arise during the normalization process. These conflicts can occur when attempting to organize and structure
data into normalized tables, especially when there are dependencies or relationships between attributes and entities.
Here are some key points to understand about reconciling differences between data in normalization
1. Identify Inconsistencies: The first step is to identify any inconsistencies or conflicts within the data. This may
involve analyzing the relationships, dependencies, and functional dependencies between attributes and entities.
2. Analyze Dependencies: Evaluate the dependencies between attributes and entities to determine if they are
accurately represented. This includes identifying partial dependencies (where an attribute depends on only a part
of the primary key) or transitive dependencies (where an attribute depends on another non-key attribute).
3. Normalize Data: Apply normalization rules, such as First Normal Form (1NF), Second Normal Form (2NF), and
so on, to organize the data into well-structured tables. This may involve breaking down tables, creating separate
tables for related entities, and defining appropriate primary and foreign keys.
4. Resolve Conflicts: Address any conflicts or inconsistencies that arise during the normalization process. This may
involve making decisions on how to handle partial dependencies or transitive dependencies. One approach is to
split tables and create additional tables to ensure that data is properly organized and dependencies are accurately
represented.
5. Ensure Data Integrity: As you reconcile differences, it is crucial to maintain data integrity. This means that the
data in the normalized tables should accurately represent the relationships and dependencies between entities. It
also involves ensuring that there are no duplicate or redundant data.
6. Validate Results: Validate the results of the normalization process to ensure that the reconciled data aligns with
the intended structure and relationships. This can be done by comparing the normalized tables with the original
data, verifying that the relationships and dependencies are accurately represented.
Reconciling differences between data in normalization is an essential step in achieving a well-structured and efficient
database design. It helps eliminate redundancy, reduce anomalies, and improve data integrity.

You might also like