INTRODUCTION
A database is an organized collection of structured information, or data, typically stored
electronically in a computer system. A database is usually controlled by a database management
system (DBMS). Together, the data and the DBMS, along with the applications that are
associated with them, are referred to as a database system, often shortened to just database.
Data within the most common types of databases in operation today is typically modeled in rows
and columns in a series of tables to make processing and data querying efficient. The data can
then be easily accessed, managed, modified, updated, controlled, and organized. Most databases
use structured query language (SQL) for writing and querying data.
NORMALISATION:
Normalization is a fundamental concept in the fields of databases, statistics, and computer
science, playing a critical role in organizing and structuring data to meet specific standards. Its
primary objective is to enhance the accuracy, efficiency, and reliability of data-driven processes
by minimizing redundancy, dependency, and anomalies within datasets.In the context of
databases, normalization was introduced by Edgar F. Codd in the 1970s as a set of principles to
guide the design of relational databases. The central idea is to organize data in a way that
eliminates redundancy and ensures that information is stored without unnecessary duplication.
This is achieved through a systematic process of breaking down complex data structures into
simpler, more manageable forms.The normalization process is typically described through a
series of normal forms, with each form building upon the previous one. The First Normal Form
(1NF) requires that each table cell contains only atomic values, eliminating repeating groups and
ensuring basic organization. The Second Normal Form (2NF) addresses partial dependencies by
ensuring that non-prime attributes are fully functionally dependent on the primary key. The Third
Normal Form (3NF) further refines the structure by eliminating transitive dependencies.
Normalization in databases is crucial for maintaining data integrity. Anomalies, such as
insertion, update, and deletion anomalies, can compromise the consistency and reliability of the
data. Insertion anomalies occur when certain attributes cannot be added to the database without
the presence of other unrelated attributes. Update anomalies arise when modifications to data
result in inconsistencies, and deletion anomalies occur when removing data leads to
unintentional loss of related information. By adhering to normalization principles, these
anomalies are mitigated, ensuring the overall consistency of the database.
 In computer science, normalization extends beyond relational databases to encompass
various types of databases, including NoSQL databases. The principles of normalization are
applied to ensure the consistency and integrity of data storage. While the specifics of
normalization may vary based on the type of database, the core principles remain consistent.In
conclusion, normalization is a foundational concept with broad applications in databases,
statistics, and computer science. Whether it involves organizing relational databases to eliminate
redundancy and anomalies or scaling data for statistical analyses, normalization is a fundamental
principle that enhances the efficiency, accuracy, and reliability of data-driven processes. A clear
understanding of normalization and its application is essential for professionals working with
data in diverse domains.
DATABASE NORMALIZATION:
Database normalization is the process of organizing data into tables in such a way that the results
of using the database are always unambiguous and as intended. Such normalization is intrinsic to
relational database theory. It may have the effect of duplicating data within the database and
often results in the creation of additional tables.
The concept of database normalization is generally traced back to E.F. Codd, an IBM researcher
who, in 1970, published a paper describing the relational database model. What Codd described
as "a normal form for database relations" was an essential element of the relational technique.
Such data normalization found a ready audience in the 1970s and 1980s -- a time when disk
drives were quite expensive and a highly efficient means for data storage was very necessary.
Since that time, other techniques, including denormalization, have also found favor.
NEED OF NORAMLIZATION:
1.Minimizing Data Redundancy:
Normalization helps in reducing the duplication of data by organizing it efficiently. When
data is duplicated across multiple records, it can lead to inconsistencies and make the database
more challenging to maintain.
2.Avoiding Update Anomalies:
 Update anomalies occur when changes to data in one part of the database do not get
reflected in other related parts. By normalizing the database, these anomalies can be minimized
or eliminated, ensuring that updates are consistently applied.
3.Enhancing Data Integrity:
Data integrity refers to the accuracy and consistency of data. Normalization helps in
improving data integrity by eliminating data redundancies and ensuring that each piece of
information is stored in one place, reducing the risk of conflicting data.
4.Simplifying Database Maintenance:
With a normalized database, maintenance tasks such as adding, deleting, or modifying
records become more straightforward. Changes can be made in one place without affecting other
parts of the database.
5.Improving Query Performance:
Normalized databases are often more efficient in terms of query performance. Smaller,
well-organized tables allow for faster retrieval of specific information, contributing to better
overall system performance.
6.Supporting Relationships:
Normalization enables the creation of relationships between tables, which is fundamental
for maintaining referential integrity. This supports complex queries and ensures that data is
logically connected.
7.Adapting to Evolving Requirements:
Normalized databases are more adaptable to changes in requirements. When the structure
of the data needs to be modified or expanded, a normalized database allows for easier
modifications without affecting the entire system.
8.Facilitating Indexing:
Normalized databases can be more easily indexed, leading to improved search performance.
Indexing is crucial for speeding up data retrieval operations.
9.Reducing Storage Space:
Since normalization eliminates redundant data, it contributes to more efficient use of
storage space. This can be important in situations where storage is a limiting factor.
10.Simplifying Database Design:
Normalization provides guidelines for systematic database design. It helps database
designers create well-structured and maintainable databases by breaking down data into
manageable components.
TYPES OF NORAMLIZATION:
1)First Normal Form:
First normal form (1NF) is a property of a relation in a relational database. A relation is in first
normal form if and only if no attribute domain has relations as elements.[1] Or more informally,
that no table column can have tables as values. Database normalization is the process of
representing a database in terms of relations in standard normal forms, where first normal is a
minimal requirement. SQL-92 does not support creating or using table-valued columns, which
means that using only the "traditional relational database features" (excluding extensions even if
they were later standardized) most relational databases will be in first normal form by necessity.
Database systems which do not require first normal form are often called NoSQL systems.
Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which
include composite types.
To achieve 1NF, each column in a table must have a distinct name, and the order of
the columns should not affect the data retrieval process. This ensures that the column are
identifiable and can be referenced independently, contributing to the simplicity and
consistency of the database structure. One of the primary goals of 1NF is to eliminate to
use of complex data types, such as arrays or nested tables, within the individual cells.
Each cell should contain atomic data, meaning that it should represents single, indivisible
value. This restriction prevents the need for complex parsing and manipulation of data
making it easier to query and maintain.
                                     Table 1.1:Donor Details
  D_id           D_name         Gender          Age         D_regdate         City       D_bloodgrp
     101            Jay          Male            30         11-2-2023      Kolhapur           O+
     102            Lilly       Female           35         13-2-2023        pune             AB+
     103            Om           Male            25         14-2-2023       Sangali           A-
Table 1.1 does not have any atomic values in ‘details’ column .Hence, it is called un-normalized
table. Inserting, updating and deletion would be a problem in such table. Hence ,it has to be
normalized. For the table 1.1 to be in the first form ,each row should have automic values.Hence,
let us reconstruct the data in the table. A ’Sr.No’ cloumn is included in the table to uniquely
identify each row.
 Sr.No       D_id           D_name    Gender       Age         D_regdate    City        D_bloodgrp
2)Second Normal Form:
Second Normal Form (2NF) is based on the concept of full functional dependency. Second Normal Form
applies to relations with composite keys, that is, relations with a primary key composed of two or more
attributes. A relation with a single-attribute primary key is automatically in at least 2NF. A relation that is
not in 2NF may suffer from the update anomalies. To be in second normal form, a relation must be in first
normal form and relation must not contain any partial dependency. A relation is in 2NF if it has No
Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is
dependent on any proper subset of any candidate key of the table. In other words,
A relation that is in First Normal Form and every non-primary-key attribute is fully functionally
dependent on the primary key, then the relation is in Second Normal Form (2NF).
3) Third Normal Form:
An entity is in the third normal form if it is in the second normal form and all of its attributes are not
transitively dependent on the primary key. Transitive dependence means that descriptor key attributes
depend not only on the whole primary key, but also on other descriptor key attributes that, in turn, depend
on the primary key. In SQL terms, the third normal form means that no column within a table is
dependent on a descriptor column that, in turn, depends on the primary key.
To convert to third normal form, remove attributes that depend on other descriptor key attributes.
Third normal form (3NF) is a crucial concept in the field of database normalization, a
process designed to organize relational tables efficiently. It builds upon the foundation laid
by the first and second normal forms (1NF and 2NF) and addresses a specific types of data
redundancy issues.
 3NF builds on the principles of 1NF and 3NF and addresses the issues of transitive
dependency. A table is in 3NF if it is in 2NF and no transitive dependencies exist-that is,
no non-prime attribute is dependent on another non-prime attribute.
 In conclusion, Third Normal Form (3NF) is a critical step in the normalization
process, aiming to eliminate transitive dependencies and enhance the efficiency and
integrity of a relational database. It involves careful analysis, decomposition and
restructuring of tables to ensure that data is stored in a logically organized and non-
redundant manner. While achieving 3NF brings significant benefits, it's essential to strike
a balance between normalization and practical consideration to maintain optimal database
performance.