L7 : Fundamental Database Concepts
By Francis Ganya - (+265) 882370345 / ganyaf@gmail.com
A database is a structured collection of data that is organized in a way to
efficiently store and retrieve information.
One of the most important attributes for high quality data is accuracy. To
ensure accuracy database management systems utilize a number of
mechanisms to ensure data integrity.
Database integrity refers to the accuracy, consistency, and reliability of the
data stored in a database. It ensures that the data remains valid and
trustworthy throughout its lifecycle, from creation to deletion. Maintaining
database integrity is essential to ensure that the data reflects the real-world
entities and relationships it is supposed to represent accurately.
There are several aspects to database integrity and in this lesson, we are
going to discuss a few of them.
Normalization
In a relational database, data in one table may refer to data in another table.
Data must be efficiently organized to reduce redundancy and maintain data
integrity. The technique used to achieve this is referred to as normalization. It
involves breaking down a large database table into multiple smaller tables with
well-defined relationships to ensure data consistency, eliminate data
anomalies, and simplify data retrieval and manipulation.
Anomalies may be insertion anomalies, update anomalies and deletion
anomalies.
Insertion anomalies occur when there are difficulties inserting new data into
the database due to missing required fields or incomplete information.
Update anomalies arise when modifying data in the database leads to
inconsistencies and unintentional changes in related data.
Deletion anomalies occur when removing data from the database affects other
data unintentionally, leading to partial loss of information.
The process of normalization is typically divided into different levels, known as
normal forms. The higher the normal form, the more normalized the database
is.
Some commonly available forms are:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Database Integrity and Keys
Entity Integrity
Entity integrity ensures that each row or record in a database table represents
a unique entity. It is usually enforced by defining a primary key for each table,
which uniquely identifies each record. The primary key constraint prevents
duplicate or null values in the primary key column, ensuring that every record
is uniquely identifiable.
In a relational database there is always a need to uniquely identify records. To
achieve this we use the keys. keys are attributes or columns that play a
fundamental role in organizing and identifying the data stored within the
database tables. They help establish relationships between different tables
and ensure the integrity and efficiency of the data.
There are several types of keys:
1. Primary Key :
A primary key is a unique identifier for each record in a table.It ensures that
each row in the table is uniquely identifiable.The primary key is used to enforce
entity integrity, preventing duplicate or null values in the primary key column(s).
In most databases, this is implemented as an auto-incrementing integers to
guarantee uniqueness and simplify data entry.
2. Foreign Key:
A foreign key is an attribute in one table that refers to the primary key in
another table. It establishes a relationship between the two tables, allowing
them to be linked based on common values. Foreign keys are used to enforce
referential integrity, ensuring that related data remains consistent across
tables.They help maintain data integrity and consistency when performing data
manipulations and updates involving multiple tables.
3. Unique Key:
A unique key is similar to a primary key, as it enforces uniqueness on a column
or a set of columns.Unlike the primary key, a table can have multiple unique
keys.Unique keys are used to identify records uniquely but are not typically
used as the main identifier for the table.
4. Candidate Key:
A unique key is similar to a primary key, as it enforces uniqueness on a column
or a set of columns. Unlike the primary key, a table can have multiple unique
keys. Unique keys are used to identify records uniquely but are not typically
used as the main identifier for the table.
5. Composite Key
A composite key is a primary key that consists of two or more columns.
It is used when a single column is not sufficient to uniquely identify a record,
but the combination of multiple columns can do so
6. Super Key:
A super key is a set of one or more attributes that can uniquely identify a
record.It may contain more attributes than the minimum required for
uniqueness, making it a superset of a candidate key.
7. Surrogate Key:
A surrogate key is an artificially generated key, often an auto-incrementing
integer, that serves as the primary key. It is used when there is no natural
primary key or when using long, complex, or frequently changing attributes as
the primary key is not practical
Referential Integrity
In a relational database, different tables are often related to each other through
common attributes or columns. For example, consider two tables: "Customers"
and "Orders." The "Orders" table might have a column named "CustomerID,"
which relates each order to a specific customer in the "Customers" table.
Referential integrity ensures that relationships between different tables are
maintained correctly. It is enforced through foreign key constraints, which
establish relationships between tables. A foreign key in one table refers to the
primary key in another table, ensuring that related data remains consistent and
accurate.
Referential integrity ensures that relationships between tables remain
consistent and valid throughout the database's lifecycle. It prevents actions
that could compromise data consistency, such as deleting a record from one
table without updating or removing related records in other tables.
Referential integrity is enforced through constraints defined in the database
schema. The most common constraint used for this purpose is the "FOREIGN
KEY" constraint, which ensures that values in the foreign key column of a table
must match values in the primary key column of the referenced table.
When specifying a relationship you may define actions on related records
when the refered to record is deleted, or updated. Common options include:
CASCADE: If a record in the referenced table is deleted or updated, the
corresponding records in the referencing table are automatically deleted or
updated.
RESTRICT: Prevents actions that would violate referential integrity, such
as deleting a record with related records in other tables.
SET NULL: Sets the foreign key values in the referencing table to NULL if
the referenced record is deleted.
SET DEFAULT: Sets the foreign key values to a default value if the
referenced record is deleted.
Domain Integrity
Domain integrity defines the valid range of values that can be stored in each
column of a table. It is maintained by defining appropriate data types and
constraints for each column. For example, a date column should only allow
valid date values, and a numeric column should only allow numbers within a
specified range.
In additional to data type and range constraints, Domain integrity may include
mandatory constraints, format constraints and unique constraints.
Table Relationships & MS Access
In Microsoft Access, relationships refer to the connections established
between different tables within a database. These relationships define how the
data in one table is related to the data in another table. By defining
relationships, you can ensure data integrity, prevent anomalies, and enable
efficient querying and reporting in the database.
Access provides a user-friendly interface for creating and managing
relationships using a visual designer
Types of Relationships
One-to-One (1:1):
One record in Table A relates to one record in Table B, and vice versa.
One-to-Many (1:N):
One record in Table A can relate to multiple records in Table B, but one record
in Table B can relate to only one record in Table A.
Many-to-One (N:1):
Multiple records in Table A can relate to one record in Table B, but each record
in Table B can relate to only one record in Table A.
Many-to-Many (N:N):
Multiple records in Table A can relate to multiple records in Table B, and vice
versa. Achieved through a junction table that contains the primary keys of both
tables as foreign keys.