LECTURE WEEK SIX
Data Integrity and Constraints
- Importance of data integrity
- Types of constraints: Domain, Entity, Referential
- Conceptual implementation of constraints
Data Integrity
Data integrity refers to the consistency, accuracy, and reliability of the data stored in any
database or a warehouse. The data with a complete structure having all characteristics accurate is
said to be data with integrity.
There are many aspects to the integrity of data like physical integrity, which involves storing
and collecting the data authentically, then comes the logical integrity, which includes checking
whether the information is relevant and accurate in the specific context or not so overall all the
rules which are required to maintain the quality of data comes under data integrity. Without
integrity and accuracy, all your collected data is useless to the company, so it’s essential to
ensure data protection and its accuracy to increase its performance and stability. Corrupted data
can also damage your business sometimes.
It’s effortless to alter data because data is not static. The information you receive comes from
various places, and many things can change your data from the site it has been created. In
addition, it can be transferred to other devices, altered, and updated by whomever and whenever
required.
And ensuring data integrity is not just a one-step process. It has to be checked and confirmed at
every step, starting from the model’s design to its final output.
Risks Involved in Data Integrity
The risks involved in data integrity are;
● Human error − It occurs when humans enter information incorrectly,
duplicate or delete the data, don’t follow the rules mentioned, or make
mistakes while writing.
● Transfer errors − When a piece of data is present in the destination but
not in the source table of a database used while transferring.
● Bugs and viruses − Spyware, malware, and viruses are software that can
invade your system, and alter, delete, or steal its data.
● Compromised hardware − Sudden computer or server crashes, hampering
the system’s functions are examples that your hardware is
compromised which may then render data incorrectly, limit or
sometimes even eliminate access to data, or make it even hard to use.
How to Minimize the Risks of Data Integrity?
These all risks can be minimized using the following;
● Limiting its access and restricting the changes by unauthorized parties in the data.
● Double verification of data whenever it’s sent and received.
● Backing up your data every day.
● Using records to keep track whenever information is altered.
● Conducting regular internal audits to maintain its accuracy.
● Using error detection software while transferring.
Importance of Data Integrity
Accuracy and Reliability of Information: Data integrity ensures that information remains
accurate and reliable throughout its lifecycle. Inaccurate or corrupted data can lead to flawed
analysis, poor decision-making, and operational inefficiencies. For instance, in healthcare,
incorrect patient records can result in inappropriate treatment plans, jeopardizing patient safety.
Maintaining accuracy allows organizations to make sound, evidence-based decisions.
Compliance with Legal and Regulatory Standards: Many industries are governed by strict
regulations, such as GDPR in Europe, HIPAA in healthcare, and SOX in finance, which require
organizations to safeguard data integrity. Compliance ensures that sensitive information is
protected from corruption, unauthorized access, or tampering. Violating these regulations can
result in severe financial penalties, legal consequences, and reputational harm.
Trust and Credibility: Maintaining data integrity fosters trust and confidence among
stakeholders, customers, and end-users. When data is consistent and reliable, it reinforces an
organization's credibility. For example, businesses that handle financial data or personal
information must ensure its integrity to gain and retain customer trust, which is crucial for long-
term success.
Operational Efficiency: Organizations rely on accurate data for smooth operations and efficient
workflows. When data integrity is compromised, it can lead to delays, rework, and system
failures. For instance, in supply chain management, inconsistent data about inventory levels can
disrupt production schedules and lead to revenue loss. Data integrity ensures that operations run
seamlessly and predictably.
Prevention of Fraud and Data Corruption: Data integrity safeguards systems against
fraudulent activities and data corruption, whether accidental or malicious. Techniques like
encryption, access controls, and checksums help detect and prevent unauthorized modifications.
In financial systems, maintaining the integrity of transaction records is critical to ensuring
transparency and preventing fraud, thereby protecting the organization's reputation and assets.
Constraints in DBMS
In DBMS (Database Management Systems), constraints are guidelines or limitations imposed on
database tables to maintain the integrity, correctness, and consistency of the data. Constraints can
be used to enforce data linkages across tables, verify that data is unique, and stop the insertion of
erroneous data. A database needs constraints to be reliable and of high quality.
What are the Constraints of DBMS?
In DBMS, constraints refer to limitations placed on data or data processes. This indicates that
only a particular type of data may be entered into the database or that only a particular sort of
operation can be performed on the data inside.
Constraints thereby guarantee data accuracy in a database management system (DBMS).
The following can be guaranteed via constraints.
Data Accuracy − Data accuracy is guaranteed by constraints, which make sure
that only true data is entered into a database. For example, a limitation may
stop a user from entering a negative value into a field that only accepts
positive numbers.
Data Consistency − The consistency of data in a database can be upheld by
using constraints. These constraints are able to ensure that the primary key
value in one table is followed by the foreign key value in another table.
Data integrity − The accuracy and completeness of the data in a database are
ensured by constraints. For example, a constraint can stop a user from
putting a null value into a field that requires one.
Types of Constraints in DBMS
● Domain Constraints
● Key Constraints
● Entity Integrity Constraints
● Referential Integrity Constraints
● Tuple Uniqueness Constraints
Domain Constraints
In a database table, domain constraints are guidelines that specify the
acceptable values for a certain property or field. These restrictions guarantee
data consistency and aid in preventing the entry of inaccurate or
inconsistent data into the database. The following are some instances of
domain restrictions in a DBMS −
● Data type constraints − These limitations define the kinds of data that can
be kept in a column. A column created as VARCHAR can take string
values, but a column specified as INTEGER can only accept integer
values.
● Length Constraints − These limitations define the largest amount of data
that may be put in a column. For instance, a column with the definition
VARCHAR(10) may only take strings that are up to 10 characters long.
● Range constraints − The allowed range of values for a column is specified
by range restrictions. A column designated as DECIMAL(5,2), for
example, may only take decimal values up to 5 digits long, including 2
decimal places.
● Nullability constraints − Constraints on a column's capacity to accept
NULL values are known as nullability constraints. For instance, a
column that has the NOT NULL definition cannot take NULL values.
● Unique constraints − Constraints that require the presence of unique
values in a column or group of columns are known as unique
constraints. For instance, duplicate values are not allowed in a column
with the UNIQUE definition.
● Check constraints − Constraints for checking data: These constraints
outline a requirement that must hold for any data placed into the
column. For instance, a column with the definition CHECK (age > 0)
can only accept ages that are greater than zero.
● Default constraints − Constraints by default: Default constraints
automatically assign a value to a column in case no value is provided.
For example, a column with a DEFAULT value of 0 will have 0 as its
value if no other value is specified.
Key Constraints
Key constraints are regulations that a DBMS uses to ensure data accuracy and consistency in a
database. They define how the values in a table's one or more columns are related to the values in
other tables, making sure that the data remains correct.
In DBMS, there are several key constraint kinds, including −
● Primary Key Constraint − A primary key constraint is an individual identifier for
each record in a database. It guarantees that each database entry contains a single, distinct
value—or a pair of values—that cannot be null—as its method of identification.
● Foreign Key Constraint − Reference to the primary key in another table is
a foreign key constraint. It ensures that the values of a column or set of columns in one
table correspond to the primary key column(s) in another table.
● Unique Constraint − In a database, a unique constraint ensures that no two
values inside a column or collection of columns are the same.
Entity Integrity Constraints
A database management system uses entity integrity constraints (EICs) to enforce rules that
guarantee a table's primary key is unique and not null. The consistency and integrity of the data
in a database are maintained by EICs, which are created to stop the formation of duplicate or
incomplete entries.
Each item in a table in a relational database is uniquely identified by one or more fields known as
the primary key. EICs make a guarantee that every row's primary key value is distinct and not
null. Take the "Employees" table, for instance, which has the columns "EmployeeID" and
"Name." The table's primary key is the EmployeeID column. An EIC on this table would make
sure that each row's unique EmployeeID value is there and that it is not null.
If you try to insert an entry with a duplicate or null EmployeeID, the database management
system will reject the insertion and produce an error. This guarantees that the information in the
table is correct and consistent.
EICs are a crucial component of database architecture and assist guarantee the accuracy and
dependability of the data contained in a database.
Referential Integrity Constraints
A database management system will apply referential integrity constraints (RICs) in order to
preserve the consistency and integrity of connections between tables. By preventing links
between entries that don't exist from being created or by removing records that have related
records in other tables, RICs guarantee that the data in a database is always consistent.
By the use of foreign keys, linkages between tables are created in relational databases. A column
or collection of columns in one table that is used as a foreign key to access the primary key of
another table. RICs make sure there are no referential errors and that these relationships are
legitimate.
Consider the "Orders" and "Customers" tables as an illustration. The primary key column in the
"Customers" database corresponds to the foreign key field "CustomerID" in the "Orders" dataset.
A RIC on this connection requires that each value in the "CustomerID" column of the "Orders"
database exist in the "Customers" table's primary key column.
If an attempt was made to insert a record into the "Orders" table with a non-existent
"CustomerID" value, the database management system would reject the insertion and notify the
user of an error.
Similar to this, the database management system would either prohibit the deletion or cascade
the deletion in order to ensure referential integrity if a record in the "Customers" table was
removed and linked entries in the "Orders" table.
In general, RICs are a crucial component of database architecture and assist guarantee that the
information contained in a database is correct and consistent throughout time.
Tuple Uniqueness Constraints
A database management system uses constraints called tuple uniqueness constraints (TUCs) to
make sure that every entry or tuple in a table is distinct. TUCs impose uniqueness on the whole
row or tuple, in contrast to Entity Integrity Constraints (EICs), which only enforce uniqueness on
certain columns or groups of columns.
TUCs, then, make sure that no two rows in a table have the same values for every column. Even
if the individual column values are not unique, this can be helpful in cases when it is vital to
avoid the production of duplicate entries.
Consider the "Sales" table, for instance, which has the columns "TransactionID," "Date,"
"CustomerID," and "Amount." Even if individual column values could be duplicated, a TUC on
this table would make sure that no two rows have the same values in all four columns.
The database management system would reject the insertion and generate an error if an attempt
was made to enter a row with identical values in each of the four columns as an existing entry.
This guarantees the uniqueness and accuracy of the data in the table.
TUCs may be a helpful tool for ensuring data correctness and consistency overall, especially
when it's vital to avoid the generation of duplicate entries.
Conceptual implementation of constraints: Use the link below to access detailed material for
practical example and implementation on this point.
https://www.tutorialspoint.com/sql/sql-constraints.htm