DATABASE 1 Part1
DATABASE 1 Part1
What is Data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms
like text, numbers, media, bytes, etc. it can be stored in pieces of paper or electronic memory,
etc.
Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is
plural of the word datum.
In computing, Data is information that can be translated into a form for efficient movement and
processing. Data is interchangeable.
What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed.
You can organize data into tables, rows, columns, and index it to make it easier to find relevant
information.
Database handlers create a database in such a way that only one set of software program
provides access of data to all the users.
The main purpose of the database is to operate a large amount of information by storing,
retrieving, and managing data.
There are many dynamic websites on the World Wide Web nowadays which are handled
through databases. For example, a model that checks the availability of rooms in a hotel. It is an
example of a dynamic website that uses a database.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix,
PostgreSQL, SQL Server, etc.
Modern databases are managed by the database management system (DBMS).
You can organize data into tables, rows, columns, and index it to make it easier to find relevant
information.
Database handlers create a database in such a way that only one set of software program
provides access of data to all the users.
The main purpose of the database is to operate a large amount of information by storing,
retrieving, and managing data.
There are many dynamic websites on the World Wide Web nowadays which are handled
through databases. For example, a model that checks the availability of rooms in a hotel. It is an
example of a dynamic website that uses a database.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix,
PostgreSQL, SQL Server, etc.
Modern databases are managed by the database management system (DBMS).
These terms are all important concepts in database management systems (DBMS) and data
manipulation. Here's a breakdown of each:
1. Atomicity:
Imagine a transaction as a series of steps that modify data in a database. Atomicity guarantees
that either all the steps in a transaction are completed successfully, or none of them are. It's like
a single unit of work. If any step fails, the entire transaction is rolled back, ensuring the database
remains in a consistent state.
2. Consistency:
This refers to maintaining data integrity within the database. A transaction must take the
database from one valid state (adhering to predefined rules) to another valid state. This involves
enforcing constraints like primary and foreign keys, data types, and other rules that ensure data
accuracy and reliability.
3. Integrity:
Data integrity refers to the accuracy and completeness of data in a database. It encompasses
various aspects like:
• Validity: Data adheres to the defined data types and formats.
• Accuracy: Data reflects the real world it represents.
• Consistency: Data adheres to the defined rules and constraints within the database.
4. Durability:
This ensures that once a transaction is committed (marked as successful), the changes are
permanently stored and won't be lost even in case of system crashes, power failures, or other
disruptions. The database guarantees that the committed data persists.
5. Concurrency:
Concurrency deals with how a DBMS handles multiple transactions happening simultaneously. It
ensures that these concurrent transactions don't interfere with each other and lead to data
inconsistencies. Mechanisms like locking and optimistic concurrency control are used to achieve
this.
6. Query Processing:
This refers to the process of interpreting and executing user queries on a database. The DBMS
analyzes the query, retrieves the relevant data from storage, and presents the results to the user.
Query processing involves various steps like parsing, optimization, and execution to efficiently
retrieve the desired data.
DBMS (Data Base Management System)
Database management System is software which is used to store and retrieve the database. For
example, Oracle, MySQL, etc.; these are some popular DBMS tools.
o DBMS provides the interface to perform the various operations like creation, deletion,
modification, etc.
o DBMS allows the user to create their databases as per their requirement.
o DBMS accepts the request from the application and provides specific data through the
operating system.
o DBMS contains the group of programs which acts according to the user instruction.
o It provides security to the database.
Advantage of DBMS
Controls redundancy
It stores all the data in a single database file, so it can control data redundancy.
Data sharing
An authorized user can share the data among multiple users.
Backup
It providesBackup and recovery subsystem. This recovery system creates automatic data from
system failure and restores data if required.
Multiple user interfaces
ADVERTISEMENT
It provides a different type of user interfaces like GUI, application interfaces.
Disadvantage of DBMS
Size
It occupies large disk space and large memory to run efficiently.
Cost
DBMS requires a high-speed data processor and larger memory to run DBMS software, so it is
costly.
Complexity
DBMS creates additional complexity and requirements.
RDBMS (Relational Database Management System)
The word RDBMS is termed as 'Relational Database Management System.' It is represented as a
table that contains rows and column.
RDBMS is based on the Relational model; it was introduced by E. F. Codd.
A relational database contains the following components:
o Table
o Record/ Tuple
o Field/Column name /Attribute
o Instance
o Schema
o Keys
These terms are all fundamental concepts in relational databases and data organization:
1. Table:
A table is a core structure in a relational database. It resembles a spreadsheet with rows and
columns and holds data about a specific subject. Each table represents a particular entity, like
customers, products, or orders, and stores related information about that entity.
2. Record / Tuple:
A record, also called a tuple, represents a single row in a table. It contains a set of values for each
column of the table. Think of it as a single entry for a specific instance within the broader
category represented by the table.
These terms refer to the same concept - a vertical section in a table that holds specific data about
a particular aspect of the entity represented by the table. For example, a "Customers" table might
have columns for "Customer ID," "Name," "Email," and "Phone Number." Each column has a
distinct name that defines the type of data it holds.
4. Instance:
An instance refers to a specific record or tuple within a table. It's a single data point representing
a particular entity described by the table structure. For instance, a customer record with the ID
"1001," name "John Doe," email "[email address removed]," and phone number "123-456-7890"
would be an instance in a "Customers" table.
5. Schema:
The schema defines the overall structure of a table. It specifies the names and data types (e.g.,
text, number, date) of each column, as well as any constraints or rules governing the data within
those columns. Think of it as the blueprint that defines how data is organized within a table.
6. Keys:
Keys are special columns or sets of columns within a table that uniquely identify records. They
play a crucial role in data integrity and retrieval. There are different types of keys:
• Primary Key: A table can only have one primary key, which is a column (or a
combination of columns) that uniquely identifies each record in the table. No two records
can have the same value for the primary key.
• Candidate Key: Any column or set of columns that can uniquely identify records in a
table is considered a candidate key. A table can have multiple candidate keys, but only
one is designated as the primary key.
• Alternate Key: Any other key that uniquely identifies a set of records in a table, but is
not chosen as the primary key, is called an alternate key.
• Foreign Key: A foreign key is a column (or set of columns) in one table that references
the primary key of another table. It establishes a link between related tables, allowing
data integrity and efficient retrieval of related information across tables.
Sometimes, it is done on purpose for recovery or backup of data, faster access of data, or updating
data easily. Redundant data costs extra money, demands higher storage capacity, and requires extra
effort to keep all the files up to date.
Sometimes, unintentional duplicity of data causes a problem for the database to work properly, or
it may become harder for the end user to access data. Redundant data unnecessarily occupy space
in the database to save identical copies, which leads to space constraints, which is one of the major
problems.
Let us understand redundancy in DBMS properly with the help of an example.
In the above example, there is a "Student" table that contains data such as "Student_id", "Name",
"Course", "Session", "Fee", and "Department". As you can see, some data is repeated in the table,
which causes redundancy.
We will understand these anomalies with the help of the following student table:
1. Insertion Anomaly:
Insertion anomaly arises when you are trying to insert some data into the database, but you are not
able to insert it.
Example: If you want to add the details of the student in the above table, then you must know the
details of the department; otherwise, you will not be able to add the details because student details
are dependent on department details.
2. Deletion Anomaly:
Deletion anomaly arises when you delete some data from the database, but some unrelated data is
also deleted; that is, there will be a loss of data due to deletion anomaly.
Example: If we want to delete the student detail, which has student_id 2, we will also lose the
unrelated data, i.e., department_id 102, from the above table.
3. Updating Anomaly:
An update anomaly arises when you update some data in the database, but the data is partially
updated, which causes data inconsistency.
Example: If we want to update the details of dept_head from Jaspreet Kaur to Ankit Goyal for
Dept_id 104, then we have to update it everywhere else; otherwise, the data will get partially
updated, which causes data inconsistency.
Consistency in DBMS
Introduction
Consistency in database systems refers to the need that any given database transaction only
change affected data in allowed ways.Data written to a database must be legitimate according to
all stated rules, including constraints, cascades, triggers, or any combination, for the database to
be consistent.
Consistency also implies that any changes made to a single object in one table must be mirrored
in all other tables in which that object appears. Continuing with the driver's license example, if
the new driver's home address changes, the change must be shown in all tables where the
previous address previously existed. Data inconsistency occurs when one table has the old
address, and the others have the updated address.
Consistency does not ensure transactional correctness in all ways that an application programmer
may expect (that is the responsibility of application-level code). Instead, consistency ensures that
programming errors do not violate established database constraints.
Importance of Consistency in DBMS
Consistent data is what keeps a database running like clockwork. Established rules/values keep
inconsistent data out of primary databases and replicas, allowing its processes to run smoothly.
ADVERTISEMENT
o Accuracy
o Increased database space
o Faster and more efficient data retrieval
Database consistency governs every data that enters. So, while the database changes when new
data is added, it does so consistently and in accordance with the validation rules that were
specified at the start. In today's environment, billion-dollar judgments are made every day all
around the world based on the apparent consistency of a database.
When real-time information becomes the new norm for modern-day digital organizations, it's
vital that validation methods are put in place to keep datasets free of erroneous data, as this adds
delay and makes real-time experiences less real.
Strong Consistency vs Weak Consistency
o Strong consistency implies that all data in a primary replica and all relevant nodes
conform to the validation rules and are the same at all times. With robust database
consistency, no matter which client accesses the data, they will always see the most
recently updated data that adheres to the database's standards.
o Weak consistency is reminiscent of the untamed Wild West. There is no guarantee that
the data in your primary, replica or node will always be the same. One client in India may
gain access to the data and view information that matches the validation rules but is not
the most recently updated data, causing consistency difficulties. They may be acting on
information that is no longer relevant, despite the fact that it was formerly.
What's the Distinction between ACID and BASE Database Consistency?
Relational databases that provide strong consistency provide 'ACID guarantees' in general. ACID
is an abbreviation that stands for the basic characteristics of a highly consistent database.
Relational databases that provide strong consistency provide 'ACID guarantees' in general. ACID
is an abbreviation that stands for the basic characteristics of a highly consistent database.
o Atomicity: If any component of the transaction fails, the complete transaction is reverted.
o Consistency: With each transaction, the database's structural integrity is maintained.
o Isolation: Each transaction is distinct from the others.
o Durability: All transaction outcomes are saved indefinitely.
ACID compliance is a complicated and much-discussed topic. In essence, it denotes the
straightforward promise that a READ will deliver the outcome of the most recent successful
WRITE. While this may appear to be a straightforward assurance, it is quite difficult to
implement in a globally distributed database structure with numerous clusters, each having many
nodes.
As a result, ACID-compliant databases are typically prohibitively expensive and difficult to
scale.
ACID Properties in DBMS
DBMS is the management of data that should remain integrated when any changes are done in it.
It is because if the integrity of the data is affected, whole data will get disturbed and corrupted.
Therefore, to maintain the integrity of the data, there are four properties described in the database
management system, which are known as the ACID properties. The ACID properties are meant
for the transaction that goes through a different group of tasks, and there we come to see the role
of the ACID properties.
In this section, we will learn and understand about the ACID properties. We will learn what these
properties stand for and what does each property is used for. We will also understand the ACID
properties with the help of some examples.
ACID Properties
The expansion of the term ACID defines for:
1) Atomicity
The term atomicity defines that the data remains atomic. It means if any operation is performed
on the data, either it should be performed or executed completely or should not be executed at
all. It further means that the operation should not break in between or execute partially. In the
case of executing operations on the transaction, the operation should be completely executed and
not partially.
Example: If Remo has account A having $30 in his account from which he wishes to send $10
to Sheero's account, which is B. In account B, a sum of $ 100 is already present. When $10 will
be transferred to account B, the sum will become $110. Now, there will be two operations that
will take place. One is the amount of $10 that Remo wants to transfer will be debited from his
account A, and the same amount will get credited to account B, i.e., into Sheero's account. Now,
what happens - the first operation of debit executes successfully, but the credit operation,
however, fails. Thus, in Remo's account A, the value becomes $20, and to that of Sheero's
account, it remains $100 as it was previously present.
In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.
In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue, and
so the atomicity is the main focus in the bank systems.
2) Consistency
The word consistency means that the value should remain preserved always. In DBMS, the
integrity of the data should be maintained, which means if a change in the database is made, it
should remain preserved always. In the case of transactions, the integrity of the data is very
essential so that the database remains consistent before and after the transaction. The data should
always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one
by one to both B & C. There are two operations that take place, i.e., Debit and Credit. Account A
firstly debits $50 to account B, and the amount in account A is read $300 by B before the
transaction. After the successful transaction T, the available amount in B becomes $150. Now, A
debits $20 to account C, and that time, the value read by C is $250 (that is correct as a debit of
$50 has been successfully done to B). The debit and credit operation from account A to C has
been done successfully. We can see that the transaction is done successfully, and the value is also
read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which
means that data is inconsistent because when the debit operation executes, it will not be
consistent.
3) Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no
data should affect the other one and may occur concurrently. In short, the operation on one
database should begin when the operation on the first database gets complete. It means if two
operations are being performed on two different databases, they may not affect the value of one
another. In the case of transactions, when two or more transactions occur simultaneously, the
consistency should remain maintained. Any changes that occur in any particular transaction will
not be seen by other transactions until the change is not committed in the memory.
Example: If two operations are concurrently running on two different accounts, then the value of
both accounts should not get affected. The value should remain persistent. As you can see in the
below diagram, account A is making T1 and T2 transactions to account B and C, but both are
executing independently without affecting each other. It is known as Isolation.
What is Integrity?
Upholding integrity means that measures are taken to ensure that data is kept accurate and up to
date. The integrity of your data impacts how trustworthy and conscientious your organisation is.
One of the eight Data Protection Principles (which are the foundations of the Data Protection Act
2018) is that data should be ‘kept accurate and up to date’. Users must make sure that they
comply with their legal duties and fulfil this requirement. It can be useful to assign individuals
specific roles and responsibilities regarding data integrity. This way employees cannot shelve the
responsibility and expect someone else to pick up the slack.