KEMBAR78
DATABASE 1 Part1 | PDF | Relational Database | Databases
0% found this document useful (0 votes)
18 views17 pages

DATABASE 1 Part1

Database course

Uploaded by

simo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views17 pages

DATABASE 1 Part1

Database course

Uploaded by

simo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DATABASE 1

What is Data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms
like text, numbers, media, bytes, etc. it can be stored in pieces of paper or electronic memory,
etc.
Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is
plural of the word datum.
In computing, Data is information that can be translated into a form for efficient movement and
processing. Data is interchangeable.
What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed.
You can organize data into tables, rows, columns, and index it to make it easier to find relevant
information.
Database handlers create a database in such a way that only one set of software program
provides access of data to all the users.
The main purpose of the database is to operate a large amount of information by storing,
retrieving, and managing data.
There are many dynamic websites on the World Wide Web nowadays which are handled
through databases. For example, a model that checks the availability of rooms in a hotel. It is an
example of a dynamic website that uses a database.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix,
PostgreSQL, SQL Server, etc.
Modern databases are managed by the database management system (DBMS).
You can organize data into tables, rows, columns, and index it to make it easier to find relevant
information.
Database handlers create a database in such a way that only one set of software program
provides access of data to all the users.
The main purpose of the database is to operate a large amount of information by storing,
retrieving, and managing data.
There are many dynamic websites on the World Wide Web nowadays which are handled
through databases. For example, a model that checks the availability of rooms in a hotel. It is an
example of a dynamic website that uses a database.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix,
PostgreSQL, SQL Server, etc.
Modern databases are managed by the database management system (DBMS).
These terms are all important concepts in database management systems (DBMS) and data
manipulation. Here's a breakdown of each:
1. Atomicity:
Imagine a transaction as a series of steps that modify data in a database. Atomicity guarantees
that either all the steps in a transaction are completed successfully, or none of them are. It's like
a single unit of work. If any step fails, the entire transaction is rolled back, ensuring the database
remains in a consistent state.
2. Consistency:
This refers to maintaining data integrity within the database. A transaction must take the
database from one valid state (adhering to predefined rules) to another valid state. This involves
enforcing constraints like primary and foreign keys, data types, and other rules that ensure data
accuracy and reliability.
3. Integrity:
Data integrity refers to the accuracy and completeness of data in a database. It encompasses
various aspects like:
• Validity: Data adheres to the defined data types and formats.
• Accuracy: Data reflects the real world it represents.
• Consistency: Data adheres to the defined rules and constraints within the database.
4. Durability:
This ensures that once a transaction is committed (marked as successful), the changes are
permanently stored and won't be lost even in case of system crashes, power failures, or other
disruptions. The database guarantees that the committed data persists.
5. Concurrency:
Concurrency deals with how a DBMS handles multiple transactions happening simultaneously. It
ensures that these concurrent transactions don't interfere with each other and lead to data
inconsistencies. Mechanisms like locking and optimistic concurrency control are used to achieve
this.
6. Query Processing:
This refers to the process of interpreting and executing user queries on a database. The DBMS
analyzes the query, retrieves the relevant data from storage, and presents the results to the user.
Query processing involves various steps like parsing, optimization, and execution to efficiently
retrieve the desired data.
DBMS (Data Base Management System)
Database management System is software which is used to store and retrieve the database. For
example, Oracle, MySQL, etc.; these are some popular DBMS tools.
o DBMS provides the interface to perform the various operations like creation, deletion,
modification, etc.
o DBMS allows the user to create their databases as per their requirement.
o DBMS accepts the request from the application and provides specific data through the
operating system.
o DBMS contains the group of programs which acts according to the user instruction.
o It provides security to the database.
Advantage of DBMS
Controls redundancy
It stores all the data in a single database file, so it can control data redundancy.
Data sharing
An authorized user can share the data among multiple users.
Backup
It providesBackup and recovery subsystem. This recovery system creates automatic data from
system failure and restores data if required.
Multiple user interfaces
ADVERTISEMENT
It provides a different type of user interfaces like GUI, application interfaces.
Disadvantage of DBMS
Size
It occupies large disk space and large memory to run efficiently.
Cost
DBMS requires a high-speed data processor and larger memory to run DBMS software, so it is
costly.
Complexity
DBMS creates additional complexity and requirements.
RDBMS (Relational Database Management System)
The word RDBMS is termed as 'Relational Database Management System.' It is represented as a
table that contains rows and column.
RDBMS is based on the Relational model; it was introduced by E. F. Codd.
A relational database contains the following components:
o Table
o Record/ Tuple
o Field/Column name /Attribute
o Instance
o Schema
o Keys

These terms are all fundamental concepts in relational databases and data organization:

1. Table:

A table is a core structure in a relational database. It resembles a spreadsheet with rows and
columns and holds data about a specific subject. Each table represents a particular entity, like
customers, products, or orders, and stores related information about that entity.

2. Record / Tuple:

A record, also called a tuple, represents a single row in a table. It contains a set of values for each
column of the table. Think of it as a single entry for a specific instance within the broader
category represented by the table.

3. Field / Column Name / Attribute:

These terms refer to the same concept - a vertical section in a table that holds specific data about
a particular aspect of the entity represented by the table. For example, a "Customers" table might
have columns for "Customer ID," "Name," "Email," and "Phone Number." Each column has a
distinct name that defines the type of data it holds.

4. Instance:

An instance refers to a specific record or tuple within a table. It's a single data point representing
a particular entity described by the table structure. For instance, a customer record with the ID
"1001," name "John Doe," email "[email address removed]," and phone number "123-456-7890"
would be an instance in a "Customers" table.

5. Schema:

The schema defines the overall structure of a table. It specifies the names and data types (e.g.,
text, number, date) of each column, as well as any constraints or rules governing the data within
those columns. Think of it as the blueprint that defines how data is organized within a table.

6. Keys:

Keys are special columns or sets of columns within a table that uniquely identify records. They
play a crucial role in data integrity and retrieval. There are different types of keys:

• Primary Key: A table can only have one primary key, which is a column (or a
combination of columns) that uniquely identifies each record in the table. No two records
can have the same value for the primary key.
• Candidate Key: Any column or set of columns that can uniquely identify records in a
table is considered a candidate key. A table can have multiple candidate keys, but only
one is designated as the primary key.
• Alternate Key: Any other key that uniquely identifies a set of records in a table, but is
not chosen as the primary key, is called an alternate key.
• Foreign Key: A foreign key is a column (or set of columns) in one table that references
the primary key of another table. It establishes a link between related tables, allowing
data integrity and efficient retrieval of related information across tables.

What is Data redundancy in the database management system?


In DBMS, when the same data is stored in different tables, it causes data redundancy.

Sometimes, it is done on purpose for recovery or backup of data, faster access of data, or updating
data easily. Redundant data costs extra money, demands higher storage capacity, and requires extra
effort to keep all the files up to date.

Sometimes, unintentional duplicity of data causes a problem for the database to work properly, or
it may become harder for the end user to access data. Redundant data unnecessarily occupy space
in the database to save identical copies, which leads to space constraints, which is one of the major
problems.
Let us understand redundancy in DBMS properly with the help of an example.

Student_id Name Course Session Fee Department

101 Devi B. Tech 2022 90,000 CS

102 Sona B. Tech 2022 90,000 CS

103 Varun B. Tech 2022 90,000 CS

104 Satish B. Tech 2022 90,000 CS

105 Amisha B. Tech 2022 90,000 CS

In the above example, there is a "Student" table that contains data such as "Student_id", "Name",
"Course", "Session", "Fee", and "Department". As you can see, some data is repeated in the table,
which causes redundancy.

Problems that are caused due to redundancy in the database


Redundancy in DBMS gives rise to anomalies, and we will study it further. In a database
management system, the problems that occur while working on data include inserting, deleting,
and updating data in the database.

We will understand these anomalies with the help of the following student table:

student_id student_name student_age dept_id dept_name dept_head

1 Shiva 19 104 Information Jaspreet Kaur


Technology

2 Khushi 18 102 Electronics Avni Singh

3 Harsh 19 104 Information Jaspreet Kaur


Technology

1. Insertion Anomaly:
Insertion anomaly arises when you are trying to insert some data into the database, but you are not
able to insert it.
Example: If you want to add the details of the student in the above table, then you must know the
details of the department; otherwise, you will not be able to add the details because student details
are dependent on department details.

2. Deletion Anomaly:
Deletion anomaly arises when you delete some data from the database, but some unrelated data is
also deleted; that is, there will be a loss of data due to deletion anomaly.

Example: If we want to delete the student detail, which has student_id 2, we will also lose the
unrelated data, i.e., department_id 102, from the above table.

3. Updating Anomaly:
An update anomaly arises when you update some data in the database, but the data is partially
updated, which causes data inconsistency.

Example: If we want to update the details of dept_head from Jaspreet Kaur to Ankit Goyal for
Dept_id 104, then we have to update it everywhere else; otherwise, the data will get partially
updated, which causes data inconsistency.

Consistency in DBMS
Introduction
Consistency in database systems refers to the need that any given database transaction only
change affected data in allowed ways.Data written to a database must be legitimate according to
all stated rules, including constraints, cascades, triggers, or any combination, for the database to
be consistent.
Consistency also implies that any changes made to a single object in one table must be mirrored
in all other tables in which that object appears. Continuing with the driver's license example, if
the new driver's home address changes, the change must be shown in all tables where the
previous address previously existed. Data inconsistency occurs when one table has the old
address, and the others have the updated address.
Consistency does not ensure transactional correctness in all ways that an application programmer
may expect (that is the responsibility of application-level code). Instead, consistency ensures that
programming errors do not violate established database constraints.
Importance of Consistency in DBMS
Consistent data is what keeps a database running like clockwork. Established rules/values keep
inconsistent data out of primary databases and replicas, allowing its processes to run smoothly.
ADVERTISEMENT
o Accuracy
o Increased database space
o Faster and more efficient data retrieval
Database consistency governs every data that enters. So, while the database changes when new
data is added, it does so consistently and in accordance with the validation rules that were
specified at the start. In today's environment, billion-dollar judgments are made every day all
around the world based on the apparent consistency of a database.
When real-time information becomes the new norm for modern-day digital organizations, it's
vital that validation methods are put in place to keep datasets free of erroneous data, as this adds
delay and makes real-time experiences less real.
Strong Consistency vs Weak Consistency
o Strong consistency implies that all data in a primary replica and all relevant nodes
conform to the validation rules and are the same at all times. With robust database
consistency, no matter which client accesses the data, they will always see the most
recently updated data that adheres to the database's standards.
o Weak consistency is reminiscent of the untamed Wild West. There is no guarantee that
the data in your primary, replica or node will always be the same. One client in India may
gain access to the data and view information that matches the validation rules but is not
the most recently updated data, causing consistency difficulties. They may be acting on
information that is no longer relevant, despite the fact that it was formerly.
What's the Distinction between ACID and BASE Database Consistency?
Relational databases that provide strong consistency provide 'ACID guarantees' in general. ACID
is an abbreviation that stands for the basic characteristics of a highly consistent database.
Relational databases that provide strong consistency provide 'ACID guarantees' in general. ACID
is an abbreviation that stands for the basic characteristics of a highly consistent database.
o Atomicity: If any component of the transaction fails, the complete transaction is reverted.
o Consistency: With each transaction, the database's structural integrity is maintained.
o Isolation: Each transaction is distinct from the others.
o Durability: All transaction outcomes are saved indefinitely.
ACID compliance is a complicated and much-discussed topic. In essence, it denotes the
straightforward promise that a READ will deliver the outcome of the most recent successful
WRITE. While this may appear to be a straightforward assurance, it is quite difficult to
implement in a globally distributed database structure with numerous clusters, each having many
nodes.
As a result, ACID-compliant databases are typically prohibitively expensive and difficult to
scale.
ACID Properties in DBMS
DBMS is the management of data that should remain integrated when any changes are done in it.
It is because if the integrity of the data is affected, whole data will get disturbed and corrupted.
Therefore, to maintain the integrity of the data, there are four properties described in the database
management system, which are known as the ACID properties. The ACID properties are meant
for the transaction that goes through a different group of tasks, and there we come to see the role
of the ACID properties.
In this section, we will learn and understand about the ACID properties. We will learn what these
properties stand for and what does each property is used for. We will also understand the ACID
properties with the help of some examples.
ACID Properties
The expansion of the term ACID defines for:

1) Atomicity
The term atomicity defines that the data remains atomic. It means if any operation is performed
on the data, either it should be performed or executed completely or should not be executed at
all. It further means that the operation should not break in between or execute partially. In the
case of executing operations on the transaction, the operation should be completely executed and
not partially.
Example: If Remo has account A having $30 in his account from which he wishes to send $10
to Sheero's account, which is B. In account B, a sum of $ 100 is already present. When $10 will
be transferred to account B, the sum will become $110. Now, there will be two operations that
will take place. One is the amount of $10 that Remo wants to transfer will be debited from his
account A, and the same amount will get credited to account B, i.e., into Sheero's account. Now,
what happens - the first operation of debit executes successfully, but the credit operation,
however, fails. Thus, in Remo's account A, the value becomes $20, and to that of Sheero's
account, it remains $100 as it was previously present.

In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.

In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue, and
so the atomicity is the main focus in the bank systems.
2) Consistency
The word consistency means that the value should remain preserved always. In DBMS, the
integrity of the data should be maintained, which means if a change in the database is made, it
should remain preserved always. In the case of transactions, the integrity of the data is very
essential so that the database remains consistent before and after the transaction. The data should
always be correct.
Example:

In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one
by one to both B & C. There are two operations that take place, i.e., Debit and Credit. Account A
firstly debits $50 to account B, and the amount in account A is read $300 by B before the
transaction. After the successful transaction T, the available amount in B becomes $150. Now, A
debits $20 to account C, and that time, the value read by C is $250 (that is correct as a debit of
$50 has been successfully done to B). The debit and credit operation from account A to C has
been done successfully. We can see that the transaction is done successfully, and the value is also
read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which
means that data is inconsistent because when the debit operation executes, it will not be
consistent.
3) Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no
data should affect the other one and may occur concurrently. In short, the operation on one
database should begin when the operation on the first database gets complete. It means if two
operations are being performed on two different databases, they may not affect the value of one
another. In the case of transactions, when two or more transactions occur simultaneously, the
consistency should remain maintained. Any changes that occur in any particular transaction will
not be seen by other transactions until the change is not committed in the memory.
Example: If two operations are concurrently running on two different accounts, then the value of
both accounts should not get affected. The value should remain persistent. As you can see in the
below diagram, account A is making T1 and T2 transactions to account B and C, but both are
executing independently without affecting each other. It is known as Isolation.

DBMS Concurrency Control


Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
Concurrent Execution in DBMS
ADVERTISEMENT
o In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of
the transaction operations, there occur several challenging problems that need to be
solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So, there
is a need to manage these two operations in the concurrent execution of the transactions as if
these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the operations:
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write operations
on the same database items in an interleaved manner (i.e., concurrent execution) that makes the
values of the items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on the same
account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted
and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250
only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and database sets to inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:

o At time t1, transaction TX reads the value of account A, i.e., $300.


o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value changes
back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which is the
dirty read and therefore known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different
values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that will
be read as $400.
o It means that within the same transaction TX, it reads two different values of account A,
i.e., $ 300 initially, and after updation made by transaction TY, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions. Therefore,
these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
We will understand and discuss each protocol one by one in our next sections.
What is Confidentiality?
The principle of confidentiality involves restricting data access strictly to authorised personnel.
Users have a responsibility to ensure they maintain secure access control systems, including both
logical (e.g. PC passwords) and physical restrictions (e.g. ID cards). For this reason, it is
important that all employees receive thorough training in information security awareness and
best practices. It is important to limit data sharing and state availability restrictions so
confidentiality is not inadvertently breached.
The importance of physical restrictions should not be underestimated. Remember, unwarranted
access to your building can facilitate unauthorised data access. Door codes help to ensure your
building remains secure. They should not be written down and staff should be vigilant in
ensuring no one is watching or recording them input codes. Similarly, many organisations insist
that their employees wear ID badges, this makes it easier to identify non-employees within your
workplace. ID badges should be worn at all times within the workplace but never outside of
work. Wearing them outside of work enables criminals to quote your details (e.g. name, position
and organisation) in an attempt to gain access to your building. Areas containing particularly
sensitive information can be protected by extra access restrictions e.g. an additional door code.
Passwords are another basic, yet vital, means of protecting your information. A strong password
is at least 8 characters long, contains upper and lower case letters, numbers and special
symbols. Passwords should never be shared (even with your colleagues or IT providers) and
should be changed immediately if discovered. Changing your password regularly allows hackers
less time to guess it and stops them from using your account if they have already obtained your
password. You should change your password at least once every 90 days.

What is Integrity?
Upholding integrity means that measures are taken to ensure that data is kept accurate and up to
date. The integrity of your data impacts how trustworthy and conscientious your organisation is.
One of the eight Data Protection Principles (which are the foundations of the Data Protection Act
2018) is that data should be ‘kept accurate and up to date’. Users must make sure that they
comply with their legal duties and fulfil this requirement. It can be useful to assign individuals
specific roles and responsibilities regarding data integrity. This way employees cannot shelve the
responsibility and expect someone else to pick up the slack.

You might also like