KEMBAR78
Unit 6 DBMS Concurrency Control and Normalization | PDF | Computer Data Storage | Databases
0% found this document useful (0 votes)
44 views65 pages

Unit 6 DBMS Concurrency Control and Normalization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views65 pages

Unit 6 DBMS Concurrency Control and Normalization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

DBMS Concurrency Control: Two Phase, Timestamp,

Lock-Based Protocol
What is Concurrency Control?
Concurrency control is the procedure in DBMS for managing simultaneous operations without conflicting
with each another. Concurrent access is quite easy if all users are just reading data. There is no way they
can interfere with one another. Though for any practical database, would have a mix of reading and WRITE
operations and hence the concurrency is a challenge.

Concurrency control is used to address such conflicts which mostly occur with a multi-user system. It helps
you to make sure that database transactions are performed concurrently without violating the data integrity
of respective databases.

Therefore, concurrency control is a most important element for the proper functioning of a system where two
or multiple database transactions that require access to the same data, are executed simultaneously.

Potential problems of Concurrency


Here, are some issues which you will likely to face while using the Concurrency Control method:

• Lost Updates occur when multiple transactions select the same row and update the row based on the
value selected
• Uncommitted dependency issues occur when the second transaction selects a row which is updated
by another transaction (dirty read)
• Non-Repeatable Read occurs when a second transaction is trying to access the same row several
times and reads different data each time.
• Incorrect Summary issue occurs when one transaction takes summary over the value of all the
instances of a repeated data-item, and second transaction update few instances of that specific data-
item. In that situation, the resulting summary does not reflect a correct result.
Why use Concurrency method?
Reasons for using Concurrency control method is DBMS:

• To apply Isolation through mutual exclusion between conflicting transactions


• To resolve read-write and write-write conflict issues
• To preserve database consistency through constantly preserving execution obstructions
• The system needs to control the interaction among the concurrent transactions. This control is
achieved using concurrent-control schemes.
• Concurrency control helps to ensure serializability

Example
Assume that two people who go to electronic kiosks at the same time to buy a movie ticket for the same
movie and the same show time.

However, there is only one seat left in for the movie show in that particular theatre. Without concurrency
control, it is possible that both moviegoers will end up purchasing a ticket. However, concurrency control
method does not allow this to happen. Both moviegoers can still access information written in the movie
seating database. But concurrency control only provides a ticket to the buyer who has completed the
transaction process first.

Concurrency Control Protocols


Different concurrency control protocols offer different benefits between the amount of concurrency they allow
and the amount of overhead that they impose.

• Lock-Based Protocols
• Two Phase
• Timestamp-Based Protocols
• Validation-Based Protocols
Lock-based Protocols
A lock is a data variable which is associated with a data item. This lock signifies that operations that can be
performed on the data item. Locks help synchronize access to the database items by concurrent
transactions.

All lock requests are made to the concurrency-control manager. Transactions proceed only once the lock
request is granted.

Binary Locks: A Binary lock on a data item can either locked or unlocked states.

Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a lock is
acquired on a data item to perform a write operation, it is called an exclusive lock.

1. Shared Lock (S):

A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared between
transactions. This is because you will never have permission to update data on the data item.

For example, consider a case where two transactions are reading the account balance of a person. The
database will let them read by placing a shared lock. However, if another transaction wants to update that
account's balance, shared lock prevent it until the reading process is over.

2. Exclusive Lock (X):

With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't be held
concurrently on the same data item. X-lock is requested using lock-x instruction. Transactions may unlock
the data item after finishing the 'write' operation.

For example, when a transaction needs to update the account balance of a person. You can allows this
transaction by placing X lock on it. Therefore, when the second transaction wants to read or write, exclusive
lock prevent this operation.
3. Simplistic Lock Protocol

This type of lock-based protocols allows transactions to obtain a lock on every object before beginning
operation. Transactions may unlock the data item after finishing the 'write' operation.

4. Pre-claiming Locking

Pre-claiming lock protocol helps to evaluate operations and create a list of required data items which are
needed to initiate an execution process. In the situation when all locks are granted, the transaction executes.
After that, all locks release when all of its operations are over.

Starvation

Starvation is the situation when a transaction needs to wait for an indefinite period to acquire a lock.

Following are the reasons for Starvation:

• When waiting scheme for locked items is not properly managed


• In the case of resource leak
• The same transaction is selected as a victim repeatedly

Deadlock

Deadlock refers to a specific situation where two or more processes are waiting for each other to release a
resource or more than two processes are waiting for the resource in a circular chain.

Two Phase Locking (2PL) Protocol


Two-Phase locking protocol which is also known as a 2PL protocol. It is also called P2L. In this type of
locking protocol, the transaction should acquire a lock after it releases one of its locks.

This locking protocol divides the execution phase of a transaction into three different parts.

• In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
• The second part is where the transaction obtains all the locks. When a transaction releases its first
lock, the third phase starts.
• In this third phase, the transaction cannot demand any new locks. Instead, it only releases the
acquired locks.

The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two steps:

• Growing Phase: In this phase transaction may obtain locks but may not release any locks.
• Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock

It is true that the 2PL protocol offers serializability. However, it does not ensure that deadlocks do not
happen.

In the above-given diagram, you can see that local and global deadlock detectors are searching for
deadlocks and solve them with resuming transactions to their initial states.

Strict Two-Phase Locking Method


Strict-Two phase locking system is almost similar to 2PL. The only difference is that Strict-2PL never
releases a lock after using it. It holds all the locks until the commit point and releases all the locks at one go
when the process is over.

Centralized 2PL
In Centralized 2 PL, a single site is responsible for lock management process. It has only one lock manager
for the entire DBMS.

Primary copy 2PL


Primary copy 2PL mechanism, many lock managers are distributed to different sites. After that, a particular
lock manager is responsible for managing the lock for a set of data items. When the primary copy has been
updated, the change is propagated to the slaves.

Distributed 2PL
In this kind of two-phase locking mechanism, Lock managers are distributed to all sites. They are
responsible for managing locks for data at that site. If no data is replicated, it is equivalent to primary copy
2PL. Communication costs of Distributed 2PL are quite higher than primary copy 2PL

Timestamp-based Protocols
The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent transactions. This
protocol ensures that every conflicting read and write operations are executed in timestamp order. The
protocol uses the System Time or Logical Count as a Timestamp.

The older transaction is always given priority in this method. It uses system time to determine the time stamp
of the transaction. This is the most commonly used concurrency protocol.

Lock-based protocols help you to manage the order between the conflicting transactions when they will
execute. Timestamp-based protocols manage conflicts as soon as an operation is created.

Example:
Suppose there are three transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020
T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.

Advantages:

• Schedules are serializable just like 2PL protocols


• No waiting for the transaction, which eliminates the possibility of deadlocks!

Disadvantages:

Starvation is possible if the same transaction is restarted and continually aborted

Characteristics of Good Concurrency Protocol


An ideal concurrency control DBMS mechanism has the following objectives:

• Must be resilient to site and communication failures.


• It allows the parallel execution of transactions to achieve maximum concurrency.
• Its storage mechanisms and computational methods should be modest to minimize overhead.
• It must enforce some constraints on the structure of atomic actions of transactions.

Summary
• Concurrency control is the procedure in DBMS for managing simultaneous operations without
conflicting with each another.
• Lost Updates, dirty read, Non-Repeatable Read, and Incorrect Summary Issue are problems faced
due to lack of concurrency control.
• Lock-Based, Two-Phase, Timestamp-Based, Validation-Based are types of Concurrency handling
protocols
• The lock could be Shared (S) or Exclusive (X)
• Two-Phase locking protocol which is also known as a 2PL protocol needs transaction should acquire a
lock after it releases one of its locks. It has 2 phases growing and shrinking.
• The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent
transactions. The protocol uses the System Time or Logical Count as a Timestamp.

Transaction
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform operations for accessing the
contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This small transaction contains
several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:
Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be 3500.

But it may be possible that because of the failure of hardware, software or power, etc. that transaction may fail before finished
all the operations in the set.

For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's value will remain
4000 in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Transaction property
The transaction has the four properties. These are used to maintain consistency in a database, before and after the transaction.

Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit and either run to
completion or is not executed at all.

Atomicity involves the following two operations:

Abort: If a transaction aborts then all the changes made are not visible.

Commit: If a transaction commits then all the changes made are visible.

Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B consists of Rs 300.
Transfer Rs 100 from account A to account B.

T1 T2

Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)

After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.

If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then the amount will be
deducted from A but not added to B. This shows the inconsistent database state. In order to ensure correctness of database
state, the transaction must be executed in entirety.

Consistency
o The integrity constraints are maintained so that the database is consistent before and after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new stable state.
o The consistent property of database states that every transaction sees a consistent database instance.
o The transaction is used to transform the database from one consistent state to another consistent state.

For example: The total amount must be maintained before or after the transaction.

1. Total before T occurs = 600+300=900


2. Total after T occurs= 500+400=900

Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then inconsistency will occur.

Isolation
o It shows that the data which is used at the time of execution of a transaction cannot be used by the second transaction
until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be accessed by
any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.

Durability
o The durability property is used to indicate the performance of the database's consistent state. It states that the
transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the consistent state. That consistent state cannot be lost, even
in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
States of Transaction
In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records are still not saved to the
database.

Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is still not saved to the
database.
o In the total mark calculation example, a final display of the total marks step is executed in this state.

Committed
A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all the effects are now
permanently saved on the database system.

Failed state
o If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the
transaction will fail to execute.

Aborted
o If any of the checks fail and the transaction has reached a failed state then the database recovery system will make sure
that the database is in its previous consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the transaction, all the executed
transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two operations:
1. Re-start the transaction
2. Kill the transaction

DBMS - Data Recovery


Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and
robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or
crashes amid transactions, it is expected that the system would follow some sort of algorithm or techniques to recover
lost data.

Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as follows −

Transaction failure

A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is
called transaction failure where only a few transactions or processes are hurt.
Reasons for a transaction failure could be −
• Logical errors − Where a transaction cannot complete because it has some code error or any internal error condition.
• System errors − Where the database system itself terminates an active transaction because the DBMS is not able to execute it, or
it has to stop because of some system condition. For example, in case of deadlock or resource unavailability, the system aborts an
active transaction.

System Crash

There are problems − external to the system − that may cause the system to stop abruptly and cause the system to
crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.

Disk Failure

In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail
frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which
destroys all or a part of disk storage.

Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive system crashes. Volatile storage devices are placed
very close to the CPU; normally they are embedded onto the chipset itself. For example, main memory and cache memory are
examples of volatile storage. They are fast but can store only a small amount of information.
• Non-volatile storage − These memories are made to survive system crashes. They are huge in data storage capacity, but slower
in accessibility. Examples may include hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.

Recovery and Atomicity


When a system crashes, it may have several transactions being executed and various files opened for them to modify
the data items. Transactions are made of various operations, which are atomic in nature. But according to ACID
properties of DBMS, atomicity of transactions as a whole must be maintained, that is, either all the operations are
executed or none.
When a DBMS recovers from a crash, it should maintain the following −
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to be rolled back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining the atomicity of a
transaction −
• Maintaining the logs of each transaction, and writing them onto some stable storage before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile memory, and later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the
logs are written prior to the actual modification and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
• The log file is kept on a stable storage media.
• When a transaction enters the system and starts execution, it writes a log about it.
<T , Start>
n

• When the transaction modifies an item X, it write logs as follows −


<T , X, V , V >
n 1 2

It reads T has changed the value of X, from V to V .


n 1 2

• When the transaction finishes, it logs −


<T , commit>
n

The database can be modified using two approaches −


• Deferred database modification − All logs are written on to the stable storage and the database is updated when a transaction
commits.
• Immediate database modification − Each log follows an actual database modification. That is, the database is modified
immediately after every operation.

Recovery with Concurrent Transactions


When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it would
become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation, most modern
DBMS use the concept of 'checkpoints'.

Checkpoint

Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the
system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where all the
previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.

Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner −

• The recovery system reads the logs backwards from the end to the last checkpoint.
• It maintains two lists, an undo-list and a redo-list.
• If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in the redo-list.
• If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and
their previous logs are removed and then redone before saving their logs.
Database SecurityDatabase security has many different layers, but the key aspects are:
Authentication
User authentication is to make sure that the person accessing the database is who he claims to be. Authentication can be
done at the operating system level or even the database level itself. Many authentication systems such as retina scanners
or bio-metrics are used to make sure unauthorized people cannot access the database.
Authorization
Authorization is a privilege provided by the Database Administer. Users of the database can only view the contents they
are authorized to view. The rest of the database is out of bounds to them.
The different permissions for authorizations available are:

1. Primary Permission - This is granted to users publicly and directly.


2. Secondary Permission - This is granted to groups and automatically awarded to a user if he is a member of the group.
3. Public Permission - This is publicly granted to all the users.
4. Context sensitive permission - This is related to sensitive content and only granted to a select users.

The categories of authorization that can be given to users are:

1. System Administrator - This is the highest administrative authorization for a user. Users with this authorization can also execute
some database administrator commands such as restore or upgrade a database.
2. System Control - This is the highest control authorization for a user. This allows maintenance operations on the database but not
direct access to data.
3. System Maintenance - This is the lower level of system control authority. It also allows users to maintain the database but within a
database manager instance.
4. System Monitor - Using this authority, the user can monitor the database and take snapshots of it.

Database Integrity
Data integrity in the database is the correctness, consistency and completeness of data. Data integrity is enforced using
the following three integrity constraints:

1. Entity Integrity - This is related to the concept of primary keys. All tables should have their own primary keys which should
uniquely identify a row and not be NULL.
2. Referential Integrity - This is related to the concept of foreign keys. A foreign key is a key of a relation that is referred in another
relation.
3. Domain Integrity - This means that there should be a defined domain for all the columns in a database.
• A DBMS system always has a separate system for security which is responsible for protecting database against
accidental or intentional loss, destruction or misuse.
• Security Levels:
o Database level:- DBMS system should ensure that the authorization restriction needs to be there on users.
o Operating system Level:- Operating system should not allow unauthorized users to enter in system.
o Network Level:- Database is at some remote place and it is accessed by users through the network so
security is required.
• Security Mechanisms:
o Access Control(Authorization)
▪ Which identifies valid users who may have any access to the valid data in the Database and which
may restrict the operations that the user may perform?
▪ For Example The movie database might designate two roles:”users” (query the data only) and
“designers”(add new data)user must be assigned to a role to have the access privileges given to that
role.
▪ Each applications is associated with a specified role. Each role has a list of authorized users who
may execute/Design/administers the application.
o Authenticate the User:
▪ Which identify valid users who may have any access to the data in the Database?
▪ Restrict each user’s view of the data in the database
▪ This may be done with help of concept of views in Relational databases.
o Cryptographic control/Data Encryption:
▪ Encode data in a cryptic form(Coded)so that although data is captured by unintentional user still he
can’t be able to decode the data.
▪ Used for sensitive data, usually when transmitted over communications links but also may be used to
prevent by passing the system to gain access to the data.
o Inference control:
▪ Ensure that confidential information can’t be retrieved even by deduction.
▪ Prevent disclosure of data through statistical summaries of confidential data.
o Flow control or Physical Protection:
▪ Prevents the copying of information by unauthorized person.
▪ Computer systems must be physically secured against any unauthorized entry.
o Virus control:
▪ At user level authorization should be done to avoid intruder attacks through humans.
▪ There should be mechanism for providing protection against data virus.
o User defined control:
▪ Define additional constraints or limitations on the use of database.
▪ These allow developers or programmers to incorporate their own security procedures in addition to
above security mechanism.

Authorization

• Authorization is finding out if the person,once identified,is permitted to have the resource.
• Authorization explains that what you can do and is handled through the DBMS unless external security procedures
are available.
• Database management system allows DBA to give different access rights to the users as per their requirements.
• Basic Authorization we can use any one form or combination of the following basic forms of authorizations
o Resource authorization:-Authorization to access any system resource. e.g. sharing of database, printer etc.
o Alternation Authorization:- Authorization to add attributes or delete attributes from relations
o Drop Authorization:-Authorization to drop a relation.
• Granting of privileges:
o A system privilege is the right to perform a particular action,or to perform an action on any schema objects of
a particular type.
o An authorized user may pass on this authorization to other users.This process is called as granting of
privileges..
o Syntax:

o GRANT <privilege list>


o ON<relation name or view name>
o TO<user/role list>

o Example:
The following grant statement grants user U1,U2 and U3 the select privilege on Emp_Salary relation:

GRANT select
ON Emp_Salary
TO U1,U2 and U3.

• Revoking of privileges:
o We can reject the privileges given to particular user with help of revoke statement.
o To revoke an authorization,we use the revoke statement.
o Syntax:
o REVOKE <privilege list>
o ON<relation name or view name>
o FROM <user/role list>[restrict/cascade]

• Example:
The revocation of privileges from user or role may cause other user or roles also have to loose that privileges.This
behavior is called cascading of the revoke.

Revoke select
ON Emp_Salary
FROM U1,U2,U3.

• Some other types of Privileges:


o Reference privileges:
SQL permits a user to declare foreign keys while creating relations. Example: Allow user U1 to create
relation that references key ‘Eid’ of Emp_Salary relation.

GRANT REFERENCES (Eid)


ON Emp_Salary
TO U1.

o Execute privileges:
This privileges authorizes a user to execute a function or procedure. Thus,only user who has execute
privilege on a function Create_Acc() can call function.

GRANT EXECUTE
ON Create_Acc
TO U1.
DATABASE RECOVERY IN DBMS AND ITS
TECHNIQUES
DATABASE RECOVERY IN DBMS AND ITS TECHNIQUES: There can be any case in database system like any computer system
when database failure happens. So data stored in database should be available all the time whenever it is needed. So Database recovery
means recovering the data when it get deleted, hacked or damaged accidentally. Atomicity is must whether is transaction is over or not it
should reflect in the database permanently or it should not effect the database at all. So database recovery and database recovery techniques
are must in DBMS. So database recovery techniques in DBMS are given below.

Crash recovery:
DBMS may be an extremely complicated system with many transactions being executed each second. The sturdiness and hardiness of
software rely upon its complicated design and its underlying hardware and system package. If it fails or crashes amid transactions, it’s
expected that the system would follow some style of rule or techniques to recover lost knowledge.
DATABASE RECOVERY IN DBMS AND ITS TECHNIQUES
Classification of failure:
To see wherever the matter has occurred, we tend to generalize a failure into numerous classes, as follows:
▪ Transaction failure
▪ System crash
▪ Disk failure
Types of Failure

1. Transaction failure: A transaction needs to abort once it fails to execute or once it reaches to any further extent from wherever it
can’t go to any extent further. This is often known as transaction failure wherever solely many transactions or processes are hurt.
The reasons for transaction failure are:
▪ Logical errors
▪ System errors
1. Logical errors: Where a transaction cannot complete as a result of its code error or an internal error condition.
2. System errors: Wherever the information system itself terminates an energetic transaction as a result of the DBMS isn’t able to
execute it, or it’s to prevent due to some system condition. to Illustrate, just in case of situation or resource inconvenience, the
system aborts an active transaction.
3. System crash: There are issues − external to the system − that will cause the system to prevent abruptly and cause the system to
crash. For instance, interruptions in power supply might cause the failure of underlying hardware or software package failure.
Examples might include OS errors.
4. Disk failure: In early days of technology evolution, it had been a typical drawback wherever hard-disk drives or storage drives
accustomed to failing oftentimes. Disk failures include the formation of dangerous sectors, unreachability to the disk, disk crash or
the other failure, that destroys all or a section of disk storage.
Storage structure:
Classification of storage structure is as explained below:

Classification Of Storage

1. Volatile storage: As the name suggests, a memory board (volatile storage) cannot survive system crashes. Volatile storage devices
are placed terribly near to the CPU; usually, they’re embedded on the chipset itself. For instance, main memory and cache memory
are samples of the memory board. They’re quick however will store a solely little quantity of knowledge.
2. Non-volatile storage: These recollections are created to survive system crashes. they’re immense in information storage capability,
however slower in the accessibility. Examples could include hard-disks, magnetic tapes, flash memory, and non-volatile (battery
backed up) RAM.
Recovery and Atomicity:
When a system crashes, it should have many transactions being executed and numerous files opened for them to switch the information
items. Transactions are a product of numerous operations that are atomic in nature. However consistent with ACID properties of a database,
atomicity of transactions as an entire should be maintained, that is, either all the operations are executed or none.
When a database management system recovers from a crash, it ought to maintain the subsequent:
▪ It ought to check the states of all the transactions that were being executed.
▪ A transaction could also be within the middle of some operation; the database management system should make sure the atomicity
of the transaction during this case.
▪ It ought to check whether or not the transaction is completed currently or it must be rolled back.
▪ No transactions would be allowed to go away from the database management system in an inconsistent state.
There are 2 forms of techniques, which may facilitate a database management system in recovering as well as maintaining the atomicity of a
transaction:
▪ Maintaining the logs of every transaction, and writing them onto some stable storage before truly modifying the info.
▪ Maintaining shadow paging, wherever the changes are done on a volatile memory, and later, and the particular info is updated.
Log-based recovery Or Manual Recovery):
Log could be a sequence of records, which maintains the records of actions performed by dealing. It’s necessary that the logs area unit
written before the particular modification and hold on a stable storage media, that is failsafe. Log-based recovery works as follows:
▪ The log file is unbroken on a stable storage media.
▪ When a transaction enters the system and starts execution, it writes a log regarding it.
Recovery with concurrent transactions (Automated Recovery):
When over one transaction is being executed in parallel, the logs are interleaved. At the time of recovery, it’d become exhausting for the
recovery system to go back all logs, and so begin recovering. To ease this example, the latest package uses the idea of ‘checkpoints’.
Automated Recovery is of three types
Deferred Update Recovery
Immediate Update Recovery
Shadow Paging
Data is a valuable entity which must have to be firmly handled and managed as with any economic resource. So some
part or all of the commercial data may have tactical importance to their respective organization and hence must have to
be kept protected and confidential. In this chapter, you will learn about the scope of database security. There is a range of
computer-based controls that are offered as countermeasures to

these threats.

What is Database Security?


Database security is the technique that protects and secures the database against intentional or accidental threats.
Security concerns will be relevant not only to the data resides in an organization's database: the breaking of security may
harm other parts of the system which may ultimately affect the database structure. Consequently, database security
includes hardware part, software part, human resource, and data. To efficiently do the uses of security needs appropriate
controls, which are distinct in a specific mission and purpose for the system. The requirement for getting proper security
while often having been neglected or overlooked in the past days; is now more and more thoroughly checked by the
different organizations.

We consider database security about the following situations:

• Theft and fraudulent.


• Loss of confidentiality or secrecy.
• Loss of data privacy.
• Loss of data integrity.
• Loss of availability of data.

These listed circumstances mostly signify the areas in which the organization should focus on reducing the risk that is the
chance of incurring loss or damage to data within a database. In some conditions, these areas are directly related such
that an activity that leads to a loss in one area may also lead to a loss in another since all of the data within an
organization is interconnected.
What is a Threat?
Any situation or event, whether intentionally or incidentally, can cause damage which can reflect an adverse effect on the
database structure and consequently the organization. A threat may occur by a situation or event involving a person, or
the action or situations that is probably to bring harm to an organization and its database.

The degree that an organization undergoes as a result of a threat's following which depends upon some aspects, such as
the existence of countermeasures and contingency plans. Let us take an example where you have a hardware failure
occurs corrupting secondary storage; all processing activity must cease until the problem is resolved.

Computer-Based Controls
The different forms of countermeasure to threats on computer systems range from physical controls to managerial
procedures. In spite of the range of computer-based controls that are preexisting, it is worth noting that, usually, the
security of a DBMS is merely as good as that of the operating system, due to the close association among them.

Most of the computer-based database security is listed below:

• Access authorization.
• Access controls.
• Views.
• Backup and recovery of data.
• Data integrity.
• Encryption of data.
• RAID technology.

What is Access Controls?


The usual way of supplying access controls to a database system is dependent on the granting and revoking of privileges
within the database. A privilege allows a user to create or access some database object or to run some specific DBMS
utilities. Privileges are granted users to achieve the tasks required for those jobs.

The database provides various types of access controls:

• Discretionary Access Control (DAC)


• Mandatory Access Control (MAC)

Backup and Recovery


Every Database Management System should offer backup facilities to help with the recovery of a database after a failure.
It is always suitable to make backup copies of the database and log file at the regular period and for ensuring that the
copies are in a secure location. In the event of a failure that renders the database unusable, the backup copy and the
details captured in the log file are used to restore the database to the latest possible consistent state.

The names of the functions are:

• Transaction support.
• Concurrency Control.

Although each function can be discussed discretely; but they are mutually dependent. Many DBMSs allow users to carry
out simultaneous operations on the database. If these operations are not restricted, the accesses may get in the way with
one another, and the database can become incompatible. For defeating this problem the DBMS implements a
concurrency control technique using a protocol which prevents database accesses from prying with one another. In this
chapter, you will learn about the concurrency control and transaction support for any centralized DBMS that consists of a
single database.

What is Data Recovery?


It is the method of restoring the database to its correct state in the event of a failure at the time of the transaction or after
the end of a process. Earlier you have been given the concept of database recovery as a service which should be
provided by all the DBMS for ensuring that the database is dependable and remains in a consistent state in the presence
of failures. In this context, dependability refers to both the flexibility of the DBMS to various kinds of failure and its ability to
recover from those failures. In this chapter, you will gather a brief knowledge of how this service can be provided. To gain
a better understanding of the possible problems you may encounter in providing a consistent system, you will first learn
about the need for recovery and its types of failure which usually occurs in a database environment.

What is the Need for Recovery of data?


The storage of data usually includes four types of media with an increasing amount of reliability: the main memory, the
magnetic disk, the magnetic tape, and the optical disk. There are many different forms of failure that can have an effect on
database processing and/or transaction and each of them has to be dealt with differently. Some data failures can affect
main memory only, while others involve non-volatile or secondary storage also. Among the sources of failure are:

• Due to hardware or software errors, the system crashes which ultimately resulting in loss of main memory.
• Failures of media, such as head crashes or unreadable media that results in the loss of portions of secondary storage.
• There can be application software errors, such as logical errors which are accessing the database that can cause one or
more transactions to abort or fail.
• Natural physical disasters can also occur such as fires, floods, earthquakes, or power failures.
• Carelessness or unintentional destruction of data or directories by operators or users.
• Damage or intentional corruption or hampering of data (using malicious software or files) hardware or software facilities.

Whatever the grounds of the failure are, there are two principal things that you have to consider:

• Failure of main memory including that database buffers.


• Failure of the disk copy of that database.

Recovery Facilities
Every DBMS should offer the following facilities to help out with the recovery mechanism:

• Backup mechanism makes backup copies at a specific interval for the database.
• Logging facilities keep tracing the current state of transactions and any changes made to the database.
• Checkpoint facility allows updates to the database for getting the latest patches to be made permanent and keep secure
from vulnerability.
• Recovery manager allows the database system for restoring the database to a reliable and steady state after any failure
occurs.

Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the
undesirable characteristics like Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them using relationship.
o The normal form is used to reduce redundancy from the database table.

Types of Normal Forms


There are the four types of normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the
primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.

5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP
14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach
more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate
key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X →
Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes
(EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a
Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283


Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued
dependency.

Example
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship
between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of
data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE
STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey
Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In
this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we
leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns
blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3
SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is
required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.

Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai
33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:

Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or R2 or
must be derivable from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation R1(ABC).

Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third
attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always
requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every
year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of these
dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR".
DBMS - Normalization

Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency says that
if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have same values for
attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-
hand side attributes determine the values of attributes on the right-hand side.

Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F +, is the set of all functional dependencies
logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of
functional dependencies.
• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in dependencies,
does not change the basic dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also holds. a → b is called as a
functionally that determines b.

Trivial Functional Dependency


• Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always
hold.
• Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
• Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely non-trivial FD.

Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator.
Managing a database with anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations.
For example, when we try to update one data item having its copies scattered over several places, a few instances get updated
properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also
saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent state.

First Normal Form


First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation
must have atomic domains. The values in an atomic domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal Form.


Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form


Before we learn about the second normal form, we need to understand the following −
• Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key
attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-
key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute
individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

Third Normal Form


For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy −

• No non-prime attribute is transitively dependent on prime key attribute.


• For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be
identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip
→ City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −

• For any non-trivial functional dependency, X → A, X must be a super-key.


In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation
ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database


BY CHAITANYA SINGH | FILED UNDER: DBMS

Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly,
update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss normal forms with
examples.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are – Insertion,
update and deletion anomaly. Let’s take an example to understand this.

Example: Suppose a manufacturing company stores the employee details in a table named employee that has
four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for
storing employee’s address and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of
the company. If we want to update the address of Rick then we have to update the same in two rows or the data
will become inconsistent. If somehow, the correct address gets updated in one department but not in other then
as per the database, Rick would be having two different addresses, which is not correct and would lead to
inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned
to any department then we would not be able to insert the data into the table if emp_dept field doesn’t allow
nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows
that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned
only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will discuss about
normalization.

Normalization
Here are the most commonly used normal forms:

• First normal form(1NF)


• Second normal form(2NF)
• Third normal form(3NF)
• Boyce & Codd normal form (BCNF)

First normal form (1NF)


As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold
only atomic values.

Example: Suppose a company wants to store the names and contact details of its employees. It creates a table
that looks like this:
emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

8812121212
102 Jon Kanpur

9900012222

103 Ron Chennai 7778881212

9990000123
104 Lester Bangalore 8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same field as
you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:
• Table is in 1NF (First normal form)
• No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a table
that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a
same teacher.

teacher_id subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates
the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key
of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:
teacher_id subject

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:

• Table must be in 2NF


• Transitive functional dependency of non-prime attribute on any super key should be removed.
An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency
X-> Y at least one of the following conditions hold:

• X is a super key of table


• Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee, they create a table
named employee_details that looks like this:

emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam


1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id).
This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:

employee table:

emp_id emp_name emp_zip

1001 John 282005


1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

employee_zip table:

emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City


282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table
complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the
table.

Example: Suppose there is a company wherein employees work in more than one department. They store the
data like this:

emp_id emp_nationality emp_dept dept_type dept_no_of_emp

1001 Austrian Production and planning D001 200


1001 Austrian stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

emp_id emp_nationality

1001 Austrian
1002 American

emp_dept table:

emp_dept dept_type dept_no_of_emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600

emp_dept_mapping table:

emp_id emp_dept
1001 Production and planning

1001 stores

1002 design and technical support

1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

Functional dependency in DBMS


BY CHAITANYA SINGH | FILED UNDER: DBMS

The attributes of a table is said to be dependent on each other when an attribute of a table uniquely identifies
another attribute of the same table.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we can tell
the student name associated with it. This is known as functional dependency and can be written as Stu_Id-
>Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B (Attribute B
is functionally dependent on attribute A)

Types of Functional Dependencies


• Trivial functional dependency
• non-trivial functional dependency
• Multivalued dependency
• Transitive dependency

You might also like