KEMBAR78
Database Management System | PDF | Relational Database | Sql
0% found this document useful (0 votes)
13 views122 pages

Database Management System

RDBMS (Relational Database Management System) is a type of database management system that organizes data into tables, each identified by a unique primary key. It allows for easy data access and manipulation through structured queries, supporting various key types such as primary, foreign, and candidate keys to maintain data integrity. Key operations in RDBMS include selection, projection, and union, which facilitate data retrieval and management.

Uploaded by

vivekrajanand349
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views122 pages

Database Management System

RDBMS (Relational Database Management System) is a type of database management system that organizes data into tables, each identified by a unique primary key. It allows for easy data access and manipulation through structured queries, supporting various key types such as primary, foreign, and candidate keys to maintain data integrity. Key operations in RDBMS include selection, projection, and union, which facilitate data retrieval and management.

Uploaded by

vivekrajanand349
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

UNIT-II

RDBMS (Relational Database Management System)


RDBMS stands for Relational Database Management System.

All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL, and
Microsoft Access are based on RDBMS.

It is called Relational Database Management System (RDBMS) because it is based on the relational model
introduced by E.F. Codd.

How it works
Data is represented in terms of tuples (rows) in RDBMS.

A relational database is the most commonly used database. It contains several tables, and each table has its
primary key.

Due to a collection of an organized set of tables, data can be accessed easily in RDBMS.

Table/Relation
Everything in a relational database is stored in the form of relations. The RDBMS database uses tables to store
data. A table is a collection of related data entries and contains rows and columns to store data. Each table
represents some real-world objects such as person, place, or event about which information is collected. The
organized collection of data into a relational table is known as the logical view of the database.

Properties of a Relation:

o Each relation has a unique name by which it is identified in the database.


o Relation does not contain duplicate tuples.
o The tuples of a relation have no specific order.
o All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.

A table is the simplest example of data stored in RDBMS.

Example
ID Name AGE COURSE

1 Ajeet 24 B.Tech

2 aryan 20 C.A

3 Mahesh 21 BCA

4 Ratan 22 MCA

5 Vimal 26 BSC
row or record
A row of a table is also called a record or tuple. It contains the specific information of each entry in the table. It
is a horizontal entity in the table. For example, The above table contains 5 records.

Properties of a row:

o No two tuples are identical to each other in all their entries.


o All tuples of the relation have the same format and the same number of entries.
o The order of the tuple is irrelevant. They are identified by their content, not by their position.

Let's see one record/row in the table.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

Column/attribute
A column is a vertical entity in the table which contains all information associated with a specific field in a
table. For example, "name" is a column in the above table which contains all information about a student's
name.

Properties of an Attribute:

o Every attribute of a relation must have a name.


o Null values are permitted for the attributes.
o Default values can be specified for an attribute automatically inserted if no other value is specified for
an attribute.
o Attributes that uniquely identify each tuple of a relation are the primary key.

Name

Ajeet

Aryan

Mahesh

Ratan

Vimal

Data item/Cells
The smallest unit of data in the table is the individual data item. It is stored at the intersection of tuples and
attributes.
Properties of data items:

o Data items are atomic.


o The data items for an attribute should be drawn from the same domain.

In the below example, the data item in the student table consists of Ajeet, 24 and Btech, etc.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

Degree:

The total number of attributes that comprise a relation is known as the degree of the table.

For example, the student table has 4 attributes, and its degree is 4.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

2 aryan 20 C.A

3 Mahesh 21 BCA

4 Ratan 22 MCA

5 Vimal 26 BSC

Cardinality:

The total number of tuples at any one time in a relation is known as the table's cardinality. The relation whose
cardinality is 0 is called an empty table.

For example, the student table has 5 rows, and its cardinality is 5.

ID Name AGE COURSE

1 Ajeet 24 B.Tech
2 aryan 20 C.A

3 Mahesh 21 BCA

4 Ratan 22 MCA

5 Vimal 26 BSC

Domain:

The domain refers to the possible values each attribute can contain. It can be specified using standard data types
such as integers, floating numbers, etc. For example, An attribute entitled Marital_Status may be limited to
married or unmarried values.

NULL Values

The NULL value of the table specifies that the field has been left blank during record creation. It is different
from the value filled with zero or a field that contains space.

Data Integrity

There are the following categories of data integrity exist with each RDBMS:

Entity integrity: It specifies that there should be no duplicate rows in a table.

Keys in DBMS
Here are some reasons for using sql key in the DBMS system.

• Keys help you to identify any row of data in a table. In a real-world application, a table
could contain thousands of records. Moreover, the records could be duplicated. Keys in
RDBMS ensure that you can uniquely identify a table record despite these challenges.
• Allows you to establish a relationship between and identify the relation between tables
• Help you to enforce identity and integrity in the relationship.

Types of Keys in DBMS (Database Management System)


There are mainly Eight different types of Keys in DBMS and each key has it’s different
functionality:

1. Super Key
2. Primary Key
3. Candidate Key
4. Alternate Key
5. Foreign Key
6. Compound Key
7. Composite Key
8. Surrogate Key
Let’s look at each of the keys in DBMS with example:

• Super Key – A super key is a group of single or multiple keys which identifies rows in a
table.
• Primary Key – is a column or group of columns in a table that uniquely identify every
row in that table.
• Candidate Key – is a set of attributes that uniquely identify tuples in a table. Candidate
Key is a super key with no repeated attributes.
• Alternate Key – is a column or group of columns in a table that uniquely identify every
row in that table.
• Foreign Key – is a column that creates a relationship between two tables. The purpose of
Foreign keys is to maintain data integrity and allow navigation between two different
instances of an entity.
• Compound Key – has two or more attributes that allow you to uniquely recognize a
specific record. It is possible that each column may not be unique by itself within the
database.
• Composite Key – is a combination of two or more columns that uniquely identify rows in
a table. The combination of columns guarantees uniqueness, though individual
uniqueness is not guaranteed.
• Surrogate Key – An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of key are unique because they are created when you don’t
have any natural primary key.

Super key
A superkey is a group of single or multiple keys which identifies rows in a table. A Super key
may have additional attributes that are not needed for unique identification.

Example:

EmpSSN EmpNum Empname

9812345098 AB05 Shown

9876512345 AB06 Roslyn

199937890 AB07 James

In the above-given example, EmpSSN and EmpNum name are superkeys.

Primary Key
PRIMARY KEY in DBMS is a column or group of columns in a table that uniquely identify
every row in that table. The Primary Key can’t be a duplicate meaning the same value can’t
appear more than once in the table. A table cannot have more than one primary key.
Rules for defining Primary key:
• Two rows can’t have the same primary key value
• It must for every row to have a primary key value.
• The primary key field cannot be null.
• The value in a primary key column can never be modified or updated if any foreign key
refers to that primary key.

Example:

In the following example, <code>StudID</code> is a Primary Key.

StudID Roll No First Name LastName Email

1 11 Tom Price abc@gmail.com

2 12 Nick Wright xyz@gmail.com

3 13 Dana Natan mno@yahoo.com

Alternate key
ALTERNATE KEYS is a column or group of columns in a table that uniquely identify every
row in that table. A table can have multiple choices for a primary key but only one can be set as
the primary key. All the keys which are not primary key are called an Alternate Key.
Example:

In this table, StudID, Roll No, Email are qualified to become a primary key. But since StudID is
the primary key, Roll No, Email becomes the alternative key.

StudID Roll No First Name LastName Email

1 11 Tom Price abc@gmail.com

2 12 Nick Wright xyz@gmail.com

3 13 Dana Natan mno@yahoo.com

Candidate Key
CANDIDATE KEY in SQL is a set of attributes that uniquely identify tuples in a table.
Candidate Key is a super key with no repeated attributes. The Primary key should be selected
from the candidate keys. Every table must have at least a single candidate key. A table can have
multiple candidate keys but only a single primary key.
Properties of Candidate key:

• It must contain unique values


• Candidate key in SQL may have multiple attributes
• Must not contain null values
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table

Candidate key Example: In the given table Stud ID, Roll No, and email are candidate keys which
help us to uniquely identify the student record in the table.
StudID Roll No First Name LastName Email

1 11 Tom Price abc@gmail.com

2 12 Nick Wright xyz@gmail.com

3 13 Dana Natan mno@yahoo.com

Candidate Key in DBMS

Foreign key
FOREIGN KEY is a column that creates a relationship between two tables. The purpose of
Foreign keys is to maintain data integrity and allow navigation between two different instances of
an entity. It acts as a cross-reference between two tables as it references the primary key of
another table.
Example:

DeptCode DeptName

001 Science

002 English

005 Computer

Teacher ID Fname Lname

B002 David Warner

B017 Sara Joseph

B009 Mike Brunton

In this key in dbms example, we have two table, teach and department in a school. However,
there is no way to see which search work in which department.
In this table, adding the foreign key in Deptcode to the Teacher name, we can create a
relationship between the two tables.

Teacher ID DeptCode Fname Lname

B002 002 David Warner

B017 002 Sara Joseph

B009 001 Mike Brunton

This concept is also known as Referential Integrity

COMPOUND KEY
It has two or more attributes that allow you to uniquely recognize a specific record. It is possible
that each column may not be unique by itself within the database. However, when combined with
the other column or columns the combination of composite keys become unique. The purpose of
the compound key in database is to uniquely identify each record in the table.
Example:

OrderNo PorductID Product Name Quantity

B005 JAP102459 Mouse 5

B005 DKT321573 USB 10

B005 OMG446789 LCD Monitor 20

B004 DKT321573 USB 15

B002 OMG446789 Laser Printer 3

In this example, OrderNo and ProductID can’t be a primary key as it does not uniquely identify a
record. However, a compound key of Order ID and Product ID could be used as it uniquely
identified each record.

COMPOSITE KEY is a combination of two or more columns that uniquely identify rows in a
table. The combination of columns guarantees uniqueness, though individually uniqueness is not
guaranteed. Hence, they are combined to uniquely identify records in a table.
The difference between compound and the composite key is that any part of the compound key
can be a foreign key, but the composite key may or maybe not a part of the foreign key.

Referential Integrity constraint


A referential integrity constraint is also known as foreign key constraint. A foreign key is a key whose values
are derived from the Primary key of another table.

The table from which the values are derived is known as Master or Referenced Table and the Table in which
values are inserted accordingly is known as Child or Referencing Table, In other words, we can say that the
table containing the foreign key is called the child table, and the table containing the Primary key/candidate
key is called the referenced or parent table. When we talk about the database relational model, the candidate
key can be defined as a set of attribute which can have zero or more attributes.
The syntax of the Master Table or Referenced table is:

CREATE TABLE Student (Roll int PRIMARY KEY, Name varchar(25) , Course varchar(10) );

Here column Roll is acting as Primary Key, which will help in deriving the value of foreign key in the child
table.

61M
1.2K
Exception Handling in Java - Javatpoint

The syntax of Child Table or Referencing table is:

1. CREATE TABLE Subject (Roll int references Student, SubCode int, SubName varchar(10) );

In the above table, column Roll is acting as Foreign Key, whose values are derived using the Roll value of
Primary key from Master table

Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the
query. It uses operators to perform queries.

Types of Relational operation

1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

1. Notation: σ p(r)

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These relational
can use as relational operators like =, ≠, ≥, <, >, ≤.

For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

1. σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300


2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of the
attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison
Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are either in R or
S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION
CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry
Williams

Mayes

4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in both
R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R but
not in S.
o It is denoted by intersection minus (-).

1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME
Jackson

Hayes

Willians

Curry

6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other table. It is
also known as a cross product.
o It is denoted by X.

1. Notation: E X D

Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:

1. EMPLOYEE X DEPARTMENT
Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to STUDENT1.

1. ρ(STUDENT1, STUDENT)

SQL
o SQL stands for Structured Query Language. It is used for storing and managing data in relational
database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create, read, update and
delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as their standard
database language.
o SQL allows users to query the database in a number of ways, using English-like statements.
Rules:

SQL follows the following rules:

o Structure query language is not case sensitive. Generally, keywords of SQL are written in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement on one or multiple
text line.
o Using the SQL statements, you can perform most of the actions in a database.
o SQL depends on tuple relational calculus and relational algebra.

SQL process:
o When an SQL command is executing for any RDBMS, then the system figure out the best way to carry
out the request and the SQL engine determines that how to interpret the task.
o In the process, various components are included. These components can be optimization Engine, Query
engine, Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL query engine won't handle
logical files.

Characteristics of SQL
o SQL is easy to learn.
o SQL is used to access data from relational database management systems.
o SQL can execute queries against the database.
o SQL is used to describe the data.
o SQL is used to define the data in the database and manipulate it when needed.
o SQL is used to create and drop the database and table.
o SQL is used to create a view, stored procedure, function in a database.
o SQL allows users to set permissions on tables, procedures, and views.
Advantages of SQL
There are the following advantages of SQL:

High speed

Using the SQL queries, the user can quickly and efficiently retrieve a large amount of records from a database.

No coding needed

In the standard SQL, it is very easy to manage the database system. It doesn't require a substantial amount of
code to manage the database system.

Well defined standards

Long established are used by the SQL databases that are being used by ISO and ANSI.

Portability

SQL can be used in laptop, PCs, server and even some mobile phones.

Interactive language

SQL is a domain language used to communicate with the database. It is also used to receive answers to the
complex questions in seconds.

Multiple data view

Using the SQL language, the users can make different views of the database structure.

SQL Datatype
o SQL Datatype is used to define the values that a column can contain.
o Every column is required to have a name and data type in the database table.

Datatype of SQL:
1. Binary Datatypes

There are Three types of binary Datatypes which are given below:

Data Type Description

binary It has a maximum length of 8000 bytes. It contains fixed-length binary data.

varbinary It has a maximum length of 8000 bytes. It contains variable-length binary data.

image It has a maximum length of 2,147,483,647 bytes. It contains variable-length binary data.

2. Approximate Numeric Datatype :

The subtypes are given below:

Data type From To Description

float -1.79E + 308 1.79E + 308 It is used to specify a floating-point value e.g. 6.2, 2.9 etc.

real -3.40e + 38 3.40E + 38 It specifies a single precision floating point number

3. Exact Numeric Datatype

The subtypes are given below:

Data type Description

int It is used to specify an integer value.

smallint It is used to specify small integer value.

bit It has the number of bits to store.

decimal It specifies a numeric value that can have a decimal number.

numeric It is used to specify a numeric value.


4. Character String Datatype

The subtypes are given below:

Play Video
x

Data Description
type

char It has a maximum length of 8000 characters. It contains Fixed-length non-unicode characters.

varchar It has a maximum length of 8000 characters. It contains variable-length non-unicode characters.

text It has a maximum length of 2,147,483,647 characters. It contains variable-length non-unicode


characters.

5. Date and time Datatypes

The subtypes are given below:

Datatype Description

date It is used to store the year, month, and days value.

time It is used to store the hour, minute, and second values.

timestamp It stores the year, month, day, hour, minute, and the second value.

SQL Commands
o SQL commands are instructions. It is used to communicate with the database. It is also used to perform
specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data to tables, drop the table, modify the table,
set permission for users.

Types of SQL Commands


There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
1. Data Definition Language (DDL)
o DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.
o All the command of DDL are auto-committed that means it permanently save all the changes in the
database.

Here are some commands that come under DDL:

o CREATE
o ALTER
o DROP
o TRUNCATE

a. CREATE It is used to create a new table in the database.

Syntax

CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);

Example:

1. CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);

b. DROP: It is used to delete both the structure and record stored in the table.

Syntax
1. DROP TABLE table_name;

Example

1. DROP TABLE EMPLOYEE;

c. ALTER: It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.

Syntax:

To add a new column in the table

1. ALTER TABLE table_name ADD column_name COLUMN-definition;

To modify existing column in the table:

1. ALTER TABLE table_name MODIFY(column_definitions....);

EXAMPLE

1. ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));


2. ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));

d. TRUNCATE: It is used to delete all the rows from the table and free the space containing the table.

Syntax:

1. TRUNCATE TABLE table_name;

Example:

1. TRUNCATE TABLE EMPLOYEE;

2. Data Manipulation Language


o DML commands are used to modify the database. It is responsible for all form of changes in the
database.
o The command of DML is not auto-committed that means it can't permanently save all the changes in
the database. They can be rollback.

Here are some commands that come under DML:

o INSERT
o UPDATE
o DELETE

a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a table.

Syntax:
1. INSERT INTO TABLE_NAME (col1, col2, col3,.... col N) VALUES (value1, value2, value3, .... valueN);

Or

1. INSERT INTO TABLE_NAME VALUES (value1, value2, value3, .... valueN);

For example:

1. INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");

b. UPDATE: This command is used to update or modify the value of a column in the table.

Syntax:

1. UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITION]

For example:

1. UPDATE students SET User_Name = 'Sonoo' WHERE Student_Id = '3'

c. DELETE: It is used to remove one or more row from a table.

Syntax:

1. DELETE FROM table_name [WHERE condition];

For example:

1. DELETE FROM javatpoint WHERE Author="Sonoo";

3. Data Control Language

DCL commands are used to grant and take back authority from any database user.

Here are some commands that come under DCL:

o Grant
o Revoke

a. Grant: It is used to give user access privileges to a database.

Example

1. GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;

b. Revoke: It is used to take back permissions from the user.

Example

1. REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;


4. Transaction Control Language

TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.

These operations are automatically committed in the database that's why they cannot be used while creating
tables or dropping them.

Here are some commands that come under TCL:

o COMMIT
o ROLLBACK
o SAVEPOINT

a. Commit: Commit command is used to save all the transactions to the database.

Syntax:

1. COMMIT;

Example:

1. DELETE FROM CUSTOMERS WHERE AGE = 25;


2. COMMIT;

b. Rollback: Rollback command is used to undo transactions that have not already been saved to the database.

Syntax:

1. ROLLBACK;

Example:

1. DELETE FROM CUSTOMERS WHERE AGE = 25;


2. ROLLBACK;

c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling back the entire
transaction.

Syntax:

1. SAVEPOINT SAVEPOINT_NAME;

5. Data Query Language

DQL is used to fetch the data from the database.

It uses only one command:

o SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to select the attribute
based on the condition described by WHERE clause.

Syntax:

1. SELECT expressions FROM TABLES WHERE conditions;

For example:

1. SELECT emp_name FROM employee WHERE age > 20;

SQL Operator
There are various types of SQL operator:

SQL Arithmetic Operators


Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b' contains 10.

Operator Description Example

+ It adds the value of both operands. a+b will give 30

- It is used to subtract the right-hand operand from the left-hand operand. a-b will give 10

* It is used to multiply the value of both operands. a*b will give


200

/ It is used to divide the left-hand operand by the right-hand operand. a/b will give 2

% It is used to divide the left-hand operand by the right-hand operand and returns a%b will give 0
reminder.

SQL Comparison Operators:


Let's assume 'variable a' and 'variable b'. Here, 'a' contains 20 and 'b' contains 10.

8M

914

HTML Tutorial

Operator Description Example

= It checks if two operands values are equal or not, if the values are queal then (a=b) is not
condition becomes true. true

!= It checks if two operands values are equal or not, if values are not equal, then (a!=b) is true
condition becomes true.

<> It checks if two operands values are equal or not, if values are not equal then (a<>b) is true
condition becomes true.

> It checks if the left operand value is greater than right operand value, if yes then (a>b) is not
condition becomes true. true

< It checks if the left operand value is less than right operand value, if yes then (a<b) is true
condition becomes true.

>= It checks if the left operand value is greater than or equal to the right operand (a>=b) is not
value, if yes then condition becomes true. true

<= It checks if the left operand value is less than or equal to the right operand value, (a<=b) is true
if yes then condition becomes true.

!< It checks if the left operand value is not less than the right operand value, if yes (a!=b) is not
then condition becomes true. true

!> It checks if the left operand value is not greater than the right operand value, if (a!>b) is true
yes then condition becomes true.

SQL Logical Operators


There is the list of logical operator used in SQL:
Operator Description

ALL It compares a value to all values in another value set.

AND It allows the existence of multiple conditions in an SQL statement.

ANY It compares the values in the list according to the condition.

BETWEEN It is used to search for values that are within a set of values.

IN It compares a value to that specified list value.

NOT It reverses the meaning of any logical operator.

OR It combines multiple conditions in SQL statements.

EXISTS It is used to search for the presence of a row in a specified table.

LIKE It compares a value to similar values using wildcard operator.

SQL Table
o SQL Table is a collection of data which is organized in terms of rows and columns. In DBMS, the table
is known as relation and row as a tuple.
o Table is a simple form of data storage. A table is also considered as a convenient representation of
relations.

Let's see an example of the EMPLOYEE table:

EMP_ID EMP_NAME CITY PHONE_NO

1 Kristen Washington 7289201223

2 Anna Franklin 9378282882

3 Jackson Bristol 9264783838

4 Kellan California 7254728346

5 Ashley Hawaii 9638482678


In the above table, "EMPLOYEE" is the table name, "EMP_ID", "EMP_NAME", "CITY", "PHONE_NO" are
the column names. The combination of data of multiple columns forms a row, e.g., 1, "Kristen", "Washington"
and 7289201223 are the data of one row.

Operation on Table
1. Create table
2. Drop table
3. Delete table
4. Rename table

SQL Create Table

SQL create table is used to create a table in the database. To define the table, you should define the name of the
table and also define its columns and column's data type.

Syntax

1. create table "table_name" ("column1" "data type", "column2" "data type", "column3" "data type", ...
"columnN" "data type");

Example

SQL> CREATE TABLE EMPLOYEE ( EMP_ID INT NOT NULL,

1. EMP_NAME VARCHAR (25) NOT NULL,


2. PHONE_NO INT NOT NULL,
3. ADDRESS CHAR (30),
4. PRIMARY KEY (ID)
5. );

If you create the table successfully, you can verify the table by looking at the message by the SQL server. Else
you can use DESC command as follows:

SQL> DESC EMPLOYEE;

Field Type Null Key Default Extra

EMP_ID int(11) NO PRI NULL

EMP_NAME varchar(25) NO NULL

PHONE_NO NO int(11) NULL

ADDRESS YES NULL char(30)

o 4 rows in set (0.35 sec)


Now you have an EMPLOYEE table in the database, and you can use the stored information related to the
employees.

Drop table

A SQL drop table is used to delete a table definition and all the data from a table. When this command is
executed, all the information available in the table is lost forever, so you have to very careful while using this
command.

Syntax

1. DROP TABLE "table_name";

Firstly, you need to verify the EMPLOYEE table using the following command:

1. SQL> DESC EMPLOYEE;


Field Type Null Key Default Extra

EMP_ID int(11) NO PRI NULL

EMP_NAME varchar(25) NO NULL

PHONE_NO NO int(11) NULL

ADDRESS YES NULL char(30)

o 4 rows in set (0.35 sec)

This table shows that EMPLOYEE table is available in the database, so we can drop it as follows:

1. SQL>DROP TABLE EMPLOYEE;

Now, we can check whether the table exists or not using the following command:

1. Query OK, 0 rows affected (0.01 sec)

As this shows that the table is dropped, so it doesn't display it.

SQL DELETE table

In SQL, DELETE statement is used to delete rows from a table. We can use WHERE condition to delete a
specific row from a table. If you want to delete all the records from the table, then you don't need to use the
WHERE clause.

Syntax

1. DELETE FROM table_name WHERE condition;


Example

Suppose, the EMPLOYEE table having the following records:

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

3 Denzel Boston 7353662627 100000

4 Angelina Denver 9232673822 600000

5 Robert Washington 9367238263 350000

6 Christian Los angels 7253847382 260000

The following query will DELETE an employee whose ID is 2.

1. SQL> DELETE FROM EMPLOYEE


2. WHERE EMP_ID = 3;

Now, the EMPLOYEE table would have the following records.

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

4 Angelina Denver 9232673822 600000

5 Robert Washington 9367238263 350000

6 Christian Los angels 7253847382 260000

If you don't specify the WHERE condition, it will remove all the rows from the table.

1. DELETE FROM EMPLOYEE;


Now, the EMPLOYEE table would not have any records.

SQL SELECT Statement


In SQL, the SELECT statement is used to query or retrieve data from a table in the database. The returns data is
stored in a table, and the result table is known as result-set.

Syntax

1. SELECT column1, column2, ...


2. FROM table_name;

Here, the expression is the field name of the table that you want to select data from.

Use the following syntax to select all the fields available in the table:

BeSELECT * FROM table_name;

Example:

EMPLOYEE

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

3 Angelina Denver 9232673822 600000

4 Robert Washington 9367238263 350000

5 Christian Los angels 7253847382 260000

To fetch the EMP_ID of all the employees, use the following query:

1. SELECT EMP_ID FROM EMPLOYEE;

Output

EMP_ID
1

To fetch the EMP_NAME and SALARY, use the following query:

1. SELECT EMP_NAME, SALARY FROM EMPLOYEE;


EMP_NAME SALARY

Kristen 150000

Russell 200000

Angelina 600000

Robert 350000

Christian 260000

To fetch all the fields from the EMPLOYEE table, use the following query:

1. SELECT * FROM EMPLOYEE

Output

EMP_ID EMP_NAME CITY PHONE_NO SALARY

1 Kristen Chicago 9737287378 150000

2 Russell Austin 9262738271 200000

3 Angelina Denver 9232673822 600000


4 Robert Washington 9367238263 350000

5 Christian Los angels 7253847382 260000

SQL INSERT Statement


The SQL INSERT statement is used to insert a single or multiple data in a table. In SQL, You can insert the data
in two ways:

1. Without specifying column name


2. By specifying column name

Sample Table

EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

1. Without specifying column name


If you want to specify all column values, you can specify or ignore the column values.

Syntax

1. INSERT INTO TABLE_NAME


2. VALUES (value1, value2, value 3, .... Value N);

Query

1. INSERT INTO EMPLOYEE VALUES (6, 'Marry', 'Canada', 600000, 48);

Output: After executing this query, the EMPLOYEE table will look like:
EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

2. By specifying column name


To insert partial column values, you must have to specify the column names.

Syntax

1. INSERT INTO TABLE_NAME


2. [(col1, col2, col3,.... col N)]
3. VALUES (value1, value2, value 3, .... Value N);

Query

1. INSERT INTO EMPLOYEE (EMP_ID, EMP_NAME, AGE) VALUES (7, 'Jack', 40);

Output: After executing this query, the table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36


6 Marry Canada 600000 48

7 Jack null null 40

SQL Update Statement


The SQL UPDATE statement is used to modify the data that is already in the database. The condition in the
WHERE clause decides that which row is to be updated.

Syntax

UPDATE table_name

1. SET column1 = value1, column2 = value2, ...


2. WHERE condition;

Sample Table

EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Updating single record


Update the column EMP_NAME and set the value to 'Emma' in the row where SALARY is 500000.

Syntax

1. UPDATE table_name
2. SET column_name = value
3. WHERE condition;
Query

1. UPDATE EMPLOYEE
2. SET EMP_NAME = 'Emma'
3. WHERE SALARY = 500000;

Output: After executing this query, the EMPLOYEE table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Emma Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Updating multiple records


If you want to update multiple columns, you should separate each field assigned with a comma. In the
EMPLOYEE table, update the column EMP_NAME to 'Kevin' and CITY to 'Boston' where EMP_ID is 5.

Syntax

1. UPDATE table_name
2. SET column_name = value1, column_name2 = value2
3. WHERE condition;

Query

1. UPDATE EMPLOYEE
2. SET EMP_NAME = 'Kevin', City = 'Boston'
3. WHERE EMP_ID = 5;

Output

EMP_ID EMP_NAME CITY SALARY AGE


1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Kevin Boston 200000 36

6 Marry Canada 600000 48

Without use of WHERE clause


If you want to update all row from a table, then you don't need to use the WHERE clause. In the EMPLOYEE
table, update the column EMP_NAME as 'Harry'.

Syntax

1. UPDATE table_name
2. SET column_name = value1;

Query

1. UPDATE EMPLOYEE
2. SET EMP_NAME = 'Harry';

Output

EMP_ID EMP_NAME CITY SALARY AGE

1 Harry Chicago 200000 30

2 Harry Austin 300000 26

3 Harry Denver 100000 42

4 Harry Washington 500000 29

5 Harry Los angels 200000 36

6 Harry Canada 600000 48


SQL DELETE Statement
The SQL DELETE statement is used to delete rows from a table. Generally, DELETE statement removes one or
more records form a table.

Syntax

1. DELETE FROM table_name WHERE some_condition;

Sample Table

EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Deleting Single Record


Delete the row from the table EMPLOYEE where EMP_NAME = 'Kristen'. This will delete only the fourth row.

Query

1. DELETE FROM EMPLOYEE


2. WHERE EMP_NAME = 'Kristen';

Output: After executing this query, the EMPLOYEE table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26


3 Christian Denver 100000 42

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Deleting Multiple Record


Delete the row from the EMPLOYEE table where AGE is 30. This will delete two rows(first and third row).

Query

1. DELETE FROM EMPLOYEE WHERE AGE= 30;

Output: After executing this query, the EMPLOYEE table will look like:

EMP_ID EMP_NAME CITY SALARY AGE

2 Robert Austin 300000 26

3 Christian Denver 100000 42

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

Delete all of the records


Delete all the row from the EMPLOYEE table. After this, no records left to display. The EMPLOYEE table will
become empty.

Syntax

1. DELETE * FROM table_name;


2. or
3. DELETE FROM table_name;

Query

1. DELETE FROM EMPLOYEE;

Output: After executing this query, the EMPLOYEE table will look like:
EMP_ID EMP_NAME CITY SALARY AGE

SQL Clauses
The following are the various SQL clauses:

1. GROUP BY
o SQL GROUP BY statement is used to arrange identical data into groups. The GROUP BY statement is
used with the SQL SELECT statement.
o The GROUP BY statement follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.
o The GROUP BY statement is used with aggregation function.

Syntax

1. SELECT column
2. FROM table_name
3. WHERE conditions
4. GROUP BY column
5. ORDER BY column

Sample table:

PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST

Item1 Com1 2 10 20

Item2 Com2 3 25 75

Item3 Com1 2 30 60
Item4 Com3 5 10 50

Item5 Com2 2 20 40

Item6 Cpm1 3 25 75

Item7 Com1 5 30 150

Item8 Com1 3 10 30

Item9 Com2 2 25 50

Item10 Com3 4 30 120

Example:

1. SELECT COMPANY, COUNT(*)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY;

Output:

HAVING
o HAVING clause is used to specify a search condition for a group or an aggregate.
o Having is used in a GROUP BY clause. If you are not using GROUP BY clause then you can use
HAVING function like a WHERE clause.

Syntax:

SELECT column1, column2

1. FROM table_name
2. WHERE conditions
3. GROUP BY column1, column2
4. HAVING conditions
5. ORDER BY column1, column2;

Example:

1. SELECT COMPANY, COUNT(*)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING COUNT(*)>2;
Output:

The ORDER BY clause sorts the result-set in ascending or descending order.

o It sorts the records in ascending order by default. DESC keyword is used to sort the records in
descending order.

Syntax:

1. SELECT column1, column2


2. FROM table_name
3. WHERE condition
4. ORDER BY column1, column2... ASC|DESC;

Where

ASC: It is used to sort the result set in ascending order by expression.

DESC: It sorts the result set in descending order by expression.

Example: Sorting Results in Ascending Order

Table:

CUSTOMER

CUSTOMER_ID NAME ADDRESS

12 Kathrin US

23 David Bangkok

34 Alina Dubai

45 John UK

56 Harry US

Enter the following SQL statement:

1. SELECT *
2. FROM CUSTOMER
3. ORDER BY NAME;

Output:
CUSTOMER_ID NAME ADDRESS

34 Alina Dubai

23 David Bangkok

56 Harry US

45 John UK

12 Kathrin US

Example: Sorting Results in Descending Order

Using the above CUSTOMER table

1. SELECT *
2. FROM CUSTOMER
3. ORDER BY NAME DESC;

Output:

CUSTOMER_ID NAME ADDRESS

12 Kathrin US

45 John UK

56 Harry US

23 David Bangkok

34 Alina Dubai

SQL Aggregate Functions


o SQL aggregation function is used to perform the calculations on multiple rows of a single column of a
table. It returns a single value.
o It is also used to summarize the data.
Types of SQL Aggregation Function

1. COUNT FUNCTION
o COUNT function is used to Count the number of rows in a database table. It can work on both numeric
and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table.
COUNT(*) considers duplicate and Null.

Syntax

1. COUNT(*)
2. or
3. COUNT( [ALL|DISTINCT] expression )

Sample table:

PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST

Item1 Com1 2 10 20

Item2 Com2 3 25 75

Item3 Com1 2 30 60

Item4 Com3 5 10 50

Item5 Com2 2 20 40

Item6 Cpm1 3 25 75

Item7 Com1 5 30 150


Item8 Com1 3 10 30

Item9 Com2 2 25 50

Item10 Com3 4 30 120

Example: COUNT()

eta Is Sued for Allegedly Collecting Patient Health Data Without Consent

1. SELECT COUNT(*)
2. FROM PRODUCT_MAST;

Output:

10

Example: COUNT with WHERE

1. SELECT COUNT(*)
2. FROM PRODUCT_MAST;
3. WHERE RATE>=20;

Output:

Example: COUNT() with DISTINCT

1. SELECT COUNT(DISTINCT COMPANY)


2. FROM PRODUCT_MAST;

Output:

Example: COUNT() with GROUP BY

1. SELECT COMPANY, COUNT(*)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY;

Output:

Com1 5
Com2 3
Com3 2

Example: COUNT() with HAVING


1. SELECT COMPANY, COUNT(*)
2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING COUNT(*)>2;

Output:

Com1 5
Com2 3

2. SUM Function

Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.

Syntax

1. SUM()
2. or
3. SUM( [ALL|DISTINCT] expression )

Example: SUM()

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST;

Output:

670

Example: SUM() with WHERE

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST
3. WHERE QTY>3;

Output:

320

Example: SUM() with GROUP BY

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST
3. WHERE QTY>3
4. GROUP BY COMPANY;

Output:

Com1 150
Com2 170
Example: SUM() with HAVING

1. SELECT COMPANY, SUM(COST)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING SUM(COST)>=170;

Output:

Com1 335
Com3 170

3. AVG function

The AVG function is used to calculate the average value of the numeric type. AVG function returns the average
of all non-Null values.

Syntax

1. AVG()
2. or
3. AVG( [ALL|DISTINCT] expression )

Example:

1. SELECT AVG(COST)
2. FROM PRODUCT_MAST;

Output:

67.00

4. MAX Function

MAX function is used to find the maximum value of a certain column. This function determines the largest
value of all selected values of a column.

Syntax

1. MAX()
2. or
3. MAX( [ALL|DISTINCT] expression )

Example:

1. SELECT MAX(RATE)
2. FROM PRODUCT_MAST;
30
5. MIN Function

MIN function is used to find the minimum value of a certain column. This function determines the smallest
value of all selected values of a column.

Syntax

1. MIN()
2. or
3. MIN( [ALL|DISTINCT] expression )

Example:

1. SELECT MIN(RATE)
2. FROM PRODUCT_MAST;

Output:

10

SQL JOIN
As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to combine two or more
tables".

In SQL, JOIN clause is used to combine the records from two or more tables in a database.

Types of SQL JOIN


1. INNER JOIN
2. LEFT JOIN
3. RIGHT JOIN
4. FULL JOIN

Sample Table

EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42


4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

PROJECT

PROJECT_NO EMP_ID DEPARTMENT

101 1 Testing

102 2 Development

103 3 Designing

104 4 Development

1. INNER JOIN

In SQL, INNER JOIN selects records that have matching values in both tables as long as the condition is
satisfied. It returns the combination of all rows from both the tables where the condition satisfies.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. INNER JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. INNER JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing
Robert Development

Christian Designing

Kristen Development

2. LEFT JOIN

The SQL left join returns all the values from left table and the matching values from the right table. If there is no
matching join value, it will return NULL.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. LEFT JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. LEFT JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

Russell NULL

Marry NULL
3. RIGHT JOIN

In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched values
from the left table. If there is no matching in both tables, it will return NULL.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. RIGHT JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. RIGHT JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

4. FULL JOIN

In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables have all the
records from both tables. It puts NULL on the place of matches not found.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. FULL JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. FULL JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

Russell NULL

Marry NULL

Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with its
corresponding entities is aggregated into a higher level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will never
enquiry about the Course only or just about the Center instead he will ask the enquiry about both.
What is Data
Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text, numbers,
media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.

Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is plural of the word
datum.

In computing, Data is information that can be translated into a form for efficient movement and processing. Data
is interchangeable.

What is Database
A database is an organized collection of data, so that it can be easily accessed and managed.

17.9M
232
SQL CREATE TABLE

You can organize data into tables, rows, columns, and index it to make it easier to find relevant information.

Database handlers create a database in such a way that only one set of software program provides access of data
to all the users.

The main purpose of the database is to operate a large amount of information by storing, retrieving, and
managing data.

There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix, PostgreSQL, SQL
Server, etc.

Modern databases are managed by the database management system (DBMS).

Evolution of Databases
The database has completed more than 50 years of journey of its evolution from flat-file system to relational and
objects relational systems. It has gone through several generations.

The Evolution

File-Based

1968 was the year when File-Based database were introduced. In file-based databases, data was maintained in a
flat file. Though files have many advantages, there are several limitations.

One of the major advantages is that the file system has various access methods, e.g., sequential, indexed, and
random.

It requires extensive programming in a third-generation language such as COBOL, BASIC.

Hierarchical Data Model

1968-1980 was the era of the Hierarchical Database. Prominent hierarchical database model was IBM's first
DBMS. It was called IMS (Information Management System).

In this model, files are related in a parent/child manner.

Below diagram represents Hierarchical Data Model. Small circle represents objects.

Like file system, this model also had some limitations like complex implementation, lack structural independence,
can't easily handle a many-many relationship, etc.

Network data model

Charles Bachman developed the first DBMS at Honeywell called Integrated Data Store (IDS). It was developed
in the early 1960s, but it was standardized in 1971 by the CODASYL group (Conference on Data Systems
Languages).

In this model, files are related as owners and members, like to the common network model.

Network data model identified the following components:

o Network schema (Database organization)


o Sub-schema (views of database per user)
o Data management language (procedural)

This model also had some limitations like system complexity and difficult to design and maintain.

Relational Database

1970 - Present: It is the era of Relational Database and Database Management. In 1970, the relational model was
proposed by E.F. Codd.

Relational database model has two main terminologies called instance and schema.

The instance is a table with rows or columns

Schema specifies the structure like name of the relation, type of each column and name.
This model uses some mathematical concept like set theory and predicate logic.

The first internet database application had been created in 1995.

During the era of the relational database, many more models had introduced like object-oriented model, object-
relational model, etc.

Cloud database

Cloud database facilitates you to store, manage, and retrieve their structured, unstructured data via a cloud
platform. This data is accessible over the Internet. Cloud databases are also called a database as service (DBaaS)
because they are offered as a managed service.

Some best cloud options are:

o AWS (Amazon Web Services)


o Snowflake Computing
o Oracle Database Cloud Services
o Microsoft SQL server
o Google cloud spanner

The Object-Oriented Databases


The object-oriented databases contain data in the form of object and classes. Objects are the real-world entity, and
types are the collection of objects. An object-oriented database is a combination of relational model features with
objects oriented principles. It is an alternative implementation to that of the relational model.

Object-oriented databases hold the rules of object-oriented programming. An object-oriented database


management system is a hybrid application.

The object-oriented database model contains the following properties.

Object-oriented programming properties

o Objects
o Classes
o Inheritance
o Polymorphism
o Encapsulation

Relational database properties

o Atomicity
o Consistency
o Integrity
o Durability
o Concurrency
o Query processing

Graph databases are very useful when the database contains a complex relationship and dynamic schema.

It is mostly used in supply chain management, identifying the source of IP telephony.

DBMS (Data Base Management System)

Database management System is software which is used to store and retrieve the database. For example, Oracle,
MySQL, etc.; these are some popular DBMS tools.

o DBMS provides the interface to perform the various operations like creation, deletion, modification, etc.
o DBMS allows the user to create their databases as per their requirement.
o DBMS accepts the request from the application and provides specific data through the operating system.
o DBMS contains the group of programs which acts according to the user instruction.
o It provides security to the database.

Advantage of DBMS

Controls redundancy

It stores all the data in a single database file, so it can control data redundancy.

Data sharing

An authorized user can share the data among multiple users.

Backup

It providesBackup and recovery subsystem. This recovery system creates automatic data from system failure and
restores data if required.

Multiple user interfaces

It provides a different type of user interfaces like GUI, application interfaces.

Disadvantage of DBMS

Size

It occupies large disk space and large memory to run efficiently.

Cost
DBMS requires a high-speed data processor and larger memory to run DBMS software, so it is costly.

Complexity

DBMS creates additional complexity and requirements.

RDBMS (Relational Database Management System)

The word RDBMS is termed as 'Relational Database Management System.' It is represented as a table that
contains rows and column.

RDBMS is based on the Relational model; it was introduced by E. F. Codd.

A relational database contains the following components:

o Table
o Record/ Tuple
o Field/Column name /Attribute
o Instance
o Schema
o Keys

An RDBMS is a tabular DBMS that maintains the security, integrity, accuracy, and consistency of the data.

Three schema Architecture


o The three schema architecture is also called ANSI/SPARC architecture or three-level architecture.
o This framework is used to describe the structure of a specific database system.
o The three schema architecture is also used to separate the user applications and physical database.
o The three schema architecture contains three-levels. It breaks the database down into three different
categories.

The three-schema architecture is as follows:

In the above diagram:


o It shows the DBMS architecture.
o Mapping is used to transform the request and response between various database levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the request from external level to
conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from the conceptual to internal level.

Objectives of Three schema Architecture


The main objective of three level architecture is to enable multiple users to access the same data with a
personalized view while storing the underlying data only once. Thus it separates the user's view from the
physical structure of the database. This separation is desirable for the following reasons:

o Different users need different views of the same data.


o The approach in which a particular user needs to see the data may change over time.
o The users of the database should not worry about the physical implementation and internal workings of
the database such as data compression and encryption techniques, hashing, optimization of the internal
structures etc.
o All users should be able to access the same data according to their requirements.
o DBA should be able to change the conceptual structure of the database without affecting the user's
o Internal structure of the database should be unaffected by changes to physical aspects of the storage.

1. Internal Level

o The internal level has an internal schema which describes the physical storage structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored in a block.
o The physical level is used to describe complex low-level data structures in detail.

2. Conceptual Level
o The conceptual schema describes the design of a database at the conceptual level. Conceptual level is
also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also describes what
relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data structure are hidden.
o Programmers and database administrators work at this level.

3. External Level

o At the external level, a database contains several schemas that sometimes called as subschema. The
subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is interested and hides the
remaining database from that user group.
o The view schema describes the end user interaction with database systems.

Mapping between Views


The three levels of DBMS architecture don't exist independently of each other. There must be correspondence
between the three levels i.e. how they actually correspond with each other. DBMS is responsible for
correspondence between the three types of schema. This correspondence is called Mapping.

There are basically two types of mapping in the database architecture:

o Conceptual/ Internal Mapping


o External / Conceptual Mapping

Conceptual/ Internal Mapping

The Conceptual/ Internal Mapping lies between the conceptual level and the internal level. Its role is to define
the correspondence between the records and fields of the conceptual level and files and data structures of the
internal level.

External/ Conceptual Mapping

The external/Conceptual Mapping lies between the external level and the Conceptual level. Its role is to define
the correspondence between a particular external and the conceptual view.
Data model Schema and Instance
o The data which is stored in the database at a particular moment of time is called an instance of the
database.
o The overall design of a database is called schema.
o A database schema is the skeleton structure of the database. It represents the logical view of the entire
database.
o A schema contains schema objects like table, foreign key, primary key, views, columns, data types,
stored procedure, etc.
o A database schema can be represented by using the visual diagram. That diagram shows the database
objects and relationship with each other.
o A database schema is designed by the database designers to help programmers whose software will
interact with the database. The process of database creation is called data modeling.

A schema diagram can display only some aspects of a schema like the name of record type, data type, and
constraints. Other aspects can't be specified through the schema diagram. For example, the given figure neither
show the data type of each data item nor the relationship among various files.

In the database, actual data changes quite frequently. For example, in the given figure, the database changes
whenever we add a new grade or add a student. The data at a particular moment of time is called the instance of
the database.

Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one level of the database
system without altering the schema at the next higher level.

There are two types of data independence:

1. Logical Data Independence


o Logical data independence refers characteristic of being able to change the conceptual schema without
having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the data would not be
affected.
o Logical data independence occurs at the user interface level.

2. Physical Data Independence


o Physical data independence can be defined as the capacity to change the internal schema without
having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the Conceptual structure of
the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Physical data independence occurs at the logical interface level.

Components of DBMS
Hardware, Software, Data, Database Access Language, Procedures and Users all together form the components
of a DBMS.
Let us discuss the components one by one clearly.
Hardware
The hardware is the actual computer system used for keeping and accessing the database. The conventional
DBMS hardware consists of secondary storage devices such as hard disks. Databases run on the range of
machines from micro computers to mainframes.

Software
Software is the actual DBMS between the physical database and the users of the system. All the requests from
the user for accessing the database are handled by DBMS.
Data
It is an important component of the database management system. The main task of DBMS is to process the
data. Databases are used to store the data, retrieved, and updated to and from the databases.
Users
There are a number of users who can access or retrieve the data on demand using the application and the
interfaces provided by the DBMS.
The users of the database can be classified into different groups −
 Native Users
 Online Users
 Sophisticated Users
 Specialized Users
 Application Users
 DBA- Database Administrator
The components of DBMS are given below in pictorial form −

ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to
define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to design view
of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship diagram.

For example, Suppose we design a school database. In this database, the student will be an entity with attributes
like address, name, id, age, etc. The address can be another entity with attributes like city, street name, pin code,
etc and there will be a relationship between them.

Component of ER Diagram
1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken as an entity.

a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute
of its own. The weak entity is represented by a double rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key. The key
attribute is represented by an ellipse with the text underlined.

b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The composite attribute
is represented by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The double
oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.
d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a
dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like Date of birth.

3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.

Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the right associates
with the relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right associates
with the relationship then it is known as a many-to-one relationship.

For example, Student enrols for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

Notation of ER diagram
Database can be represented using the notations. In ER diagram, many notations are used to express the
cardinality. These notations are as follows:
ER DIAGRAM EAMPLE
Unit-III

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists between the primary key a nd non-
key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the Emp_Id, we can
tell that employee name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency

1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.

2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:
1. ID → Name,
2. Name → DOB

Types of dependencies
Dependencies in DBMS is a relation between two or more attributes. It has the following types in DBMS −

 Functional Dependency
 Fully-Functional Dependency
 Transitive Dependency
 Multivalued Dependency
 Partial Dependency
Let us start with Functional Dependency −

Functional Dependency
If the information stored in a table can uniquely determine another information in the same table, then it is called Function al
Dependency. Consider it as an association between two attributes of the same relation.
If P functionally determines Q, then
P -> Q
Let us see an example −
<Employee>
EmpID EmpName EmpAge
E01 Amit 28
E02 Rohit 31
In the above table, EmpName is functionally dependent on EmpID because EmpName can take only one value for the given
value of EmpID:
EmpID -> EmpName
The same is displayed below −

Fully-functionally Dependency

An attribute is fully functional dependent on another attribute, if it is Functionally Dependent on that attribute and not on any of its
proper subset.
For example, an attribute Q is fully functional dependent on another attribute P, if it is Functionally Dependent on P and not on any
of the proper subset of P.
Let us see an example −
<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000
<EmployeeProject>
EmpID ProjectID Days (spent on the project)
E099 001 320
E056 002 190
The above relations states:
EmpID, ProjectID, ProjectCost -> Days
However, it is not fully functional dependent.
Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on the project by the employee.
This summarizes and gives our fully functional dependency −
{EmpID, ProjectID} -> (Days)

Transitive Dependency
When an indirect relationship causes functional dependency it is called Transitive Dependency.
If P -> Q and Q -> R is true, then P-> R is a transitive dependency.

Multivalued Dependency

When existence of one or more rows in a table implies one or more other rows in the same table, then the Multi-valued
dependencies occur.
If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.
It is represented by double arrow −
->->
For our example:
P->->QQ->->R
In the above case, Multivalued Dependency exists only if Q and R are independent attributes.

Partial Dependency

Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a candidate key.
The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −
<StudentProject>
StudentID ProjectNo StudentName ProjectName
S01 199 Katie Geo Location
S02 120 Ollie Cluster Exploration
In the above table, we have partial dependency; let us see how −
The prime key attributes are StudentID and ProjectNo.
As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally dependent on part of a candidate
key, to be Partial Dependent.
The StudentName can be determined by StudentID that makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial Dependent.

Armstrong's axioms for FD's

Inference Rule (IR):


o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies on a relational database.
o The inference rule is a type of assertion. It can apply to a set of FD(functional dependency) to derive other FD.
o Using the inference rule, we can derive additional functional dependency from the initial set.

The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1)


In the reflexive rule, if Y is a subset of X, then X determines Y.

1. If X ⊇ Y then X → Y

Example:

1. X = {a, b, c, d, e}
2. Y = {a, b, c}

2. Augmentation Rule (IR2)


The augmentation is also called as a partial dependency. In augmentation, if X determines Y, then XZ determines YZ for any Z.
OOPs Concepts in Jav
1. If X → Y then XZ → YZ

Example:

1. For R(ABCD), if A → B then AC → BC

3. Transitive Rule (IR3)


In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.

1. If X → Y and Y → Z then X → Z

4. Union Rule (IR4)


Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.

1. If X → Y and X → Z then X → YZ

Proof:

1.X→Y(given)
2.X→Z(given)
3.X→XY(using IR 2 on 1 by augmentation with X. Where XX = X)
4.XY→YZ(using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)

5. Decomposition Rule (IR5)


Decomposition rule is also known as project rule. It is the reverse of union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.

1. If X → YZ then X → Y and X → Z

Proof:

1.X→YZ(given)
2.YZ→Y(usingIR1 Rule)
3. X → Y (using IR 3 on 1 and 2)

6. Pseudo transitive Rule (IR6)


In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.

1. If X → Y and YZ → W then XZ → W

Proof:

1.X→Y(given)
2.WY→Z(given)
3.WX→WY(usingIR2 on1by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

Closure of a Set of Functional Dependencies


1. We need to consider all functional dependencies that hold. Given a set F of functional dependencies, we can prove that
certain other ones also hold. We say these ones are logically implied by F.
2. Suppose we are given a relation scheme R=(A,B,C,G,H,I), and the set of functional dependencies:
3. A B
4. A C
5. CG H
6. CG I
7. B H

Then the functional dependency is logically implied.

8. To see why, let and be tuples such that

9.

As we are given A B , it follows that we must also have

Further, since we also have B H , we must also have

Thus, whenever two tuples have the same value on A, they must also have the same value on H, and we can say
that A H.

10. The closure of a set F of functional dependencies is the set of all functional dependencies logically implied by F.
11. We denote the closure of F by .
12. To compute , we can use some rules of inference called Armstrong's Axioms:

o Reflexivity rule: if is a set of attributes and , then holds.

o Augmentation rule: if holds, and is a set of attributes, then holds.

o Transitivity rule: if holds, and holds, then holds.


13. These rules are sound because they do not generate any incorrect functional dependencies. They are also complete as they
generate all of .
14. To make life easier we can use some additional rules, derivable from Armstrong's Axioms:

o Union rule: if and , then holds.

o Decomposition rule: if holds, then and both hold.

o Pseudo transitivity rule: if holds, and holds, then holds.


15. Applying these rules to the scheme and set F mentioned above, we can derive the following:
o A H, as we saw by the transitivity rule.
o CG HI by the union rule.
o AG I by several steps:
 Note that A C holds.
 Then AG CG , by the augmentation rule.
 Now by transitivity, AG I

Minimal Cover
16. A minimal cover is a simplified and reduced version of the given set of functional dependencies.
Since it is a reduced version, it is also called as Irreducible set.
It is also called as Canonical Cover.
17. Steps to Find Minimal Cover
18. 1) Split the right-hand attributes of all FDs.
Example
A->XY => A->X, A->Y
19. 2) Remove all redundant FDs.
Example
{ A->B, B->C, A->C }
Here A->C is redundant since it can already be achieved using the Transitivity Property.
20. 3) Find the Extraneous attribute and remove it.
Example
AB->C, either A or B or none can be extraneous.
If A closure contains B then B is extraneous and it can be removed.
If B closure contains A then A is extraneous and it can be removed.
21. Example 1
Minimize {A->C, AC->D, E->H, E->AD}
22. Step 1: {A->C, AC->D, E->H, E->A, E->D}
23. Step 2: {A->C, AC->D, E->H, E->A}
Here Redundant FD : {E->D}
24. Step 3: {AC->D}
{A}+ = {A,C}
Therefore C is extraneous and is removed.
{A->D}
25. Minimal Cover = {A->C, A->D, E->H, E->A}
26. Example 2
Minimize {AB->C, D->E, AB->E, E->C}
27. Step 1: {AB->C, D->E, AB->E, E->C}
28. Step 2: {D->E, AB->E, E->C}
Here Redundant FD = {AB->C}
29. Step 3: {AB->E}
{A}+ = {A}
{B}+ = {B}
There is no extraneous attribute.
30. Therefore, Minimal cover = {D->E, AB->E, E->C}

Normalization
It is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Upd ate and
Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships. The p urpose of
Normalisation in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.

Types of Anomalies
Following are the types of anomalies that make the table inconsistency, loss of integrity, and redundant data.

1. Data redundancy occurs in a relational database when two or more rows or columns have the same value or repetitive value
leading to unnecessary utilization of the memory. M

Student Table:

StudRegistration CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

224 6247 Trent Bolt New York Mathematics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts

There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive when we enter a new Cours eID.
Hence it repeats the studRegistration, StudName and address attributes.

2. Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data items are to be inserted into
the database without existence of other attributes. For example, In the Student table, if we want to insert a new courseID, we need to
wait until the student enrolled in a course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion
anomalies.

3. Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all instances. Hence, it
makes our data or table inconsistent state. For example, suppose there is a student 'James' who belongs to Student table. If we want
to update the course in the Student, we need to update the same in the course table; otherwise, the data can be inconsistent. And it
reflects the changes in a table with updated values where some of them will not.
4. Delete Anomalies: An anomaly occurs in a database table when some records are lost or deleted from the database table due to
the deletion of other records. For example, if we want to remove Trent Bolt from the Student table, it also removes his address,
course and other details from the Student table. Therefore, we can say that deleting some attributes can remove other attribu tes of
the database table.

So, we need to avoid these types of anomalies from the tables and maintain the integrity, accuracy of the database table. Therefore,
we use the normalization concept in the database management system.

Types of Normalization

1. First Normal Form (1NF)


2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce and Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF)

First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.


o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)

o In the 2NF, relational must be in 1NF.


o In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more
than one subject.
TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate
key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)

o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X →
Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime
attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal
form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as
a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)

o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi -valued
dependency.

Example

STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship
between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket
59 Hockey

Fifth normal form (5NF)

o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this
case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave
Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER
Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen
Transaction
A transaction is an action or series of actions that are being performed by a single user or
application program, which reads or updates the contents of the database.

A transaction can be defined as a logical unit of work on the database. This may be an
entire program, a piece of a program, or a single command (like the SQL commands such
as INSERT or UPDATE), and it may engage in any number of operations on the
database. In the database context, the execution of an application program can be thought
of as one or more transactions with non-database processing taking place in between.

Transaction property
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's
account. This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:
Following are the main operations of transaction:

3.1M

Large Hadron Collider Discovers 3 New Kinds of Particles


Read(X): Read operation is used to read the value of X from the database and stores it in
a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following
operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain
3500.
o The third operation will write the buffer's value to the database. So X's final value
will be 3500.

But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after executing
operation 2 then X's value will remain 4000 in the database which is not acceptable by the
bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Property of Transaction(ACID Property)


1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the
transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.

Atomicity involves the following two operations:

Abort: If a transaction aborts then all the changes made are not visible.

Commit: If a transaction commits then all the changes made are visible.

Example: Let's assume that following transaction T consisting of T1 and T2. A consists
of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.

T1 T2

Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)

After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.


If the transaction T fails after the completion of transaction T1 but before completion of
transaction T2, then the amount will be deducted from A but not added to B. This shows
the inconsistent database state. In order to ensure correctness of database state, the
transaction must be executed in entirety.

Consistency
o The integrity constraints are maintained so that the database is consistent before
and after the transaction.
o The execution of a transaction will leave a database in either its prior stable state
or a new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to
another consistent state.

For example: The total amount must be maintained before or after the transaction.

1. Total before T occurs = 600+300=900


2. Total after T occurs= 500+400=900

Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.

Isolation
o It shows that the data which is used at the time of execution of a transaction cannot
be used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then
that data item can't be accessed by any other transaction T2 until the transaction
T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.

Durability
o The durability property is used to indicate the performance of the database's
consistent state. It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the
system failure. When a transaction is completed, then the database reaches a state
known as the consistent state. That consistent state cannot be lost, even in the
event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability
property.

States of Transaction
In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.

Partially committed
o In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is
executed in this state.

Committed
A transaction is said to be in a committed state if it executes all its operations
successfully. In this state, all the effects are now permanently saved on the database
system.
Failed state
o If any of the checks made by the database recovery system fails, then the
transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query
to fetch the marks, then the transaction will fail to execute.

Aborted
o If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous
consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the
two operations:
1. Re-start the transaction
2. Kill the transaction

Log-Based Recovery
o The log is a sequence of records. Log of each transaction is maintained in some
stable storage so that if any failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction is
applied in the database.
Let's assume there is a transaction to modify the City of a student. The following logs are
written for this transaction.
o When the transaction is initiated, then it writes 'start' log.

1. <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another
log is written to the file.
1. < Tn, City, 'Noida', 'Bangalore' >
o When the transaction is finished, then it writes another log to indicate the end of
the transaction.

<Tn, Commit>
There are two approaches to modify the database:
1. Deferred database modification:
o The deferred modification technique occurs if the transaction does not modify the
database until it has committed.
o In this method, all the logs are created and stored in the stable storage, and the
database is updated when a transaction commits.
2. Immediate database modification:
o The Immediate modification technique occurs if database modification occurs
while the transaction is still active.
o In this technique, the database is modified immediately after every operation. It
follows an actual database modification.
Recovery using Log records
When the system is crashed, then the system consults the log to find which transactions
need to be undone and which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>,
then the Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti,
commit> or <Ti, abort>, then the Transaction Ti needs to be undone.

Checkpoint
o The checkpoint is a type of mechanism where all the previous logs are removed
from the system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then
the log file is updated with the new step of transaction till next checkpoint and so
on.
o The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.

Recovery using Checkpoint


In the following manner, a recovery system recovers the database from this failure:
o The recovery system reads log files from the end to start. It reads log files from T4
to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their
previous list, all the transactions are removed and then redone before saving their
logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and
<Tn, Commit>. The T1 transaction will have only <Tn, commit> in the log file.
That's why the transaction is committed after the checkpoint is crossed. Hence it
puts T1, T2 and T3 transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log with <Tn,
Start> but no commit or abort log found. In the undo-list, all the transactions are
undone, and their logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo
list since this transaction is not yet complete and failed amid.

DBMS Concurrency Control


Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent
execution.
Concurrent Execution in DBMS
o In a multi-user system, multiple users can access and use the same database at one
time, which is known as the concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user system by different
users.
o While working on the database transactions, there occurs the requirement of using
the database by multiple users for performing different operations, and in that
case, concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in
an interleaved manner, and no operation should affect the other executing
operations, thus maintaining the consistency of the database. Thus, on making the
concurrent execution of the transaction operations, there occur several challenging
problems that need to be solved.

Problems with Concurrent Execution


In a database transaction, the two main operations are READ and WRITE operations.
So, there is a need to manage these two operations in the concurrent execution of the
transactions as if these operations are not performed in an interleaved manner, and the
data may become inconsistent. So, the following problems occur with the Concurrent
Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)


The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent
execution) that makes the values of the items incorrect hence making the database
inconsistent. Play Vid

For example:

Consider the below diagram where two transactions TX and TY, are performed on the
same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be
$300 only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added
but not updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as
$250 only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write
as done at time t4 that will be $400. It means the value written by TX is lost, i.e.,
$250 is lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)


The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database
item is accessed by another transaction. There comes the Read-Write Conflict between
both transactions.

For example:

Consider two transactions TX and TY in the below diagram performing read/write


operations on account A where the available balance in account A is $300:

o At time t1, transaction TX reads the value of account A, i.e., $300.


o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value
changes back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which
is the dirty read and therefore known as the Dirty Read Problem.

Unrepeatable Read Problem (W-R Conflict)


Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.

For example:

Consider two transactions, TX and TY, performing the read/write operations on


account A, having an available balance = $300. The diagram is shown below:

o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and
that will be read as $400.
o It means that within the same transaction TX, it reads two different values of
account A, i.e., $ 300 initially, and after updation made by transaction TY, it reads
$400. It is an unrepeatable read and is therefore known as the Unrepeatable read
problem.

Thus, in order to maintain consistency in the database and avoid such problems that take
place in concurrent execution, management is needed, and that is where the concept of
Concurrency Control comes into role.

Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing
the concurrent execution of database operations and thus avoiding the inconsistencies in
the database. Thus, for maintaining the concurrency of the database, we have the
concurrency control protocols.

Concurrency Control Protocols


The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorized as:

o Lock Based Concurrency Control Protocol


o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
We will understand and discuss each protocol one by one in our next sections

Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read
by the transaction.
o It can be shared between the transactions because when the transaction holds a
lock, then it can't update the data on the data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the
same data simultaneously.
There are four types of lock protocols available:
1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based
protocols allow all the transactions to get the lock on the data before insert or delete or
update on it. It will unlock the data item after completing the transaction.

2. Pre-claiming Lock Protocol


o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock
on all those data items.
o If all the locks are granted then this protocol allows the transaction to begin. When
the transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls
back and waits until all the locks are granted.

3. Two-phase locking (2PL)


o The two-phase locking protocol divides the execution phase of the transaction into
three parts.
o In the first part, when the execution of the transaction starts, it seeks permission
for the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started
as soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.
There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data item may be acquired by
the transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.

In the below example, if lock conversion is allowed then the following phase can happen:

1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.


2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.

Example:

The following way shows how unlocking and locking work with 2-PL.

Transaction T1:

o Growing phase: from step 1-3


o Shrinking phase: from step 5-7
o Lock point: at 3

Transaction T2:

o Growing phase: from step 2-6


o Shrinking phase: from step 8-9
o Lock point: at 6

4. Strict Two-phase locking (Strict-2PL)


o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all
the locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release
a lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.

Timestamp Ordering Protocol


o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or
logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs
among transactions at the execution time. But Timestamp based protocols start
working as soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009
times. T1 has the higher priority, so it executes first as it is entered the system
first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and
'write' operation on a data.

Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:

o If W_TS(X) >TS(Ti) then the operation is rejected.


o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.

2. Check the following condition whenever a transaction Ti issues a Write(X) operation:

o If TS(Ti) < R_TS(X) then the operation is rejected.


o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise
the operation is executed.

Where,

TS(TI) denotes the timestamp of the transaction Ti.

R_TS(X) denotes the Read time-stamp of data-item X.

W_TS(X) denotes the Write time-stamp of data-item X.

Advantages and Disadvantages of TO protocol:


o TO protocol ensures serializability since the precedence graph is as follows:
o TS protocol ensures freedom from deadlock that means no transaction ever waits.
o But the schedule may not be recoverable and may not even be cascade- free.

Validation Based Protocol


Validation phase is also known as optimistic concurrency control technique. In the
validation based protocol, the transaction is executed in the following three phases:

1. Read phase: In this phase, the transaction T is read and executed. It is used to
read the value of various data items and stores them in temporary local variables.
It can perform all the write operations on temporary variables without an update to
the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary
results are written to the database or system otherwise the transaction is rolled
back.

Here each phase has the following different timestamps:

Start(Ti): It contains the time when Ti started its execution.

Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.Play Video

Finish(Ti): It contains the time when Ti finishes its write phase.

o This protocol is used to determine the time stamp for the transaction for
serialization using the time stamp of the validation phase, as it is the actual phase
which determines if the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be decided
in advance.
o While executing the transaction, it ensures a greater degree of concurrency and
also less number of conflicts.
o Thus it contains transactions which have less number of rollbacks.

UNDO/REDO recovery algorithm


Undo/Redo is a database transaction log for handling transaction crash recovery. This
algorithm stores all the values and changes made during transactions in a separate
memory in case of failure or crash. It utilizes the stored value to restore the loss due
to failure.

This algorithm is a combination of two approaches

UNDO: It stands for undone and restores the data value items that are updated by any
transaction to their previous value.

REDO: It stands for re-done and it set the value of all the data updated by the
transaction to the new value.

Now let’s understand the algorithm using a simple example

Let a transaction T perform the following set of operations on the data items X, Y,
and Z.

Read(X)
Read(Y)
Update X=X+Y
Write(X)
Commit
Update Y=Y-100
Write(Y)
Commit

Now, if, during the transaction execution, the statement “Update X+Y” suffers from
failure, then the UNDO operation will perform, and it restores the value of “X” and
then starts the transaction again.

Suppose the statement “commit” fails during the transaction execution. In that case,
the REDO operation will be performed, which again tries to execute the statement
commit and reset the new value of X.

The UNDO/REDO recovery algorithm is a very flexible algorithm but the only
disadvantage it faces is that it requires more storage to store both old as well as newly
updated values.
Indexing
Indexing is used to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed.

o The index is a type of data structure. It is used to locate and access the data in a
database table quickly.

Index structure:
Indexes can be created using some database columns.

o The first column of the database is the search key that contains a copy of the primary
key or candidate key of the table. The values of the primary key are stored in sorted
order so that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be
found.

Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are
known as ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is
10 bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with
ID-543.

o In the case of a database with no index, we have to search the disk block from starting
till it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the
record after reading 542*2= 1084 bytes which are very less compared to the previous
case.

Primary Index

o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1
relation between the records.
o As primary keys are stored in sorted order, the performance of the searching operation
is quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.

Dense index

o The dense index contains an index record for every search key value in the data file. It
makes searching faster.
o In this, the number of records in the index table is same as the number of records in
the main table.
o It needs more space to store index record itself. The index records have the search key
and a pointer to the actual record on the disk.
Sparse index

o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.

Clustering Index

o A clustered index can be defined as an ordered data file. Sometimes the index is
created on non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get
the unique value and create index out of them. This method is called a clustering
index.
o The records which have similar characteristics are grouped, and indexes are created
for these group.

Example: suppose a company contains several employees in each department. Suppose we


use a clustering index, where all employees which belong to the same Dept_ID are
considered within a single cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.

Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows. These
mappings are usually kept in the primary memory so that address fetch should be faster. Then
the secondary memory searches the actual data based on the address got from mapping. If the
mapping size grows then fetching the address itself becomes slower. In this case, the sparse
index will not be efficient. To overcome this problem, secondary indexing is introduced.

In secondary indexing, to reduce the size of mapping, another level of indexing is introduced.
In this method, the huge range for the columns is selected initially so that the mapping size of
the first level becomes small. Then each range is further divided into smaller ranges. The
mapping of the first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary memory (hard disk).

For example:

o If you want to find the record of roll 111 in the diagram, then it will search the highest
entry which is smaller than or equal to 111 in the first level index. It will get 100 at
this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now
using the address 110, it goes to the data block and starts searching each record till it
gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is
also done in the same manner.

File Organization
o The File is a collection of records. Using the primary key, we can access the records.
The type and frequency of access can be determined by the type of file organization
which was used for a given set of records.
o File organization is a logical relationship among various records. This method defines
how file records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms
of blocks, and the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store
only one fixed length record in any given file. An alternative approach is to structure
our files so that we can contain multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length
records.

Objective of file organization


o It contains an optimal selection of records, i.e., records can be selected as fast as
possible.
o To perform insert, delete or update transaction on the records should be quick and
easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.

Types of file organization:


File organization contains various methods. These particular methods have pros and cons on
the basis of access or selection. In the file organization, the programmer decides the best-
suited file organization method according to his requirement.

Types of file organization are as follows:

o Sequential file organization


o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization
Sequential File Organization
This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:

1. Pile File Method:


o It is a quite simple method. In this method, we store the record in a sequence, i.e., one
after another. Here, the record will be inserted in the order in which they are inserted
into tables.
o In case of updating or deleting of any record, the record will be searched in the
memory blocks. When it is found, then it will be marked for deleting, and the new
record is inserted.

Insertion of the new record:


Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence,
records are nothing but a row in the table. Suppose we want to insert a new record R2 in the
sequence, then it will be placed at the end of the file. Here, records are nothing but a row in
any table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then it will sort
the sequence in ascending or descending order. Sorting of records is based on any
primary key or any other key.
o In the case of modification of any record, it will update the record and then sort the
file, and lastly, the updated record is placed in the right place.
Insertion of the new record:
Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and
R7. Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the
end of the file, and then it will sort the sequence.

Pros of sequential file organization

o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic
tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade
calculation of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization

o It will waste time as we cannot jump on a particular record that is required but we
have to move sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

Heap file organization


o It is the simplest and most basic type of organization. It works with data blocks. In
heap file organization, the records are inserted at the file's end. When the records are
inserted, it doesn't require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new
data block need not to be the very next data block, but it can select any data block in
the memory to store new records. The heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It
is the DBMS responsibility to store and manage the new records.

Insertion of a new record


Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert
a new record R2 in a heap. If the data block 3 is full then it will be inserted in any of the
database selected by the DBMS, let's say data block 1.
If we want to search, update or delete the data in heap file organization, then we need to
traverse the data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-
consuming because there is no sorting or ordering of records. In the heap file organization,
we need to check all the data until we get the requested record.

Pros of Heap file organization


o It is a very good method of file organization for bulk insertion. If there is a large
number of data which needs to load into the database at a time, then this method is
best suited.
o In case of a small database, fetching and retrieving of records is faster than the
sequential record.

Cons of Heap file organization


o This method is inefficient for the large database because it takes time to search or
modify the record.
o

o This method is inefficient for large databases.

o Hash File Organization


o Hash File Organization uses the computation of hash function on some fields of the
records. The hash function's output determines the location of disk block where the
records are to be placed.

o
o When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way,
when a new record has to be inserted, then the address is generated using the hash key
and record is directly inserted. The same process is applied in the case of delete and
update.
o In this method, there is no effort for searching and sorting the entire file. In this
method, each record will be stored randomly in the memory.
o

B+ File Organization
o B+ tree file organization is the advanced method of an indexed sequential access
method. It uses a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the
records. For each primary key, the value of the index is generated and mapped with
the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two
children. In this method, all the records are stored only at the leaf node. Intermediate
nodes act as a pointer to the leaf nodes. They do not contain any records.

The above B+ tree shows that:


o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual record. They
have only pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to
the right contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and
accessed easily.

Pros of B+ tree file organization


o In this method, searching becomes very easy as all the records are stored only in the
leaf nodes and sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records can increase or
decrease and the B+ tree structure can also grow or shrink.
o It is a balanced tree structure, and any insert/update/delete does not affect the
performance of tree.

Cons of B+ tree file organization


o This method is inefficient for the static method.

Cluster file organization


o When the two or more records are stored in the same file, it is known as clusters.
These files will have two or more tables in the same data block, and key attributes
which are used to map these tables together are stored only once.
o This method reduces the cost of searching for various records in different files.
o The cluster file organization is used when there is a frequent need for joining the
tables with the same condition. These joins will give only a few records from both
tables. In the given example, we are retrieving the record for only particular
departments. This method can't be used to retrieve the record for the entire
department.
In this method, we can directly insert, update or delete any record. Data is sorted based on the
key with which searching is done. Cluster key is a type of key with which joining of the table
is performed.

Types of Cluster file organization:


Cluster file organization is of two types:

1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The
above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster.
Here, all the records are grouped based on the cluster key- DEP_ID and all the records are
grouped.

2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the
cluster key, we generate the value of the hash key for the cluster key and store the records
with the same hash key value.

Pros of Cluster file organization


o The cluster file organization is used when there is a frequent request for joining the
tables with same joining condition.
o It provides the efficient result when there is a 1:M mapping between the tables.

Cons of Cluster file organization


o This method has the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use. If we change
the condition of joining then traversing the file takes a lot of time.
o This method is not suitable for a table with a 1:1 condition.

Indexed sequential access method (ISAM)


ISAM method is an advanced sequential file organization. In this method, records are stored
in the file using the primary key. An index value is generated for each primary key and
mapped with the record. This index contains the address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.

Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index
is based on the primary key values, we can retrieve the data for the given range of
value. In the same way, the partial value can also be easily searched, i.e., the student
name starting with 'JA' can be easily searched.

Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to
maintain the sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.

Hashing
In a huge database structure, it is very inefficient to search all the index values and reach the
desired data. Hashing technique is used to calculate the direct location of a data record on the
disk without using index structure.

In this technique, data is stored at the data blocks whose address is generated by using the
hashing function. The memory location where these records are stored is known as data
bucket or data blocks.

In this, a hash function can choose any of the column value to generate the address. Most of
the time, the hash function uses the primary key to generate the address of the data block. A
hash function is a simple mathematical function to any complex mathematical function. We
can even consider the primary key itself as the address of the data block. That means each
row whose address will be the same as a primary key stored in the data block.

The above diagram shows data block addresses same as primary key value. This hash
function can also be a simple mathematical function like exponential, mod, cos, sin, etc.
Suppose we have mod (5) hash function to determine the address of the data block. In this
case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and 2
respectively, and records are stored in those data block addresses.
Types of Hashing:

o Static Hashing
o Dynamic Hashing

Static Hashing
In static hashing, the resultant data bucket address will always be the same. That means if we
generate an address for EMP_ID =103 using the hash function mod (5) then it will always
result in same bucket address 3. Here, there will be no change in the bucket address.

Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to store the
data.
Operations of Static Hashing
o Searching a record
When a record needs to be searched, then the same hash function retrieves the address of the
bucket where the data is stored.
o Insert a Record
When a new record is inserted into the table, then we will generate an address for a new
record based on the hash key and record is stored in that location.
o Delete a Record
To delete a record, we will first fetch the record which is supposed to be deleted. Then we
will delete the records for that address in memory.
o Update a Record
To update a record, we will first search it using a hash function, and then the data record is
updated.
If we want to insert some new record into the file but the address of a data bucket generated
by the hash function is not empty, or data already exists in that address. This situation in the
static hashing is known as bucket overflow. This is a critical situation in this method.

To overcome this situation, there are various methods. Some commonly used methods are as
follows:

1. Open Hashing
When a hash function generates an address at which data is already stored, then the next
bucket will be allocated to it. This mechanism is called as Linear Probing.

For example: suppose R3 is a new address which needs to be inserted, the hash function
generates address as 112 for R3. But the generated address is already full. So the system
searches next available data bucket, 113 and assigns R3 to it.

2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is
linked after the previous one. This mechanism is known as Overflow chaining.
For example: Suppose R3 is a new address which needs to be inserted into the table, the
hash function generates address as 110 for it. But this bucket is full to store the new data. In
this case, a new bucket is inserted at the end of 110 buckets and is linked to it.

Dynamic Hashing
o The dynamic hashing method is used to overcome the problems of static hashing like
bucket overflow.
o In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion without
resulting in poor performance.

How to search a key


o First, calculate the hash address of the key.
o Check how many bits are used in the directory, and these bits are called as i.
o Take the least significant i bits of the hash address. This gives an index of the
directory.
o Now using the index, go to the directory and find bucket address where the record
might be.

How to insert a new record


o Firstly, you have to follow the same procedure for retrieval, ending up in some
bucket.
o If there is still space in that bucket, then place the record in it.
o If the bucket is full, then we will split the bucket and redistribute the records.
For example:
Consider the following grouping of keys into buckets, depending on the prefix of their hash
address:

The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6
are 01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into
bucket B2. The last two bits of 7 are 11, so it will go into B3.

Insert key 9 with hash address 10001 into the above


structure:
o Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is
full, so it will get split.
o The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go
into bucket B1, and the last three bits of 6 are 101, so it will go into bucket B5.
o Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry
because last two bits of both the entry are 00.
o Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry
because last two bits of both the entry are 10.
o Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last
two bits of both the entry are 11.
Advantages of dynamic hashing
o In this method, the performance does not decrease as the data grows in the system. It
simply increases the size of memory to accommodate the data.
o In this method, memory is well utilized as it grows and shrinks with the data. There
will not be any unused memory lying.
o This method is good for the dynamic database where data grows and shrinks
frequently.

Disadvantages of dynamic hashing


o In this method, if the data size increases then the bucket size is also increased. These
addresses of data will be maintained in the bucket address table. This is because the
data address will keep changing as buckets grow and shrink. If there is a huge
increase in data, maintaining the bucket address table becomes tedious.
o In this case, the bucket overflow situation will also occur. But it might take little time
to reach this situation than static hashing.

B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf
nodes remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can
support random access as well as sequential access.

Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is
of the order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.

Internal node

o An internal node of the B+ tree can contain at least n/2 record pointers except the root
node.
o At most, an internal node of the tree contains n pointers.

Leaf node

o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key
values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.

Searching a record in B+ Tree


Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.

So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the
end, we will be redirected to the third leaf node. Here DBMS will perform a sequential search
to find 55.

B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node
after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert
60 there.

In this case, we have to split the leaf node, so that it can be inserted into tree without affecting
the fill factor, balance and order.

63.5M
1.2K
C++ vs Java

The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will
split the leaf node of the tree in the middle so that its balance is not altered. So we can group
(50, 55) and (60, 65, 70) into 2 leaf nodes.

If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have
60 added to it, and then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a normal scenario, it is very
easy to find the node where it fits and then place it in that leaf node.

B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60
from the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify
it to have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as
follows:

You might also like