KEMBAR78
UNIT 1 Notes | PDF | Databases | Data Model
0% found this document useful (0 votes)
68 views74 pages

UNIT 1 Notes

Uploaded by

kirtick.mm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views74 pages

UNIT 1 Notes

Uploaded by

kirtick.mm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

R19CS202 – DATABASE MANAGEMENT SYSTEMS

UNIT – I INTRODUCTION TO DATABASES


Purpose of Database – Types and examples of Databases (RDBMS, NOSQL, In-memory Databases
& Distributed SQL databases)– Relational Database System Architecture - Views of Data– Schema
architecture – Data Independence – Schema and instance- Data Models–Benefits and Phases of
Data Model - ER Diagram - Symbols, Components, Relationships, Weak entities, Attributes,
Cardinality - Extended ER Diagram – Examples- Relational Data Model – Keys - Relational
Algebra-Normalization - 1NF, 2NF, 3NF, BCNF,4NF,5NF

Case Study: ER Diagram on Online Streaming, Movie Ticket Recommendation, Bike Tracking

1.1 Purpose of Database

The Database Management System (DBMS) is defined as a software system that


allows the user to define, create and maintain the database and provide control access to the
data.

The purpose of DBMS is to transform the following −

 Data into information


 Information into knowledge
 Knowledge to the action

File-processing system is supported by a conventional operating


system. The system stores permanent records in various files, and it needs
different application programs to extract records from, and add records to, the appropriate files.
Before database management systems (DBMSs) were introduced, organizations usually stored
information in such systems.
1.1.1 Disadvantages of file processing system:

 Difficulty in accessing data


 Data isolation
 Integrity problems
 Atomicity problems
 Concurrent-access anomalies
 Security problems

1.1.2 Advantages of DBMS

 Data Independence.
 Efficient Data Access.
 Data Integrity and security.
 Data administration.
 Concurrent access and Crash recovery.
 Reduced Application Development Time.

1.2 Types of Databases

There are a few types that are very important and popular.

 Relational Database - a relational database are organized as a set of tables with columns
and rows
 Object-Oriented Database - object-oriented database is represented in the form of
objects
 Distributed Database - consists of two or more files located in different sites. The
database may be stored on multiple computers, located in the same physical location, or
scattered over different networks
 NoSQL Database - A NoSQL, or nonrelational database, allows unstructured and semi
structured data to be stored and manipulated
 Graph Database - stores data in terms of entities and the relationships between entities.
 Cloud Database - collection of data, either structured or unstructured, that resides on a
private, public, or hybrid cloud computing platform.

1.2.1 In Memory Database

 An in-memory database (IMDB) is a computer system that stores and retrieves data
records that reside in a computer’s main memory, e.g., random-access memory (RAM).
 IMDBs can store relational (tabular) data, document data, key-value data, or even a
combination.
 The data and instructions for an in-memory database reside in the main memory, usually
RAM.
 This approach reduces the I/O requests to the disk and improves overall database speed.
 Storing data in memory enables direct access to information and dramatically reduces the
time needed to query data.

Fig - Working of In-Memory Database

Below are some examples of where in-memory stores find applications:

 IoT and edge computing. IoT sensors stream massive amounts of data. An in-memory
database can store and perform calculations using real-time data before sending it to an
on-disk database.
 E-commerce applications. Shopping carts, search results, session management, and
quick page loads are all possible with an in-memory database
 Gaming industry. The gaming industry uses in-memory databases for updating
leaderboards in real-time,
 Real-time security and fraud detection. In-memory databases help perform complex
processing and analytics in real-time.
1.2.2 Distributed Databases:

• A distributed database is a database that is not limited to one computer system. It is like
a database that consists of two or more files located on different computers.
• Distributed databases are needed when a particular data in the database needs to be
accessed by various users globally.

Fig - Distributed Databases

Features of Distributed databases:

• Location independency: Data is independently stored at multiple sites and managed by


independent Distributed database management systems (DDBMS).
• Network linking: All distributed databases in a collection are linked by a network and
communicate with each other.
• Distributed query processing: Distributed query processing is the procedure of
answering queries in a distributed environment.
• Hardware Independent: There is no physical contact between these Distributed
databases , which is accomplished often through virtualization.
• Distributed transaction management: Distributed Database In Dbms provides a
consistent distribution through commit protocols, and distributed recovery methods in
case of many transaction failures.

There are two types of distributed databases:

• Homogenous distributed database.


• Heterogeneous distributed database.
• A Homogenous distributed database is a network of identical databases stored on
multiple sites. All databases store data identically, the operating system, DDBMS, and the
data structures used – all are the same at all sites, making them easy to manage.
• A Heterogeneous distributed database uses different schemas, operating systems,
DDBMS, and different data models making it difficult to manage.
• In the case of a Heterogeneous distributed database, a particular site can be completely
unaware of other sites.

Fig - Homogeneous distributed database

Fig - Heterogeneous Distributed Systems


1.3 Database Tier Architecture

Database management systems are divided into multiple levels of abstraction for proper
functioning. database management system is not always directly accessible by the user or an
application. architectures follow a tier-based classification. an n-tier DBMS
Architecture divides the whole DBMS into related but n independent layers or levels.

 Single Tier Architecture (One-Tier Architecture)


 Two-Tier Architecture
 Three-Tier Architecture

Database applications are usually partitioned into two or three parts. In a two-tier architecture,
the application resides at the client machine, where it invokes database system functionality at
the server machine through query language statements.

Application program interface standards like ODBC and JDBC are used for interaction between
the client and the server. In contrast, in a three-tier architecture, the client machine acts as
merely a front end and does not contain any direct database calls. Instead, the client end
communicates with an application server, usually through a forms interface.

The application server in turn communicates with a database system to access data. The business
logic of the application, which says what actions to carry out under what conditions, is embedded
in the application server, instead of being distributed across multiple clients.

Three-tier applications are more appropriate for large applications, and for applications that run
on the WorldWideWeb.
1.3.1 Database System Architecture

A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components. The storage manager is
important because databases typically require a large amount of storage space. The query
processor is important because it helps the database system simplify and facilitate access to
data.

It is the job of the database system to translate updates and queries written in a
nonprocedural language, at the logical level, into an efficient sequence of operations at the
physical level.

Users are differentiated by the way they expect to interact with the system

⮚ Application programmers – interact with system through DML calls.


⮚ Sophisticated users – form requests in a database query language.
⮚ Specialized users – write specialized database applications that do not fit into the
traditional data processing framework.
⮚ Naïve users – invoke one of the permanent application programs that have been written
previously E.g. people accessing database over the web, bank tellers, clerical staff
Query Processor:

The query processor components include

· DDL interpreter, which interprets DDL statements and records the definitions in the
data dictionary.
· DML compiler, which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all
give the same result. The DML compiler also performs query optimization, that is, it picks the
lowest cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.

Storage Manager:

A storage manager is a program module that provides the interface between the low level
data stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file manager. The raw data
are stored on the disk using the file system, which is usually provided by a conventional operating
system.
The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data
in the database.
The storage manager components include:

· Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
· Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
· File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
· Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than
the size of main memory

Transaction Manager:
A transaction is a collection of operations that performs a single logical function in a
database application.
Each transaction is a unit of both atomicity and consistency. Thus, we require that
transactions do not violate any database-consistency constraints. That is, if the database was
consistent when a transaction started, the database must be consistent when the transaction
successfully terminates.
Transaction - manager ensures that the database remains in a consistent (correct) state
despite system failures (e.g., power failures and operating system crashes) and transaction
failures.

1.4 Views of Data


Abstraction is one of the main features of database systems. Hiding irrelevant details from
user and providing abstract view of data to users, helps in easy and efficient user-
database interaction.
The top level of that architecture is “view level”. The view level provides the “view of
data” to the users and hides the irrelevant details such as data relationship, database
schema, constraints, security etc from the user.
Levels of abstraction:

1. Physical level (internal level) - how the data is stored in the hardware

2. Logical level (conceptual level) - The next-higher level of abstraction describes what
data are stored in the database, and what relationships exist among those data.
3. View level (external level) - The highest level of abstraction describes only part of
the entire database.
Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in
memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers
generally work at this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details
at the screen, they are not aware of how the data is stored and what data is stored; such details
are hidden from them.
1.4.1 Schemas

Schemas are the overall design of a database. For example: An employee table in
database exists with the following attributes:

EMP_NAME EMP_ID EMP_ADDRESS EMP_CONTACT


-------- ------ ----------- -----------

Schema is of three types:

 Physical schema
 logical schema
 view schema

Schema represents the logical view of the database. It helps you understand what data
needs to go where.Schema helps the database users to understand the relationship between
data. Have a schema that shows the relationship between three tables: Course, Student and
Section.
1.4.2 Definition of instance: The data stored in database at a particular moment of time is

called instance of database. Database schema defines the attributes in tables that belong to a
particular database. The value of these attributes at a moment of time is called the instance of
that database.

EMP_NAME
EMP_ID

EMP_ADDRESS EMP_CONTACT
------- ------ ----------- -----------
Chaitanya 101 Noida 95********
Ajeet 102 Delhi 99********

1.4.3 Data Independence

An important objective of the three-tier architecture is to provide data independence,


which means that the upper levels are unaffected by changes in the lower levels. i.e., Data
Independence is used to achieve the changes in physical level without affecting logical level and
vice versa.

There are three levels in database:

1. Physical level / Low level ( Disk storage)


2. Conceptual level ( query / procedure / logics / etc.,)
3. Logical level / View level ( User Interface)

Need of Data Independence

 To improve the quality of data


 Easy maintenance of DBMS
 To achieve database security
 Developer need not be worry about internal structure
 Easily making the changes in physical level to improve the performance
Logical Independence

Logical data independence can be defined as the immunity of the external schemas to
changes in the conceptual schema.

Some of the changes in Logical level

 Add a new attribute in an entity set


 Modify / Delete an attribute
 Merging records
 Splitting records

Physical Independence

Physical data independence can be defined as the immunity of the conceptual schema to
changes in the internal schema.

Some of the changes in Physical level

 Changing the storage devices


 Changing the file organization techniques
 Changing the data structures
 Changing the data access method
 Modifying indexes
 Migrating the Database from one drive to another

Physical data independence logical data independence

Concerned with the storage of Concerned with the structure of


the data. data definition.

Easy to retrieve Difficult to retrieve because of


dependent on logical structure

Easy to achieve, compare with Difficult to achieve, compare with


logical data independence physical data independence

Concerned with physical schema Concerned with logical schema

1.4.4 Data Models

The entire structure of a database can be described using a data model. A data model is
a collection of conceptual tools for describing data, data relationships and consistency constraints.
A data model provides a way to describe the design of a database at the physical, logical, and
view levels as it provides a clear picture of the data making it easier for developers to create a
physical database.
The data models can be classified into four different categories:
 Relational Model
 Entity-Relationship Model
 Object-Based Data Model
 Semi-structured Data Model
 Network Data Model
 Hierarchical Data Model

Relational Model

The database is represented as a collection of relations in the form of rows and


columns of a two-dimensional table. Each row is known as a tuple (a tuple contains all the
data for an individual record) while each column represents an attribute.

1.4.5 Entity-Relationship Model

An Entity-Relationship model is a high-level data model that describes the structure of the
database in a pictorial form which is known as ER-diagram or it is used to represent logical
structure of the database easily. The entity-relationship (E-R) data model uses a collection of
basic objects, called entities and relationships among these objects.
Entities

An entity is a real-life concept. It could be a person, a location, or even an idea. A school


management system, for example, has entities such as teachers, students, courses, buildings,
departments, and so on.
Attributes

An entity contains a real-world property called attribute. This is the characteristics of that
attribute. Example: The entity teacher has the property like teacher id, salary, age, etc

Relationship
Relationship tells how two attributes are related. Example: Teacher works for a

department.

In above example the entities are Teacher and Department.


The attributes are
Teacher entity - Teacher_Name, Teacher_id, Age, Salary, Mobile_Number.
Department entity - Dept_id, Dept_name.
The two entities are connected using the relationship. Here, each teacher works for a
department.

Object-Based Data Model


The object-oriented data model is a combination of object-oriented programming and
relational data model. In this data model, the data and their relationship are represented in a
single structure which is known as an object.

Semi-structured Data Model


The semi-structured model is a modified form of the relational model. The semi-structured data
model allows the data specifications at places where the individual data items of the same type
may have different sets of attributes. In this model, some entities may have missing attributes
while others may have an extra attribute. The semi-structured data model is represented using
the Extensible Markup Language (XML).
Example

<Employee 1>
<name>..............</name>
<age>.................</age>
<salary>.............</salary>
</employee 1>
ER Diagram - Relational Data Model
An Entity–relationship model (ER model) describes the structure of a database with
the help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An
ER model is a design or blueprint of a database

ER Diagrams
An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a table or
attribute of a table in database, so by showing relationship among tables and their attributes,
ER diagram shows the complete logical structure of a database.

Network Data Model


The network data model is a modified version of the hierarchical model. It is represented
by a graph. It provides extra flexibility in representing the data. It is also easier and faster to
retrieve data using this method compared to the network model. Here a child entity can have
more than one parent entity, i.e., an entity can be related with multiple entities.
Hierarchical Data Model
This is one of the older forms of data models. This data model is represented using a
tree-like structure. Each entity has only one parent. However, it can have many children. At the
top of the hierarchy, there is a single entity known as the root.

ER Diagram

An ER model is a design or blueprint of a database that can be implemented as a database.


An Entity Relationship Diagram that represents relationships among entities in a database.

What is an ER Model?

An Entity-Relationship Model represents the structure of the database with the help of a
diagram. ER Modelling is a systematic process to design a database as it would require you to
analyze all data requirements before implementing your database.

Components of ER Diagram

1. Entity
2. Attribute
3. Relationship
Symbols used in ER Diagram

Entity

An entity is an object or component of data. It is anything in the real world, such as an object,
class, person, or place. Objects that physically exist and are logically constructed in the real world
are called entities.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship
with other entity is called weak entity. The weak entity is represented by a double rectangle.

For example – a bank account cannot be uniquely identified without knowing the bank to which
the account belongs, so bank account is a weak entity.

Attribute

An attribute describes the property of an entity. An attribute is represented as Oval in an


ER diagram. There are four types of attributes:

1.Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

There are four categories in which the attributes can be distributed :

1. Key attribute:

This attribute can uniquely identify an entity in an entity set. Key attribute is
represented by oval same as other attributes however the text of key attribute is
underlined. For example, the Aadhar number is a key attribute that can identify a person.

2. Composite attribute:

It is a combination of other attributes in an ER diagram or, in other words, the


attributes that can be divided into sub-attributes or sub-parts are known as the composite
attributes in an ER diagram. For example, the address field in a table is a combination of
the state, city, Pin, and street attributes.
3. Multivalued attribute:
It is an attribute that can hold multiple values. For example, a person can have
multiple phone numbers.

4. Derived attribute :

These attributes are derived from other attributes in a database. An example of a derived
attribute can be the age of an employee which must be derived from the Date of Birth of
that employee.

Relationship

It is used to define the relationship that exists between different entities in an ER diagram.

1. Unary Relationship: When only one entity participates in a relationship, it is called


Unary Relationship.
2. Binary Relationship: When two entities participate in a relationship, it is called Binary
Relationship.
3. N-ary Relationship: When n number of entities participate in the relationship, it is
termed an n-ary relationship.
There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

Participation Constraints

1. Partial Participation – Not all entities are involved in the relationship. Partial participation is
denoted by single lines.
For example: In the following diagram each college must have at-least one associated
Student. Total participation is represented using a double line between the entity set and
relationship set.
2. Total Participation – All entities are involved in the relationship. Total participation is
denoted by double lines.

Example: Consider an example of an IT company. There are many employees working


for the company. Let’s take the example of relationship between employee and role software
engineer. Every software engineer is an employee but not every employee is software engineer
as there are employees for other roles as well, such as housekeeping, managers, CEO etc. so we
can say that participation of employee entity set to the software engineer relationship is partial.

1. One-to-one – A relationship is denoted as ‘1:1’ when only one instance of an entity is


associated with the relationship.

For instance, consider a relationship between a person and a driver's license where each
person can have only one driver's license and each driver's license belongs to only one person.
2. One-to-many – A relationship is denoted as ‘1:N’ when one instance of an entity is associated
with multiple instances of another entity.

For instance, a mother can have multiple children, but each child has only one biological
mother.

3. Many-to-one – The relationship is denoted as ‘N:1’ when multiple instances of an entity are
associated with a single instance of another entity.

For instance, many students can enroll in a single course.

4. Many-to-many – A relationship is denoted as ‘M:N’ when multiple instances of an entity can


be associated with multiple instances of another entity.

For instance, a student can enroll in multiple courses, and a course can have multiple
students.

Steps to Create ER Diagram

Example :

(a) Construct an E-R diagram for a car-insurance company whose customers own one or more
cars each. Each car has associated with it zero to any number of recorded accidents.
(b) Suppose you are given the following requirements for a simple database for the National

Hockey League (NHL): the NHL has many teams, each team has a name, a city, a coach, a
captain, and a set of players, each player belongs to only one team, each player has a name, a
position (such as left wing or goalie), a skill level, and a set of injury records, a team captain is
also a player, a game is played between two teams (referred to as host_team and guest_team)
and has a date (such as May 11th, 1999) and a score (such as 4 to 2). Construct a clean and
concise ER diagram for the NHL database.

(c) A university registrar’s office maintains data about the following entities: courses, including
number, title, credits, syllabus, and prerequisites; course offerings, including course number,
year, semester, section number, instructor(s), timings, and classroom; students, including
student-id, name, and program; instructors, including identification number, name, department,
and title. Further, the enrollment of students in courses and grades awarded to students in each
course they are enrolled for must be appropriately modeled. Construct an E-R diagram for the
registrar’s office. Document all assumptions that you make about the mapping constraints.
1.6 Extended ER Model:

Extended ER is a high-level data model that incorporates the extensions to the original ER
model. Enhanced ER models are high level models that represent the requirements and
complexities of complex databases.

The extended Entity Relationship (ER) models are three types as given below –

 Aggregation
 Specialization
 Generalization

1.6.1 Specialization:

 The process of designing sub groupings within an entity set is called specialization.
 It is a top-down process.
 If an entity set is given with all the attributes in which the instances of the entity set are
differentiated according to the given attribute value, then that sub-classes or the sub-
entity sets can be formed from the given attribute.
 Specialization of account creates two entity sets: savings account and current account.
 In the E-R diagram specialization is represented by triangle components labeled ISA. The
ISA relationship is referred as superclass- subclass relationship
1.6.2 Generalization:

 It is the reverse process of specialization. It is a bottom-up approach.


 It converts subclasses to super classes. This process combines a number of entity sets
that share the same features into higher-level entity sets.
 If the sub-class information is given for the given entity set then, ISA relationship type
will be used to represent the connectivity between the subclass and superclass

1.6.3 Aggregation:

 It is an abstraction in which relationship sets are treated as higher level entity sets and
can participate in relationships.
 Aggregation allows us to indicate that a relationship set participates in another relationship
set.
 Aggregation is used to simplify the details of a given database where ternary relationships
will be changed into binary relationships.
 Ternary relation is only one type of relationship which is working between three entities.

1.7 Relational Algebra:

 Relational Algebra is procedural query language, which takes Relation as input


and generates relation as output. Relational algebra mainly provides theoretical
foundation for relational databases and SQL.

 Relational algebra is a procedural query language, it means that it tells what data
to be retrieved and how to be retrieved.

 Relational Algebra works on the whole table at once, so we do not have to use
loops etc to iterate over all the rows (tuples) of data one by one.

 All we have to do is specify the table name from which we need the data, and in
a single line of command, relational algebra will traverse the entire given table
to fetch data for you.
Basic/Fundamental Operations:

1. Select (σ)
2. Project (∏)
3. Union (𝖴)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)
7. Joins
8. Assignment Operator
9. Division Operator

1. Select Operation (σ) :This is used to fetch rows (tuples) from table(relation)
which satisfies a given condition.
Syntax: σp(r)

 σ is the predicate
 r stands for relation which is the name of the table
 p is prepositional logic ex: σage > 17 (Student)
This will fetch the tuples(rows) from table Student, for which age will be greater than 17.
σage > 17 and gender = 'Male' (Student)

This will return tuples(rows) from table Student with information of male students, of
age more than 17.

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500


Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:
σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

Project Operation (∏):

Project operation is used to project only a certain set of attributes of a relation. In simple
words, If you want to see only the names of all of the students in the Student table, then
you can use Project Operation.

 It will only project or show the columns or attributes asked for, and will also
remove duplicate data from the columns.
Syntax of Project Operator (∏)
∏ column_name1, column_name2, . , column_nameN(table_name)

Example:
∏Name, Age(Student)

Above statement will show us only the Name and Age columns for all the rows of
data in Student table.

Example: CUSTOMER RELATION

NAME STREET CITY


Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:
∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

Union Operation (𝖴):

 This operation is used to fetch data from two relations(tables) or


temporary relation(result of another operation).

 For this operation to work, the relations(tables) specified should have same number
of attributes(columns) and same attribute domain. Also the duplicate tuples are
automatically eliminated from the result.
Syntax: A 𝖴 B

∏Student(RegularClass) 𝖴 ∏Student(ExtraClass)
Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17
Input:

∏ CUSTOMER_NAME (BORROW) 𝖴 ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

Set Difference (-):

This operation is used to find data present in one relation and not present in the second
relation. This operation is also applicable to two relations, just like Union operation.
Syntax: A - B

where A and B are relations.


For example, if we want to find the names of students who attend the regular class
but not the extra class, then, we can use the below operation:
∏Student(RegularClass) - ∏Student(ExtraClass)

Input: ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)


CUSTOMER_NAME

Smith

Jones

Cartesian Product (X):


 This is used to combine data from two different relations(tables) into one and
fetch data from the combined relation.
 Syntax: A X B

 For example, if we want to find the information for Regular Class and Extra
Class which are conducted during morning, then, we can use the following
operation:
 σtime = 'morning' (RegularClass X ExtraClass)
 For the above query to work, both RegularClass and ExtraClass should have the
attribute time.

Notation: E X D

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B
DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:

EMPLOYEE X DEPARTMENT

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal
Rename Operation (ρ):

 This operation is used to rename the output relation for any query operation
which returns result like Select, Project etc. Or to simply rename a
relation(table)
 Syntax: ρ(RelationNew, RelationOld)

 The rename operation is used to rename the output relation. It is denoted by rho (ρ).

 Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)

Joins

 A JOIN clause is used to combine rows from two or more tables, based on a
related column between them.

 Join in DBMS is a binary operation which allows you to combine join product
and selection in one single statement.

 The goal of creating a join condition is that it helps you to combine the data
from two or more DBMS tables.

 The tables in DBMS are associated using the primary key and foreign keys.

Syntax:

Relation1 ⋈condition Relation2

EMPLOYEE SALARY

EMPLOYEE ⋈ SALARY
Types of Joins:

1. Theta Join

 general case of JOIN operation


 denoted by symbol θ
 combines tuples from different relations provided they satisfy the theta condition
Syntax

• Notation: r s

• Let r, s be relations on schemas R and S, respectively. The result is a relation on


schema R ∪ S. The result tuples are obtained by considering each pair of tuples tr
∈ r and ts ∈ s.

• If tr and ts have the same value for each of the attributes in R ∩ S (“same name
attributes”), a tuple t is added to the result such that

– t has the same value as tr on r


– t has the same value as ts on s

• Example: Given the relations R(A, B, C, D) and

S(B, D, E)

– Join can be applied because R ∩ S /= ∅

– the result schema is (A, B, C, D, E)

– and the result of r 0 s is defined as


πr.A,r.B,r.C,r.D,s.E(σr.B=s.B∧r.D=s.D(r × s))
• Example: given the relations r and s

r s

A B C D B D E
α 1 α a 1 a α
β 2 γ a 3 a β
γ 4 β b 1 a γ
α 1 γ a 2 b δ
δ 2 β b 3 b τ
r ⋈ s

A B C D E
α 1 α a α
α 1 α a γ
α 1 γ a α
α 1 γ a γ
δ 2 β b δ

• Example: given two relations r, s

r s

A B C
1 2 3 3 1
4 5 6 6 2
7 8 9

r ⋈ B<D s

A B C D E
1 2 3 3 1
1 2 3 6 2
4 5 6 6 2
2. Equi Join:

 When Theta join uses only equality comparison operator, it is said to be equijoin
special case of conditional join where only equality condition holds between a
pair of attributes
 As values of two attributes will be equal in result of equijoin, only one attribute
will be appeared in result
 Derivation: r⋈ s = σC(r × s)
 If C involves only the comparison operator “=”, the condition join is also called
Equi-Join.

r s

A B C
4 5 6 6 8
7 8 9 10 12
r ⋈ C=SC (ρS(SC,D) (s))

A B C SC D
3. Natural Join:
4 5 6 6 8

 Special case of Equi_Join.


 Natural join requires that the two join attributes, or each pair of
corresponding join attributes, have the same name in both relations.
If this is not the case, a renaming operation is applied first.
 Natural join removes duplicate attributes.

r ⋈ s = π Attribute_List (σ Join_Condition (r × s)) where

 Attribute_List = attributes (r) ∪ attributes (s)

(duplicates are eliminated) and Join-Condition has the form:

 A1 = A1 AND … AND An = An
 where {A1 … An} = attributes(r) ∩ attributes(s)
R ⋈ ρ (B, F, G) (S) = Select R.*, F, G from R, S where B=E;

Outer Join – Left Join

• Select records from the first (left-most) table with matching right table records
• Join starting with the first (left-most) table.
• Then, any matched records from the second table (right-most) will be included
• there is no matching tuple is found in right relation, then the attributes of right
relation in the join result are filled with null values
• Syntax:
Outer Join – Right Join:

• Operation allows keeping all tuple in the right relation


• Join starting with the second (right-most) table and then any matching first
(left-most) table records
• no matching tuple is found in the left relation, then the attributes of the left
relation in the join result are filled with null values

• Syntax:

Outer Join – Full Join

• all tuples from both relations are included in the result, irrespective of the
matching condition.
• Syntax:
Semi Join:

• Semi-Join matches the rows of two relations and then show the matching
rows of the relation whose name is mentioned to the left side of ⋉ Semi Join
operator.

Division Operator:

• Division operation is denoted by ÷ sign.


• Let R (R-Schema) and S(S-Schema) be relations and any attribute of S –
Schema is also in R – Schema.
• The relation R / S is a relation on schema R-Schema – S-Schema i.e. on the
schema containing all the attributes of Schema R that are not in Schema S.
Syntax:

P=R÷S

Where,
P is result we get after applying division operator,
R and S stands for relation (name of the tables) on which division operation is
applied.

A tuple t is in r ÷ s if and only if both the conditions hold.

• T is in πR – S (r)
• For every tuple ts in S, there is a tuple tr in R satisfying both of the following:
• tr[s] = ts[s]
• tr[R-S] = t

Assignment Operator:

• We can write the operations as a single relational algebra expression by


nesting the operations, or we can apply one operation at a time and create
intermediate result relations.
• In the latter case, we must name the relations that hold the intermediate
results.
• Here, we use the assignment operator (←).

Syntax:

Relational Variable ← Expression (or) R ← E.

R is a relation.

E is the Expression whose result we wish to assign to the relation variable R.

Example:

R1 ← πname(Customer)

R2 ← πname(Employee)

R = R1 – R2

1.8 Relational Data Model

Relational database management system, data is organized into tables. Tables


are further divided into columns and rows. Columns represent the attributes of an
entity, while rows represent the entities themselves.

 Attribute: An attribute is a characteristic or quality of an entity. In the context


of the relational model, an attribute is a column in a table.
 Tables: A table is a data collection organized into rows and columns. In the
relational model for database management, tables are used to store information
about entities.
 Tuple: A tuple is a row in a table. Tuples are composed of attributes.
 Relation Schema: A relation schema is a blueprint for a table. It defines the
attributes that are contained in a table as well as the relationships between
those attributes.
 Degree: The degree is the number of attributes that it contains.
 Cardinality: Cardinality defines the relationship between two attributes in a
relation schema. There are three possible relationships: one-to-one, one-to-
many, and many-to-many.
 Column: A column is an attribute that is contained in a table. Columns are
also sometimes referred to as fields.
 Relation key: A relation key is an attribute or combination of attributes that
uniquely identifies a tuple in a table.
 Domain: The domain of an attribute is the set of values that can be stored in
that attribute. For example, if an attribute represents ages, its domain would
be any age from 0 onward.

1.9 Constraints

 Key constraints
 Domain constraints
 Referential integrity constraints

Key constraints
There must be at least one minimal subset of attributes in the relation, which can
identify a tuple uniquely. This minimal subset of attributes is called key for that relation

Key constraints force that −

 In a relation with a key attribute, no two tuples can have identical values for
key attributes.
 A key attribute cannot have NULL values.

Domain Constraints

Attributes have specific values in real-world scenario. For example, age cannot be
less than zero and telephone numbers cannot contain a digit outside 0-9.

Referential integrity Constraint

Referential Integrity ensures that for every foreign key entry in child table must have
corresponding primary key entry present in the parent table

2.0 Keys
KEYS in DBMS is an attribute or set of attributes which helps you to identify a
row(tuple) in a relation(table). Database key is also helpful for finding unique record
or row from the table.

 A Key can be a single attribute or a group of attributes, where the


combination may act as a key.

Primary key

 Primary key is the Candidate key selected by the database administrator


to uniquely identify tuples in a table.
 The Primary key should be a unique and non-null attribute(s).

There can be two ways to create a Primary key for the table. The first way is to alter
an already created to add the Primary key constraint on an attribute.
Here, I have chosen the Id as the primary key attribute.

Now if you try to add a new row with a duplicate Id value, it gives me an error message.

The second way of adding a Primary key is during the creation of the table itself. All
you have to do is add the Primary Key constraint at the end after defining all the
attributes in the table.

To define a Primary Key constraint on multiple attributes, you can list all the attributes

in the parenthesis, as shown below.

Alternate Keys or Secondary Keys

Alternate keys are those candidate keys which are not the Primary key.

There can be only one Primary key for a table. Therefore all the remaining Candidate

keys are known as Alternate or Secondary keys. They can also uniquely identify tuples

in a table, but the database administrator chose a different key as the Primary key.

If we look at the Employee table once again, since I have chosen Id as the Primary

key, the other Candidate Key (Email) becomes the Alternate key for the table.
Foreign Keys

Foreign key is an attribute which is a Primary key in its parent table, but is
included as an attribute in another host table

A Foreign key generates a relationship between the parent and host tables. For
example, in addition to the Employee table containing the employees’ personal details,
we might have another table, Department containing information related to the
department of the employee with the following schema Department(Id, Name,
Location).

The Primary key in Department table is the Id. Add this attribute to the Employee by

making it the Foreign key in the table. Foreign Key constraint may added during table

creation are by using alter command. Here Altered the table, but creating Foreign Key

during table creation is similar to that of Primary Key.

Dep_Id is now the Foreign Key in table, Employee while it is a Primary Key in the
Department table.
The Foreign key allows you to create a relationship between two tables in the database,

thereby ensuring normalization in relational databases. Each of these tables describes

data related to a particular field (employee and department here). Using the Foreign

key, we can easily retrieve data from both tables

For example, if the Marketing department shifts from Kolkata to Pune, instead of
updating it for all the relevant rows in the Employee table, we can simply update the
location in the Department table.

Composite Keys
A Composite key is a Candidate key or Primary key that consists of more than
one attribute. Sometimes it is possible that no single attribute will have the property
to identify tuples in a table uniquely. In such cases, we can use a group of attributes
to guarantee uniqueness. Combining these attributes will uniquely identify tuples in
the table.

Consider the following table:

Here, neither of the attributes contains unique values to identify the tuples. Therefore,

we can combine two or more attributes to create a key uniquely identifying the tuples.
For example, we can group Transaction_Id and Product_Id to create a key that can

uniquely identify the tuples. These are called composite keys.

Various Keys in SQL - Takeaways

 SQL keys are used to identify rows in a table uniquely

 SQL keys can either be a single column or a group of columns

 A Super key is a single key or a group of multiple keys that can uniquely

identify tuples in a table

 Super keys can contain redundant attributes that might not be important for

identifying tuples

 Super keys are a superset of Candidate keys

 Candidate keys are a subset of Super keys. They contain only those

attributes which are required to identify tuples uniquely

 All Candidate keys are Super keys. But the vice-versa is not true

 A Primary key is a Candidate key chosen to uniquely identify tuples in the

table.

 Primary key values should be unique and non-null

 There can be multiple Super keys and Candidate keys in a table, but there

can be only one Primary key in a table

 Alternate keys are those Candidate keys that were not chosen to be the

Primary key of the table

 A Composite key is a Candidate key with more than one attribute

 A Foreign key is an attribute that is a Primary key in its parent table but is

included as an attribute in the host table

 Foreign keys may accept non-unique and null values


2.1 Normalization
Normalization is a database design technique that reduces data redundancy
and eliminates anomalies like Insertion, Update and Deletion Anomalies.

It divides larger tables into smaller tables and links them using relationships

Need for Database Normalization


There are a few reasons we would want to go through this process:

 It is to eliminate redundant (repetitive) data and ensure data is stored


logically
 Make the database more efficient
 Prevent the same data from being stored in more than one place (called an
“insert anomaly”)
 Prevent updates being made to some data but not others (called an “update
anomaly”)
 Prevent data not being deleted when it is supposed to be, or from data being
lost when it is not supposed to be (called a “delete anomaly”)
 Ensure the data is accurate
 Reduce the storage space that a database takes up
 Ensure the queries on a database run as fast as possible

 1NF: A relation is in 1NF if all its attributes have an atomic value


 2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the candidate key in DBMS
 3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency
 BCNF: A relation is in BCNF if it is in 3NF and for every Functional
Dependency, LHS is the super key

Data Anomalies - An anomaly is where there is an issue in the data that is not
meant to be there
Functional dependency is a relationship that exists between two sets of attributes
of a relational table where one set of attributes can determine the value of the other
set of attributes. It is denoted by X -> Y, where X is called a determinant and Y is
called dependent.

Why Do We Carry out Database Normalization?

There are two primary reasons why database normalization is used.

 First, it helps reduce the amount of storage needed to store the data.
 Second, it prevents data conflicts that may creep in because of the
existence of multiple copies of the same data

Database Normalization Example

To understand (DBMS)normalization with example tables, let's assume that


storing the details of courses and instructors in a university. Here is what a sample
database could look like:

Course Instructor
Venue Phone No.
code Name

CS201 SF02 Prof John 9567889890

Prof
CS202 SF03 9089090909
George

CS203 SF04 Prof John 9567889890

Here, the data basically stores the course code, venue, instructor name, and phone
number.

At first, this design seems to be good. However, issues start to develop once need to
modify information. For instance, suppose, if Prof. John changed his mobile number.
In such a situation, have to make edits in 2 places.

If someone just edited the phone number against CS201, but forgot to edit it for
CS203? This will lead to wrong information in the database. This problem can be easily
tackled by dividing our table into 2 simpler tables:
Table 1 (Instructor):

 Instructor ID
 Instructor Name
 Phone no.

Instructor
InstructorID Phone no
Name

9567889890,
F01 Prof John
8975643567

F02 Prof
9089090909
George

F03 Prof John 9567889890

Table 2 (Course):

 Course code
 Venue
 Instructor ID

Course
Venue InstructorID
code

CS201 SF02 F01

CS202 SF03 F02

CS203 SF04 F03

The First Normal Form – 1NF


For a table to be in the first normal form, it must meet the following criteria:

 a single cell must not hold more than one value (atomicity)

 there must be a primary key for identification

 no duplicated rows or columns

 each column must have only one value for each row in the table

 First Normal Form (1NF)

If a relation doesn't have any multivalued attributes, it is said to be in 1NF (first


normal form).
Employee
Employee_Name Employee_PhoneNo.
_code

201 John 9567889890,7868468656

202 George 9089090909

203 Rehen 9567889890

Employee_PhoneNo is a multivalued characteristic in this case. This relationship is,


therefore, not 1NF.Now create new rows for each employee's phone number in order
to transform this table into 1NF.

EmployeeDetail
Employee
Employee_Name Employee_PhoneNo.
_code

201 John 9567889890

201 John 7868468656

202 George 9089090909

203 Rehen 9789405444

The Second Normal Form – 2NF


The 1NF only eliminates repeating groups, not redundancy. The reduction of
partial dependencies is necessary for the normalization of 1NF relations to 2NF.A
table is said to be in 2NF if it meets the following criteria:

 it’s already in 1NF

 has no partial dependency. That is, all non-key attributes are fully dependent
on a primary key

 If there is a partial dependency, split the table in two and relocate the
partially dependent characteristics to another table where they will fit in better

To better understand partial dependency and how to normalize the table to the second
normal form, let's use the following "EmployeeProjectDetail" table as

EmployeeProjectDetail
Employee Project Employee Project
Code ID Name Name

201 P02 John Project123

201 P03 John Project126

202 P04 George Project125


203 P05 Rehen Project124

Employee Code and Project ID are the table's primary attributes in the table. Because
Employee Code may identify Employee Name and Project Name can be determined by
Project ID, have partial dependencies in this table. As a result, the relational table
mentioned above breaks the 2NF constraint.

The EmployeeProjectDetail table can be divided into the following three tables to
eliminate partial dependencies and normalize it into the second normal form.

EmployeeDetail
Employee Employee
Code Name

201 John

201 John

202 George

203 Rehen

EmployeeProject
Employee Project
Code ID

201 P02

201 P03

202 P04

203 P05

ProjectDetail
Project Project
ID Name

P02 Project123

P03 Project126

P04 Project125

P05 Project124

So, by splitting the EmployeeProjectDetail database into the EmployeeDetail,


ProjectDetail, and EmployeeProject tables, were able to convert it into 2NF. The
tables adhere to the same 2NF rules as 1NF, and each non-prime property completely
depends on the primary key.
The Third Normal Form – 3NF
When a table is in 2NF, it eliminates repeating groups and redundancy, but it
does not eliminate transitive partial dependency.

This means a non-prime attribute (an attribute that is not part of the candidate’s key)
is dependent on another non-prime attribute. This is what the third normal form (3NF)
eliminates.

So, for a table to be in 3NF, it must:

 be in 2NF

 have no transitive partial dependency


 Remove any fields that are independent of the key.
 Records in the table should not contain values that are not a part of the key for
that record. In general, whenever a group of fields' contents may be relevant
to more than one record in the table, think about putting those data in a
different table.
 For instance, the name and address of a candidate's institution may be listed in
an employee recruitment table. For group mailings, however, you require a
complete list of universities. There is no way to list universities that don't
currently have any candidates if university information is kept in the Candidates
table. Make a separate Universities table and use a university code key to
connect it to the Candidates table.

 EXCEPTION: Adhering to the third normal form is not always possible, despite
being theoretically desirable. Create separate tables for cities, ZIP codes, sales
reps, customer classes, and any other factor that might be repeated in many
records if you have a customer database and want to eliminate any potential
inter-field dependencies. Normalization in DBMS is worthwhile to pursue in
theory. However, numerous small tables might cause performance to suffer or
go beyond open file and RAM limits.

 Applying the third normal form to regularly changing data may be more
practical. If any dependent fields still exist, build your application to ask the
user to confirm changes to all linked fields.
Employee Employee Employee_zipcode
Employee_City
Code Name

201 John Bangalore 501602

201 John Chennai 641092

202 George Coimbatore 641201

203 Rehen Madurai 652301


 Employee Code -> Employee City transitive dependency prevents the table
mentioned above from being in 3NF because:

 Customer Code -> Customer Zipcode

 Employee City -> Employee Zipcode

 Additionally, neither Employee City nor Employee Zipcode is the primary


attribute.

 We can divide the "EmployeeDetail" table into the following two tables to
eliminate transitive dependency from it and normalize it into the third normal
form:

<EmployeeDetail>

Employee Employee Employee_zipcode


Code Name

201 John 501602

201 John 641092

202 George 641201

203 Rehen 652301

<EmployeeLocation>

Employee_City Employee_zipcode

Bangalore 501602

Chennai 641092

Coimbatore 641201

Madurai 652301

By breaking the "EmployeeDetail" data down into the "EmployeeDetail" and


"EmployeeLocation" tables, which are both in 2NF and do not have any transitive
dependencies, we were able to transform the "EmployeeDetail" table into 3NF.
The 2NF and 3NF remove any redundant dependence on candidate keys and set certain
additional restrictions on them. There may, however, still be certain dependencies that
result in database redundancy. A stricter normal form called BCNF eliminates these
redundant elements.

BCNF (Boyce-Codd Normal Form)

Boyce-Codd As it has more limitations than 3NF, Normal Form is an improved form of
3NF. A relational table must conform to the following conditions to be in Boyce-Codd
normal form:

The third normal form of the table is required. X is the table's superkey for every non-
trivial functional dependence X -> Y. As a result, if Y is a prime attribute, X cannot be
a non-prime attribute.

A group of one or more attributes known as a superkey can be used to identify a row
in a database table specifically. To better understand how to normalize the table to the
BCNF, let's use the following "EmployeeProjectLead" table as an example:

Employee Project
ProjectLeader
Code ID

201 P02 Jonson

201 P03 Swetha

202 P04 Arun

203 P05 Daniel

The preceding table satisfies all conventional forms up to 3NF. However, since its
candidate key is "Employee Code, Project ID," it defies BCNF's criteria. Project Leader
is a non-prime property for the non-trivial functional dependence Project Leader ->
Project ID, whereas Project ID is a prime attribute. In BCNF, this is not permitted.
Divide the given table into three tables and then translate them into BCNF.

<EmployeeProject>

Employee Code Project ID

201 P02

201 P03
202 P04

203 P05

<ProjectLead>

Project ID ProjectLeader

P02 Jonson

P03 Swetha

P04 Arun

P05 Daniel

As a result, divided the "EmployeeProjectLead" database into the "EmployeeProject"


and the "ProjectLead" tables before converting it to BCNF.

4NF (Fourth Normal Form)


1. It should be in the Boyce-Codd Normal Form.

2. the table should not have any Multi-valued Dependency

For example, consider a functional Dependency A → B. This dependency is

multivalued if B has more than one value.

Let us understand 4NF with an example. Consider the table given below.

student_id student_subject student_hobby

S011 Science Reading Books

Listening to
S012 English
music

S011 Maths Playing guitar

Watching
S013 Biology
movies
S014 Civics Cooking

In the above table, student_subject and student_hobby are independant of each

other, but, both of them are dependant on student_id. Hence, the above relation is

not in 4NF.

This can be removed by decomposing the tables into two tables.

Student_Subject Table

student_id student_subject

S011 Science

S012 English

S011 Maths

S013 Biology

S014 Civics

Student_Hobby Table

student_id student_hobby

S011 Reading Books

S012 Listening to music

S011 Playing guitar

S013 Watching movies

S014 Cooking
5NF (Fourth Normal Form)
1. It should be in the Fourth Normal Form.

2. It should not have any Join Dependency and there shouldn't be any losses while
perfoming the join function.

3. The relation can be considered to be in 5nf if it has been broken down to as many tables

as possible and it has passed through all the Normal Forms. This helps to diminish the data

redundancy.

Let us try to understand this with the help of an example.

teacher_subject teacher_name semester

Maths David 2nd Sem

Maths Arya 2nd Sem

English Arya 2nd Sem

English Ashwini 3rd Sem

Science Prateek 2nd Sem

According to the table above, Arya handles both maths and english classes for 2nd sem, but

not for 3rd sem. To identify a valid data in this situation, a combination of all these fields is needed.

Consider a scenario in which we add a new semester say Semester 4 but are unsure of the

subject or the students who will be enrolled in it, so we leave teacher_name and teacher_subject

as NULL. We cannot, however, leave the other two columns empty because the three columns

together function as a primary key.

Due to these issues the above relation is not in 5NF. This can be achieved by breaking the

table down to three tables as given below.


Semester_Subject Table

Semester teacher_subject

2nd Sem Maths

2nd Sem English

2nd Sem Science

3rd Sem English

Teacher_Subject Table

teacher_subject teacher_name

Maths David

Maths Arya

English Arya

English Ashwini

Science Prateek

Semester_Teacher Table

teacher_name semester

David 2nd Sem

Arya 2nd Sem

Arya 2nd Sem

Ashwini 3rd Sem

Prateek 2nd Sem


Advantages of Normalization

There are many benefits to normalizing a database. Some of the main advantages are
as follows:

1. Utilizing normalized database or data redundancy


2. Duplication might be removed.
3. Reduced null values may result from normalization.
4. results in a more compact database (since there is less data duplication or
zero).
5. Reduce/avoid problems caused by data alteration.
6. It simplifies the questions.
7. The database's structure is clearer and easier to understand.
8. The database can have existing data added to it without any negative effects.

Disadvantages of Normalization
There are various drawbacks to normalizing a database. A few disadvantages are as
follows:
1. When information is dispersed over many tables, it becomes necessary to link
them together, extending the work. Additionally, the database becomes more
intriguing to recognize.
2. Tables will include codes rather than actual data since rewritten data will be
saved as lines of numbers rather than actual data. As a result, the query table
must constantly be consulted.
3. Being designed for programs rather than ad hoc querying, the information
model proves to be exceedingly difficult to query. It is made up of SQL that has
been accumulated through time, and operating framework cordial query devices
often carry out this task. As a result, it might be difficult to demonstrate
knowledge and understanding without first comprehending the client's needs.
4. The show's pace gradually slows down compared to the typical structural type.
5. To successfully finish the standardization cycle, it is vital to have a thorough
6. understanding of the many conventional structures. A bad plan with
substantial irregularities and data inconsistencies can result from careless
use.
2.2 Armstrong Axioms
 We can use closures of attributes to determine if any FD follows from a given
set of FDs.
 Use Armstrong's axioms: complete set of inference rules from which it is
possible to derive every FD that follows from a given set.

Sound and Complete Axioms:


• Sound Axioms - any dependency inferred from F using the primary rules of
the Armstrong axioms will be valid in all applicable relation states.
• Complete Axioms - we have derived all the relevant dependencies that can
be inferred from the given set of functional dependencies.
2.3 Closure of an Attribute Set

● The set of all those attributes which can be functionally determined from an
attribute set is called as a closure of that attribute set.
● Closure of attribute set {X} is denoted as {X}+.

Steps to Find Closure of an Attribute Set

Following steps are followed to find the closure of an attribute set-

Step-01:

Add the attributes contained in the attribute set for which closure is being calculated to the
result set.

Step-02:

Recursively add the attributes to the result set which can be functionally determined from
the attributes already contained in the result set.
Example:
Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-
A → BC BC → DE
D → F CF → G

Now, let us find the closure of some attributes and attribute sets-

Closure of attribute A-

A+ = { A }
={A,B,C } ( Using A → BC )
={A,B,C , D , E } ( Using BC → DE )
={A,B,C , D , E , F } ( Using D → F )
={A,B,C , D , E , F , G } ( Using CF → G )
Thus,

A+ = { A , B , C , D , E , F , G }

Closure of attribute D-

D+ = { D }
= { D , F } ( Using D → F )

We can not determine any other attribute using attributes D and F contained in the result set.
Thus, D+ = { D , F }
Closure of attribute set {B, C}-
{ B , C }+= { B , C }
= { B , C , D , E } ( Using BC → DE )
= { B , C , D , E , F } ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }

Finding the Keys Using Closure-


Super Key-
If the closure result of an attribute set contains all the attributes of the relation, then
that attribute set is called as a super key of that relation.
Thus, we can say-

“The closure of a super key is the entire relation schema.”

In the above example,

 The closure of attribute A is the entire relation schema.


 Thus, attribute A is a super key for that relation.
Candidate Key-

If there exists no subset of an attribute set whose closure contains all the attributes of
the relation, then that attribute set is called as a candidate key of that relation.

In the above example,

 No subset of attribute A contains all the attributes of the relation.


 Thus, attribute A is also a candidate key for that relation.

Finding Candidate Keys-

We can determine the candidate keys of a given relation using the following steps-

Step-01:

 Determine all essential attributes of the given relation.


 Essential attributes are those attributes which are not present on RHS of any
functional dependency.
 Essential attributes are always a part of every candidate key.
 This is because they can not be determined by other attribute.

Example

Let R(A, B, C, D, E, F) be a relation scheme with the following functional


dependencies-

A→BC→DD→E
Here, the attributes which are not present on RHS of any functional dependency
are A, C and F. So, essential attributes are- A, C and F.

Step-02:

 The remaining attributes of the relation are non-essential attributes.


 This is because they can be determined by using essential attributes.

Now, following two cases are possible-

Case-01: If all essential attributes together can determine all remaining non-essential
attributes, then-
 The combination of essential attributes is the candidate key.
 It is the only possible candidate key.

Case-02:

 If all essential attributes together can not determine all remaining non-essential
attributes, then-
 The set of essential attributes and some non-essential attributes will be the
candidate key(s).
 In this case, multiple candidate keys are possible.
 To find the candidate keys, we check different combinations of essential and
non-essential attributes.

Let R = (A, B, C, D, E, F) be a relation scheme with the following dependencies-

C → F E → A EC → D

A→B

Which of the following is a key for R?

1.CD

2.EC

3.AE

4.AC

Also, determine the total number of candidate keys and super keys.

Step-01:

 Determine all essential attributes of the given relation.


 Essential attributes of the relation are- C and E.
 So, attributes C and E will definitely be a part of every candidate key.

Step-02:

We will check if the essential attributes together can determine all remaining non-
essential attributes.
To check, we find the closure of CE

So, we have-

{ CE }+

={C,E}

= { C , E , F } ( Using C → F )

= { A , C , E , F } ( Using E → A )

= { A , C , D , E , F } ( Using EC → D )

= { A , B , C , D , E , F } ( Using A → B )

We conclude that CE can determine all the attributes of the given relation. So, CE is
the only possible candidate key of the relation.

2.4 Decomposition in DBMS

• The process of decomposition in DBMS helps us remove redundancy,


inconsistencies and anomalies from a database when we divide the table into
numerous tables.
• The term decomposition refers to the process in which we break down a table
in a database into various elements or parts.

Decomposition is of two major types in DBMS:

• Lossless
• Lossy

Lossless Decomposition:

• A decomposition is said to be lossless when it is feasible to reconstruct the


original relation R using joins from the decomposed tables.
• Let us take ‘A’ as the Relational Schema, having an instance of ‘a’. Consider
that it is decomposed into: A1, A2, A3, . . . . An; with instance: a1, a2, a3, . .
.. an, If a1 ⋈ a2 ⋈ a3 . . . . ⋈ an, then it is known as ‘Lossless Join
Decomposition’.

Lossy Decomposition:
• Whenever we decompose a relation into multiple relational schemas, then the
loss of data/information is unavoidable whenever we try to retrieve the original
relation.

Properties of Decomposition:

Decomposition must have the following properties:

1. Decomposition Must be Lossless

2. Dependency Preservation

3. Lack of Data Redundancy

1. Decomposition must be lossless:

• Decomposition must always be lossless, which means the information must


never get lost from a decomposed relation.

2. Dependency Preservation:

• Dependency is a crucial constraint on a database, and a minimum of one


decomposed table must satisfy every dependency.
• If {P → Q} holds, then two sets happen to be dependent functionally.

3. Lack of Data Redundancy:

It is also commonly termed as a repetition of data/information. When


decomposition is careless, it may cause issues with the overall data in the
database.

Examples:

• Apply Natural Join decomposition on the below two tables:

Cust_ID Cust_Name Cust_Age Cust_Location

S001 Monica 22 Texas

S002 Rachel 33 Toronto

S003 Phoebe 44 Minnesota


Sec_ID Cust_ID Sec_Name

Sec1 S001 Accounts

Sec2 S002 Marketing

Sec3 S003 Telecom

Lossless Decomposition:

Sec_ID Cust_ID Sec_Name

Sec1 S001 Accounts

Sec2 S002 Marketing

Sec3 S003 Telecom

Lossy Decomposition:

Relational Schema = A (X, Y, Z)

Decompositions,

A1 (X, Y)

A2 (X, Z)

Example:

X Y Z

X1 Y1 Z1

X2 Y1 Z1

X1 Y2 Z2

X1 Y3 Z3

X Y

X1 Y1
X2 Y1

X1 Y2

X1 Y3

X Z

X1 Z1

X2 Z1

X1 Z2

X1 Z3

Now, if we want this decomposition to be lossy, then

A ⊂ A1 ⨝ A2

Thus, A1 ⨝ A2 will be equal to

X Y Z

X1 Y1 Z1

X1 Y1 Z2

X2 Y1 Z1

X1 Y2 Z2

X1 Y2 Z1

X1 Y3 Z3

X1 Y3 Z1
Dependency Preservation:

• Let Fi be the set of dependencies F+ that includes only attributes in Ri


– A decomposition is dependency preserving, if (F1 È F2 È … È Fn)+ = F+
– If it is not, then checking updates for violation of functional
dependencies may require computing joins, which is expensive
• A Dependency preserving decomposition of a relation R is R1, R2, R3...Rn
concerning the set of Functional Dependencies FD if,
• (FD1 ∪ FD2 ∪ ... ∪ FDn)+ = FD+ where,
• FD1, FD2, FD3…...FDn Sets of Functional dependencies of relations R1, R2, R3
...Rn.
• (FD1 U FD2 U FD3 U … U FDn)+ -> Closure of Union of all sets of functional
dependencies.
• FD+ -> Closure of set of functional dependency FD of R.

• With FD (FD1) R is decomposed or divided into R1 and with FD(FD2) into R2,
then the possibility of three cases arise,
• FD1 ∪ FD2 = FD -> Decomposition is dependency preserving.
• FD1 ∪ FD2 is a subset of FD -> Not Dependency preserving.
• FD1 ∪ FD2 is a superset of FD -> This case is not possible.

Example:

• Let suppose, a relation R (P, Q, R, S) with a set of Functional


Dependency FD = (PQ→R, R→S, S→P) is given. Into R1 (P, Q, R) and
R2(R, S), relation R (P, Q, R, S) is decomposed. Find out whether the
decomposition is dependency preserving or not.

Solution:

Decomposed relations of relation R (P, Q, R, S) are R1 (P, Q, R) and

R2 (R, S). To solve this problem, we need to first find the closure of Functional
Dependencies FD1 and FD2 of the relations R1 (P, Q, R) and R2(R, S).

1) To find the closure of FD1, we have to consider all combinations of (P, Q, R). i.e.,
we need to find out the closure of P, Q, R, PQ, QR, and RP.
closure (P) = {P} // Trivial

closure (Q) = {Q} // Trivial

closure (R) = {R, P, S} //but S can't be in closure as S is not

//present in R1 (P, Q, R).

= {R, P}

(R--> P // Removing R from right side as it is trivial attribute)

closure (PQ) = {P, Q, R, S}

= {P, Q, R}

(PQ --> R // Removing PQ from right side as these are trivial attributes)

closure (QR) = {Q, R, S, P}

= {P, Q, R}

(QR --> P // Removing QR from right side as these are trivial attributes)

Closure (PR) = {P, R, S}

(PR --> S // Removing PR from right side as these are trivial attributes)

FD1 {R --> P, PQ --> R, QR --> P}.

2) Similarly FD2 {R--> S}

In the original Relation Dependency FD= {PQ→R, R→S, S→P}.

PQ --> R is present in FD1.

R --> S is present in FD2.

S --> P is not preserved.

• From the given result, in FD1, PQ holds R (PQ --> R) and in FD2, R holds S (R
--> S). But, there is no follow up in Functional Dependency S holds P (S --> P).
• FD1 U FD2 is a subset of FD.
• So as a consequence, given decomposition is not dependency preserving.

You might also like