KEMBAR78
Unit 1 Notes DBMS | PDF | Databases | Client–Server Model
0% found this document useful (0 votes)
12 views64 pages

Unit 1 Notes DBMS

notes dbms

Uploaded by

Krrai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views64 pages

Unit 1 Notes DBMS

notes dbms

Uploaded by

Krrai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

DATABASE SYSTEM

PAPER 105
Unit 1
1. What do you understand about DBMS?
What is DBMS?

Database Management System (DBMS) is software for storing and retrieving users’ data while
considering appropriate security measures. It consists of a group of programs that manipulate
the database. The DBMS accepts the request for data from an application and instructs the
operating system to provide the specific data. In large systems, a DBMS helps users and other
third-party software store and retrieve data. DBMS allows users to create their own databases
as per their requirements. The term “DBMS” includes the user of the database and other
application programs. It provides an interface between the data and the software application.

Example of a DBMS

Let us see a simple example of a university database. This database is maintaining information
concerning students, courses, and grades in a university environment. The database is
organized as five files:
• The STUDENT file stores the data of each student
• The COURSE file stores contain data on each course.
• The SECTION stores information about sections in a particular course.
• The GRADE file stores the grades which students receive in the various sections
• The TUTOR file contains information about each professor.

To define DBMS:

We need to specify the structure of the records of each file by defining the different types of
data elements to be stored in each record.
We can also use a coding scheme to represent the values of a data item.
Basically, your Database will have 5 tables with a foreign key defined amongst the various
tables.

History of DBMS
Here, are the important landmarks from the history of DBMS:

• 1960 – Charles Bachman designed the first DBMS system


• 1970 – Codd introduced IBM’S Information Management System (IMS)
• 1976- Peter Chen coined and defined the Entity-relationship model, also known as the
ER model
• 1980 – Relational Model becomes a widely accepted database component
• 1985- Object-oriented DBMS develops.
• 1990s- Incorporation of object-orientation in relational DBMS.
• 1991- Microsoft ships MS access, a personal DBMS, and that displaces all other
personal DBMS products.
• 1995: First Internet database applications
• 1997: XML applied to database processing. Many vendors begin to integrate XML into
DBMS products.

Characteristics of DBMS
Here are the characteristics and properties of a Database Management System:

• Provides security and removes redundancy


• Self-describing nature of a database system
• Insulation between programs and data abstraction
• Support of multiple views of the data
• Sharing of data and multiuser transaction processing
• Database Management Software allows entities and relations among them to form tables.
• It follows the ACID concept ( Atomicity, Consistency, Isolation, and Durability).
• DBMS supports a multi-user environment that allows users to access and manipulate
data in parallel.

DBMS vs. Flat File


DBMS Flat File Management System
Multi-user access It does not support multi-user access
Design to fulfill the need of small and
It is only limited to smaller DBMS systems.
large businesses
Remove redundancy and Integrity. Redundancy and Integrity issues
Expensive. But in the long term Total
It’s cheaper
Cost of Ownership is cheap
Easy to implement complicated
No support for complicated transactions
transactions

Users of DBMS
Following are the various category of users of DBMS

Component Name Task


The Application programmers write programs in
Application Programmers various programming languages to interact with
databases.
Database Admin is responsible for managing the
Database Administrators entire DBMS system. He/She is called Database
admin or DBA.
The end users are the people who interact with the
database management system. They conduct various
End-Users
operations on databases like retrieving, updating,
deleting, etc.
Popular DBMS Software
Here is the list of some popular DBMS systems:

• MySQL • SQLite
• Microsoft Access • IBM DB2
• Oracle • LibreOffice Base
• PostgreSQL • MariaDB
• dBASE • Microsoft SQL Server
• FoxPro

Application of DBMS
Below are the popular database system applications:

Sector Use of DBMS


For customer information, account activities, payments, deposits, loans,
Banking
etc.
Airlines For reservations and schedule information.
Universities For student information, course registrations, colleges, and grades.
Telecommunication It helps to keep call records, monthly bills, maintain balances, etc.
For storing information about stock, sales, and purchases of financial
Finance
instruments like stocks and bonds.
Sales Use for storing customer, product & sales information.
It is used to manage the supply chain and track the production of items.
Manufacturing
Inventories status in warehouses.
For information about employees, salaries, payroll, deduction,
HR Management
generation of paychecks, etc.

Types of Databases

There are various types of databases used for storing different varieties of data:

1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts the users
to access the stored data from different locations through several applications. These
applications contain the authentication process to let users access data securely. An example of
a Centralized database can be Central Library that carries a central database of each library in
a college/university.
Advantages of Centralized Database
o It has decreased the risk of data management, i.e., manipulation of data will not affect
the core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.
Disadvantages of Centralized Database
o The size of the centralized database is large, which increases the response time for
fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.

2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among different
database systems of an organization. These database systems are connected via communication
links. Such links help the end-users to access the data easily. Examples of the Distributed
database are Apache Cassandra, HBase, Ignite, etc.
We can further divide a distributed database system into:

o Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures, and carries different hardware devices.
Advantages of Distributed Database
o Modular development is possible in a distributed database, i.e., the system can be
expanded by including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.

3) Relational Database
This database is based on the relational data model, which stores data in the form of rows(tuple)
and columns(attributes), and together forms a table(relation). A relational database uses SQL
for storing, manipulating, as well as maintaining the data. E.F. Codd invented the database in
1970. Each table in the database carries a key that makes the data unique from
others. Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc.
Properties of Relational Database
There are following four commonly known properties of a relational model known as ACID
properties, where:
• A means Atomicity: This ensures the data operation will complete either with success
or with failure. It follows the 'all or nothing' strategy. For example, a transaction will either be
committed or will abort.
• C means Consistency: If we perform any operation over the data, its value before and
after the operation should be preserved. For example, the account balance before and after the
transaction should be correct, i.e., it should remain conserved.
• I means Isolation: There can be concurrent users for accessing data at the same time
from the database. Thus, isolation between the data should remain isolated. For example, when
multiple transactions occur at the same time, one transaction effects should not be visible to
the other transactions in the database.
• D means Durability: It ensures that once it completes the operation and commits the
data, data changes should remain permanent.

4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets.
It is not a relational database as it stores data not only in tabular form but in several different
ways. It came into existence when the demand for building modern applications increased.
Thus, NoSQL presented a wide variety of database technologies in response to the demands.
We can further divide a NoSQL database into the following four types:

• Key-value storage: It is the simplest type of database storage where it stores every
single item as a key (or attribute name) holding its value, together.
• Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model format as
used in the application code.
• Graph Databases: It is used for storing vast amounts of data in a graph-like structure.
Most commonly, social networking websites use the graph database.
• Wide-column stores: It is similar to the data represented in relational databases. Here,
data is stored in large columns together, instead of storing in rows.
Advantages of NoSQL Database
o It enables good productivity in the application development as it is not required to store
data in a structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the cloud
computing platform. It provides users with various cloud computing services (SaaS, PaaS,
IaaS, etc.) for accessing the database. There are numerous cloud platforms, but the best options
are:
o Amazon Web Services(AWS)
o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.

6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in the
database system. The data is represented and stored as objects which are similar to the objects
used in the object-oriented programming language.

7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship nodes. Here,
it organizes data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child record in the tree
will contain only one parent. On the other hand, each parent record can have multiple child
records.

8) Network Databases
It is the database that typically follows the network data model. Here, the representation of data
is in the form of nodes connected via links between them. Unlike the hierarchical database, it
allows each record to have multiple children and parent nodes to form a generalized graph
structure.

9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This database is
basically designed for a single user.
Advantage of Personal Database
o It is simple and easy to handle.
o It occupies less storage space as it is small in size.

10) Operational Database


The type of database which creates and updates the database in real-time. It is basically
designed for executing and handling the daily data operations in several businesses. For
example, An organization uses operational databases for managing per day transactions.
11) Enterprise Database
Large organizations or enterprises use this database for managing a massive amount of data. It
helps organizations to increase and improve their efficiency. Such a database allows
simultaneous access to users.
Advantages of Enterprise Database:
o Multi processes are supportable over the Enterprise database.
o It allows executing parallel queries on the system.

2. What is the difference between database system vs file


system?

Basics File System DBMS

The file system is a way of


DBMS is software for managing
arranging the files in a storage
the database.
Structure medium within a computer.

Data Redundant data can be present in a In DBMS there is no redundant


Redundancy file system. data.

It doesn’t provide Inbuilt It provides in house tools for


Backup and mechanism for backup and backup and recovery of data
Recovery recovery of data if it is lost. even if it is lost.

Query There is no efficient query Efficient query processing is


processing processing in the file system. there in DBMS.

There is more data consistency


There is less data consistency in the
because of the process of
file system.
Consistency normalization.

It has more complexity in


It is less complex as compared to
handling as compared to the file
DBMS.
Complexity system.

DBMS has more security


File systems provide less security
Security mechanisms as compared to file
in comparison to DBMS.
Constraints systems.

It has a comparatively higher


It is less expensive than DBMS.
Cost cost than a file system.
Basics File System DBMS

In DBMS data independence


exists, mainly of two types:
There is no data independence.
Data 1) Logical Data Independence.
Independence 2)Physical Data Independence.

Only one user can access data at a Multiple users can access data at
User Access time. a time.

The users are not required to write The user has to write procedures
Meaning procedures. for managing databases

Data is distributed in many files. Due to centralized nature data


Sharing So, it is not easy to share data. sharing is easy

Data It give details of storage and It hides the internal details of


Abstraction representation of data Database

Integrity Integrity Constraints are difficult to Integrity constraints are easy to


Constraints implement implement

To access data in a file , user


requires attributes such as file No such attributes are required.
Attributes name, file location.

Example Cobol, C++ Oracle, SQL Server

3. Explain the architecture of DBMS?


A Database store a lot of critical information to access data quickly and securely. Hence it is
important to select the correct architecture for efficient data management. DBMS
Architecture helps users to get their requests done while connecting to the database. We
choose database architecture depending on several factors like the size of the database,
number of users, and relationships between the users. There are two types of database models
that we generally use, are logical model and physical model. Several types of architecture are
there in the database which we will deal with in the next section.
Types of DBMS Architecture
There are several types of DBMS Architecture that we use according to the usage
requirements. Types of DBMS Architecture are discussed here.
• 1-Tier Architecture
• 2-Tier Architecture
• 3-Tier Architecture
1-Tier Architecture

In 1-Tier Architecture the database is directly available to the user, the user can directly sit
on the DBMS and use it that is, the client, server, and Database are all present on the same
machine. For Example: to learn SQL we set up an SQL server and the database on the local
system. This enables us to directly interact with the relational database and execute
operations. The industry won’t use this architecture they logically go for 2-Tier and 3-Tier
Architecture.

DBMS 1-Tier Architecture

Advantages of 1-Tier Architecture


Below mentioned are the advantages of 1-Tier Architecture.
• Simple Architecture: 1-Tier Architecture is the most simple architecture to set up,
as only a single machine is required to maintain it.
• Cost-Effective: No additional hardware is required for implementing 1-Tier
Architecture, which makes it cost-effective.
• Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is
mostly used in small projects.

2-Tier Architecture

The 2-tier architecture is similar to a basic client-server model. The application at the client
end directly communicates with the database on the server side. APIs like ODBC and JDBC
are used for this interaction. The server side is responsible for providing query processing
and transaction management functionalities. On the client side, the user interfaces and
application programs are run. The application on the client side establishes a connection with
the server side in order to communicate with the DBMS.
An advantage of this type is that maintenance and understanding are easier, and compatible
with existing systems. However, this model gives poor performance when there are a large
number of users.
DBMS 2-Tier Architecture

Advantages of 2-Tier Architecture


• Easy to Access: 2-Tier Architecture makes easy access to the database, which
makes fast retrieval.
• Scalable: We can scale the database easily, by adding clients or by upgrading
hardware.
• Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier
Architecture.
• Easy Deployment: 2-Tier Architecture is easy to deploy than 3-Tier Architecture.
• Simple: 2-Tier Architecture is easily understandable as well as simple because of
only two components.

3-Tier Architecture

In 3-Tier Architecture, there is another layer between the client and the server. The client
does not directly communicate with the server. Instead, it interacts with an application server
which further communicates with the database system and then the query processing and
transaction management takes place. This intermediate layer acts as a medium for the
exchange of partially processed data between the server and the client. This type of
architecture is used in the case of large web applications.

DBMS 3-Tier Architecture


Advantages of 3-Tier Architecture
• Enhanced scalability: Scalability is enhanced due to distributed deployment of
application servers. Now, individual connections need not be made between the client and
server.
• Data Integrity: 3-Tier Architecture maintains Data Integrity. Since there is a middle
layer between the client and the server, data corruption can be avoided/removed.
• Security: 3-Tier Architecture Improves Security. This type of model prevents direct
interaction of the client with the server thereby reducing access to unauthorized data.
Disadvantages of 3-Tier Architecture
• More Complex: 3-Tier Architecture is more complex in comparison to 2-Tier
Architecture. Communication Points are also doubled in 3-Tier Architecture.
• Difficult to Interact: It becomes difficult for this sort of interaction to take place
due to the presence of middle layers.

4. What do you understand about data models? Explain


different types of data models?
Data Models

Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:

1) Relational Data Model: This type of model designs the data in the form of rows and
columns within a table. Thus, a relational model uses tables for representing data and in-
between relationships. Tables are also called relations. This model was initially described by
Edgar F. Codd, in 1969. The relational data model is the widely used model which is primarily
used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as
objects and relationships among them. These objects are known as entities, and relationship is
an association among these entities. This model was designed by Peter Chen and published in
1976 papers. It was widely used in database designing. A set of attributes describe the entities.
For example, student_name, student_id describes the 'student' entity. A set of the same type of
entities is known as an 'Entity set', and the set of the same type of relationships is known as
'relationship set'.

3) Object-based Data Model: An extension of the ER model with notions of functions,


encapsulation, and object identity, as well. This model supports a rich type system that includes
structured and collection types. Thus, in 1980s, various database systems following the object-
oriented approach were developed. Here, the objects are nothing but the data carrying its
properties.

4) Semistructured Data Model: This type of data model is different from the other three data
models (explained above). The semistructured data model allows the data specifications at
places where the individual data items of the same type may have different attributes sets. The
Extensible Markup Language, also known as XML, is widely used for representing the
semistructured data. Although XML was initially designed for including the markup
information to the text document, it gains importance because of its application in the exchange
of data.

Data Model gives us an idea that how the final system will look like after its complete
implementation. It defines the data elements and the relationships between the data elements.
Data Models are used to show how data is stored, connected, accessed and updated in the
database management system. Here, we use a set of symbols and text to represent the
information so that members of the organisation can communicate and understand it. Though
there are many data models being used nowadays but the Relational model is the most widely
used model. Apart from the Relational model, there are many other types of data models
about which we will study in details in this blog. Some of the Data Models in DBMS are:

1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Oriented Data Model
6. Object-Relational Data Model
7. Flat Data Model
8. Semi-Structured Data Model
9. Associative Data Model
10. Context Data Model
Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in the
hierarchical tree structure. The hierarchy starts from the root which has root data and then it
expands in the form of a tree adding child node to the parent node. This model easily
represents some of the real-world relationships like food recipes, sitemap of a website
etc. Example: We can represent the relationship between the shoes present on a shopping
website in the following way:

Features of a Hierarchical Model

1. One-to-many relationship: The data here is organised in a tree-like structure where


the one-to-many relationship is between the datatypes. Also, there can be only one path from
parent to any node. Example: In the above example, if we want to go to the node sneakers we
only have one path to reach there i.e through men's shoes node.
2. Parent-Child Relationship: Each child node has a parent node but a parent node can
have more than one child node. Multiple parents are not allowed.
3. Deletion Problem: If a parent node is deleted then the child node is automatically
deleted.
4. Pointers: Pointers are used to link the parent node with the child node and are used
to navigate between the stored data. Example: In the above example the ' shoes ' node points
to the two other nodes ' women shoes ' node and ' men's shoes ' node.
Advantages of Hierarchical Model

• It is very simple and fast to traverse through a tree-like structure.


• Any change in the parent node is automatically reflected in the child node so, the
integrity of data is maintained.
Disadvantages of Hierarchical Model

• Complex relationships are not supported.


• As it does not support more than one parent of the child node so if we have some
complex relationship where a child node needs to have two parent node then that can't be
represented using this model.
• If a parent node is deleted then the child node is automatically deleted.

Network Model
This model is an extension of the hierarchical model. It was the most popular model before
the relational model. This model is the same as the hierarchical model, the only difference is
that a record can have more than one parent. It replaces the hierarchical tree with a
graph. Example: In the example below we can see that node student has two parents i.e. CSE
Department and Library. This was earlier not possible in the hierarchical model.

Features of a Network Model

1. Ability to Merge more Relationships: In this model, as there are more relationships
so data is more related. This model has the ability to manage one-to-one relationships as well
as many-to-many relationships.
2. Many paths: As there are more relationships so there can be more than one path to
the same record. This makes data access fast and simple.
3. Circular Linked List: The operations on the network model are done with the help of
the circular linked list. The current position is maintained with the help of a program and this
position navigates through the records according to the relationship.
Advantages of Network Model

• The data can be accessed faster as compared to the hierarchical model. This is because
the data is more related in the network model and there can be more than one path to reach a
particular node. So the data can be accessed in many ways.
• As there is a parent-child relationship so data integrity is present. Any change in
parent record is reflected in the child record.
Disadvantages of Network Model

• As more and more relationships need to be handled the system might get complex.
So, a user must be having detailed knowledge of the model to work with the model.
• Any change like updation, deletion, insertion is very complex.

Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this
model, we represent the real-world problem in the pictorial form to make it easy for the
stakeholders to understand. It is also very easy for the developers to understand the system
by just looking at the ER diagram. We use the ER diagram as a visual tool to represent an
ER Model. ER diagram has the following three components:

• Entities: Entity is a real-world thing. It can be a person, place, or even a


concept. Example: Teachers, Students, Course, Building, Department, etc are some of the
entities of a School Management System.
• Attributes: An entity contains a real-world property called attribute. This is the
characteristics of that attribute. Example: The entity teacher has the property like teacher id,
salary, age, etc.
• Relationship: Relationship tells how two attributes are related. Example: Teacher
works for a department.
Example:
In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The
attributes of entity Department entity are Dept_id, Dept_name. The two entities are
connected using the relationship. Here, each teacher works for a department.

Features of ER Model

• Graphical Representation for Better Understanding: It is very easy and simple to


understand so it can be used by the developers to communicate with the stakeholders.
• ER Diagram: ER diagram is used as a visual tool for representing the model.
• Database Design: This model helps the database designers to build the database and
is widely used in database design.
Advantages of ER Model

• Simple: Conceptually ER Model is very easy to build. If we know the relationship


between the attributes and the entities we can easily build the ER Diagram for the model.
• Effective Communication Tool : This model is used widely by the database designers
for communicating their ideas.
• Easy Conversion to any Model : This model maps well to the relational model and
can be easily converted relational model by converting the ER model to the table. This model
can also be converted to any other model like network model, hierarchical model etc.
Disadvatages of ER Model

• No industry standard for notation: There is no industry standard for developing an


ER model. So one developer might use notations which are not understood by other
developers.
• Hidden information: Some information might be lost or hidden in the ER model. As
it is a high-level view so there are chances that some details of information might be hidden.

Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in the
form of a two-dimensional table. All the information is stored in the form of row and
columns. The basic structure of a relational model is tables. So, the tables are also
called relations in the relational model. Example: In this example, we have an Employee
table.
Features of Relational Model

• Tuples : Each row in the table is called tuple. A row contains all the information about
any instance of the object. In the above example, each row has all the information about any
specific individual like the first row has information about John.
• Attribute or field: Attributes are the property which defines the table or relation. The
values of the attribute should be from the same domain. In the above example, we have
different attributes of the employee like Salary, Mobile_no, etc.
Advnatages of Relational Model

• Simple: This model is more simple as compared to the network and hierarchical
model.
• Scalable: This model can be easily scaled as we can add as many rows and columns
we want.
• Structural Independence: We can make changes in database structure without
changing the way to access the data. When we can make changes to the database structure
without affecting the capability to DBMS to access the data we can say that structural
independence has been achieved.
Disadvantages of Relatinal Model

• Hardware Overheads: For hiding the complexities and making things easier for the
user this model requires more powerful hardware computers and data storage devices.
• Bad Design: As the relational model is very easy to design and use. So the users don't
need to know how the data is stored in order to access it. This ease of design can lead to the
development of a poor database which would slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the relational model.
These problems can be avoided with the help of proper implementation and organisation.

Object-Oriented Data Model


The real-world problems are more closely represented through the object-oriented data
model. In this model, both the data and relationship are present in a single structure known
as an object. We can store audio, video, images, etc in the database which was not possible
in the relational model(although you can store audio and video in relational database, it is
adviced not to store in the relational database). In this model, two are more objects are
connected through links. We use this link to relate one object to other objects. This can be
understood by the example given below.

In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name, Job_title
of the employee and the methods which will be performed by that object are stored as a single
object. The two objects are connected through a common attribute i.e the Department_id and
the communication between these two will be done with the help of this common id.

Object-Relational Model
As the name suggests it is a combination of both the relational model and the object-oriented
model. This model was built to fill the gap between object-oriented model and the relational
model. We can have many advanced features like we can make complex data types according
to our requirements using the existing data types. The problem with this model is that this
can get complex and difficult to handle. So, proper understanding of this model is required.

Flat Data Model


It is a simple model in which the database is represented as a table consisting of rows and
columns. To access any data, the computer has to read the entire table. This makes the modes
slow and inefficient.

Semi-Structured Model
Semi-structured model is an evolved form of the relational model. We cannot differentiate
between data and schema in this model. Example: Web-Based data sources which we can't
differentiate between the schema and data of the website. In this model, some entities may
have missing attributes while others may have an extra attribute. This model gives flexibility
in storing the data. It also gives flexibility to the attributes. Example: If we are storing any
value in any attribute then that value can be either atomic value or a collection of values.

Associative Data Model


Associative Data Model is a model in which the data is divided into two parts. Everything
which has independent existence is called as an entity and the relationship among these
entities are called association . The data divided into two parts are called items and links.

• Item : Items contain the name and the identifier(some numeric value).
• Links: Links contain the identifier, source, verb and subject.
Example : Let us say we have a statement "The world cup is being hosted by London from
30 May 2020". In this data two links need to be stored:

1. The world cup is being hosted by London. The source here is 'the world cup', the verb
'is being' and the target is 'London'.
2. ...from 30 May 2020. The source here is the previous link, the verb is 'from' and the
target is '30 May 2020'.
This is represented using the table as follows:

Context Data Model


Context Data Model is a collection of several models. This consists of models like network
model, relational models etc. Using this model we can do various types of tasks which are
not possible using any model alone.
5. What is the e-r diagram? Explain it with an example.
The Entity Relational Model is a model for identifying entities to be represented in the
database and representation of how those entities are related. The ER data model specifies
enterprise schema that represents the overall logical structure of a database graphically.
The Entity Relationship Diagram explains the relationship among the entities present in the
database. ER models are used to model real-world objects like a person, a car, or a company
and the relation between these real-world objects. In short, the ER Diagram is the structural
format of the database.
Why Use ER Diagrams In DBMS?
• ER diagrams are used to represent the E-R model in a database, which makes them
easy to be converted into relations (tables).
• ER diagrams provide the purpose of real-world modeling of objects which makes
them intently useful.
• ER diagrams require no technical knowledge and no hardware support.
• These diagrams are very easy to understand and easy to create even for a naive user.
• It gives a standard solution for visualizing the data logically.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which
consists of these symbols:
• Rectangles: Rectangles represent Entities in the ER Model.
• Ellipses: Ellipses represent Attributes in the ER Model.
• Diamond: Diamonds represent Relationships among Entities.
• Lines: Lines represent attributes to entities and entity sets with other relationship
types.
• Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
• Double Rectangle: Double Rectangle represents a Weak Entity.

Symbols used in ER Diagram

Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database
System.
Components of ER Diagram

Entity
An Entity may be an object with a physical existence – a particular person, car, house, or
employee – or it may be an object with a conceptual existence – a company, a job, or a
university course.
Entity Set: An Entity is an object of Entity Type and a set of all entities is called an entity
set. For Example, E1 is an entity having Entity Type Student and the set of all students is
called Entity Set. In ER diagram, Entity Type is represented as:

Entity Set

1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on
other Entity in the Schema. It has a primary key, that helps in identifying it uniquely, and it
is represented by a rectangle. These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But
some entity type exists for which key attributes can’t be defined. These are called Weak
Entity types.
For Example, A company may store the information of dependents (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existed without the employee. So
Dependent will be a Weak Entity Type and Employee will be Identifying Entity type for
Dependent, which means it is Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The participation of weak entity
types is always total. The relationship between the weak entity type and its identifying strong
entity type is called identifying relationship and it is represented by a double diamond.

Strong Entity and Weak Entity

Attributes
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB,
Age, Address, and Mobile_No are the attributes that define entity type Student. In ER
diagram, the attribute is represented by an oval.

Attribute

1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key
attribute. For example, Roll_No will be unique for each student. In ER diagram, the key
attribute is represented by an oval with underlying lines.

Key Attribute

2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For
example, the Address attribute of the student Entity type consists of Street, City, State, and
Country. In ER diagram, the composite attribute is represented by an oval comprising of
ovals.
Composite Attribute

3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.

Multivalued Attribute

4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.

Derived Attribute

The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes

Relationship Type and Relationship Set


A Relationship Type represents the association between entity types. For example, ‘Enrolled
in’ is a relationship type that exists between entity type Student and Course. In ER diagram,
the relationship type is represented by a diamond and connecting the entities with lines.

Entity-Relationship Set

A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.

Relationship Set
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one
person.

Unary Relationship

2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.

Binary Relationship

3. n-ary Relationship: When there are n entities set participating in a relation, the
relationship is called an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is known
as cardinality. Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the
relationship, the cardinality is one-to-one. Let us assume that a male can marry one female
and a female can marry one male. So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.

one to one cardinality

Using Sets, it can be represented as:


Set Representation of One-to-One

2. One-to-Many: In one-to-many mapping as well where each entity can be related to more
than one relationship and the total number of tables that can be used in this is 2. Let us assume
that one surgeon deparment can accomodate many doctors. So the Cardinality will be 1 to M.
It means one deparment has many Doctors.
total number of tables that can used is 3.

one to many cardinality

Using sets, one-to-many cardinality can be represented as:

Set Representation of One-to-Many


3. Many-to-One: When entities in one entity set can take part only once in the relationship
set and entities in other entity sets can take part more than once in the relationship set,
cardinality is many to one. Let us assume that a student can take only one course but one
course can be taken by many students. So the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course.
The total number of tables that can be used in this is 3.

many to one cardinality

Using Sets, it can be represented as:

Set Representation of Many-to-One

In this case, each student is taking only 1 course but 1 course has been taken by many
students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than one
course and one course can be taken by many students. So the relationship will be many to
many.
the total number of tables that can be used in this is 3.
many to many cardinality

Using Sets, it can be represented as:

Many-to-Many Set Representation

In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3,
and S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of students will be total. Total
participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Total Participation and Partial Participation

Using Set, it can be represented as,

Set representation of Total Participation and Partial Participation

Every student in the Student Entity set participates in a relationship but there exists a course
C4 that is not taking part in the relationship.
How to Draw ER Diagram?
• The very first step is Identifying all the Entities, and place them in a Rectangle, and
labeling them accordingly.
• The next step is to identify the relationship between them and pace them
accordingly using the Diamond, and make sure that, Relationships are not connected to each
other.
• Attach attributes to the entities properly.
• Remove redundant entities and relationships.
• Add proper colors to highlight the data present in the database.

6. What are keys? Explain the types of keys and give an


example of each.

What are Keys in DBMS?


KEYS in DBMS is an attribute or set of attributes which helps you to identify a row(tuple) in
a relation(table). They allow you to find the relation between two tables. Keys help you
uniquely identify a row in a table by a combination of one or more columns in that table. Key
is also helpful for finding unique record or row from the table. Database key is also helpful
for finding unique record or row from the table.
Example:
Employee ID FirstName LastName

11 Andrew Johnson

22 Tom Wood

33 Alex Hale
In the above-given example, employee ID is a primary key because it uniquely identifies an
employee record. In this table, no other employee can have the same employee ID.

Why we need a Key?


Here are some reasons for using sql key in the DBMS system.

• Keys help you to identify any row of data in a table. In a real-world application, a
table could contain thousands of records. Moreover, the records could be duplicated. Keys in
RDBMS ensure that you can uniquely identify a table record despite these challenges.
• Allows you to establish a relationship between and identify the relation between
tables
• Help you to enforce identity and integrity in the relationship.

Types of Keys in DBMS (Database Management System)


There are mainly Eight different types of Keys in DBMS and each key has it’s different
functionality:

1. Super Key
2. Primary Key
3. Candidate Key
4. Alternate Key
5. Foreign Key
6. Compound Key
7. Composite Key
8. Surrogate Key

Let’s look at each of the keys in DBMS with example:

• Super Key – A super key is a group of single or multiple keys which identifies rows
in a table.
• Primary Key – is a column or group of columns in a table that uniquely identify
every row in that table.
• Candidate Key – is a set of attributes that uniquely identify tuples in a table.
Candidate Key is a super key with no repeated attributes.
• Alternate Key – is a column or group of columns in a table that uniquely identify
every row in that table.
• Foreign Key – is a column that creates a relationship between two tables. The
purpose of Foreign keys is to maintain data integrity and allow navigation between two
different instances of an entity.
• Compound Key – has two or more attributes that allow you to uniquely recognize a
specific record. It is possible that each column may not be unique by itself within the
database.
• Composite Key – is a combination of two or more columns that uniquely identify
rows in a table. The combination of columns guarantees uniqueness, though individual
uniqueness is not guaranteed.
• Surrogate Key – An artificial key which aims to uniquely identify each record is
called a surrogate key. These kind of key are unique because they are created when you don’t
have any natural primary key.

What is the Super key?


A superkey is a group of single or multiple keys which identifies rows in a table. A Super key
may have additional attributes that are not needed for unique identification.

Example:

EmpSSN EmpNum Empname

9812345098 AB05 Shown

9876512345 AB06 Roslyn

199937890 AB07 James


In the above-given example, EmpSSN and EmpNum name are superkeys.

What is a Primary Key?


PRIMARY KEY in DBMS is a column or group of columns in a table that uniquely identify
every row in that table. The Primary Key can’t be a duplicate meaning the same value can’t
appear more than once in the table. A table cannot have more than one primary key.
Rules for defining Primary key:

• Two rows can’t have the same primary key value


• It must for every row to have a primary key value.
• The primary key field cannot be null.
• The value in a primary key column can never be modified or updated if any foreign
key refers to that primary key.

Example:

In the following example, StudID is a Primary Key.

StudID Roll No First Name LastName Email

1 11 Tom Price abc@gmail.com

2 12 Nick Wright xyz@gmail.com


StudID Roll No First Name LastName Email

3 13 Dana Natan mno@yahoo.com

What is the Alternate key?


ALTERNATE KEYS is a column or group of columns in a table that uniquely identify
every row in that table. A table can have multiple choices for a primary key but only one can
be set as the primary key. All the keys which are not primary key are called an Alternate Key.
Example:

In this table, StudID, Roll No, Email are qualified to become a primary key. But since StudID
is the primary key, Roll No, Email becomes the alternative key.

StudID Roll No First Name LastName Email

1 11 Tom Price abc@gmail.com

2 12 Nick Wright xyz@gmail.com

3 13 Dana Natan mno@yahoo.com

What is a Candidate Key?


CANDIDATE KEY in SQL is a set of attributes that uniquely identify tuples in a table.
Candidate Key is a super key with no repeated attributes. The Primary key should be selected
from the candidate keys. Every table must have at least a single candidate key. A table can
have multiple candidate keys but only a single primary key.
Properties of Candidate key:

• It must contain unique values


• Candidate key in SQL may have multiple attributes
• Must not contain null values
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table

Candidate key Example: In the given table Stud ID, Roll No, and email are candidate keys
which help us to uniquely identify the student record in the table.

StudID Roll No First Name LastName Email

1 11 Tom Price abc@gmail.com

2 12 Nick Wright xyz@gmail.com

3 13 Dana Natan mno@yahoo.com


Candidate
Key in DBMS

What is the Foreign key?


FOREIGN KEY is a column that creates a relationship between two tables. The purpose of
Foreign keys is to maintain data integrity and allow navigation between two different
instances of an entity. It acts as a cross-reference between two tables as it references the
primary key of another table.
Example:

DeptCode DeptName

001 Science

002 English

005 Computer

Teacher ID Fname Lname

B002 David Warner

B017 Sara Joseph

B009 Mike Brunton


In this key in dbms example, we have two table, teach and department in a school. However,
there is no way to see which search work in which department.

In this table, adding the foreign key in Deptcode to the Teacher name, we can create a
relationship between the two tables.

Teacher ID DeptCode Fname Lname

B002 002 David Warner

B017 002 Sara Joseph

B009 001 Mike Brunton


This concept is also known as Referential Integrity.

What is the Compound key?


COMPOUND KEY has two or more attributes that allow you to uniquely recognize a
specific record. It is possible that each column may not be unique by itself within the
database. However, when combined with the other column or columns the combination of
composite keys become unique. The purpose of the compound key in database is to uniquely
identify each record in the table.
Example:

OrderNo PorductID Product Name Quantity

B005 JAP102459 Mouse 5

B005 DKT321573 USB 10

B005 OMG446789 LCD Monitor 20

B004 DKT321573 USB 15

B002 OMG446789 Laser Printer 3


In this example, OrderNo and ProductID can’t be a primary key as it does not uniquely
identify a record. However, a compound key of Order ID and Product ID could be used as it
uniquely identified each record.

What is the Composite key?


COMPOSITE KEY is a combination of two or more columns that uniquely identify rows in
a table. The combination of columns guarantees uniqueness, though individually uniqueness
is not guaranteed. Hence, they are combined to uniquely identify records in a table.
The difference between compound and the composite key is that any part of the compound
key can be a foreign key, but the composite key may or maybe not a part of the foreign key.

What is a Surrogate key?


SURROGATE KEYS is An artificial key which aims to uniquely identify each record is
called a surrogate key. This kind of partial key in dbms is unique because it is created when
you don’t have any natural primary key. They do not lend any meaning to the data in the
table. Surrogate key in DBMS is usually an integer. A surrogate key is a value generated right
before the record is inserted into a table.
Fname Lastname Start Time End Time

Anne Smith 09:00 18:00

Jack Francis 08:00 17:00

Anna McLean 11:00 20:00


Fname Lastname Start Time End Time

Shown Willam 14:00 23:00


Above, given example, shown shift timings of the different employee. In this example, a
surrogate key is needed to uniquely identify each employee.

Surrogate keys in sql are allowed when

• No property has the parameter of the primary key.


• In the table when the primary key is too big or complicated.

Difference Between Primary key & Foreign key


Following is the main difference between primary key and foreign key:

Primary Key Foreign Key

It is a field in the table that is


Helps you to uniquely identify a record in the
the primary key of another
table.
table.

A foreign key may accept


Primary Key never accept null values.
multiple null values.

A foreign key cannot


automatically create an
Primary key is a clustered index and data in the
index, clustered or non-
DBMS table are physically organized in the
clustered. However, you can
sequence of the clustered index.
manually create an index on
the foreign key.

You can have multiple


You can have the single Primary key in a table.
foreign keys in a table.

7. What are the integrity rules?


Database systems are integral to the operation of most businesses. A well-designed database
system can improve productivity, accuracy, and security. One key aspect of designing a
database is understanding integrity constraints.

This article will explain different integrity constraints and the importance of their role
in database design. We'll also look at some common types of integrity constraints in
DBMS. Understanding these concepts is essential for creating robust, efficient databases.
We will also provide examples of how these constraints can be used to protect your data.
So, if you’re curious to learn more about integrity constraints in DBMS, read on!

What are Integrity Constraints in DBMS?

Integrity constraints are rules that help to maintain the accuracy and consistency of data in
a database. They can be used to enforce business rules or to ensure that data is entered
correctly. For example, a simple integrity constraint in DBMS might state that all customers
must have a valid email address. This would prevent someone from accidentally entering an
invalid email address into the database. Integrity constraints can also be used to enforce
relationships between tables.

For example, if a customer can only have one shipping address, then an integrity constraint
can be used to ensure that only one shipping address is entered for each customer. Enforcing
integrity constraints in SQL can help prevent data inconsistencies and errors, making it
easier to manage and query the data.

What is the Purpose of Integrity Constraints?

Integrity constraints are an important part of maintaining database correctness. They ensure
that the data in the database adheres to a set of rules, which can help prevent errors and
inconsistencies. In some cases, integrity constraints can be used to enforce business rules,
such as ensuring that a customer's balance remains within a certain limit.

In other cases, they can be used to enforce data integrity, such as ensuring that all values in
a column are unique. Integrity constraints in SQL can be either enforced by the database
system or by application code. Enforcing them at the database level can help ensure that the
rules are always followed, even if the application code is changed. However, enforcing them
at the application level can give the developer more flexibility in how the rules are enforced.

Getting a practical approach to understanding the purpose of integrity constraints needs


some professional assistance, so look for the MongoDB Professional certification courses.
Their experts will make sure to get hands-on learning along with a grasp of technical
knowledge in the easiest way possible.

Types of Integrity Constraints


Integrity constraints in DBMS are used to ensure that data is consistent and accurate. There
are four main types of integrity constraints: domain, entity, referential, and key. Here, we'll
take a closer look & explain the types of integrity constraints along with some examples.

1. Domain Constraint

A domain constraint is a restriction on the values that can be stored in a column. For
example, if you have a column for "age," domain integrity constraints in DBMS would
ensure that only values between 1 and 120 can be entered into that column. This ensures that
only valid data is entered into the database.

2. Entity Integrity Constraint

An entity integrity constraint is a restriction on null values. Null values are values that are
unknown or not applicable, and they can be problematic because they can lead to inaccurate
results. Entity integrity constraints would ensure that null values are not entered into any
required columns. For example, if you have a column for "first name," an entity integrity
constraint in DBMS would ensure that this column cannot contain any null values.

3. Referential Integrity Constraint

A referential integrity constraint is a restriction on how foreign keys can be used. A foreign
key is a column in one table that references a primary key in another table. For example,
let's say you have a table of employees and a table of department managers. The "employee
ID" column in the employee's table would be a foreign key that references the "manager ID"
column in the manager's table.
Referential integrity constraints in DBMS would ensure that every manager ID in the
manager's table has at least one corresponding employee ID in the employee's table. In other
words, it would prevent you from assigning an employee to a manager who doesn't exist.

4. Key Constraint

Key constraints in DBMS are a restriction on duplicate values. A key is composed of one or
more columns whose values uniquely identify each row in the table. For example, let's say
you have a table of products with columns for "product ID" and "product name." The
combination of these two values would be the key for each product, and a key constraint
would ensure that no two products have the same combination of product ID and product
name.

Types of Key Constraints

Within databases, a key constraint is a rule that defines how data in a column(s) can be
stored in a table. There are several different types of key constraints in DBMS, each with its
own specific purpose. Now, we'll take a high-level look at the five most common types of
key constraints: primary key constraints, unique key constraints, foreign key constraints,
NOT NULL constraints, and check constraints.

1. Primary Key Constraints

A primary key constraint (also known as a "primary key") is a type of key constraint that
requires every value in a given column to be unique. In other words, no two rows in a table
can have the same value for their primary key column(s). A primary key can either be a
single column or multiple columns (known as a "composite" primary key). The null value
is not allowed in the primary key column(s).

2. Unique Key Constraints

A unique key constraint is a column or set of columns that ensures that the values stored in
the column are unique. A table can have more than one unique key constraint, unlike the
primary key. A unique key column can contain NULL values. Like primary keys, unique
keys can be made up of a single column or multiple columns.

3. Foreign Key Constraints

A foreign key constraint defines a relationship between two tables. A foreign key in one
table references a primary key in another table. Foreign keys prevent invalid data from being
inserted into the foreign key column. Foreign keys can reference a single column or multiple
columns.

4. NOT NULL Constraints

A NOT NULL constraint is used to ensure that no row can be inserted into the table without
a value being specified for the column(s) with this type of constraint. Thus, every row must
have a non-NULL value for these columns.

5. Check Constraints

A check constraint enforces data integrity by allowing you to specify conditions that must
be met for data to be inserted into a column. For example, you could use a check constraint
to ensure that only positive integer values are inserted into a particular column. Check
constraints are usually used in combination with other constraints (such as NOT NULL
constraints) to enforce more complex rules.

There are several different types of key constraints in DBMS that you can use in SQL
databases. Each type of constraint has its own specific use cases and benefits. By
understanding when to use each type of constraint, you can ensure that your database is both
reliable and consistent. For in-depth knowledge of the types of integrity constraints, you can
go for the MongoDB Administration certification & expand your knowledge and develop
a stronger outlook.
Advantages of Integrity Constraints

Integrity constraints in DBMS can be used to enforce rules at the database level, which
means that they are applied to all users and applications that access the database. There are
several advantages to using integrity constraints in SQL, which will be outlined in more
detail below.

1. Declarative Ease

One of the advantages of integrity constraints is that they can be declared easily. Integrity
constraints are written in a declarative language, which means that they can be specified
without having to write code. This makes it easy for even non-technical users to understand
and specify rules.

2. Centralized Rules

Another advantage of integrity constraints is that they provide a centralized way to specify
rules. Therefore, rules only have to be specified once and then they can be enforced across
the entire database. This is much more efficient than having to specify rules individually for
each application or user.

3. Flexibility When Loading Data

Integrity constraints also provide flexibility when loading data into the database. When data
is loaded into the database, the integrity constraints are checked automatically. In other
words, if there are any problems with the data, they can be detected and corrected
immediately.

4. Maximum Application Development Productivity

Using integrity constraints can also help to maximize application development productivity.
This is because developers do not have to write code to enforce rules; they can simply
specify the rules using an integrity constraint language. This saves time and effort during
development and makes it easier to create consistent and reliable applications.

5. Immediate User Feedback

Finally, using integrity constraints in DBMS provides immediate feedback to users when
they attempt to violate a rule. For example, if a user tries to insert an invalid value into a
database column, the database will reject the attempted insertion and return an error message
to the user instead. This provides a clear indication to the user that their input is incorrect
and needs to be corrected.

Why are Integrity Constraints Important?

Integrity constraints are important for several reasons. First, they help to ensure the accuracy
of data by preventing invalid data from being entered into the database. Second, they help
to maintain the consistency of data by ensuring that data is consistent across different tables
and fields. Third, they help to prevent unauthorized access to data by ensuring that only
authorized users can access specific data.

Finally, they help to optimize performance by ensuring that only valid data is accessed and
processed. By enforcing integrity constraints, databases can maintain a high level of
accuracy and consistency while also preventing unauthorized access and optimizing
performance.

8. What do you understand from a data dictionary?


Data Dictionary in DBMS

Till now, we learned and understood about relations and its representation. In the relational
database system, it maintains all information of a relation or table, from its schema to the
applied constraints. All the metadata is stored. In general, metadata refers to the data about
data. So, storing the relational schemas and other metadata about the relations in a structure is
known as Data Dictionary or System Catalog.

A data dictionary is like the A-Z dictionary of the relational database system holding all
information of each relation in the database.

The types of information a system must store are:

o Name of the relations


o Name of the attributes of each relation
o Lengths and domains of attributes
o Name and definitions of the views defined on the database
o Various integrity constraints

With this, the system also keeps the following data based on users of the system:
o Name of authorized users
o Accounting and authorization information about users.
o The authentication information for users, such as passwords or other related
information.

In addition to this, the system may also store some statistical and descriptive data about
the relations, such as:

o Number of tuples in each relation


o Method of storage for each relation, such as clustered or non-clustered.

A system may also store the storage organization, whether sequential, hash, or heap. It
also notes the location where each relation is stored:

o If relations are stored in the files of the operating system, the data dictionary note, and
stores the names of the file.
o If the database stores all the relations in a single file, the data dictionary notes and store
the blocks containing records of each relation in a data structure similar to a linked list.

At last, it also stores the information regarding each index of all the relations:

o Name of the index.


o Name of the relation being indexed.
o Attributes on which the index is defined.
o The type of index formed.

All the above information or metadata is stored in a data dictionary. The data dictionary also
maintains updated information whenever they occur in the relations. Such metadata constitutes
a miniature database. Some systems store the metadata in the form of a relation in the database
itself. The system designers design the way of representation of the data dictionary. Also, a
data dictionary stores the data in a non-formalized manner. It does not use any normal form so
as to fastly access the data stored in the dictionary.

For example, in the data dictionary, it uses underline below the value to represent that the
following field contains a primary key.

So, whenever the database system requires fetching records from a relation, it firstly finds in
the relation of data dictionary about the location and storage organization of the relation. After
confirming the details, it finally retrieves the required record from the database.

Why Use a Data Dictionary?


Data Dictionary is made up of two words, data which means the collected information through
multiple sources, and dictionary meaning the place where all this information is made
available.

A data dictionary is a crucial part of a relational database as it provides additional information


about the relationships between multiple tables in a database. The data dictionary in DBMS
helps the user to arrange data in a neat and well-organized way, thus preventing data
redundancy.

Below is a data dictionary describing the table containing employee details.

Max Field
Attribute Name Data Type Description isRequired
Size

A unique ID for each


Employee ID Integer 10 Yes
Employee

Name Text 25 Name of the Employee Yes

Date of Birth of the


Date of Birth DateTime 10 Yes
Employee

Contact Number of the


Mobile Number Integer 10 Yes
Employee

Some advantages of using a data dictionary are:

1. Data models in DBMS provide very little information about the database, so a data
dictionary is very essential to have proper knowledge about entities, relationships, and
attributes that are present in a data model.
2. The Data Dictionary provides consistency by reducing data redundancy in the
collection and use of data across various members of a team.
3. The Data Dictionary provides structured analysis and design tools by enforcing the use
of data standards. Data standards are the set of rules that govern the way data is collected,
recorded, and represented.
4. Using a Data Dictionary helps to define naming conventions that are used in a model.

Types of Data Dictionary in DBMS

There are mainly two types of data dictionary in a database management system:

1. Integrated Data Dictionary


2. Stand Alone Data Dictionary

1. Integrated Data Dictionary

Every relational database has an Integrated Data Dictionary contained within the DBMS.
This integrated data dictionary acts as a system catalog that is accessed and updated by the
relational database. In older databases, they did not include an integrated data dictionary, so in
that case, the database administrator had to use Stand Alone Data Dictionary. In DBMS, an
Integrated Data Dictionary can bind metadata to data.

The Integrated Data Dictionary can be further classified into two types:

• Active: An active data dictionary is updated automatically by the DBMS whenever any
changes are made to the database. This is also known as a self-updating dictionary as it keeps
the information up-to-date.

• Passive: In contrast to an active dictionary, a passive dictionary needs to be updated


manually whenever any changes are made to the database. This type of data dictionary is
difficult to handle as it requires proper handling. Otherwise, the database and the data
dictionary will get unsynchronized.

2 Stand Alone Data Dictionary

In DBMS, this type of data dictionary is very flexible as it allows the Database Administrator
to define and manage all the confidential data. It doesn't matter whether the data is
computerized or not. A stand-alone data dictionary allows database designers to interact with
end-users regardless of the data dictionary format.

There is no standard format for a data dictionary. Below given are some of the common
elements:

1. Data Elements: The Data Dictionary stores the definition of all the data elements such
as name, datatype, storage formats, and validation rules.
2. Tables: All information regarding the table, such as the user who created the table, the
number of rows and columns, the date on which the table was created and accessed, etc.
3. Index: Indexes for defined database tables are stored in the data dictionary. DBMS
stores the index name used by the attributes, location, and characteristics of the index, as well
as the date of creation, in each index.
4. Programs: Programs defined to access the database, including reports, application and
screen formats, SQL queries, etc., are also stored in the data dictionary.
5. Relationship between data elements: The Data Dictionary stores the type of
relationship; for example, if it is compulsory or optional, the cardinality of the relationship and
connectivity, etc.
6. Administrations and End-Users: The Data Dictionary stores all the information of
the administration along with the end-users.

The metadata in DBMS, which is stored in the Data Dictionary, is similar to a monitor that
monitors the use of the database and the allocation of permission to access the database by the
users.
How to Create a Data Dictionary?

As discussed above, most businesses rely on database management systems having an


integrated data dictionary as they are updated automatically and are easy to maintain.
Documentation for a data dictionary can be generated in various types of relational databases
like MySQL, SQL Server, Oracle, etc.

While creating a stand-alone data dictionary, the database administrator can take the help of a
template in SQL Server, Oracle, or even Microsoft Excel.

Various notations used to create a data dictionary are:

Data Construct Notation Stands For

Composition = is composed of

Sequence + AND

Selection [|] OR

Repetition {}�{}n n repetitions

Parentheses () to represent optional data

Comment *…* to define a comment

9. What is normalization? Explain the types of


normalization with an example.
Normalization

A large database defined as a single relation may result in data duplication. This repetition of
data may result in:

o Making relations very large.


o It isn't easy to maintain and update data as it would involve searching many records in
relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply
to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.

Following are the various types of Normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.

Advantages of Normalization

o Normalization helps to minimize data redundancy.


o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization

o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35
83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which


is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}.


...so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

Backward Skip 10sPlay VideoForward Skip 10s

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300


Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then
the relation will be a multi-valued dependency.

Example

STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY
21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a
valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1ckward Skip 10sPlay VideoForward Skip 10s

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John
Semester 2 Akash

Semester 1 Praveen

10. What is Relational Decomposition (lossless join


decomposition, Dependency Preserving)
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.

Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).

11. What is Multivalued Dependency and Join


dependency ?
o Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a
third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and


independent of each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL


multidetermined COLOR".
Join decomposition is a further generalization of Multivalued dependencies.

If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency
(JD) exists.

Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).

Alternatively, R1 and R2 are a lossless decomposition of R.

A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join
decomposition.

The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation
R.

Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.

12. What do you understand by inclusion dependencies?


Inclusion dependencies are generalized form of referential constraints and are used to guide the
design of the database, however, they usually have little influence on how the database is
actually designed. Inclusion dependency is widespread but less prevalent than functional
dependency, join dependency and multivalued dependency.

A statement in which some columns of any relation are contained in other columns is known
as an Inclusion Dependency. Inclusion dependencies, like functional dependencies, represent
one-to-many relationships. However, inclusion dependencies are more commonly used to
represent relationships between relations. A foreign key is an example of inclusion
dependency. The relation which it is referring is contained in the column of primary key.

Inclusion Dependency Example

Let's say we take two relations, namely R and S that are created by using two entity sets in a
way that every entity in R is also S entity. Inclusion dependence occurs when projecting R's
key attributes gives a relation that is contained in the relation acquired by projecting S's key
attributes.

Let's name the relations R as teacher and S as student, so take the attribute as teacher_id, so
we can write:

• teacher.teacher_id --> student.teacher_id


teacher:

teacher_id (primary key) name department

1 Ram Kumar DBMS

student:

student_1 name teached_id (foreign key) age

1 Rahul Singh 1 18

teacher_id will be the primary key for teacher table and will be foreign key for the student
table, attributes of the teacher table will be available in the student table.

So this foreign key concept makes the inclusion dependency possible.

Inference Axioms for Inclusion Dependencies

Interference axioms for inclusion dependencies are described in the following table:

Axiom Formal Expression

Reflexive rule A -> A

Projection and Permutation rule IF AB -> CD THEN A -> C AND B -> D

Transitivity rule IF A -> B AND B -> C THEN A -> C

• Reflexive rule here states that a table can have attributes and can project on itself:
If X⊇X then X->X.
• Projection and Permutation rule here states that if IF AB->CD then A->C AND B-
>D.
• Transitivity rule here states that if a table A projects to B and B projects to C, so We
can conclude A->C.

13. What are Codd rules?


Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems,
came up with twelve rules of his own, which according to him, a database must obey in order
to be regarded as a true relational database.

These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.

Rule 1: Information Rule


The data stored in a database, may it be user data or metadata, must be a value of some table
cell. Everything in a database must be stored in a table format.

Rule 2: Guaranteed Access Rule

Every single data element (value) is guaranteed to be accessible logically with a combination
of table-name, primary-key (row value), and attribute-name (column value). No other means,
such as pointers, can be used to access data.

Rule 3: Systematic Treatment of NULL Values

The NULL values in a database must be given a systematic and uniform treatment. This is a
very important rule because a NULL can be interpreted as one the following − data is missing,
data is not known, or data is not applicable.

Rule 4: Active Online Catalog

The structure description of the entire database must be stored in an online catalog, known
as data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.

Rule 5: Comprehensive Data Sub-Language Rule

A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be
used directly or by means of some application. If the database allows access to data without
any help of this language, then it is considered as a violation.

Rule 6: View Updating Rule

All the views of a database, which can theoretically be updated, must also be updatable by the
system.

Rule 7: High-Level Insert, Update, and Delete Rule

A database must support high-level insertion, updation, and deletion. This must not be limited
to a single row, that is, it must also support union, intersection and minus operations to yield
sets of data records.

Rule 8: Physical Data Independence

The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data
is being accessed by external applications.

Rule 9: Logical Data Independence

The logical data in a database must be independent of its user’s view (application). Any change
in logical data must not affect the applications using it. For example, if two tables are merged
or one is split into two different tables, there should be no impact or change on the user
application. This is one of the most difficult rule to apply.

Rule 10: Integrity Independence

A database must be independent of the application that uses it. All its integrity constraints can
be independently modified without the need of any change in the application. This rule makes
a database independent of the front-end application and its interface.

Rule 11: Distribution Independence

The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.

Rule 12: Non-Subversion Rule

If a system has an interface that provides access to low-level records, then the interface must
not be able to subvert the system and bypass security and integrity constraints.

You might also like