Database Management System
(CSC 631)
Sanghita Bhattacharjee
Dept of CSE
NIT Durgapur
Course Objectives
Understand the basic concepts and appreciate the applications of database systems
Comprehend the fundamentals of design principles for logical design of relational
databases
Apply the query writing skill and its subsequent optimization
Discuss the basic issues of transaction processing and concurrency control
Course Content
This course highlights basic concepts of DBMS, the principals of
DBMS, architecture of DBMS, ER model and EER model,
relational algebra, tuple relational calculus, SQL, database design ,
normalization, transaction management, concurrency control,
recovery
This course gives a concept of distributed database
Acknowledgment / References
A Silberschatz, H F Korth and S Sudarshan, Database System Concepts, 5th Edition, 2006
Ramez Elmasriand Shamkant, B Navathe, Fundamentals of Database Systems, 3rd Edition,
Addison Wesley, 2000
Video lectures:
(i) Database Management System by Prof. Partha Pratim Das
(ii) Introduction to database systems by Prof. P. Sreenivasa Kumar
Lecture 1
Introduction
In computerized information system, data is the basic resource of
the organization / enterprise. So, proper organization and
management for data is required for the organization to run
smoothly. Database management system deals the knowledge of
how data stored and managed on a computerized information
system. In any organization, it requires accurate and reliable data for
better decision making, ensuring privacy of data and controlling data
efficiently.
Database
What is Data ?
By data, we mean the known facts that can be recorded and that have implicit meaning. It
can also be defined as the representation of facts, concepts or instruction in a formal
manner, which is suitable for understanding and processing
e.g: text, number, speech, image, audio, video, sensor data (traffic data)
Database is the collection of inter related data
What is Information ?
Information is the processed data on which decisions and actions are based. Or information
is processed meaningful data. Information can be defined as the organized and classified
data to provide meaningful values
Eg: “ Age of Rohan is 18”
Database Management System
DBMS is the set of programs that are used to access and modify the data in
database
DBMS is a general purpose software system that enables
Creation of large disk-resident databases
Efficient retrieval of data
Concurrent use of the system by allowing multiple users in a consistent manner
Avoid redundancy of data and provides correctness of data
Ensure availability of data in spite of system failures (disk failure, power failure,
software failure etc.)
It provides an environment that is both convenient and efficient to use in order to
store and access information in database
Database
Purpose of database
Collect and manage various data about a real-world enterprise or part of an
enterprise
Maintain the data to serve various purposes of enterprise
Update database to reflect the changes
An Example
A common example is : University Database
Data about students, faculty, courses, staff, alumni, course- taken, grade_report
etc.
Reflects the academic aspects of the university
Purpose: To keep an accurate track of the academic activities of the university
Other Examples:
Banking: all transactions
Airlines: reservations, schedules
Sales: customers, products, purchases
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
Databases touch all aspects of our lives
File System Storage based Approach
File-processing system is supported by a conventional operating system.
In the early days, file system was used to store data
A file is a collection of related information
A system of files and collection of application programs manipulating them is a file-based
system
File System Storage based Approach
Text file DBMS (My SQL)
1 Sohini Kolkata 9434578960 Sl no name Address phno
2 Deepika Haldia 9567123469 1 sohini Kolkata 943457896
…………… 0
…………..
2 Deepika Haldia 956712346
9
File system
File system
Memory/HD
Memory/HD
Structured data vs Unstructured data
Structured data Unstructured data
Has predefined format or predefined data model Cannot be arranged according to predefined
or schema. data set
Examples: Adhar no, address, name, credit card Examples: audio, video, image, geo spatial data,
no, age, price sensor data ( traffic data, weather data), email
content, social media data( whatsapp, FB data)
Textual only Textual and non textual too
Quantitative . Easy to store and access Qualitative. Difficult to mange and access
Relational data based to store it Relational database cannot be used to store
unstructured data
Human generated , machine generated Human generated , machine generated
SQL in RDBMS is used to access structured data Platform like No-SQL (Not only SQL) is used for
RDMS housing, managing and using unstructured data.
MONGO DB is used to store
Data is stored in data ware house Data is stored in data ware house, data lakes
Drawbacks of using File Systems (cont.)
Uses unstructured data ( e.g. :: images, audio, video, email content,
social media data)
Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
Difficulty in accessing data
Need to write a new program to carry out each new task i.e. each
different data access request (a query) performed by a separate
program
Data isolation — multiple files and formats
Integrity problems
Integrity constraints (e.g. account balance > 0) become part of
program code
Hard to add new constraints or change existing ones
Drawbacks of using File Systems (cont.)
Atomicity problem
Failures may leave database in an inconsistent state with partial updates
carried out
Example: transfer of funds from one account to another should either
complete or not happen at all
Concurrent access by multiple users
Concurrent accessed needed for performance
Concurrent accesses of system in uncontrolled manner can lead to
inconsistencies
Example: two people reading a balance and want to update the account at the
same time
Security problems
Database systems offer solutions to all the above problems
Advantages of DBMS
Controlling redundancy
Improved data sharing
Data integrity
Security
Data consistency
Efficient data access
Data independence
Disadvantages
Little bit complex
Need more memory to run
A Simple Database System Environment
Users/ programmers
Application programs/ queries
Software to process queries
DBMS software
Software to access the stored
data
Meta data ( stored Stored database
data definition)
Abstract View of Data
Major purpose of database system is to provide users with an
abstract view of data. This implies that the system hides certain
details of how data is stored and maintained.
Levels of Abstraction
Physical (Internal) level
This lowest level describes how data/ record (e.g., student
record) is stored in memory
type instructor = record
Logical (Conceptual) level id : string;
name: string;
Next higher level describes what data is stored in database dept_name: string;
and what relationship exits between those data, tables, salary : int;
attributes, and constraints end;
View (External) level
Application programs hide details of data types
Describes only part of the entire database
Consists of set of views
Views can also hide information (e.g., grade, salary) for
security purposes
Data Models
A data model- a collection of concepts that can be used to describe the structure of a database –
provides necessary means to achieve this abstraction. By structure of database, we mean
Data types
data relationships
data semantics
data constraint - that should hold on the data
Most data models also include a set of basic operations for specifying retrievals and updates on
the database
Conceptual data model or high level data model
Provide the concepts that close to the way many users can perceive
Useful for understanding . Stakeholders and database programmer create this
Low level or physical data model
Provide the concepts that describe how data is stored in memory i.e. file format, file
structure
This is meant for computer specialist or DBA. Not for end user
Representational model
It gives logical representation of data without giving details of physical representation.
Database programmer or developer create this
ER Model
ER (Entity Relationship Model)
An example of conceptual level data model
Graphical representation of database design
It provides the concepts of entities, relationships and attributes
Example : The University Database
Entities: student, faculty, course, department, alumni, staff etc.
Relationships: enrolment relationship between student & course, associate
relationship between faculty member, department etc.
Attributes: name, roll_number, address etc., of student entity ; name,
emp_code, phone_no, emp_name etc., of faculty entity etc.
Example of ER Model
department address C_code
m n
course
student enroll
C_name credit
name roll_no
Entity: Student, course
Relationship: Enroll
Attributes: Name, roll_no, address, credit
Relational Model
An example of representational model
Describes logical representation of database without giving physical details of database
Relational model uses a collection of tables to represent both data and relationship
among the data. Each table has multiple columns, and each column has a unique name.
Relation
name Tables are also known as relations.
Attributes
Student
Roll_no Name Address Department
CS19001 Ravi Delhi CSE
CS19002 Nitesh Kolkata CSE
BT19001 Rekha Kolkata BT
CH19005 Raja Kanpur CHE
Tuples
A Sample Relational Database
Other Models
Institute
Hierarchical model
Data is organized into tree based structure Departments Infrastructure
Older
models A child has one and only one parent
Useful for one to many relationship
Network model Teachers Subjects
Students
Data is organized into graph based structure
A child can have multiple parents
Cores Labs Electives
Useful for many to many relationship. More faster than hierarchical model to access data
Object-oriented model
Extension of relational database
Suitable for complex database design such as GIS, telecommunications, scientific
equipment, multi media, CAD/ CAM
Use object oriented languages (C++, java ) to develop such database
Use concept of class, object, object type, in heritance, encapsulation
Other Models
Structured data vs semi structured data
Structured data can be organized by the means of relational database,
while semi structure data cannot be organized into tabular representation
Self describing and flexible in representing data
However, hard to implement
XML can be used to represent semi structured data. HTML, JSON also
Semi structure data model
Permit the specification of data where individual data items of the same
type may have different attributes
Schema or Meta Data vs Instance
In data model, it is important to distinguish the description of database and
database itself
Schema
Description of database or logical structure of database, which is specified
during the time of database design and is not changed frequently
Table names, attributes names with their data types, constraints etc.
Student_name address Roll_no student
course_name Course_code credit course
Data and schema are stored separately
DBMS stores the description of schema of constructs and constraints, called
meta data, in DBMS catalog so that DBMS software can refer to the schema
whenever needs to
Analogous to type of variable in a programming language
Schema diagram of student database
The logical schema is essential for relational database design
Instance
Instance
The actual content of the database at a particular point in time
Snapshot of database at a time
Analogous to value of a variable
Example of Database Instance
Course instance
Student instance
Student_name Address Roll No
Rohit Sharma Mumbai 18CS004
Deva Sharma Delhi 18BT001
Alia Chatterjee Kolkata 18CS007
Udit Goswami Assam 18CH003
Three Schema Architecture
Goal is to separate the user application from the physical database
Level architecture is used to describe various schema
Physical/ internal schema: Internal level has internal schema which describes how data is
stored in the memory i.e. physical storage of the database. Internal schema uses physical data
model and describes data storage and access methods for the database
Storage allocation : hashing, B tree
Access paths: primary index, secondary index
Others: encryption, optimization of internal structure
Designed by DBA
Logical/ conceptual schema: Conceptual level has conceptual schema which describes
database for the users. It hides the details of physical storage structure and concentrates on
entities, attributes, data types, relationships, user operations, constraints etc. Representational
data model can be used to describe conceptual schema. It is designed by database developer
or application programmer
View level schema: View level includes a number of view schemas. Each view schema
describes a part of database that a particular group of users is interested and hides the rest of
the database from that user group. High level data model or representational data model can be used
at this level
Three schemas are only descriptions of data; the actual data is stored
at the physical level only
In the three-schema architecture, each user group refers to its own
external schema. Hence, the DBMS must transform a request
specified on an external schema into a request against the conceptual
schema, and then into a request on the internal schema for
processing over the stored database
If the request is a database retrieval, the data extracted from the
stored database must be reformatted to match the user’s external
view. The processes of transforming requests and results between
levels are called mappings
Illustrating the three schema
architecture
View View 1 View 2 View n
level
View / conceptual
mapping
Logical level
Relation 1 Relation 2 Relation n
Logical/ internal
mapping
Physical Physical schema
level
Why Data Independence
A database normally contains a lot of data in addition to user data. For
example, it stores data about data , meta data, to locate and retrieve
data easily.
It is difficult to update or modify set of meta data once it is stored in the
database. But as database expands, it needs to change over time to
satisfy the requirement of users
If entire data is dependent , it would become a tedious and highly
complex job
To over come this, we need data independence.
What is Data Independence?
Schema architecture can be used to explain data independence
Data independence is defined as the capacity to change schema at one level of
database system without having to change the schema at next higher level
Aim is to keep data separated from all programs that make use of it
Types:
Physical data independence
Logical data independence
Logical Data Independence
The capacity to change the conceptual schema without affecting the view
level schemas or application programs
We may change conceptual schema to expand database
Examples
merging / splitting of records
adding a new attribute to some relation
no need to change the programs or views that don’t require to use the
new attribute
deleting an attribute
no need to change the programs or views that use the remaining data
view definitions in VL-LL mapping only need to be changed for views
that use the deleted attribute
changes to constraints can be applied to conceptual schema without affecting
view level schema
Physical Data Independence
The ability to modify physical/internal schema without affecting the logical or view schema
Changes to internal schema may be needed as some physical files have to be reorganized by
creating additional access structure to improve the performance of retrieval
If same data as before remains in database, no need to change conceptual schema
Physical Data Independence – modification is localized
achieved by suitably modifying PL-LL mapping
A very important feature of modern DBMS
Examples:
modification at physical level creating a new index
changing access method
switching to different data structure
modifying file organization,
use of new storage structure
change of location of database say from C drive to D drive
Database languages
Database languages are used to read, manipulate and store the data in
database
Several languages : DDL, DML, DCL, TCL
Data Definition language (DDL)
Specification notation for defining the database schema. It is used to create schema,
table, index, constraints in database
DDL commands: create, drop, alter, truncate, rename
To create the database/ tables : create
To alter the structure of database : alter
To drop the database/ table: drop
To delete the all records from a table including space allocated for the records: truncate
Example
create table student (
student_name char(30),
address varchar (30),
roll_no char(10) NOT NULL PRIMARY KEY,
age int
);
DDL
Drop table student;
Alter table student add phone_no int(10);
Alter table student drop column age;
Alter table student modify student_ name varchar(30);
DDL
DDL compiler whose function is to process DDL statements in order to identify
descriptions of the schema constructs and to store the schema description in the
DBMS catalog (data dictionary )
Data dictionary is the table which contains the information about database objects
Database schema (description of entities , attributes as well as meaning of data
elements)
Integrity constraint
primary key uniquely identifies records
synonyms, authorization and security codes
database authorization ( who can access what)
Data Manipulation Language (DML)
Language for accessing and manipulating the database
DML also known as query language
Two classes of languages
Procedural – user instructs the system to perform sequence of operations on
database to compute the desired result ( relational algebra)
Non procedural – user describes the desired information without giving a specific
procedure for getting the information (tuple relational calculus, domain relational
calculus)
Pure query language (tuple relational calculus, domain relational calculus, relational algebra)
SQL is the most widely used commercial query language
DML compiler translates DML statements in query language into evaluation plan
consisting of low level instructions that query evaluation engine understand
DML commands: insert, update, delete, select
DML Examples
Insert into student values (‘Anupam’,’Kolkata’,’101’,’982345123’);
Delete * from student where roll_no=‘101’;
Update student set address =‘Delhi’ where roll_no=‘101’;
// careful with update. Where clause specifies which record(s) to be
updated. Omitting where clause, all records in the table will be update
Query
Relational database systems allow a rich class of questions
1. Find the name of the student with roll 18CS1003
2. Find the average marks in CSE department
3. Find the number of students have enrolled in CSE and EE
and many more……
These questions are called queries
To process the query we need query language
SQL
SQL (structured query language ): widely used non-procedural language
Uses to perform information retrieval from database
Used in relational database management system (RDMS) and useful for handling
structured data
E.g. find the name of the student with roll_no 18CS003
select student_name
from student
where roll_no= ’18CS003’;
select * from student; (select all records from student table)
Application programs generally access databases through
Application program interface (e.g. ODBC/JDBC) which allow SQL queries
to be sent to a database
SQL can be embedded in high level language
Copy all columns from one table to another table:
INSERT INTO table2 SELECT * FROM table1 WHERE condition;
Copy only some columns from one table into another table:
INSERT INTO table2 (column1, column2, column3,…)SELECT column1, colum
n2, column3, ... FROM table1 WHERE condition;
SQL
DCL and TCL
DCL: used to control the privilege in database
Grant privilege/ permission to a user to access the database using GRANT
statement
Revoke the privilege using REVOKE statement. Take back the privilege to
access the database
REVOKE and GRANT are examples of DCL
TCL: Manage the transactions in database
Commit: save the work
Roll back: Restores the database to the original since last commit
Savepoint: set a point in transaction to roll back later
Roll back vs Commit
select * from student
student delete * from student where age=17
commit
ID Name Age select * from student
1 A 10
2 B 13
student
ID Name Age
3 A 14
1 A 10
4 C 14
2 B 13
5 V 17
3 A 14
4 C 14
Roll back vs Commit
ID Name Age
1 A 10
student
2 B 13
select * from student
3 A 14
delete * from student where age=17
4 C 14
select * from student
roll back ID Name Age
1 A 10
select * from student 2 B 13
student
3 A 14
4 C 14
5 V 17
Database Design
How to say the database design is “good” or “bad”
Employee table
EmpID Name Salary Dept Building HOD
11 Raja 1000 CSE Annex 1 Dr. Ray
12 David 1000 IT Annex 2 Dr.
Sharma
13 Amir 4000 CHE Main Dr. Gupta
14 Natasha 4000 BT BIO Dr. Palit
15 Dona 3000 CSE Annex 1 Dr. Ray
16 Sanjay 2000 BT BIO Dr. Palit
Any problem in table !!!!!!!!!!!!
Major problem is redundancy
How to design good database
Proper modelling of database (ER model)
Normalization (normalize if the design is bad and make it good)
EmpID Name Salary Dept Dept Building HOD
11 Raja 1000 CSE
CSE Annex 1 Dr. Ray
12 David 1000 IT
13 Amir 4000 CHE IT Annex 2 Dr. Sharma
14 Natasha 4000 BT
CHE Main Dr. Gupta
15 Dona 3000 CSE
16 Sanjay 2000 BT BT BIO Dr. Palit
EMP Dept
Database Users and Administrators
Users are differentiated by the way they expect to interact with the system
Naive users
Unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously
No deep knowledge of database required
Use the GUI provided by an application program
Examples: people accessing database over the web, bank tellers, data entry operators
Database Users and Administrators
Sophisticated user / data analyst
Interact with the system without writing application programs
They form their requests in a database query language and submit each such query
to a query processor, whose function is to break down DML statements into
instructions that the storage manager understands.
Uses SQL to generate answers for complex queries
Application Programmers
Computer professionals who write application programs
Embed SQL in high level programming language and develop programs to satisfy requirements
Interact with system through DML calls
Should thoroughly understand the logical schema or relevant views
Testing of programs is necessary
DBA (Database Administrator)
Coordinates all the activities of the database system; the database
administrator has a good understanding of the enterprise’s information
resources and needs
Functionalities
Creation and modification of conceptual schema
Implementation of storage structure and access method
Physical organization modifications
Grant/ revoke authorization to other users for data access
Integrity constraints specification
Execute immediate recovery procedure in case of failures
Ensure physical security to database
Overall Database Architecture
Data storage and Querying
Storage manager
Query processor
Transaction management
Storage Management
Storage manager is a program module that provides the interface
between the low level data stored in the database and the application
programs and queries submitted to the system
Responsible for interaction with the file manager
The raw data are stored on the disk using the file system, which is
usually provided by a conventional operating system.
The storage manager translates the various DML statements into low-
level file-system commands. Thus, the storage manager is responsible
for storing, retrieving, and updating data in the database.
Query Processor
Parsing and translation
Optimization
Evaluation
Scanner,
Intermediate query
Query Parser
form
and
translator
Desired Execution Evaluation
Optimizer
o/p engine plan
data
Transaction Management
What happens if the system fails?
What happens if more than one users simultaneously access the system?
A transaction is a collection of operations (R/W) that performs a single logical
function in database application
It is unit of atomicity and consistency (correctness of data)
Transaction manager ensures atomicity and consistency even there is some
failures (power failure, system crashes) and it provides recovery from transaction
failures
Concurrency control manager is responsible for managing concurrent execution of
multiple transactions