Source: A Silberschatz, H F Korth and S Sudarshan, “Database System
Concepts”, McGRAW Hill.
Taught By: Rashmi Bhattad
Database system applications
Purpose of Database Systems
View of Data
Data models
Approaches to building a database
Database management system(DBMS)
Three levels of the architecture
Challenges in building a DBMS
Various components of a DBMS architecture.
2
DBMS is a
Collection of interrelated data i.e. database
Set of programs to access the data
An environment that is both convenient and efficient to use
Database Applications:
Banking: transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, recommendations
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
Databases can be very large.
Databases touch all aspects of our lives
Sales:
For customer, product, and purchase information.
Accounting:
For payments, receipts, account balances, assets and other
accounting information.
Human resources:
For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
Manufacturing:
For management of the supply chain and for tracking production
of items in factories, inventories of items in warehouses and
stores, and orders for items.
4
Add new students, faculties, and courses
Register students for courses, and generate class masters
Assign grades to students, compute Cumulative grade point
averages (CGPA) and generate transcripts
Maintain attendance, placement, etc. record
Maintain ORSP project, alumni etc. record
And so on.
Data redundancy
Example of data redundancy
Account Department
Student Name, Address, ID, branch, fee, year
Computer Department (SOT admin)
Student name, Address, ID, branch, class, batch, year
Placement Department
Student name, Address, ID, branch, class, batch, year
And so on
Data inconsistency
Various piece of information not agree. i.e. address in ICT
department file is updated while same address is not reflected in
account and placement.
Difficulty in accessing data
Consider that SOT admin wants to find the list of
students residing on the pincode 382007. how they do ?
First way : manually search in database
Second way : ask programmer to make program to search in
large database
After several days, they want to know list of students
having CGPA > 8 (may be for placement purpose)
Again they have to follow the same ways as earlier.
In short, need to write a new program to carry out each
new task
7
Data isolation
Data will be scattered among multiple files which
also scattered among various formats
Thus, if makes difficult for programmers to write
programs.
Like excel file may not open in linux or mac
environment very well.
8
Integrity problems
Integrity constraints (e.g., Marks >= 0 AND Marks
<=100)
Developers add these constraints by adding hard
coded lines in several programs.
However, when new constraints are added,
it is difficult to change
The problem is compounded when constraints
involve several data items from different files.
9
Atomicity of updates
Failures may leave database in an inconsistent state with partial
updates carried out
Example :
Consider a program to send INR 5000 from my account to you.
If system failure during program execution than it may happen that 5000
deducted from account while it may not deposited in your account.
10
Security problems
In general, it is suggested that every user have
specific access to certain files/data, not all.
Example : Account department user does not need
access to academic records of student.
Hard to provide user access to some, but not all,
data
11
Concurrent access by multiple users
Concurrent access needed for performance
Example : department C has balance 1,00,000. Clerk A will
see this and run program to withdraw 10,000. Same time
clerk B will run withdraw of 5,000.
What will final balance ?
Whoever’s program last executed that balance will be reflected.
Uncontrolled concurrent accesses can lead to
inconsistencies
Database systems offer solutions to all the above
problems
Physical level: describes how a record (e.g.,
student) is stored.
Logical level: describes data stored in database,
and the relationships among the data.
type student= record
ID : string;
name : string;
dept_name : string;
address: integer;
end;
View level: application programs hide details of
data types. Views can also hide information (such as
an student’s address) for security purposes.
Similar to types and variables in programming languages
Logical Schema – the overall logical structure of the database
Example: The database consists of information about a set of customers and
accounts in a bank and the relationship between them
Analogous to type information of a variable in a program
Physical schema– the overall physical structure of the database
Instance – the actual content of the database at a particular point
in time
Analogous to the value of a variable
Physical Data Independence – the ability to modify the physical
schema without changing the logical schema
Applications depend on the logical schema
In general, the interfaces between the various levels and components should
be well defined so that changes in some parts do not seriously influence
others.
Storage manager
Query processing
Transaction manager
Storage manager is a program module that
provides the interface between the low-level
data stored in the database and the application
programs and queries submitted to the system.
The storage manager is responsible to the
following tasks:
Interaction with the OS file manager
Efficient storing, retrieving and updating of data
Issues:
Storage access
File organization
Indexing and hashing
1. Parsing and translation
2. Optimization
3. Evaluation
What if the system fails?
What if more than one user is concurrently updating
the same data?
A transaction is a collection of operations that
performs a single logical function in a database
application
Transaction-management component ensures that
the database remains in a consistent (correct) state
despite system failures (e.g., power failures and
operating system crashes) and transaction failures.
Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.
Database
The architecture of a database systems is greatly influenced by
the underlying computer system on which the database is
running:
Centralized
Client-server
Parallel (multi-processor)
Distributed
24
Conceptual level
Like one can see the student_ID
Internal Level
Like one can see the type of student_ID and any constraints
External level
Like one can see the storage allocation of student_ID
25
26
27
28
29
30
Database applications are usually partitioned into two or three
parts.
In a two-tier architecture,
the application resides at the client machine,
where it invokes database system functionality at the server
machine through query language statements.
Application program interface standards like ODBC and JDBC are
used for interaction between the client and the server.
31
In contrast, in a three-tier architecture, the client machine acts
as merely a front end and does not contain any direct database
calls.
Instead, the client end communicates with an application
server, usually through a forms interface.
The application server in turn communicates with a database
system to access data.
32
The business logic of the application,
which says what actions to carry out under what conditions, is
embedded in the application server, instead of being distributed
across multiple clients.
Three-tier applications are more appropriate
for large applications, and for applications that run on the World
Wide Web.
33
A collection of tools for describing
Data
Data relationships
Data semantics
Data constraints
Relational model
Entity-Relationship data model
Object-based data models
Object-oriented
Object-relational
Semi-structured data model (XML)
Other older models:
Network model
Hierarchical model
attributes
(or columns)
tuples
(or rows)
35
The set of allowed values for each attribute is called the domain of the
attribute
The domain of the name attribute is the set of all possible instructor
names.
The domain of the ID attribute is the set of all possible ID i.e. from 1 to 70.
Attribute values are (normally) required to be atomic.
A domain is atomic if elements of the domain are considered to be
indivisible units.
For example, suppose the table instructor had an attribute phone number,
which can store a set of phone numbers corresponding to the instructor.
Then the domain of phone number would not be atomic, since an element
of the domain is a set of phone numbers, and it has subparts, namely the
individual phone numbers in the set.
The special value null is a member of every domain. Indicated that the
value is “unknown”
36
A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di
The current values (relation instance) of a relation are specified by
a table
An element t of r is a tuple, represented by a row in a table
37