DATABASE MANAGEMENT SYSTEMS
Part 1 : Introduction of DBMS
A very large, integrated collection of data.
Models real-world enterprise.
◦ Entities (e.g., students, courses)
◦ Relationships (e.g., Madonna is taking CS564)
A Database Management System (DBMS) is a
software package designed to store and manage
databases.
Data independence and efficient access.
Reduced application development time.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from
crashes.
DBMS contains information about a particular enterprise
◦ Collection of interrelated data
◦ Set of programs to access the data
◦ An environment that is both convenient and efficient to
use
Database Applications:
◦ Banking: all transactions
◦ Airlines: reservations, schedules
◦ Universities: registration, grades
◦ Sales: customers, products, purchases
◦ Online retailers: order tracking, customized
recommendations
◦ Manufacturing: production, inventory, orders, supply
chain
◦ Human resources: employee records, salaries, tax
deductions
Application must stage large datasets
between main memory and secondary
storage (e.g., buffering, page-oriented access,
32-bit addressing, etc.)
Special code for different queries
Must protect data from inconsistency due to
multiple concurrent users
Crash recovery
Security and access control
In the early days, database applications were built directly on top
of file systems
Drawbacks of using file systems to store data:
◦ Data redundancy and inconsistency
Multiple file formats, duplication of information in different
files
◦ Difficulty in accessing data
Need to write a new program to carry out each new task
◦ Data isolation — multiple files and formats
◦ Integrity problems
Integrity constraints (e.g. account balance > 0) become
“buried” in program code rather than being stated explicitly
Hard to add new constraints or change existing ones
Drawbacks of using file systems (cont.)
◦ Atomicity of updates
Failures may leave database in an inconsistent state
with partial updates carried out
Example: Transfer of funds from one account to another
should either complete or not happen at all
◦ Concurrent access by multiple users
Concurrent accessed needed for performance
Uncontrolled concurrent accesses can lead to
inconsistencies
Example: Two people reading a balance and updating it
at the same time
◦ Security problems
Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems
An architecture for a database system
Physical level: describes how a record (e.g.,
customer) is stored.
Logical level: describes data stored in database,
and the relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : string;
end;
View level: application programs hide details of
data types. Views can also hide information (such
as an employee’s salary) for security purposes.
Similar to types and variables in programming
languages
Schema – the logical structure of the database
◦ Example: The database consists of information
about a set of customers and accounts and the
relationship between them)
◦ Analogous to type information of a variable in a
program
◦ Physical schema: database design at the
physical level
◦ Logical schema: database design at the logical
level
Instance – the actual content of the database
at a particular point in time
◦ Analogous to the value of a variable
Physical Data Independence – the ability to
modify the physical schema without changing
the logical schema
◦ Applications depend on the logical schema
◦ In general, the interfaces between the various
levels and components should be well defined
so that changes in some parts do not
seriously influence others.
A collection of tools for describing
◦ Data
◦ Data relationships
◦ Data semantics
◦ Data constraints
Relational model
Entity-Relationship data model (mainly for database design)
Object-based data models (Object-oriented and Object-
relational)
Semi structured data model (XML)
Other older models:
◦ Network model
◦ Hierarchical model
Oracle
Sybase
Informix (Unix)
DB2, SQL/DS (IBM)
Access, SQL Server (Microsoft)
Older (IMS, Focus)
Many limited to PC (MS Access,
dBASE, Paradox, …)
Open source: MySQL (more)
• First commercial standard (IBM’s IMS) Entry point
• Still used (legacy systems)
Customers Customer
files
XYZ
pointers
Order 1 Order 2
Order
files pointers
Item A Item B Item A Item C
To retrieve how many of item A are sold,
Items start at the top from Customer. Then all
Item#ItemName Quantity nested data are retrieved top-down and
998 Dog Food 12 left-right.
764 Cat Food 11
Different data models needed for different
retrieval tasks (e.g., Order at the root) =>
high data redundancy in DBS! 14
of
15
Entry point
Relationships between records also
supported by pointers;
Customer complex programming.
XYZ
Order 1 Order 2
Entry points
Item A Item B Item C
One data model supports different retrieval paths (by
customer, order, item).
15
of
15
Primary Key (PK, Key) Foreign Key (FK)
Customer(CustomerID, Name, …)
Order(OrderID, CustomerID, OrderDate, …)
ItemOrdered(OrderID, ItemID, Quantity, …)
Item(ItemID, Description, Price, …)
• Data organized as logical tables, consisted of rows
(records) and columns (attributes), and connected via key
attributes.
• Possible to retrieve almost any combination of rows and
columns, and a specific piece of data (field) within a row.
• Pointers transparent to developers, just need to specify
Key—Foreign Key relationships. 16
of
15
Order Customer
OrderID CustomerID Government
CustomerID Name Customer
… … Commercial
ContactName
NewOrder Add Customer ContactPhone
ContactName
DeleteOrder Drop Customer Discount, …
ContactPhone
… Change Address …
NewContact
Data
AddNewContact
Procedures (behavior,
methods) for processing
data.
Generalization/Specialization
supported; inheritance
17
of
15
More frequent than pure object-oriented
systems
Architecture:
◦ Database is relational
◦ Objects are created in main memory according to
class diagram and business rules, and populated by
data from the relational databases (data access
layer in system sequence diagrams)
◦ System operations, then, performed by objects
18
of
15
A data model is a collection of concepts for
describing data.
A schema is a description of a particular
collection of data, using the a given data
model.
The relational model of data is the most
widely used model today.
◦ Main concept: relation, basically a table with
rows and columns.
◦ Every relation has a schema, which describes the
columns, or fields.
Relational database: a set of relations
Relation: made up of 2 parts:
◦ Instance : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
◦ Schema : specifies name of relation, plus name
and type of each column.
E.G. Students (sid: string, name: string,
login: string, age: integer, gpa: real).
Can think of a relation as a set of rows or tuples
(i.e., all rows are distinct).
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
Cardinality = 3, degree = 5, all rows distinct
Do all columns in a relation instance have to
be distinct?
Conceptual schema:
◦ Students(sid: string, name: string, login: string,
age: integer, gpa:real)
◦ Courses(cid: string, cname:string, credits:integer)
◦ Enrolled(sid:string, cid:string, grade:string)
Physical schema:
◦ Relations stored as unordered files.
◦ Index on first column of Students.
External Schema (View):
◦ Course_info(cid:string,enrollment:integer)
Applications insulated from how data is
structured and stored.
Logical data independence: Protection from
changes in logical structure of data.
Physical data independence: Protection from
changes in physical structure of data.
Example of tabular data in the relational model
Attributes
Users are differentiated by the way they expect to
interact with the system
Application programmers – interact with system
through DML calls
Sophisticated users – form requests in a database
query language
Specialized users – write specialized database
applications that do not fit into the traditional data
processing framework
Naïve users – invoke one of the permanent
application programs that have been written
previously
◦ Examples, people accessing database over the web,
bank tellers, clerical staff
Coordinates all the activities of the database
system
◦ has a good understanding of the enterprise’s
information resources and needs.
Database administrator's duties include:
◦ Storage structure and access method definition
◦ Schema and physical organization modification
◦ Granting users authority to access the database
◦ Backing up data
◦ Monitoring performance and responding to
changes
◦ Database tuning
Storage management
Query processing
Transaction processing
Storage manager is a program module that
provides the interface between the low-level data
stored in the database and the application
programs and queries submitted to the system.
The storage manager is responsible to the
following tasks:
◦ Interaction with the file manager
◦ Efficient storing, retrieving and updating of data
Issues:
◦ Storage access
◦ File organization
◦ Indexing and hashing
1.Parsing and translation
2.Optimization
3.Evaluation
Alternative ways of evaluating a given query
◦ Equivalent expressions
◦ Different algorithms for each operation
Cost difference between a good and a bad way of
evaluating a query can be enormous
Need to estimate the cost of operations
◦ Depends critically on statistical information
about relations which the database must
maintain
◦ Need to estimate statistics for intermediate
results to compute cost of complex expressions
A transaction is a collection of operations that
performs a single logical function in a database
application
Transaction-management component ensures
that the database remains in a consistent (correct)
state despite system failures (e.g., power failures
and operating system crashes) and transaction
failures.
Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.
The architecture of a database systems is greatly
influenced by the underlying computer system on which
the database is running:
Centralized
Client-server
Parallel (multiple processors and disks)
Distributed
(web browser)
Old Modern
1950s and early 1960s:
◦ Data processing using magnetic tapes for storage
Tapes provide only sequential access
◦ Punched cards for input
Late 1960s and 1970s:
◦ Hard disks allow direct access to data
◦ Network and hierarchical data models in
widespread use
◦ Ted Codd defines the relational data model
Would win the ACM Turing Award for this work
IBM Research begins System R prototype
UC Berkeley begins Ingres prototype
◦ High-performance (for the era) transaction
processing
1980s:
◦ Research relational prototypes evolve into commercial
systems
SQL becomes industry standard
◦ Parallel and distributed database systems
◦ Object-oriented database systems
1990s:
◦ Large decision support and data-mining applications
◦ Large multi-terabyte data warehouses
◦ Emergence of Web commerce
2000s:
◦ XML and XQuery standards
◦ Automated database administration
◦ Increasing use of highly parallel database systems
◦ Web-scale distributed data storage systems