01 Overview of Database Systems
Questions:
1) What is a DBMS, specifically a RDBMS?
2) Why considering it for managing data?
3) How is application data rep. in a DBMS?
4) How is data retrieved & manipulated, in a DBMS?
5) How does a DBMS support concurrent access & protect data during system failures?
6) Which are the main comp. of a DBMS?
7) Who is involved with DB in real life?
1. Short Definitions
• database = collection of data, typically describing the activities of one or more related organizations
o entities
o relationships between entities
• DBMS = software designed to assist in maintaining & utilizing large collections of data
2. History
• 1960s: Charles Bachman developed the 1st general-purpose DBMS, called Integrated Data Store =>
basis for the network data model
• late 1960s: IBM developed IMS (Information Mgmt. Syst.), used even today => hierarchical data
model
• 1970: Edgar Codd (IBM) proposed new model => relational data model (RDM)
• 1980s: RDM consolidates => dominant DBMS paradigm => SQL developed by IBM
• 1990s: advances in many areas of BD => data warehouse = consolidates data from several DBs
• 1999: James Gray contributes to DB transaction mgmt. = concurrent execution of DB programs
• enterprise resource planning (ERP) & mgmt. resource planning (MRP) emerged
• Internet age: 1st generation of websites stored data exclusively in OS files
• now: more accessible, new visions including multimedia DB, interactive videos, streaming data, etc.
3. File Systems vs DBMS
File Systems drawbacks DBMS advantages
- we may not have a lot of main memory to hold - data independence: provide an abstract view
all of the data of the data that hides details of its representation
- write special programs to answer queries - efficient data access: especially on external
storage devices
- protecting the data against inconsistent - data admin.: centralizing the admin. of data for
changes is difficult user's access
- ensuring data is restored consistently is hard - concurrent access & crash recovery: access
scheduled by DBMS as it's accessed by one user
at a time; users are protected from system
failures
- enforcing security policies for concurrent user's - reduced application dev. time: many actions
access is weak are already acted by DBMS (no need for debug)
Q: Is there a reason to not use a DBMS?
A: yes! – DBMS is complex
- when making application with real-time constraints or just a few well-defined operations for efficient
customer code => DBM too much.
- sometimes an app may need to manipulate data differently than a DBMS (query lang.)
- RDB do not support flexible analysis of text data
4. Describing & Storing Data in a DBMS
• data model = a collection of high-level data description constructs that hide many low-level storage
details
• semantic data model = a more abstract, high-level data model that makes it easier for a user to
come up with a good initial description of the data (ex ER diagram)
5. Relational Model
• central data description construct is the relation = a
set of records
• schema = a description of data in term of a data
model
• relational model -> schema for a relation specifies its
name, name of each fiel/attribute/column, type
ex: Students(sid: string, name: string,
login: string, age: integer, gpa: real)
• each row in the relation = record that describes a student
• integrity constraints = conditions that the records in a relation must satisfy (ex: every student has
a unique sid value)
6. Levels of Abstraction in a DBMS
• the DB description consists of a schema at 3 levels of
abstraction: conceptual, physical, external
• information about the conceptual, external, physical
schemas is stored in the system catalogs
• data definition language (DDL) -> defines the external &
conceptual schemas
a) Conceptual / Logical Schema
• described the stored data in terms of the data model
of the DBMS
• describes all relations that are stored in the DB
• in the university example, these relations contain
information about entities (students, faculty) &
relationships (student's enrolment in courses)
• there is only one per DB
b) Physical Schema
• specifies additional storage details
• what files organizations to use to store the relations and create auxiliary data structures =
indexes -> to speed up data retrieval operations
• there is only one per DB
• example
o store all relations as unsorted files of records
file in DBMS = collection of records or collection of pages, rather than a string of chars
as in an OS
o create indexes on the 1st column of Students, Faculty, Courses, the sal field of Faculty,
and capacity field of Rooms
c) External Schema
• allow data access to be customized at the level of individual users or groups of users
• several per DB guided by end user requirements
• it consists of one or more views = a relation, but with the records not stored in the DBMS
examples: Courseinfo(cid: string, fname: string, enrolment: integer)
• we won't include Courseinfo in the conceptual schema because we can compute it from the
relations of it, so adding it is redundant
7. Data Independence
• application programs are insulated from changes in the way the data is structured or stored
• achieved through conceptual & external schema
• relations in external schema are in principle generated on demand from the ones in the conceptual
schema => if the data is reorganized => the conceptual schema changes => the definition of a view
relation can be modified so the same relation is computed as before
ex:
if we change our Faculty by 2 relations:
Faculty_public(fid: string, fname: string, office: integer)
Faculty_private(fid: string, sal: real)
• Courseinfo will be adjusted for student's to only see the faculty public infos
• => users can be shielded from changes in the logical structure of data or ones in the choice of
relations to be stored = logical/conceptual data independence
• physical data independence is achieved by the fact that users can't change the storage details
8. Queries
• useful tools to retrieve information from the DB
• relational calculus = formal query language based on mathematical logic
• relational algebra = formal query language based on a collection of operators for manipulating
relations
• DBMS enable users to create, modify and query data through a data manipulation language
(DML), and the query language is a part of it
• DML + DDL => data sublanguages when embedded within a host language (C, COBOL)
9. Transaction Mgmt.
• at any given time, it is possible that several users are accessing (and possibly modify) a DB
concurrently => the DBMS must order their requests carefully to avoid conflicts
• the DBMS must protect users from the effects of system failures by ensuring that all data (and the
status of active applications) is restored to a consistent state when the system is restarted after a
crash
• transaction = any one execution of a user program in a DBMS; basic unit of change as seen by the
DBMS
• executing the same program several times => several transactions
• partial transactions aren't allowed
• group of transactions = some serial execution of all transactions
• example of concurrent use:
o DB of airline reservations. At any given time several agents are looking up information about
available seats on various flights and making new set reservations. If for example, one agent
looks up flight 100 on some given day and finds an empty seat, another travel agent may
simultaneously be making a reservation for that seat, thereby making the information by the 1st
one obsolete
• locking protocol = set of rules to be followed by each transaction to ensure that, even though
actions of several transactions might be interleaved, the net effect is identical to executing all of
them in some serial order
• lock = access control mechanism on an object of the DB
o shared lock -> an object can be held by 2 different transactions simultaneously
o exclusive lock -> no other transactions hold any lock on that object
• transaction might be interrupted before running to completion (ex: system crashes)
o to retake the state before the crash => log of all writes to the DB
o Write-Ahead Log (WAL) = property of the log which provides that each write action must be
recorded in the log (on disk) before the corresponding effects take place
o checkpoint = periodically forcing some info. to disk (often slows down normal execution)
10. Structure of a DBMS
• SQL commands generated from various UI -> DBMS -> evaluation, execution, returning of an
answer
• a user-issued query -> parsed through a query optimizer (produces an efficient execution plan)
• execution plan = blueprint for evaluating query, represented as a tree of relational operators
• code implementing relational operators -> sits on top of the file = collection of pages/ collection of
records
• heap files = files of unordered pages
• buffer manager = module which allows files
and access method layer code, which brings
pages from disk to main memory as needed
in response to read request
• disk space manager = lowest layer of the
DBMS, which deals with mgmt. of space on
disk; higher layers allocate, deallocate, read
and write pages through this layer (by
routines)
• transaction manager = ensures the
transaction request and release locks
according to a suitable locking protocol and
schedules the execution of transactions
• lock manager = keeps track of requests for
locks and grants locks on DB objects when
they become available
• recovery manager = responsible for
maintaining a log and restoring the system
to a consistent state after a crash
• all managers interact between themselves
11. People working with DBs
• DB implementors -> build DBMS software
• end user -> store data in a DBMS; they usually use application programs => little technical
knowledge
• Application programmers
o develop packages that facilitates data access for end users, using the host or data lang. and
software tools
o they ideally access data through the external scheme
• DB Administrator (DBA)
o maintains a DB
o design of the conceptual and physical schemas: decide what relations to store (conceptual),
how to store them (physical)
o security & authorization: unauthorized access not permitted
o data availability & recovery: system fails security and recovery, back up data, periodically
checkpoints & logs of system activity
o DB tuning: modify the DB (conceptual & physical schemas) to ensure adequate performance
as end users' requirements change.