Chapter 2
DATABASE SYSTEM CONCEPTS
AND ARCHITECTURE
Outline
2
Data Models and Their Categories
History of Data Models
Schemas, Instances, and States
Three-Schema Architecture
Data Independence
DBMS Languages and Interfaces
Database System Utilities and Tools
Centralized and Client-Server Architectures
Classification of DBMSs
The Three-Level ANSI-SPARC Architecture
1971 1975
An early proposal for a standard terminology The American National Standards
and general architecture for database systems Institute (ANSI) Standards Planning and
was produced in 1971 by the DBTG (Data Requirements Committee (SPARC),
Base Task Group) appointed by the ANSI/X3/SPARC, produced a similar
Conference on Data Systems and Languages terminology and architecture in 1975
(CODASYL, 1971). (ANSI, 1975).
The DBTG recognized the need for a two- ANSI-SPARC recognized the need for a
level approach with a system view called the three-level approach with a system
schema and user views called subschemas. catalog.
The levels form a three-level
architecture comprising an external, a
conceptual, and an internal level.
Three schema architecture
4
The Three-Level ANSI-SPARC Architecture
External level
Slide
2- 5
The users’ view of the database. This level describes that part of the
database that is relevant to each user.
The external view includes only those entities, attributes, and relationships
in the ‘real world’ that the user is interested in.
Different views may have different representations of the same data.
For example, one user may view dates in the form (day, month, year),
while another may view dates as (year, month, day).
Conceptual level
Slide
2- 6
The community view of the database. This level describes what data is stored
in the database and the relationships among the data.
The conceptual level represents:
All entities, their attributes, and their relationships;
The constraints on the data;
Semantic information about the data;
Security and integrity information
Internal level
Slide
2- 7
The physical representation of the database on the computer.
This level describes how the data is stored in the database.
The internal level is concerned with such things as:
Storage space allocation for data and indexes;
Record descriptions for storage (with stored sizes for data items);
Record placement;
Data compression and data encryption techniques.
Data Models
8
Data Model:
A set of concepts to describe the structure of a database, the
operations for manipulating these structures, and certain constraints
that the database should obey.
Data Model Structure and Constraints:
Constructs are used to define the database structure
Constructs typically include elements (and their data types) as well
as groups of elements (e.g entity, record, table), and relationships
among such groups
Constraints specify some restrictions on valid data; these constraints
must be enforced at all times
Data Models (continued)
9
Data Model Operations:
These operations are used for specifying database
retrievals and updates by referring to the constructs
of the data model.
Operations on the data model may include basic
model operations (e.g. generic insert, delete,
update) and user-defined operations (e.g.
compute_student_gpa, update_inventory)
Data models
Slide
2- 10
A data model can be thought of as comprising three components:
1. A structural part, consisting of a set of rules according to which databases
can be constructed;
2. A manipulative part, defining the types of operation that are allowed on the
data (this includes the operations that are used for updating or retrieving data
from the database and for changing the structure of the database);
3. Possibly a set of integrity constraints, which ensures that the data is accurate
The purpose of a data model is to represent data and to make
the data understandable.
Con …
Slide
2- 11
To reflect the ANSI-SPARC architecture, we can
identify three related data models:
1. an external data model, to represent each user’s view of the
organization, sometimes called the Universe of Discourse (UoD);
2. a conceptual data model, to represent the logical (or community)
view that is DBMS independent;
3. an internal data model, to represent the conceptual schema in
such a way that it can be understood by the DBMS.
Categories of Data Models
12
I. Conceptual (high-level, semantic) data models:
Provide concepts that are close to the way many users perceive data.
The way users perceive the data.
(Also called entity-based or object-based data models.)
The community view of the database.
This level describes what data is stored in the database and the
relationships among the data.
All the constraints on the data;
Semantic information about the data;
security and integrity information.
entities, their attributes, and their relationships;
Categories of Data Models
13
II. Physical (low-level, internal) data models:
The way the DBMS and the operating system perceive the data.
Provide concepts that describe details of how data is stored in the
computer.
The physical representation of the database on the computer.
The internal level is concerned with such things as:
Storage space allocation for data and indexes;
Record descriptions for storage (with stored sizes for data items);
Record placement;
Data compression and data encryption techniques.
Categories of Data Models
14
III. Implementation (representational) data models:
Provide concepts that fall between the above two, used by many
commercial DBMS implementations (e.g. relational data models
used in many commercial systems).
which provide concepts that may be easily
understood by end users.
Representational data models hide many details of
data storage on disk but can be implemented on a
computer system directly.
Con…
Slide
2- 15
There have been many data models proposed in the literature.
They fall into three broad categories: object-based, record-based,
and physical data models.
The first two are used to describe data at the conceptual and
external levels, the latter is used to describe data at the internal
level.
Object-Based Data Models
Slide
2- 16
Object-based data models use concepts such as entities, attributes, and relationships.
An entity is a distinct object (a person, place, thing, concept, event) in the
organization that is to be represented in the database.
An attribute is a property that describes some aspect of the object that we wish to
record, and a relationship is an association between entities.
Some of the more common types of object-based data model are:
Entity–Relationship
Semantic
Functional
Object-Oriented.
Record-Based Data Models
Slide
2- 17
In a record-based model, the database consists of a number of fixed-
format records possibly of differing types.
Each record type defines a fixed number of fields, each typically of a
fixed length.
There are three principal types of record-based logical data model:
The relational data model,
The network data model, and
The hierarchical data model.
The hierarchical and network data models were developed almost a
decade before the relational data model, so their links to traditional file
processing concepts are more evident.
Relational data model
Slide
2- 18
The relational data model is based on the concept of mathematical relations.
In the relational model, data and relationships are represented as tables, each
of which has a number of columns with a unique name.
Hierarchical model
Slide
2- 19
The hierarchical model organizes data into a tree-like structure, where each
record has a single parent or root.
Sibling records are sorted in a particular order.
That order is used as the physical order for storing the database.
This model is good for describing many real-world relationships.
Network model
Slide
2- 20
The network model builds on the hierarchical model by allowing many-to-many
relationships between linked records, implying multiple parent records.
Each set consists of one owner or parent record and one or more member or
child records.
It was most popular in the 70s after it was formally defined by the Conference
on Data Systems Languages (CODASYL).
Physical Data Models
Slide
2- 21
Physical data models describe how data is stored in the computer,
representing information such as record structures, record
orderings, and access paths.
There are not as many physical data models as logical data
models, the most common ones being the unifying model and the
frame memory.
Schemas, Mappings, and Instances
Slide
2- 22
Database Schemas-
A database schema is the skeleton structure that
represents the logical view of the entire database.
It defines how the data is organized and how the
relations among them are associated.
It formulates all the constraints that are to be applied
on the data.
Schemas versus Instances
23
Simply Schema is the description of a database.
Includes descriptions of the database structure, data types, and
the constraints on the database.
The data in the database at any particular point in time is called
a database instance.
Schema Diagram:
An illustrative display of (some aspects of) a database schema.
Schema Construct:
A component of the schema or an object within the schema,
e.g., STUDENT, COURSE.
Schemas versus Instances
24
Database State:
The actual data stored in a database at a particular
moment in time. This includes the collection of all the
data in the database.
Also called database instance (or occurrence or
snapshot).
The term instance is also applied to individual database
components, e.g. record instance, table instance, entity
instance
Database Schema vs. Database State
25
Database State:
Refers to the content of a database at a moment in time.
Initial Database State:
Refers to the database state when it is initially loaded
into the system.
Valid State:
A state that satisfies the structure and constraints of the
database.
Database Schema vs. Database State
26
Distinction
The database schema changes very infrequently.
The database state changes every time the database
is updated.
Schema is sometimes called the intension of the database
Instance is called an extension(or state) of the database.
Example of a Database Schema
27
Example of a database
28
state
STUDENT
Name Student_numb Class Major
er
Smith 17 1 CS
Brown 8 2 CS
Three-Schema Architecture
29
Proposed to support DBMS characteristics of:
Program-data independence.
Support of multiple views of the data.
Not explicitly used in commercial DBMS products,
but has been useful in explaining database system
organization
Three-Schema Architecture
30
Defines DBMS schemas at three levels:
Internal schema at the internal level to describe physical
storage structures and access paths.
Is a complete description of the internal model, containing the
definitions of stored records, the methods of representation, the
data fields, and the indexes and storage structures used.
Typically uses a physical data model.
31
Conceptual schema at the conceptual level to describe
the structure and constraints for the whole database for a
community of users.
Describes all the entities, attributes, and relationships
together with integrity constraints.
Uses a conceptual or an implementation data model.
External schemas (also called subschema) at the
external level to describe the various user views.
Correspond to different views of the data.
Usually uses the same data model as the conceptual level.
The three-schema architecture
32
The three-schema architecture
33
Examples: Differences between the three levels.
Three-Schema Architecture
34
Mappings among schema levels are needed to
transform requests and data.
Programs refer to an external schema, and are mapped
by the DBMS to the internal schema for execution.
Data extracted from the internal DBMS level is
reformatted to match the user’s external view (e.g.
formatting the results of an SQL query for display in a
Web page)
Data Independence
Slide
2- 35
Data independence is the ability to modify the schema definition
in one level without affecting the schema definition in the next
higher level.
Two types of Data Independence:
1. Physical Data Independence
2. Logical Data Independence
Data Independence
36
Logical Data Independence:
Logical data is data about database, that is, it stores information about how
data is managed inside.
The capacity to change the conceptual schema without having to change the
external schemas and their associated application programs.
Physical Data Independence:
The capacity to change the internal schema without having to change the
conceptual schema.
For example, the internal schema may be changed when certain file structures
are reorganized or new indexes are created to improve database performance
Data Independence (continued)
37
When a schema at a lower level is changed, only the mappings
between this schema and higher-level schemas need to be
changed in a DBMS that fully supports data independence.
The higher-level schemas themselves are unchanged.
Hence, the application programs need not be changed since they
refer to the external schemas.
DBMS Languages
38
A data sublanguage consists of two parts:
Data Definition Language(DDL)
The DDL is used to specify the database schema
Data Manipulation Language(DML).
The DML is used to both read and update the database.
DDL and DML not two separate languages; instead they simply
form parts of a single db language, such as SQL language.
DBMS Languages
39
These languages are called data sublanguages because they do
not include constructs for all computing needs such as
conditional or iterative statements, which are provided by the
high-level programming languages.
Many DBMSs have a facility for embedding the sublanguage in a
high-level programming language such as COBOL, Fortran,
Pascal, Ada, ‘C’, C++, Java, or Visual Basic.
DBMS Languages
40
High-Level or Non-procedural Languages: These include the relational
language SQL
May be used in a standalone way or may be embedded in a programming
language
Low Level or Procedural Languages: These must be embedded in a
programming language.
View definition language (VDL)- Specifies user views/mappings to
conceptual schema
Storage definition language (SDL)- Specifies the internal schema
Data Definition Language (DDL):
41
A language that is used to define database schemas.
The DDL statement is used to identify description of the schema construct and
store the schema description in the DBMS catalog (data dictionary).
A data dictionary contains metadata (i.e., data about data).
Example:
Create table Student (StNo number(14),name varchar(20),Bdate date);
When create a new table, it also update a special set of tables (data
dictionary).
An example of a database catalog for the database.
Slide
2- 42
Data Definition Language (DDL):
43
DDL is Used by the DBA and database designers to specify the conceptual
schema of a database.
In many DBMSs, the DDL is also used to define internal and external schemas
(views).
In some DBMSs, separate Storage Definition Language (SDL) And View
Definition Language (VDL) are used to define internal and external schemas.
SDL is typically realized via DBMS commands provided to the DBA
and database designers
Data Manipulation Language (DML)
44
DML is Used to specify database retrievals and updates
DML commands (data sublanguage) can be embedded in a
general-purpose programming language (host language), such as
COBOL, C, C++, or Java.
A library of functions can also be provided to access the DBMS
from a programming language
Alternatively, stand-alone DML commands can be
applied directly (called a query language).
Data Manipulation Language (DML)
45
Data manipulation operations usually include the following:
Insertion of new data into the database;
Modification of data stored in the database;
Retrieval of data contained in the database;
Deletion of data from the database.
Types of DML
46
There are two classes of DMLs:
High Level or Non-procedural Language:
Are set-oriented and specify what data to retrieve rather
than how to retrieve it.
Also called declarative languages.
Example the SQL relational language
Low Level or Procedural Language:
A language that allows the user to tell the system what data is needed
and exactly how to retrieve the data.
Retrieve data one record-at-a-time;
Constructs such as looping are needed to retrieve
multiple records, along with positioning pointers.
Example –PL/SQL
DBMS Interfaces
47
DBMS interface- is a user interface which allows for the ability to input
queries to a database without using the query language itself.
Stand-alone query language interfaces
Example: Entering SQL queries at the DBMS interactive SQL
interface (e.g. SQL*Plus in ORACLE)
Programmer interfaces for embedding DML in programming languages
User-friendly interfaces
Menu-based, forms-based, graphics-based, etc.
DBMS Interfaces cont…
48
Menu-based interfaces for Web clients or browsing
Forms-based interfaces
Graphical user interfaces
Natural language interfaces
Speech input and output
Interfaces for parametric users
Interfaces for the DBA
Database System Utilities
49
To perform certain functions such as:
Loading data stored in files into a database. Includes data
conversion tools.
Backing up the database periodically on tape.
Reorganizing database file structures.
Report generation utilities.
Performance monitoring utilities.
Other functions, such as sorting, user monitoring, data compression,
etc.
Other Tools
50
Data dictionary / repository:
Used to store schema descriptions and other
information such as design decisions, application
program descriptions, user information, usage
standards, etc.
Active data dictionary is accessed by DBMS
software and users/DBA.
Passive data dictionary is accessed by users/DBA
only.
Other Tools
51
Application Development Environments and CASE
(computer-aided software engineering) tools:
Examples:
PowerBuilder (Sybase)
JBuilder (Borland)
JDeveloper 10G (Oracle)
Typical DBMS Component Modules
52
Centralized and Client-Server DBMS
Architectures
53
Centralized DBMS:
Combines everything into single system including:-
DBMS software,
hardware,
application programs, and
user interface processing software.
User can still connect through a remote terminal –
however, all processing is done at centralized site.
A Physical Centralized Architecture
54
Basic 2-tier Client-Server Architectures
55
Specialized Servers with Specialized
functions
Print server
File server
DBMS server
Web server
Email server
Clients can access the specialized servers as
needed
Logical two-tier client server
architecture
56
Clients
57
Provide appropriate interfaces through a client software
module to access and utilize the various server resources.
Clients may be diskless machines or PCs or Workstations
with disks with only the client software installed.
Connected to the servers via some form of a network.
(LAN: local area network, wireless network, etc.)
DBMS Server
58
Provides database query and transaction services to the clients
Relational DBMS servers are often called SQL servers, query servers,
or transaction servers
Applications running on clients utilize an Application Program
Interface (API) to access server databases via standard interface Like
ODBC: Open Database Connectivity standard
JDBC: for Java programming access
Client and server must install appropriate client module and server
module software for ODBC or JDBC
Two Tier Client-Server Architecture
59
A client program may connect to several DBMSs, sometimes
called the data sources.
In general, data sources can be files or other non-DBMS
software that manages data.
Other variations of clients are possible: e.g., in some object
DBMSs, more functionality is transferred to clients including
data dictionary functions, optimization and recovery across
multiple servers, etc.
Three Tier Client-Server Architecture
60
Common for Web applications
Intermediate Layer called Application Server or Web Server:
Stores the web connectivity software and the business
logic part of the application used to access the
corresponding data from the database server
Acts like a conduit for sending partially processed
data between the database server and the client.
Three-tier Architecture Can Enhance Security:
Database server only accessible via middle tier
Clients cannot directly access database server
Three-tier client-server architecture
61
Classification of DBMSs
62
I. Based on the data model used
Traditional: Relational, Network, Hierarchical.
Emerging: Object-oriented, Object-relational.
II. Other classifications
Single-user (typically used with personal computers)
vs. multi-user (most DBMSs).
Centralized (uses a single computer with one database)
vs. distributed (uses multiple computers, multiple databases)
Summary
63
Data Models and Their Categories
History of Data Models
Schemas, Instances, and States
Three-Schema Architecture
Data Independence
DBMS Languages and Interfaces
Database System Utilities and Tools
Centralized and Client-Server Architectures
Classification of DBMSs
Questions
Slide
2- 64
1. What is Data Model?
Ans: A collection of conceptual tools for describing data, data relationships data
semantics and constraints.
2. What is DDL (Data Definition Language)?
Ans: A data base schema is specifies by a set of definitions expressed by a
special language called DDL.
3. What is VDL (View Definition Language)?
Ans: It specifies user views and their mappings to the conceptual schema.
4. What is SDL (Storage Definition Language)?
Ans: This language is to specify the internal schema. This language may specify
the mapping between two schemas.