Database System Concepts and Architecture
Data Models, Schemas, and Instances
One fundamental characteristic of the database approach is that it provides some
level of data abstraction by hiding details of data storage that are irrelevant to
database users.
A data model is a collection of concepts that can be used to describe the
conceptual/logical structure of a database.
The structure of a database means that holds the data’s data types, relationships,
and constraints.
According to C.J. Date (one of the leading database experts), a data model is an
abstract, self-contained, logical definition of the objects, operators, and so forth,
that together constitute the abstract machine with which users interact. The
objects allow us to model the structure of data; the operators allow us to model its
behavior.
Types of Data Models
1. High Level- Conceptual data model.
2. Low Level – Physical data model.
3. Relational or Representational
4. Object-oriented Data Models:
5. Object-Relational Models:
1. High Level-conceptual data model: User level data model is the high level or
conceptual model. This provides concepts that are close to the way that many
users perceive data.
2 .Low level-Physical data model: provides concepts that describe the details of
how data is stored in the computer model. Low level data model is only for
Computer specialists not for end-user.
3. Representation data model: It is between High level & Low level data model
Which provides concepts that may be understood by end-user but that are not too
far removed from the way data is organized by within the computer.
The most common data models are
1. Relational Model
The Relational Model uses a collection of tables both data and the relationship
among those data. Each table has multiple columns and each column has a unique
name.
Relational database comprising of two tables.
Advantages
1. The main advantage of this model is its ability to represent data in a simplified
format.
2. The process of manipulating record is simplified with the use of certain key
attributes used to retrieve data.
3. Representation of different types of relationship is possible with this model.
2. Network Model
The data in the network model are represented by collection of records and
relationships among data are represented by links, which can be viewed as
pointers.
The records in the database are organized as collection of arbitrary groups.
Advantages:
1. Representation of relationship between entities is implemented using pointers
which allows the representation of arbitrary relationship
2. Unlike the hierarchical model it is easy.
3. Data manipulation can be done easily with this model.
3. Hierarchical Model
A hierarchical data model is a data model which the data is organized into a tree
like structure. The structure allows repeating information using parent/child
relationships: each parent can have many children but each child has one parent.
All attributes of a specific record are listed under an entity type.
Advantages:
1. The representation of records is done using an ordered tree, which is
natural method of implementation of one–to-many relationships.
2. Proper ordering of the tree results in easier and faster retrieval of records.
Allows the use of virtual records. This result in a stable database especially when
modification of the data base is made
Data model Schema and Instance
o The data which is stored in the database at a particular moment of time is
called an instance of the database.
o The overall design of a database is called schema.
o A database schema is the skeleton structure of the database. It represents
the logical view of the entire database.
o A schema contains schema objects like table, foreign key, primary key,
views, columns, data types, stored procedure, etc.
o A database schema can be represented by using the visual diagram. That
diagram shows the database objects and relationship with each other.
o A database schema is designed by the database designers to help
programmers whose software will interact with the database. The process
of database creation is called data modeling.
A schema diagram can display only some aspects of a schema like the name of
record type, data type, and constraints. Other aspects can't be specified through
the schema diagram. For example, the given figure neither show the data type
of each data item nor the relationship among various files.
In the database, actual data changes quite frequently. For example, in the given
figure, the database changes whenever we add a new grade or add a student.
The data at a particular moment of time is called the instance of the database.
DBMS Architecture ( three Schema Architecture)
A commonly used view of data approach is the three-level architecture
suggested by the ANSI/SPARC (American National Standards
Institute/Standards Planning and Requirements Committee). ANSI/SPARC
proposed an architectural framework for databases.
The three levels of the architecture are three different views of the data:
External Schema - individual user view
Conceptual Schema- Logical or community user view Physical Schema -
Internal or storage view
The three level database architecture allows a clear separation of the information
meaning (conceptual view) from the external data representation and from the
physical data structure layout. A database system that is able to separate the
three different views of data is likely to be flexible and adaptable.
The External Schema is the view that the individual user of the database has. This
view is often a restricted view of the database and the same database may
provide a number of different views for different classes of users.
The Conceptual schema (sometimes called the logical schema) describes the
stored data in terms of the data model of the DBMS. In a relational DBMS, the
conceptual schema describes all relations that are stored in the database.
It hides physical storage details, concentrating upon describing entities, data
types, relationships, user operations, and constraints.
The physical schema specifies additional storage details. Essentially, the physical
schema summarizes how the relations described in the conceptual schema are
actually stored on secondary storage devices such as disks and tapes.
It tells us what data is stored in the database and how.
Mappings
Process of transforming request and results between three level it's called mapping.
There are the two types of mappings:
1. Conceptual/Internal Mapping
2. External/Conceptual Mapping
1. Conceptual/Internal Mapping:
The conceptual/internal mapping defines the correspondence between the
conceptual view and the store database.
It specifies how conceptual record and fields are represented at the internal level.
It relates conceptual schema with internal schema.
If structure of the store database is changed.
If changed is made to the storage structure definition-then the conceptual/internal
mapping must be changed accordingly, so that the conceptual schema can remain
invariant.
There could be one mapping between conceptual and internal levels.
2. External/Conceptual Mapping:
The external/conceptual mapping defines the correspondence between a particular
external view and conceptual view.
It relates each external schema with conceptual schema.
The differences that can exist between these two levels are analogous to those that
can exist between the conceptual view and the stored database.
Example: fields can have different data types; fields and record name can be
changed; several conceptual fields can be combined into a single external field.
Any number of external views can exist at the same time; any number of users can
share a given external view: different external views can overlap.
There could be several mapping between external and conceptual levels.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence
o Logical data independence refers characteristic of being able to change the
conceptual schema without having to change the external schema.
o Logical data independence is used to separate the external level from the
conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the
data would not be affected.
o Logical data independence occurs at the user interface level.
2. Physical Data Independence
o Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal
levels.
o Physical data independence occurs at the logical interface level.
Database languages and interfaces
A database system provides a data definition language to specify the database
schema and a data manipulation language to express database queries and
updates.
In practice, the data definition and data manipulation languages are not two
separate languages; instead they simply form parts of a single database language,
such as the widely used SQL language.
Data Definition Language
Data Definition Language (DDL) statements are used to define the database
structure or schema. Some examples:
o CREATE - to create objects in the database
o ALTER - alters the structure of the database
o DROP - delete objects from the database
o TRUNCATE - remove all records from a table, including all spaces allocated
for the records are removed
For instance, the following statement in the SQL language is used to create the
account table:
create table account
(accountnumber number(10),
balance number(8));
The storage definition language (SDL), is used to specify the internal schema. The
mappings between the two schemas may be specified in either one of these
languages.
The view definition language (VDL), to specify user views and their mappings to
the conceptual schema, but in most DBMSs the DDL is used to define both
conceptual and external schemas.
In addition, it updates a special set of tables called the data dictionary or data
directory.
A data dictionary contains metadata—that is, data about data. The schema of a
table is an example of metadata. A database system consults the data dictionary
before reading or modifying actual data.
Data Manipulation Language
Data manipulation is
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
Data Manipulation Language (DML) statements are used for managing data within
schema objects. Some examples:
o SELECT - retrieve data from the a database
o INSERT - insert data into a table
o UPDATE - updates existing data within a table
o DELETE - deletes all records from a table
A data-manipulation language (DML) is a language that enables users to access or
manipulate data as organized by the appropriate data model.
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how
to get those data, The DML component of the PL/SQL language is procedural.
• Nonprocedural DMLs require a user to specify what data are needed
without specifying how to get those data. The DML component of the SQL
language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a
DML that involves information retrieval is called a query language.
The query in the SQL language finds the name of the customer whose customer-id
is 192:
select customername
from customer
where customerid = 192;
The query specifies that those rows from the table customer where the customerid
is 192 must be retrieved.
DBMS Interfaces
User-friendly interfaces provided by a DBMS may include the following.
Menu Based Interfaces for Web Clients or Browsing. These interfaces present the
user with lists of options, called menus, that lead the user through the formulation
of a request. Menus do away with the need to memorize the specific commands
and syntax of a query language; rather, the query is composed step by step by
picking options from a menu that is displayed by the system. Pull-down menus are
a very popular technique in Web-based user interfaces. They are also often used in
browsing interfaces, which allow a user to look through the contents of a database
in an exploratory and unstructured manner.
Forms Based Interfaces. A forms-based interface displays a form to each user.
Users can fill out all of the form entries to insert new data, or they fill out only
certain entries, in which case the DBMS will retrieve matching data for the
remaining entries. Forms are usually designed and programmed for naive users as
interfaces to canned transactions.
Graphical User Interfaces. A graphical interface (GUI) typically displays a schema
to the user in diagrammatic form. The user can then specify a query by
manipulating the diagram. In many cases, GUIs utilize both menus and forms.
Most GUIs use a pointing device, such as a mouse, to pick certain parts of the
displayed schema diagram.
Natural Language Interfaces. These interfaces accept requests written in English
or some other language and attempt to "understand" them. A natural language
interface usually has its own "schema," which is similar to the database conceptual
schema, as well as a dictionary of important words. The natural language interface
refers to the words in its schema, as well as to the set of standard words in its
dictionary, to interpret the request.
Interfaces for Parametric Users. Parametric users, such as bank tellers, often
have a small set of operations that they must perform repeatedly. Systems
analysts and programmers design implement a special interface for naive users.
Usually, a small set of abbreviated commands is included, with the goal of
minimizing the number of keystrokes required for each request.
Interfaces for the DBA. Most database systems contain privileged commands that
can be used only by the DBA's staff. These include commands for creating
accounts, setting system parameters, granting account authorization, changing a
schema, and reorganizing the storage structures of a database.
The Database System Environment
A DBMS is a complex software system. The database and the DBMS catalog are
usually stored on disk. Access to the disk is controlled primarily by the operating
system (OS), which schedules disk read/write. Many DBMSs have their own
buffer management module to schedule disk read/write, because this has a
considerable effect on performance. Reducing disk read/write improves
performance considerably. A higher-level stored data manager module of the
DBMS controls access to DBMS information that is stored on disk, whether it is
part of the database or the catalog.
the top part of Figure, It shows interfaces for the DBA staff, casual users who work
with interactive interfaces to formulate queries, application programmers who
create programs using some host programming languages, and parametric users
who do data entry work by supplying parameters to predefined transactions. The
DBA staff works on defining the database and tuning it by making changes to its
definition using the DDL and other privileged commands.
The DDL compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog. The catalog
includes information such as the names and sizes of files, names and data types of
data items, storage details of each file, mapping information among schemas, and
constraints.
Casual users and persons with occasional need for information from the database
interact using some form of interface, which we call the interactive query
interface. These queries are parsed and validated for correctness of the query
syntax, the names of files and a query compiler that compiles them into an
internal form. This internal query is subjected to query optimization, the query
optimizer is concerned with the rearrangement and possible reordering of
operations, elimination of redundancies, and use of correct algorithms and indexes
during execution.
The precompiler extracts DML commands from an application program written in
a host programming language. These commands are sent to the DML compiler for
compilation into object code for database access. The rest of the program is sent
to the host language compiler. The object codes for the DML commands and the
rest of the program are linked, forming a canned transaction whose executable
code includes calls to the runtime database processor.
It is now common to have the client program that accesses the DBMS running on
a separate computer from the computer on which the database resides. The
former is called the client computer running a DBMS client software and the
latter is called the database server. In some cases, the client accesses a middle
computer, called the application server, which in turn accesses the database
server.
Database System Utilities
In addition to possessing the software modules just described, most DBMSs have
database utilities that help the DBA manage the database system. Common
utilities have the following types of functions:
Loading. A loading utility is used to load existing data files—such as text
files or sequential files—into the database. Usually, the current (source) format of
the data file and the desired (target) database file structure are specified
to the utility, which then automatically reformats the data and stores it
in the database.
Backup. A backup utility creates a backup copy of the database, usually by
dumping the entire database onto tape or other mass storage medium. The
backup copy can be used to restore the database in case of disk failure.
Database storage reorganization. This utility can be used to reorganize a set
of database files into different file organizations, and create new access paths
to improve performance.
Performance monitoring. Such a utility monitors database usage and provides
statistics to the DBA. The DBA uses the statistics in making decisions
such as whether or not to reorganize files or whether to add or drop indexes
to improve performance.
Other utilities may be available for sorting files, handling data compression,
monitoring access by users, interfacing with the network, and performing other
functions.
Centralized and Client/Server Architectures for DBMSs
Centralized DBMSs Architecture
Architectures for DBMSs have followed trends similar to those for general
computer system architectures. Earlier architectures used mainframe computers to
provide the main processing for all system functions, including user application
programs and user interface programs, as well as all the DBMS functionality. The
reason was that most users accessed such systems via computer terminals that
did not have processing power and only provided display capabilities. Therefore, all
processing was performed remotely on the computer system, and only display
information and controls were sent from the computer to the display terminals,
which were connected to the central computer via various types of
communications networks.
As prices of hardware declined, most users replaced their terminals with PCs and
workstations. At first, database systems used these computers similarly to how
they had used display terminals, so that the DBMS itself was still a centralized
DBMS in which all the DBMS functionality,
application program execution, and
user interface processing were carried out
on one machine.
The Figure illustrates the physical
components in a centralized
architecture. Gradually,
DBMS systems started to exploit
the available processing power at
the user side, which led to
client/server DBMS architectures.
Client/Server Architectures for DBMSs
Database architecture essentially describes the location of all the pieces of
information that make up the database application.
Database architecture is logically divided into two types.
a) Logical two-tier Client / Server architecture
b) Logical three-tier Client / Server architecture
Two-tier Client / Server Architecture
Two-tier Client / Server architecture is used for User Interface program and
Application Programs that runs on client side. An interface called ODBC(Open
Database Connectivity) provides an API that allow client side program to call the
dbms. Most DBMS vendors provide ODBC drivers. A client program may connect to
several DBMS's. In this architecture some variation of client is also possible for
example in some DBMS's more functionality is transferred to the client including
data dictionary, optimization etc. Such clients are called Data server.
Three-tier Client / Server Architecture
Three-tier Client / Server database architecture is commonly used architecture for
web applications. Intermediate layer called Application server or Web Server
stores the web connectivity software and the business logic(constraints) part of
application used to access the right amount of data from the database server. This
layer acts like medium for sending partially processed data between the database
server and the client.
Differentiate between centralized and distributed data base
Centralized Distributed
Database is maintained at one site Database is maintained at a number of
different sites
If centralized system fails, entire If one system fails, system continues
system is halted.
work with other sites
Less reliable More reliable
Classification of Database Management Systems
Several criteria are normally used to classify DBMSs.
The first is the data model on which the DBMS is based. The
main data model used in many current commercial DBMSs is
the relational data model. The object data model has been
implemented in some commercial systems but has not had
widespread use. Many legacy applications still run on
database systems based on the hierarchical and network data
models.
The second criterion used to classify DBMSs is the number of
users supported by the system. Single-user systems support
only one user at a time and are mostly used with PCs. Multiuser
systems, which include the majority of DBMSs, support
concurrent multiple users.
The third criterion is the number of sites over which the database
is distributed. A DBMS is centralized if the data is stored at a
single computer site. A centralized DBMS can support multiple
users, but the DBMS and the database reside totally at a single
computer site. A distributed DBMS (DDBMS) can have the actual
database and DBMS software distributed over many sites,
connected by a computer network. Homogeneous DDBMSs use
the same DBMS software at all the sites, whereas heterogeneous
DDBMSs can use different DBMS software at each site.
The fourth criterion is cost. It is difficult to propose a classification
of DBMSs based on cost. Today we have open source (free) DBMS
products like MySQL and PostgreSQL that are supported by third-
party vendors with additional services. The main RDBMS products
are available as free examination 30-day copy versions as well as
personal versions,
We can also classify a DBMS on the basis of the types of access
path options for storing files. One well-known family of DBMSs is
based on inverted file structures. Finally, a DBMS can be general
purpose or special purpose. When performance is a primary
consideration, a special-purpose DBMS can be designed and built for
a specific application; such a system cannot be used for other
applications without major changes. Many airline reservations and
telephone directory systems developed in the past are special-
purpose DBMSs. These fall into the category of online transaction
processing (OLTP) systems, which must support a large number of
concurrent transactions without imposing excessive delays