Database System Concepts and Architecture (Unit - I)
Database System Concepts and Architecture (Unit - I)
Databases and database systems are an essential component of life in modern society: most of us
encounter several activities every day that involve some interaction with a database.
For example, if we go to the bank to deposit or withdraw funds, if we make a hotel or airline reservation,
if we access a computerized library catalog to search for a bibliographic item, or if we purchase
something online—such as a book, toy, or computer—chances are that our activities will involve
someone or some computer program accessing a database.
Even purchasing items at a supermarket often automatically updates the database that holds the
inventory of grocery items.
• A database represents some aspect of the real world, sometimes called the mini-world or the
universe of discourse (UoD). Changes to the
mini-world are reflected in the database.
In traditional file processing, each user defines and implements the files needed for a specific software
application as part of programming the application. In the database approach, a single repository
maintains data that is defined once and then accessed by various users.
In file systems, each application is free to name data elements independently. In contrast, in a database,
the names or labels of data are defined once, and used repeatedly by queries, transactions, and
applications.
The main characteristics of the database approach versus the file-processing approach are the
following:
A fundamental characteristic of the database approach is that the database system contains not only
the database itself but also a complete definition or description of the database structure and
constraints.
This definition is stored in the DBMS catalog, which contains information such as the structure of each
file, the type and storage format of each data item, and various constraints on the data.
The information stored in the catalog is called meta-data, and it describes the structure of the primary
database. The catalog is used by the DBMS software and also by database users who need information
about the database structure.
A general-purpose DBMS software package is not written for a specific database application. Therefore,
it must refer to the catalog to know the structure of the files in a specific database, such as the type
and format of data it will access.
The structure of data files is stored in the DBMS catalog separately from the access programs. We call
this property program-data independence. In some types of database systems, such as object-oriented
and object-relational systems, users can define operations on data as part of the database definitions.
The interface (or signature) of an operation includes the operation name and the data types of its
arguments (or parameters).
The implementation (or method) of the operation is specified separately and can be changed without
affecting the interface.
User application programs can operate on the data by invoking these operations through their names
and arguments, regardless of how the operations are implemented. This may be termed program-
operation independence.
Informally, a data model is a type of data abstraction that is used to provide this conceptual
representation. The data model uses logical concepts, such as objects, their properties, and their
interrelationships, that may be easier for most users to understand than computer storage concepts.
Hence, the data model hides storage and implementation details that are not of interest to most
database users.
A database typically has many users, each of whom may require a different perspective or view of the
database. A view may be a subset of the database or it may contain virtual data that is derived from
the database files but is not explicitly stored.
A multiuser DBMS, as its name implies, must allow multiple users to access the database at the same
time. This is essential if data for multiple applications is to be integrated and maintained in a single
database.
The DBMS must include concurrency control software to ensure that several users trying to update the
same data do so in a controlled manner so that the result of the updates is correct.
For example, when several reservation agents try to assign a seat on an airline flight, the DBMS should
ensure that each seat can be accessed by only one agent at a time for assignment to a passenger. These
types of applications are generally called online transaction processing (OLTP) applications.
A fundamental role of multiuser DBMS software is to ensure that concurrent transactions operate
correctly and efficiently. The concept of a transaction has become central to many database
applications.
A transaction is an executing program or process that includes one or more database accesses, such
as reading or updating of database records. Each transaction is supposed to execute a logically correct
database access if executed in its entirety without interference from other transactions. The DBMS
must enforce several transaction properties.
The isolation property ensures that each transaction appears to execute in isolation from other
transactions, even though hundreds of transactions may be executing concurrently.
The atomicity property ensures that either all the database operations in a transaction are executed
or none are.
For a small personal database, such as the list of addresses, one person typically defines, constructs,
and manipulates the database, and there is no sharing. However, in large organizations, many people
are involved in the design, use, and maintenance of a large database with hundreds of users.
Here we will identify the people whose jobs involve the day-to-day use of a large database; we call
them the actors on the scene. We will consider people who may be called workers behind the scene—
those who work to maintain the database system environment but who are not actively interested in
the database contents as part of their daily job.
1. Database Administrators
2. Database Designers
3. End Users
1. Database Administrators
In any organization where many people use the same resources, there is a need for a chief
administrator to oversee and manage these resources. In a database environment, the primary
resource is the database itself, and the secondary resource is the DBMS and related software.
Administering these resources is the responsibility of the database administrator (DBA). The DBA is
responsible for authorizing access to the database, coordinating and monitoring its use, and acquiring
software and hardware resources as needed.
The DBA is accountable for problems such as security breaches and poor system response time. In large
organizations, the DBA is assisted by a staff that carries out these functions.
2. Database Designers
Database designers are responsible for identifying the data to be stored in the database and for
choosing appropriate structures to represent and store this data. These tasks are mostly undertaken
before the database is actually implemented and populated with data.
It is the responsibility of database designers to communicate with all prospective database users in
order to understand their requirements and to create a design that meets these requirements. In many
cases, the designers are on the staff of the DBA and may be assigned other staff responsibilities after
the database design is completed.
Database designers typically interact with each potential group of users and develop views of the
database that meet the data and processing requirements of these groups. Each view is then analyzed
and integrated with the views of other user groups. The final database design must be capable of
supporting the requirements of all user groups.
3. End Users
End users are the people whose jobs require access to the database for querying, updating, and
generating reports; the database primarily exists for their use.
a) Casual end users occasionally access the database, but they may need different information each
time. They use a sophisticated database query language to specify their requests and are typically
middle- or high-level managers or other occasional browsers.
b) Naive or parametric end users make up a sizable portion of database end users. Their main job
function revolves around constantly querying and updating the database, using standard types of
queries and updates—called canned transactions—that have been carefully programmed and tested.
d) Stand-alone users maintain personal databases by using ready-made program packages that provide
easy-to-use menu-based or graphics-based interfaces. An example is the user of a tax package that
stores a variety of personal financial data for tax purposes.
System analysts determine the requirements of end users, especially naive and parametric end users,
and develop specifications for standard canned transactions that meet these requirements.
Application programmers implement these specifications as programs; then they test, debug,
document, and maintain these canned transactions. Such analysts and programmers—commonly
referred to as software developers or software engineers—should be familiar with the full range of
capabilities provided by the DBMS to accomplish their tasks.
In addition to those who design, use, and administer a database, others are associated with the design,
development, and operation of the DBMS software and system environment. These persons are
typically not interested in the database content itself. We call them the workers behind the scene, and
they include the following categories:
2. Tool developers
1. DBMS system designers and implementers design and implement the DBMS modules and interfaces
as a software package. A DBMS is a very complex software system that consists of many components,
or modules, including modules for implementing the catalog, query language processing, interface
processing, accessing and buffering data, controlling concurrency, and handling data recovery and
security. The DBMS must interface with other system software such as the operating system and
compilers for various programming languages.
2. Tool developers design and implement tools—the software packages that facilitate database
modeling and design, database system design, and improved performance. Tools are optional packages
that are often purchased separately.
They include packages for database design, performance monitoring, natural language or graphical
interfaces, prototyping, simulation, and test data generation. In many cases, independent software
vendors develop and market these tools.
3. Operators and maintenance personnel (system administration personnel) are responsible for the
actual running and maintenance of the hardware and software environment for the database system.
• Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e. storing same
data multiple times). In a database system, by having a centralized database and centralized
control of data by the DBA the unnecessary duplication of data is avoided. It also eliminates
the extra time for processing the large volume of data. It results in saving the storage space.
• Improved Data Sharing: DBMS allows a user to share the data in any number of application
programs.
• Data Integrity: Integrity means that the data in the database is accurate. Centralized control of
the data helps in permitting the administrator to define integrity constraints to the data in the
database. For example: in customer database we can enforce an integrity that it must accept
the customer only from Noida and Meerut city.
• Security: Having complete authority over the operational data, enables the DBA in ensuring
that the only mean of access to the database is through proper channels. The DBA can define
authorization checks to be carried out whenever access to sensitive data is attempted.
• Data Consistency: By eliminating data redundancy, we greatly reduce the opportunities for
inconsistency. For example: is a customer address is stored only once, we cannot have
disagreement on the stored values. Also updating data values is greatly simplified when each
value is stored in one place only. Finally, we avoid the wasted storage that results from
redundant data storage.
• Efficient Data Access: In a database system, the data is managed by the DBMS and all access
to the data is through the DBMS providing a key to effective data processing
• Enforcements of Standards: With the centralized of data, DBA can establish and enforce the
data standards which may include the naming conventions, data quality standards etc.
• Data Independence: Ina database system, the database management system provides the
interface between the application programs and the data. When changes are made to the data
representation, the meta data obtained by the DBMS is changed but the DBMS is continues to
provide the data to application program in the previously used way. The DBMs handles the task
of transformation of data wherever necessary.
• Reduced Application Development and Maintenance Time : DBMS supports many important
functions that are common to many applications, accessing data stored in the DBMS, which
facilitates the quick development of application.
Disadvantages of DBMS
• It is bit complex. Since it supports multiple functionalities to give the user the best, the
underlying software has become complex. The designers and developers should have thorough
knowledge about the software to get the most out of it.
• Because of its complexity and functionality, it uses large amount of memory. It also needs large
memory to run efficiently.
• DBMS system works on the centralized system, i.e.; all the users from all over the world access
this database. Hence any failure of the DBMS, will impact all the users.
View of Data
A database system is a collection of inter-related data and a set of programs that allow users to access
and modify these data. A major purpose of a database system is to provide users with an abstract view
of the data. That is, the system hides certain details of
how the data are stored and maintained.
Data Abstraction
Physical level (or Internal View / Schema): The lowest level of abstraction describes how the data are
actually stored. The physical level describes complex low-level data structures in detail.
Logical level (or Conceptual View / Schema): The next-higher level of abstraction describes what data
are stored in the database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is referred to
as physical data independence. Database administrators, who must decide what information to keep
in the database, use the logical level of abstraction.
View level (or External View / Schema): The highest level of abstraction describes only part of the
entire database. Even though the logical level uses simpler structures, complexity remains because of
the variety of information stored in a large database. Many users of the database system do not need
all this information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many views for
the same database.
Databases change over time as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database. The overall design
of the database is called the database schema.
Each variable has a particular value at a given instant. The values of the variables in a program at a
point in time correspond to an instance of a database schema. Database systems have several schemas,
partitioned according to the levels of abstraction.
The physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level.
A database may also have several schemas at the view level, sometimes called sub-schemas, which
describe different views of the database. Of these, the logical schema is by far the most important, in
terms of its effect on application programs, since programmers construct applications by using the
logical schema. The physical schema is hidden beneath the logical schema, and can usually be changed
easily without affecting application programs.
Application programs are said to exhibit physical data independence if they do not depend on the
physical schema, and thus need not be rewritten if the physical schema changes.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for describing
data, data relationships, data semantics, and consistency constraints. A data model provides a way to
describe the design of a database at the physical, logical, and view levels.
• Relational Model. The relational model uses a collection of tables to represent both data and
the relationships among those data. Each table has multiple columns, and each column has a
unique name. Tables are also known as relations. The relational model is an example of a
record-based model.
Record-based models are so named because the database is structured in fixed-format records of
several types. Each table contains records of a particular type. Each record type defines a fixed number
of fields, or attributes. The columns of the table correspond to the attributes of the record type. The
relational data model is the most widely used data model, and a vast majority of current database
systems are based on the relational model.
• Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic
objects, called entities, and relationships among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity
relationship model is widely used in database design.
• Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has
become the dominant software-development methodology. This led to the development of an
object-oriented data model that can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity. The object-relational data model
combines features of the object-oriented data model and relational data model.
• Semi-structured Data Model. The semi-structured data model permits the specification of
data where individual data items of the same type may have different sets of attributes. This is
Database Languages
A database system provides a data-definition language to specify the database schema and a data-
manipulation language to express database queries and updates. In practice, the data definition and
data-manipulation languages are not two separate languages; instead they simply form parts of a single
database language, such as the widely used SQL language.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data as
organized by the appropriate data model. The types of access are:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a user
does not have to specify how to get the data, the database system has to figure out an efficient means
of accessing data. A query is a statement requesting the retrieval of information. The portion of a DML
that involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data-manipulation language synonymously.
We specify a database schema by a set of definitions expressed by a special language called a data-
definition language (DDL). The DDL is also used to specify additional properties of the data.
We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language. These statements
define the implementation details of the database schemas, which are usually hidden from the users.
The data values stored in the database must satisfy certain consistency constraints.
• Domain Constraints. A domain of possible values must be associated with every attribute (for
example, integer types, character types, date/time types). Declaring an attribute to be of a particular
domain acts as a constraint on the values that it can take. Domain constraints are the most elementary
form of integrity constraint. They are tested easily by the system whenever a new data item is entered
into the database.
• Assertions. An assertion is any condition that the database must always satisfy. Domain constraints
and referential-integrity constraints are special forms of assertions. However, there are many
constraints that we cannot express by using only these special forms. For example, “Every department
must have at least five courses offered every semester” must be expressed as an assertion. When an
assertion is created, the system tests it for validity. If the assertion is valid, then any future modification
to the database is allowed only if it does not cause that assertion to be violated.
• Authorization. We may want to differentiate among the users as far as the type of access they are
permitted on various data values in the database. These differentiations are expressed in terms of
authorization, the most common being: read authorization, which allows reading, but not
modification, of data; insert authorization, which allows insertion of new data, but not modification of
existing data; update authorization, which allows modification, but not deletion, of data; and delete
authorization, which allows deletion of data.
We may assign the user all, none, or a combination of these types of authorization.
The DDL, just like any other programming language, gets as input some instructions (statements) and
generates some output. The output of the DDL is placed in the data dictionary, which contains
metadata—that is, data about data.
The data dictionary is considered to be a special type of table that can only be accessed and updated
by the database system itself (not a regular user). The database system consults the data dictionary
before reading or modifying actual data.
Data Dictionary
We can define a data dictionary as a DBMS component that stores the definition of data characteristics
and relationships. You may recall that such “data about data” were labeled metadata. The DBMS data
dictionary provides the DBMS with its self describing characteristic. In effect, the data dictionary
resembles and X-ray of the company’s entire data set, and is a crucial element in the data
administration function.
The two main types of data dictionary exist, integrated and stand alone. An integrated data dictionary
is included with the DBMS. For example, all relational DBMSs include a built in data dictionary or system
catalog that is frequently accessed and updated by the RDBMS. Other DBMSs especially older types,
do not have a built in data dictionary instead the DBA may use third party stand alone data dictionary
systems.
Data dictionaries can also be classified as active or passive. An active data dictionary is automatically
updated by the DBMS with every database access, thereby keeping its access information up-to-date.
A passive data dictionary is not updated automatically and usually requires a batch process to be run.
Data dictionary access information is normally used by the DBMS for query optimization purpose.
Although, there is no standard format for the information stored in the data dictionary several features
are common. For example, the data dictionary typically stores descriptions of all:
▪ Data elements that are define in all tables of all databases. Specifically the data dictionary
stores the name, datatypes, display formats, internal storage formats, and validation rules. The
data dictionary tells where an element is used, by whom it is used and so on.
▪ Tables define in all databases. For example, the data dictionary is likely to store the name of
the table creator, the date of creation access authorizations, the number of columns, and so
on.
▪ Indexes define for each database tables. For each index the DBMS stores at least, the index
name the attributes used, the location, specific index characteristics and the creation date.
▪ Define databases: who created each database, the date of creation where the database is
located, who the DBA is and so on.
▪ Programs that access the database including screen formats, report formats application
formats, SQL queries and so on.
▪ Relationships among data elements which elements are involved: whether the relationship are
mandatory or optional, the connectivity and cardinality and so on.
A primary goal of a database system is to retrieve information from and store new information in the
database. People who work with a database can be categorized as database users or database
administrators.
There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different types
of users.
Naive users are unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously. For example, a bank teller who needs to transfer $50 from
account A to account B invokes a program called transfer. This program asks the teller for the amount
of money to be transferred, the account from which the money is to be transferred, and the account
to which the money is to be transferred.
Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces. Rapid application development
(RAD) tools are tools that enable an application programmer to construct forms and reports without
Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager understands.
Analysts who submit queries to explore data in the database fall in this category.
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view summaries of
data in different ways. For instance, an analyst can see total sales by region (for example, North, South,
East, and West), or by product, or by a combination of region and product (that is, total sales of each
product in each region). The tools also permit the analyst to select specific regions, look at data in more
detail (for example, sales by city within a region) or look at the data in less detail (for example,
aggregate products together by category).
Another class of tools for analysts is data mining tools, which help them find certain kinds of patterns
in data. Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework.
Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system on
which the database system runs. Database systems can be centralized, or client-server, where one
server machine executes work on behalf of multiple client machines. Database systems can also be
designed to exploit parallel computer
architectures. Distributed databases span
multiple geographically separated
machines.
Query Processor
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Query Processor
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give the
same result. The DML compiler also performs query optimization, that is, it picks the lowest cost
evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Storage Manager
A storage manager is a program module that provides the interface between the low-level data stored
in the database and the application programs and queries submitted to the system. The storage
manager is responsible for the interaction with the file manager. The raw data are stored on the disk
using the file system, which is usually provided by a conventional operating system. The storage
manager translates the various DML statements into low-level file-system commands. Thus, the storage
manager is responsible for storing, retrieving, and updating data in the database.
Authorization and integrity manager, which tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
Buffer manager, which is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the database
system, since it enables the database to handle data sizes that are much larger than the size of main
memory.
A graphical technique for understanding and organizing the data independent of the actual database
implementation.
Entity
Entity instance
Entity instance is a particular member of the entity type. Example for entity instance: A particular
employee.
Regular Entity
An entity which has its own key attribute is a regular entity. Example for regular entity: Employee.
Weak entity
An entity which depends on other entity for its existence and doesn't have
any key attribute of its own is a weak entity.
Example for a weak entity: In a parent/child relationship, a parent is considered as a strong entity and
the child is a weak entity.
Attributes
Domain of Attributes
The set of possible values that an attribute can take is called the domain of the attribute. For example,
the attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence this set can be
termed as the domain of the attribute day.
Key attribute
Simple attribute
Composite attribute
If an attribute can take only a single value for each entity instance, it is a single valued attribute. example
for single valued attribute: age of a student. It can take only one value for a particular student.
Multi-valued Attributes
If an attribute can take more than one value for each entity instance, it is
a multi-valued attribute. Multi-valued example for multi valued attribute:
telephone number of an employee, a particular employee may have
multiple telephone numbers.
Stored Attribute
An attribute which need to be stored permanently is a stored attribute Example for stored attribute :
name of a student
Derived Attribute
Example for derived attribute: age of employee which can be calculated from date of birth and current
date.
Relationships
However, in ER Modeling, To connect a weak Entity with others, you should use a
weak relationship notation as given below.
Degree of a Relationship
Degree of a relationship is the number of entity types involved. The n-ary relationship is the general
form for degree n. Special cases are unary, binary, and ternary, where the degree is 1, 2, and 3,
respectively.
Example for ternary relationship: customer purchase item from a shop keeper.
Relationship cardinalities specify how many of each entity type is allowed. Relationships can have four
possible connectivity's as given below.
The minimum and maximum values of this connectivity is called the cardinality of the relationship.
1. One to One:
An entity in A is associated with at most one entity in B, and an entity in B is associated with at most
one entity in A.
2. One to Many:
An entity in A is associated with any number of entities in B. An entity in B is associated with at the
most one entity in A.
An entity in A is associated with at most one entity in B. An entity in B is associated with any number
in A.
4. Many to Many:
Entities in A and B are associated with any number of entities from each other
Students enrolls for courses. One student can enroll for many
courses and one course can be enrolled by many students. Hence it
is a M:N relationship and cardinality is Many-to-Many (M:N).
Recursive Relationships:
When the same entity type participates more than once in a relationship type in different roles, the
relationship types are called recursive relationships.
Participation Constraints:
The participation constraints specify whether the existence of any entity depends on its being related
to another entity via the relationship. There are two types of participation constraints.
a) Total: When all the entities from an entity set participate in a relationship type, is called total
participation. For example, the participation of the entity set student on the relationship set must ‘opts’
is said to be total because every student enrolled must opt for a course.
b) Partial: When it is not necessary for all the entities from an entity set to participate in a relationship
type, it is called partial participation. For example, the participation of the entity set student in
‘represents’ is partial, since not every student in a class is a class representative.
Advantages
Disadvantages
1. Physical design derived from E-R Model may have some amount of ambiguities or
inconsistency.