1 Introduction to Databases
DATABASE
A database is a collection of related data or operational data extracted from any firm or
organization. For example, consider the names, telephone number, and address of people you
know. You may have recorded this data in an indexed address book, or you may have stored it on
a diskette, using a personal computer and software such as Microsoft Access of MS Office or
ORACLE, SQL SERVER etc.
The common use of the term database is usually more restricted.
A database has the following implicit properties:
• A database represents some aspect of the real world, sometimes called the miniworld or the
Universe of Discourse (U.D.). Changes to the miniworld are reflected in the database.
• A database is a logically coherent collection of data with some inherent meaning. A random
assortment of data cannot correctly be referred to as a database.
• A database is designed, built and populated with data for a specific purpose. It has an intended
group of users and some preconceived applications in which these users are interested.
In other words, a database has some source from which data is derived, some degree of interaction
with events and an audience that is actively interested in the contents of the database. A database
can be of any size and of varying complexity. For example, the list of names and addresses
referred to earlier may consist of only a few hundred records, each with a simple structure. On the
other hand, the card catalog of a large library may contain half a million cards stored under
different categories – by primary author’s last name, by subject, by book titles – with each
category organized in alphabetic order.
Here are several examples of databases.
1. Manufacturing company
2. Bank
3. Hospital
4. University
5. Government department
In general, it is a collection of files (tables)
Entity: A person, place, thing or event about which information must be kept.
Attribute: Pieces of information describing a particular entity. These are mainly the
characteristics about the individual entity. Individual attributes help to identify and distinguish
one entity from another.
2 Introduction to Databases
Student (Database Name)
Entity Attributes
Personnel Name, Age, Address, Father’s Name
Academic Name, Roll No., Course, Depts. Name
Hierarchy of Database
Bit 0,1
Byte 10101011 (8-bits)
Field (Attribute name like name, Age, Address)
Record (One or more rows in a table)
File (Table or collection of all files)
Database (Collection of files or tables)
e.g.
Student (Database Name)
Field name or attribute name
Personal (Table Name) Academic (Table Name)
Name Father Name Age Name
ROLL COURSE Dept.
NO Name
John Albert 24 RECORD
John 12 MSC Computer
Ramesh Suresh 18 Ramesh 15 BCA Computer
Why Database?
3 Introduction to Databases
Handling of a small shop’s database can be done normally but if you have a large database and
multiple users then in that case you have to maintain computerized database. The advantages of a
database system over traditional, paper-based methods of record-keeping tag will perhaps be more
readily apparent in these examples. Here are some of them.
• Compactness: No need for possibly voluminous paper files.
• Speed: The machine can retrieve and change data faster than a human can..
• Accuracy: Accurate, up-to-date information is available on demand at any time.
Benefits of the Database Approach
There are following benefits of the Database Approach:
• Redundancy and duplication can be reduced. In the database approach, the
views of different user groups are integrated during database design. For
consistency, we should have a database design that stores each logical data item –
such as student’s name or birth date – in only one place in the database. This does
not permit inconsistency, and it saves time. However, in some cases, controlled
redundancy may be useful for improving the performance of queries.
• Inconsistency can be avoided (to some extent). Employee E4 works in
department D5 – is represented by two distinct entries in the stored database.
Suppose also that the DBMS is not aware of this duplication (i.e. redundancy is not
controlled). Then there will necessarily be an occasion on which the two entries
will not agree, i.e., when one of the two has been updated and the other has not. At
such times the database is said to be inconsistent.
• The data can be shared. Same database can be used by variety of users, for their
different objectives, simultaneously.
• Security restrictions can be applied. It is likely that some users is often will not
be authorized to access all information in the database. For example, financial data
is often considered confidential, and hence only authorized persons are allowed to
access such data. In addition, some users may be permitted only to retrieve data,
whereas others are allowed both to retrieve and to-update.
• Integrity can be maintained. The problem of integrity is the problem of ensuring
that the data in the database in accurate it means if the data type of any field is
number then we cannot insert any string text here.
4 Introduction to Databases
CODD RULES
Rule 1 : The information Rule.
"All information in a relational data base is represented explicitly at the logical level and in
exactly one way - by values in tables."
Everything within the database exists in tables and is accessed via table access routines.
Rule 2 : Guaranteed access Rule.
"Each and every datum (atomic value) in a relational data base is guaranteed to be logically
accessible by resorting to a combination of table name, primary key value and column name."
To access any data-item you specify which column within which table it exists, there is no
reading of characters 10 to 20 of a 255 byte string.
Rule 3 : Systematic treatment of null values.
"Null values (distinct from the empty character string or a string of blank characters and distinct
from zero or any other number) are supported in fully relational DBMS for representing missing
information and inapplicable information in a systematic way, independent of data type."
If data does not exist or does not apply then a value of NULL is applied, this is understood by the
RDBMS as meaning non-applicable data.
Rule 4 : Dynamic on-line catalog based on the relational model.
"The data base description is represented at the logical level in the same way as-ordinary data, so
that authorized users can apply the same relational language to its interrogation as they apply to
the regular data."
The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell
you the structure of the database.
Rule 5 : Comprehensive data sub-language Rule.
"A relational system may support several languages and various modes of terminal use (for
example, the fill-in-the-blanks mode). However, there must be at least one language whose
statements are expressible, per some well-defined syntax, as character strings and that is
comprehensive in supporting all the following items
• Data Definition
• View Definition
• Data Manipulation (Interactive and by program).
• Integrity Constraints
• Authorization.
Every RDBMS should provide a language to allow the user to query the contents of the RDBMS
and also manipulate the contents of the RDBMS.
Rule 6 : .View updating Rule
"All views that are theoretically updatable are also updatable by the system."
Not only can the user modify data, but so can the RDBMS when the user is not logged-in.
5 Introduction to Databases
Rule 7 : High-level insert, update and delete.
"The capability of handling a base relation or a derived relation as a single operand applies not
only to the retrieval of data but also to the insertion, update and deletion of data."
The user should be able to modify several tables by modifying the view to which they act as base
tables.
Rule 8 : Physical data independence.
"Application programs and terminal activities remain logically unimpaired whenever any
changes are made in either storage representations or access methods."
The user should not be aware of where or upon which media data-files are stored
Rule 9 : Logical data independence.
"Application programs and terminal activities remain logically unimpaired when information-
preserving changes of any kind that theoretically permit un-impairment are made to the base
tables."
User programs and the user should not be aware of any changes to the structure of the tables
(such as the addition of extra columns).
Rule 10 : Integrity independence.
"Integrity constraints specific to a particular relational data base must be definable in the
relational data sub-language and storable in the catalog, not in the application programs."
If a column only accepts certain values, then it is the RDBMS which enforces these constraints
and not the user program, this means that an invalid value can never be entered into this column,
whilst if the constraints were enforced via programs there is always a chance that a buggy
program might allow incorrect values into the system.
Rule 11 : Distribution independence.
"A relational DBMS has distribution independence."
The RDBMS may spread across more than one system and across several networks, however to
the end-user the tables should appear no different to those that are local.
Rule 12 : Non-subversion Rule.
"If a relational system has a low-level (single-record-at-a-time) language, that low level cannot
be used to subvert or bypass the integrity Rules and constraints expressed in the higher level
relational language (multiple-records-at-a-time)."
0. Foundation Rule
Interestingly Codd defined a Rule 0 for relational database systems.
"For any system that is advertised as, or claimed to be, a relational database management system,
that system must be able to manage databases entirely through its relational capabilities, no
matter what additional capabilities the system may support." (Codd, 1990)
That means, no matter what additional features a relational database might support, in order to be
truly called relational it must comply with the 12 rules.
6 Introduction to Databases
DATA MODELS
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. The various data
models that have been proposed fall into three different groups: object-based logical models,
record-based logical models, and physical models.
Object-Based Logical Models
Object-based logical models are used in describing data at the logical and view levels. They are
characterized by the fact that they provide fairly flexible structuring capabilities and allow data
constraints to be specified explicitly. There are many different models, and more are likely to
come. Several of the more widely known ones are
• The entity-relationship model
• The object-oriented model
The Entity-Relationship Model
The entity-relationship (E-R) data model is based on a perception of a real world that consists of a
collection of basic objects, called entities, and of relationships
Social-Security Customer-street Account-number Balance
Customer-name Customer-city
D
ep
Customer os Account
it or
Figure 1 A sample E-R diagram.
among these objects. An entity is a “thing” or “object” in the real world that is distinguishable
from other objects. For example, each person is an entity, and bank accounts can be considered to
be entities. Entities are described in a database by a set of attributes. For example, the attributes
account-number and balance describe one particular account in a bank. A relationship is an
association among several entities. For example, a Depositor relationship associates a customer
with each account that she has. The set of all entities of the same type, and the set and relationship
of the same type, are termed an entity set and relationship set, respectively.
In addition to entities and relationships, the E-R model represents certain constraints to which the
contents of a database must conform. One important constraint is mapping cardinalities, which
express the number of entities to which another entity can be associated via a relationship set.
The overall logical structure of a database can be expressed graphically by an E-R diagram, which
is built up from the following components:
• Rectangles, which represent entity sets
7 Introduction to Databases
• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
Each component is labeled with the entity or relationship that it represents.
The Object-Oriented Model
Like the E-R model, the object-oriented model is based on a collection of objects. An object
contains values stored in instance variables within the object. An object also contains bodies of
code that operate on the object. These bodies of code are called methods.
Objects that contain the same types of values and the same methods are grouped together into
classes. A class may be viewed as a type definition for objects. This combination of data and
methods comprising a type definition is similar to a programming-language abstract data type.
The only way in which one object can access the data of another object is by invoking a method of
that other object. This action is called sending a message to the object. Thus, the call interface of
the methods of an object defines that object’s externally visible part. The internal part of the
object-the instance variables and method code-are not visible externally. The result is two levels of
data abstraction.
To illustrate the concept, let us consider an object representing a bank account. Such an object
contains instance variables account-number and balance. It contains a method pay-interest, which
adds interest to the balance. Assume that the bank had been paying 6 percent interest on all
accounts, but now is changing its policy to pay 5 percent if the balance is less than $1000 or 6
percent if the balance is $1000 or greater. Under most data models, making this adjustment would
involve changing code in one or more application programs. Under the object-oriented model, the
only change is made within the pay-interest method. The external interface to the objects remains
unchanged.
bank
person branch
employee customer
teller secretary
Figure 2: Class hierarchy
Unlike entities in the E-R model, each object has its own unique identity, independent of the
values that it contains. Thus, two objects containing the same values are nevertheless distinct. The
distinction among individual objects is maintained in the physical level through the assignment of
distinct object identifiers.
8 Introduction to Databases
Record-Based Logical Models
Record-based logical models are used in describing data at the logical and view levels. In contrast
to object-based data models, they are used both to specify the overall logical structure of the
database and to provide a higher-level description of the implementation.
Record-based models are so named because the database structured in fixed-format records of
several types. Each record type defines a fixed number of fields, or attributes, and each field is
usually of a fixed length. The use of fixed-length records simplifies the physical-level
implementation of the database. This simplicity is in contrast to many of the object-based models,
whose richer structure often leads to variable-length records at the physical level.
The three most widely accepted record-based data models are the relational, network, and
hierarchical models. The relational model, which has gained favor over the other two in recent
years. The network and hierarchical models are still used in a large number of older databases.
Here, we present a brief overview of each model.
Relational Model
The relational model uses a collection of tables to represent both data and the relationships among
those data. Each table has multiple columns, and each column has a unique name. Figure 3
presents a sample relational database comprising of two tables: one shows bank customers, and the
other shows the accounts that belong to those customers. It shows, for example, that Customer
Johnson, with
Customer-name Social-security Customer-street Customer-city Account-
number
Johnson 192-83-7465 Alma Palo Alto A-101
Smith 019-28-3746 North Rye A-215
Hayes 677-89-9011 Main Harrison A-102
Turner 182-73-6091 Putnam Stamford A-305
Johnson 192-83-7465 Alma Palo alto A-201
Jones 321-12-3123 Main Harrison A-217
Lindsay 336-66-9999 Park Pittsfield A-222
Smith 019-28-3746 North Rye A-201
Account- Balance
number
A-101 500
A-215 700
A-102 400
A-305 350
A-201 900
A-217 750
A-222 700
Figure 3. A sample relational database
9 Introduction to Databases
Social-security number 192-83-7465, lives on Main in Harrison, and has two accounts: A-101,
with a balance of $500, and A-201, with a balance of $900. Note that customer Johnson and Smith
share account number A-201 (they may share a business venture)
Network Model
Data in the network model are represented by collections of records (in the Pascal sense), and
relationships among data are represented by links, which can be viewed as pointers. The records in
the database are organized as collections of arbitrary graphs. Figure 4 presents a sample network
database using the same information as in figure 3.
Figure 4. Network Model
Hierarchical Model
The hierarchical model is similar to the network model in the sense that data and relationships
among data are represented by records and links, respectively. It differs from the network model in
that the records are organized as collections of trees rather than arbitrary graphs. Figure 5 presents
a sample hierarchical database with the same information as in figure 4.
10 Introduction to Databases
Figure 5. A sample hierarchical database.
INSTANCES AND SCHEMAS
Database changes over time when and as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database. The
overall design of the database is called the database schema, schemas one changes infrequently, if
at all.
Analogies to the concepts of data types, variables and values in programming languages are useful
here. Returning to the customer record types definition, note that in declaring the type of customer,
we have not declared any variables. To declare such variables in a Pascal-like language, we write
Var customer: customer; variable customer2 now corresponds to an area of storage containing a
customer type record.
A database schema corresponds to the programming-language type definition. A variable of a
given type has a particular value at a given instant. Thus, the value of a variable in programming
languages corresponds to an instance of a database schema. In other words “the description of a
database is called the database schema, which is specified during database design and is not
expected to change frequently”, A displayed schema is called a schema diagram.
E.g. student-schema.
Name. Roll No Class Mayan
Cours
Course No Department
e.
Schema diagram
11 Introduction to Databases
A schema diagram displays only some aspects of a schema, such as the names of record types and
data items, and some types of constraints. Other aspects are not specified in the schema diagram.
As in the above diagram they’re neither in data type of each data item, nor in the relationships
among the various files.
DATA ABSTRACTION
For the system to be usable, it must retrieve data efficiently. This concern has led to the design of
complex data structures for the representation of data in the database. Since many database-system
users are not computer trained, developers hide the complexity from users through several levels
of abstraction, to simplify users’ interactions with the systems:
• Physical level. The lowest level of abstraction describes how the data are actually
stored. At the physical level, complex low-level data structures are described in detail.
• Logical level. The next higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The entire database is thus
described in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex
physical-level structures, the user of the logical level does not need to be aware of this
complexity. Database administrators, who must decide what information is to be kept
in the database, use the logical level of abstraction.
• View level. The highest level of abstraction describes only part of the entire database.
Despite the use of simpler structures at the logical level, some complexity remains,
because of the large size of the database. Many users of the database system will not be
concerned with all this information. Instead, such users need to access only a part of the
database. So that their interaction with the system is simplified, the view level of
abstraction is defined. The system may provide many views for the same database.
The interrelationship among these three levels of abstraction is illustrated in Figure given
below.
The Three Levels of Data Abstraction
12 Introduction to Databases
An analogy to the concept of data types in programming languages may clarify the distinction
among levels of abstraction. Most high-level programming languages support the notion of a
record type. For example, in a Pascal-like language, we may declare a record as follows:
type customer = record
customer-name : string;
social-security : string;
customer-street : string;
customer-city : string;
end
This code defines a new record called customer with three fields. Each field has a name and a type
associate with it. A banking enterprise may have several such record types, including
• Account, with fields account-number and balance
• Employee, with fields employee-name and salary
At the physical level, a customer, account, or employee record can be described as a block of
consecutive storage locations (for example, words or bytes). The language compiler hides this
level of detail from programmers. Similarly, the database system hides many of the lowest-level
storage details from database programmers. Database administrators may be aware of certain
details of the physical organization of the data.
At the logical level, each such record is described by a type definition, as illustrated in the
previous code segment, and the interrelationship among these record types is defined.
Programmers using a programming language work at this level of abstraction. Similarly, database
administrators usually work at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide details of the
data types. Similarly, at the view level, several views of the database are defined, and database
users see these views. In addition to hiding details of the logical level of the database, the views
also provide a security mechanism to prevent users from accessing parts of the database. For
example, tellers in a bank see only that part of the database that has information on customer
accounts; they cannot access information concerning salaries of employees.
13 Introduction to Databases