Chapter 1
The difference between data and
information
What a database is, about different types
of databases, and why they are valuable
assets for decision making
Why database design is important
How modern databases evolved from files
and file systems
About flaws in file system data
management
How a database system differs from a file
system, and how a DBMS functions within
the database system
Data:
◦ Raw facts; building blocks of information
◦ Unprocessed information
Information:
◦ Data processed to reveal meaning
Accurate, relevant, and timely information
is key to good decision making
Good decision making is key to survival in
global environment
Database—is an organized collection of
relevant data.
A database should have shared, integrated
computer structure that contains:
◦ End user data (raw facts)
◦ Metadata (data about data)
DBMS (database management system):
◦ Collection of programs that manages database
structure and controls access to data
◦ Possible to share data among multiple applications
or users
◦ Makes data management more efficient and
effective
End users have better access to more and
better-managed data
◦ Promotes integrated view of organization’s
operations
◦ Probability of data inconsistency is greatly reduced
◦ Possible to produce quick answers to ad hoc
queries
Single-user:
◦ Supports only one user at a time
Desktop:
◦ Single-user database running on a personal
computer
Multi-user:
◦ Supports multiple users at the same time
Workgroup:
◦ Multi-user database that supports a small group of
users or a single department
Enterprise:
◦ Multi-user database that supports a large group of
users or an entire organization
Centralized:
◦ A centralized database is stored at a single location.
Supports data located at a single site.
Distributed:
◦ Supports data distributed across several sites.
1. The data integrity is maximized as the whole
database is stored at a single physical location.
This means that it is easier to coordinate the data
and it is as accurate and consistent as possible.
2. The data redundancy is minimal in the
centralized database. All the data is stored
together and not scattered across different
locations. So, it is easier to make sure there is no
redundant data available.
3. Since all the data is in one place, there can be
stronger security measures around it. So, the
centralized database is much more secure.
4. Data is easily portable because it is stored at the
same place.
5. The centralized database is cheaper than other
types of databases as it requires less power and
maintenance.
6. All the information in the centralized database
can be easily accessed from the same location
and at the same time.
1. Since all the data is at one location, it takes more time
to search and access it. If the network is slow, this
process takes even more time.
2. There is a lot of data access traffic for the centralized
database. This may create a bottleneck situation.
3. Since all the data is at the same location, if multiple
users try to access it simultaneously it creates a
problem. This may reduce the efficiency of the system.
4. If there are no database recovery measures in place
and a system failure occurs, then all the data in the
database will be destroyed.
Traditionally composed of collection of file
folders kept in file cabinet
Organization within folders was based on
data’s expected use (ideally logically related)
System was adequate for small amounts of
data with few reporting requirements
Finding and using data in growing
collections of file folders became time-
consuming and cumbersome
Could be technically complex, requiring
hiring of data processing (DP) specialists
DP specialists created file structures,
wrote software, and designed application
programs
Resulted in numerous “home-grown”
systems being created
Initially, computer files were similar in
design to manual files
As number of databases increased, small file
system evolved
Each file used its own application programs
Each file was owned by individual or
department who commissioned its creation
Every task requires extensive programming in
a third-generation language (3GL)
◦ Programmer must specify task and how it must be
done
Modern databases use fourth-generation
language (4GL)
◦ Allows user to specify what must be done without
specifying how it is to be done
Time-consuming, high-level activity
Programmer must be familiar with physical
file structure
As system becomes complex, access paths
become difficult to manage and tend to
produce malfunctions
Complex coding establishes precise
location of files and system components
and data characteristics
Ad hoc queries are impossible
Writing programs to design new reports is
time consuming
As number of files increases, system
administration becomes difficult
Making changes in existing file structure is
difficult
File structure changes require modifications
in all programs that use data in that file
Modifications are likely to produce errors,
requiring additional time to “debug” the
program
Security features hard to program and
therefore often omitted
Structural dependence
◦ Access to a file depends on its structure
Data dependence
◦ Changes in database structure affect program’s
ability to access data
◦ Logical data format
How a human being views the data
◦ Physical data format
How the computer “sees” the data
Flexible record definition anticipates
reporting requirements by breaking up fields
into their component parts
Data redundancy results in data
inconsistency
◦ Different and conflicting versions of the same
data appear in different places
Errors more likely to occur when complex
entries are made in several different files
and recur frequently in one or more files
Data anomalies develop when required
changes in redundant data are not made
successfully
Modification anomalies
◦ Occur when changes must be made to existing
records
Insertion anomalies
◦ Occur when entering new records
Deletion anomalies
◦ Occur when deleting records
1. Data Redundancy:
It is possible that the same information may
be duplicated in different files. This leads to
data redundancy results in memory wastage.
2. Data Inconsistency:
Because of data redundancy, it is possible
that data may not be in consistent state.
3. Difficulty in Accessing Data:
Accessing data is not convenient and efficient
in file processing system.
4. Limited Data Sharing:
Data are scattered in various files. Also different
files may have different formats and these files
may be stored in different folders may be of
different departments.
5. Integrity Problems:
Data integrity means that the data contained in the
data file must be correct and consistent, which is
not 100% possible.
6. Dependency on application programs.
7. Data Security
1. No redundant data: Redundancy removed by
data normalization. No data duplication
saves storage and improves access time.
2. Data Consistency and Integrity: As we
discussed earlier the root cause of data
inconsistency is data redundancy, since data
normalization takes care of the data
redundancy, data inconsistency also been
taken care of as part of it.
3. Data Security: It is easier to apply access
constraints in database systems so that only
authorized user is able to access the data.
Each user has a different set of access thus
data is secured from the issues such as
identity theft, data leaks and misuse of data.
4. Privacy: Limited access means privacy of data.
5. Easy access to data : Database systems
manages data in such a way so that the data
is easily accessible with fast response times.
6. Easy recovery: Since database systems keeps
the backup of data, it is easier to do a full
recovery of data in case of a failure.
7. Flexible: Database systems are more flexible
than file processing systems.
DBMS is composed of 5 main parts:
1. Hardware
2. Software
Operating system software
DBMS software
Application programs and utility software
3. People
Programmers
End User
Database Administrator (DBA)
4. Procedures
5. Data
Performs functions that guarantee integrity
and consistency of data
◦ Data dictionary management
defines data elements and their relationships
◦ Data storage management
stores data and related data entry forms, report
definitions, etc.
◦ Data transformation and presentation
translates logical requests into commands to
physically locate and retrieve the requested data
◦ Security management
enforces user security and data privacy within
database
◦ Multi-user access control
creates structures that allow multiple users to
access the data
◦ Backup and recovery management
provides backup and data recovery procedures
◦ Data integrity management
promotes and enforces integrity rules to eliminate
data integrity problems
◦ Database access languages and application
programming interfaces
provides data access through a query language
◦ Database communication interfaces
allows database to accept end-user requests within
a computer network environment