Introduction to
Database Systems
"The numbers have no way of speaking for themselves. We speak for them.
We imbue them with meaning." —Statistician Nate Silver in the book
The Signal and the Noise
Compiled and presented by:
Subarna Shrestha.
Lectures
Lecture slides in ppt format will be posted shortly before or after
the lecture
are to complement the lectures
Many issues discussed in the lectures will be covered in the
exams
hence try to attend lectures regularly
Will not cover ALL materials on the slides
attending lectures will tell you which is covered and which is
not
Terminologies
Data are plain facts. Think of data as a "raw material“- it needs to be
processed before it can be turned into something useful.
Data can be any character, text, word, number, and, if not put into
context, means little or nothing to a human.
Information is data that has been processed in such a way as to be
meaningful to the person who receives it.
Characteristics of information are Timely, Accuracy, Completeness .
Knowledge
Human mind purposefully organized the information and evaluate it to
produce knowledge. In other words the ability of the person recalls or
uses his information and experience is known as knowledge.
For example,
"386" is data,
"your marks are 386" is information,
And
"It is result of your hard work" is knowledge.
Terminologies cont…
Data, Information and Knowledge
Terminologies cont…
Database(A large collection of data.)
•is collection of related data and its metadata organized in a
structured format for optimized information management
•Examples: databases of customers, products,...
•A database usually models (some part of) a real- world enterprise.
Entities (e.g., students, courses)
Relationships (e.g., John Doe is taking DBS)
Database Management System (DBMS)
•is a software that enables easy creation, access, and modification of
databases for efficient and effective database management
•Many vendors: IBM, Sybase, Oracle, Microsoft, etc
Database System
is an integrated system of hardware, software, people, procedures,
and data that define and regulate the collection, storage,
management, and use of data within a database environment
Data Hierarchy- the systematic organization of data
Data Hierarchy-cont…
Database System versus File System
DBMS File Processing System
Minimal data redundancy problem in
Data Redundancy problem exits
DBMS
Data Inconsistency does not exist Data Inconsistency exist here
Accessing database is easier Accessing is comparatively difficult
Data is scattered in various files and files
The problem of data isolation is not
may be of different format, so data
found in database
isolation problem exists
Transactions like insert, delete, view, In file system, transactions are not
updating, etc are possible in database possible
Concurrent access and recovery is Concurrent access and recovery is not
possible in database possible
Security of data Security of data is not good
A database manager (administrator) A file manager is used to store all
stores the relationship in form of relationships in directories in file
Why learn about databases?
It used to be about boring things: employee records, bank records,
etc.
Today, the field covers all aspects of working with data:
Web search
Data mining
Scientific and medical databases
Integrating information
Databases are behind almost everything you do on the Web
Google searches
Queries at Amazon, eBay, etc.
Trip planning (expedia etc)
What is Data Management?
• Find data (search and query)
Query: Find the average enrollment in database courses at
BICTE?
How could we find this using a conventional
search within file system?
Do we get what we want?
Why is this hard?
How could we find this using a Database
Management System (DBMS)?
• Update or modify data
• Ensure data consistency
• Protect data
from unauthorized access (access control)
from failures (recovery)
from other programs or users (concurrency control)
Database Management System
- manages interaction between end users and database
Database Systems: Design, Implementation, & Management: Rob & Coronel
Database System Environment
Hardware
Software
- OS
- DBMS
- Applications
People
Procedures
Data
Database Systems: Design, Implementation, & Management: Rob & Coronel
Simplified database system environment
Database: Why
Purpose of Database
•Optimizes data management
•Transforms data into information
Importance of Database Design
•Defines the database’s expected use
-different approach needed for different types of databases
•Avoid data redundancy & ensure data integrity
-data is accurate and verifiable
•Poorly designed database generates errors
-leads to bad decisions
-can lead to failure of organization
Functions of DBMS/Database System
•Stores data and related data entry forms, report definitions, etc.
•Hides the complexities of relational database model from the user
-facilitates the construction/definition of data elements and their
relationships
-enables data transformation and presentation
•Enforces data integrity
•Implements data security management
-access, privacy, backup & restoration
Database: How
•Planning & Analysis
-Assess
•Goal of the organization
•Database environment
•existing hardware, software, raw data, data processing
procedures
-Identify
Database needs
•what database can do to further the goal of the organization
User needs and characteristics
•who the users are, what they want to do, how they envision
doing it
Database system requirements
•what the database system should do to satisfy the database
and user needs
•Design
From conceptual design to a detailed system specification
•Implementation
Create the database
•Maintenance
Troubleshoot, update, streamline the database
Database: User-centered
Perspective
The user is always right. If there is a problem with the use of the
system, the system is the problem, not the user.
Compliance
The user has the right to a system that performs exactly as promised.
Instruction
The user has the right to easy-to-use instructions (user guides, online
or contextual help, error messages) for understanding and utilizing a
system to achieve desired goals and recover efficiently and gracefully
from problem situations.
Usability
The user should be the master of software and hardware technology,
not vice-versa. Products should be natural and intuitive to use.
Database: Data Models
Importance
Abstraction of complex real-word data structures in relative simple
(graphical) representations
Facilitate interaction among the designer, the applications
programmer, and the end user
Basic Building Blocks
Entity
thing about which data are to be collected and stored
Attribute
a characteristic of an entity
Relationship
describes an association among entities
Constraint
restrictions placed on the data
Typical DBMS Functionality
Define a particular database in terms of its data types, structures,
and constraints
Construct or Load the initial database contents on a secondary
storage device
Manipulating the database:
Retrieval: Querying, generating reports
Modification: Insertions, deletions and updates to its content
Accessing the database through Web applications
Processing and Sharing by a set of concurrent users and application
programs – yet, keeping all data valid and consistent
Other features:
Protection or Security measures to prevent unauthorized access
Presentation and Visualization of data
Maintaining the database and associated programs over the
lifetime of the database application
What Can a DBMS Do for Us?
•Data Definition Language – DDL
A syntax similar to a computer programming language for defining data
structures, especially database schemas.
•Data Manipulation Language – DML
Query Language
•Storage management
•Transaction Management
concurrency control
Recovery
Automate a lot of boring/mundane operations on data
so that we don’t have to program over and over
so that we can write complex data manipulations in just a few lines, so that we can
concentrate on app logics
Make execution very fast
so that it scales up to very large data sets
Make concurrent access/modification possible
so that many users can use the data at the same time
Think buying a plane ticket! Can you do it without a DBMS?
Why Use a DBMS?
Self-describing nature of a database system:
A DBMS catalog stores the description of a particular database
(e.g. data structures, types, and constraints)
The description is called meta-data.
Data independence
You don’t need to know the implementation of the database to
access the data
Applications insulated from how data is structured and stored
change the order of tuples
add or modify other columns
add or modify indexes
Note that query does not change when physical structure changes
One of the most important benefits of using a DBMS
Why Use a DBMS? Cont…
Efficient access
queries are optimized.
Reduced application development time
Queries can be expressed declaratively, we do not need to
indicate how to execute them
Data integrity and security
Some constraints on the data are enforced
automatically.
Data Consistency
Data Constraints:
All students must have a student ID (sID)
No two students can have the same sID (uniqueness)
A student may only have one grade per course
Etc.
Why Use a DBMS? Cont…
Concurrent access, recovery from crashes
Many users can access/update the database at the same
time without any interference.
Concurrent execution of user programs is essential for good
DBMS performance.
Because disk accesses are frequent, and relatively slow
Interleaving actions of different user programs can lead to
inconsistency:
A cheque is cleared while account balance is being
computed.
DBMS ensures that such problems do not arise: users can
behave as if they were using a single- user system.
Speed – even when the data is huge, i.e.
IRS: 150 TB (1 TB ≈ 1012 B)
Yahoo: 2 PB (1 PB ≈ 1015 B)
National Energy Research Scientific Computing Center (USA):
3.5 PB
Why Use a DBMS? Cont…
Data Abstraction
•Many views,
•single logical schema and physical schema.
•Views (external schemas) describe how users see the data.
•Logical schema defines logical structure.
•Physical schema describes the files and indexes used.
Example: University Database View 1 View 2 View 3
Conceptual schema:
Logical Schema
Student(sid: string, name: string, login: string, age: integer, gpa:real)
Course(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string, grade:string) Physical Schema
• describes data in terms of the data model of the DBMS
Physical schema:
Relations stored as unordered files.
Index on first column of Students.
External Schema (View):
Course_info(cid:string,enrollment:integer)
Describing Data: What is a Data Model?
Mathematical representation of data
relational model = tables;
semistructured model = trees/graphs.
AND
Operations on data
Constraints
A schema is a description of a particular collection of data, using a
given data model.
The relational model of data is the most widely used model today.
Main concept: relation, basically a table with rows and columns.
Every relation has a schema, which describes the columns, or
fields.
Example Instance of Student Relation
sID Name Login Age GPA
53666 Jones Jones@cs 20 3.2
45453 Smith Smith@ai 19 3.1
Columns are attributes
Rows are tuples
The SQL Query Language
Find all students who are 20 years old
SELECT *
FROM Students WHERE age = 20
sID Name Login Age GPA
53666 Jones Jones@cs 20 3.2
Database Users
Three groups:
End users
Database administrators
Database developers
Historical Development of Database
Technology
Early Database Applications:
The Hierarchical and Network Models were introduced
in the mid 1960s and dominated during the 1970s.
Relational Model based Systems:
Relational model was originally introduced in 1969
(40th anniversary is celebrated this year!) by E.F. Codd
at IBM.
Relational DBMS Products emerged in the early
1980s.
Historical Development of Database
Technology (cont'd)
Object-oriented and emerging applications:
Object-Oriented Database Management Systems
(OODBMSs) were introduced in the late 1980s and
early 1990s to cater to the need of complex data
processing - but failed to take off.
Many relational DBMSs have incorporated object
database concepts, leading to a new category called
object-relational DBMSs (ORDBMSs)
Extended relational systems add further capabilities
(e.g. for multimedia data, XML, and other data types)
Historical Development of Database
Technology (cont'd)
Data on the Web and E-commerce Applications:
Web contains data in HTML with links among
pages.
This has given rise to a new set of applications
and E-commerce is using new standards like XML
(eXtended Markup Language).
Script programming languages such as PHP and
JavaScript allow generation of dynamic Web
pages that are partially generated from a
database
Also allow database updates through Web pages
Evolution of Data Models
Timeline
1960s 1970s 1980s 1990s 2000+
File-based
Hierarchical
Object-oriented
Network
Relational Web-based
Entity-Relationship
Database: Historical Roots
Manual File System
to keep track of data
used tagged file folders in a filing cabinet
organized according to expected use
e.g. file per customer
easy to create, but hard to
locate data
aggregate/summarize data
Computerized File System
to accommodate the data growth and information need
manual file system structures were duplicated in the computer
Data Processing (DP) specialists wrote customized programs to
write, delete, update data (i.e. management)
extract and present data in various formats (i.e. report)
File System: Example
Database Systems: Design, Implementation, & Management: Rob & Coronel
File System: Weakness
Weakness
“Islands of data” in scattered file systems.
Problems
Duplication
same data may be stored in multiple files
Inconsistency
same data may be stored by different names in different format
Rigidity
requires customized programming to implement any changes
cannot do ad-hoc queries
Implications
Waste of space
Data inaccuracies
High overhead of data manipulation and maintenance
File System: Problem Case
CUSTOMER file AGENT file SALES file
A_Name (15 char) A_Name (20 char) AGENT (20 char)
Carol Johnson Carol T. Johnson Carol J. Smith
- inconsistent field name, field size
- inconsistent data values
- data duplication
Database System vs. File System
Database Systems: Design, Implementation, & Management: Rob & Coronel
Hierarchical Database
Background
Developed to manage large amount of data for complex manufacturing
projects
e.g., Information Management System (IMS)
IBM-Rockwell joint venture
clustered related data together
hierarchically associated data clusters using pointers
Hierarchical Database Model
Assumes data relationships are hierarchical
One-to-Many (1:M) relationships
Each parent can have many children
Each child has only one parent
Logically represented by an upside down tree
Hierarchical Database: Example
Database Systems: Design, Implementation, & Management: Rob & Coronel
Hierarchical Database: Pros & Cons
Advantages
Conceptual simplicity
groups of data could be related to each other
related data could be viewed together
Centralization of data
reduced redundancy and promoted consistency
Disadvantages
Limited representation of data relationships
did not allow Many-to-Many (M:N) relations
Complex implementation
required in-depth knowledge of physical data storage
Structural Dependence
data access requires physical storage path
Lack of Standards
limited portability
Network Database
Objectives
Represent more complex data relationships
Improve database performance
Impose a database standard
Network Database Model
Similar to Hierarchical Model
Records linked by pointers
Composed of sets
Each set consists of owner (parent) and member (child)
Many-to-Many (M:N) relationships representation
Each owner can have multiple members (1:M)
A member may have several owners
Network Database: Example
Database Systems: Design, Implementation, & Management: Rob & Coronel
Network Database: Pros & Cons
Advantages
More data relationship types
More efficient and flexible data access
“network” vs. “tree” path traversal
Conformance to standards
enhanced database administration and portability
Disadvantages
System complexity
require familiarity with the internal structure for data access
Lack of structural independence
small structural changes require significant program changes
Relational Database
Problems with legacy database systems
Required excessive effort to maintain
Data manipulation (programs) too dependent on physical file structure
Hard to manipulate by end-users
No capacity for ad-hoc query (must rely on DB programmers).
Evolution in Data Organization
E. F. Codd’s Relational Model proposal
Separated the notion of physical representation (machine-view)
from logical representation (human-view)
Considered ingenious but computationally impractical in 1970
Relational Database Model
Dominant database model of today
Eliminated pointers and used tables to represent data
Tables
flexible logical structure for data representation
a series of row/column intersections
related by sharing common entity characteristic(s)
Relational Database: Example
Provides a logical “human-level” view of the data and associations among groups of data
(i.e., tables)
Customer_ID Customer_Account Agent_ID
1224 4556 23
1225 4558 25
Agent_ID Last_Name First_Name Phone
23 Sturm David 334-5678
25 Long Kyle 556-3421
Customer_ID Last_Name First_Name Phone Account_Balance
1224 Vira Dyne 678-9987 1223.95
1225 Davies Tricia 556-3342 234.25
Relational Database: Pros & Cons
Advantages
Structural independence
Separation of database design and physical data storage/access
Easier database design, implementation, management, and use
Ad hoc query capability with Structured Query Language (SQL)
SQL translates user queries to codes
Disadvantages
Substantial hardware and system software overhead
more complex system
Poor design and implementation is made easy
ease-of-use allows careless use of RDBMS
Entity Relationship Model
Peter Chen’s Landmark Paper in 1976
“The Relationship Model: Toward a Unified View of Data”
Graphical representation of entities and their relationships
Entity Relationship (ER) Model
Based on Entity, Attributes & Relationships
Entity is a thing about which data are to be collected and stored
e.g. EMPLOYEE
Attributes are characteristics of the entity
e.g. SSN, last name, first name
Relationships describe an associations between entities
i.e. 1:M, M:N, 1:1
Complements the relational data model concepts
Helps to visualize structure and content of data groups
entity is mapped to a relational table
Tool for conceptual data modeling (higher level representation)
Represented in an Entity Relationship Diagram (ERD)
Formalizes a way to describe relationships between groups of data
E-R Diagram: Chen Model
Entity
represented by a rectangle with its name in capital
letters.
Relationships
represented by an active or passive verb inside the
diamond that connects the related entities.
Connectivities
i.e., types of relationship
written next to each entity box.
Database Systems: Design, Implementation, & Management: Rob & Coronel
E-R Diagram: Crow’s Foot Model
Entity
represented by a rectangle with its name in
capital letters.
Relationships
represented by an active or passive verb
that connects the related entities.
Connectivities
indicated by symbols next to entities.
2 vertical lines for 1
“crow’s foot” for M
Database Systems: Design, Implementation, & Management: Rob & Coronel
E-R Model: Pros & Cons
Advantages
Exceptional conceptual simplicity
easily viewed and understood representation of database
facilitates database design and management
Integration with the relational database model
enables better database design via conceptual modeling
Disadvantages
Incomplete model on its own
Limited representational power
cannot model data constraints not tied to entity relationships
e.g. attribute constraints
cannot represent relationships between attributes within entities
No data manipulation language (e.g. SQL)
Loss of information content
Hard to include attributes in ERD
Object-Oriented Database
Semantic Data Model (SDM)
Modeled both data and their relationships in a single structure (object)
Developed by Hammer & McLeod in 1981
Object-oriented concepts became popular in 1990s
Modularity facilitated program reuse and construction of complex structures
Ability to handle complex data types (e.g. multimedia data)
Object-Oriented Database Model (OODBM)
Maintains the advantages of the ER model but adds more features
Object = entity + relationships (between & within entity)
consists of attributes & methods
attributes describe properties of an object
methods are all relevant operations that can be performed on an object
self-contained abstraction of real-world entity
Class = collection of similar objects with shared attributes and methods
e.g. EMPLOYEE class = (employ1 object, employ2 object, …)
organized in a class hierarchy
e.g. PERSON > EMPLOYEE, CUSTOMER
Incorporates the notion of inheritance
attributes and methods of a class are inherited by its descendent classes
OO Database Model vs. E-R Model
OODBM:
- can accommodate relationships within a object
- objects to be used as building blocks for autonomous structures
Database Systems: Design, Implementation, & Management: Rob & Coronel
Object-Oriented Database: Pros &
Cons
Advantages
Semantic representation of data
fuller and more meaningful description of data via object
Modularity, reusability, inheritance
Ability to handle
complex data
sophisticated information requirements
Disadvantages
Lack of standards
no standard data access method
Complex navigational data access
class hierarchy traversal
Steep learning curve
difficult to design and implement properly
More system-oriented than user-centered
High system overhead
slow transactions
Web Database
Internet is emerging as a prime business tool
Shift away from models (e.g. relational vs. O-O)
Emphasis on interfacing with the Internet
Characteristics of “Internet age” databases
Flexible, efficient, and secure Internet access
Support for complex data types & relationships
Seamless interfaces with multiple data sources and structures
Ease of use for end-user, database architect, and database administrator
Simplicity of conceptual database model
Many database design, implementation, and application development tools
Powerful DBMS GUI
In 2-tier architecture
application program directly interacts with the database.
Advantages of 2-tier Architecture
•Easy to understand as it directly communicates with the database.
•Requested data can be retrieved very quickly, when there is less number of users.
•Easy to modify – any changes required, directly requests can be sent to database
•Easy to maintain – When there are multiple requests, it will be handled in a queue
and there will not be any chaos.
Disadvantages of 2-tier architecture:
•It would be time consuming, when there is huge number of users. All the requests will
be queued and handed one after another. Hence it will not respond to multiple users
at the same time.
•This architecture would little cost effective.
3-tier Architecture
3-tier architecture is the most widely used database architecture. It can be viewed as below.
Presentation layer / User layer is the layer where user uses the database. He does not have
any knowledge about underlying database. He simply interacts with the database as though
he has all data in front of him. You can imagine this layer as a registration form where you will
be inputting your details. Did you ever guessed, after pressing ‘submit’ button where the data
goes? No right? You just know that your details are saved. This is the presentation layer where
all the details from the user are taken, sent to the next layer for processing.
Application layer is the underlying program which is responsible for saving the details that you
have entered, and retrieving your details to show up in the page. This layer has all the business
logics like validation, calculations and manipulations of data, and then sends the requests to
database to get the actual data. If this layer sees that the request is invalid, it sends back the
message to presentation layer. It will not hit the database layer at all.
Data layer or Database layer is the layer where actual database resides. In this layer, all the
tables, their mappings and the actual data present. When you save you details from the front
end, it will be inserted into the respective tables in the database layer, by using the programs
in the application layer. When you want to view your details in the web browser, a request is
sent to database layer by application layer. The database layer fires queries and gets the data.
These data are then transferred to the browser (presentation layer) by the programs in the
application layer.
Advantages of 3-tier architecture:
•Easy to maintain and modify. Any changes requested will not affect any other data in the
database. Application layer will do all the validations.
•Improved security. Since there is no direct access to the database, data security is increased.
There is no fear of mishandling the data. Application layer filters out all the malicious actions.
•Good performance. Since this architecture cache the data once retrieved, there is no need to
hit the database for each request. This reduces the time consumed for multiple requests and
hence enables the system to respond at the same time.
Disadvantages 3-tier Architecture
•it is little more complex and little more effort is required in terms of hitting the database.
Extending Database Capabilities
New functionality is being added to DBMSs in
the following areas:
Scientific Applications
XML (eXtensible Markup Language)
Image Storage and Management
Audio and Video Data Management
Data Warehousing and Data Mining
Spatial Data Management
Time Series and Historical Data Management
Interoperability, integrating data from different sources