System Databases
Introduction, Features
THE RELATIONAL DATABASE
MODEL
There are many types of databases
The relational database model is the most
popular
Relational database uses a series of logically
related two-dimensional tables or files to
store information in the form of a database
Databases Are
Collections of information
Created with logical structures
With logical ties within the information
With built-in integrity constraints
Databases Collections of
Information
Databases have many tables
Consider Solomon Enterprises that provides
concrete to home and commercial builders.
Tables or files include:
Order
Customer
Concrete Type
Employee
Truck
Databases Collections of
Information
3-5
Databases Created with
Logical Structures
In databases, the row number is irrelevant
Not true in spreadsheet software
In databases, column names are very
important. Column names are created in the
data dictionary
Data dictionary contains the logical
structure of the information in a database
Databases With Logical Ties
Within the Information
Logical ties must exist between the tables or
files in a database
Logical ties are created with primary and
foreign keys
Primary key field (or group of fields in some
cases) that uniquely describes each record
Databases With Logical Ties
Within the Information
Foreign key primary key of one file that
appears in another file
Foreign keys help you create logical ties within
the information in a database
Databases With Logical Ties
Within the Information
Databases With Built-In
Integrity Constraints
Integrity constraints rules that help ensure
the quality of the information
Examples
Primary keys must be unique
Foreign keys must be present
Sales price cannot be negative
Phone number must have area code
DATABASE MANAGEMENT
SYSTEM TOOLS
Database management system (DBMS)
helps you specify the logical organization for a
databases and access and use the information
within a database
Word processing software = document
Spreadsheet software = workbook
DBMS software = database
DATABASE MANAGEMENT
SYSTEM TOOLS
5 software components:
1.
2.
3.
4.
5.
DBMS engine
Data definition subsystem
Data manipulation subsystem
Application generation subsystem
Data administration subsystem
DATABASE MANAGEMENT
SYSTEM TOOLS
DBMS Engine
DBMS engine accepts logical requests from
the various other DBMS subsystems, converts
them into their physical equivalent, and
actually accesses the database and data
dictionary as they exist on a storage device
DBMS engine separates the logical from the
physical
DBMS Engine
Physical view how information is physically
arranged, stored, and accessed on some type
of storage device
Logical view how you as a knowledge
worker need to arrange and access
information
With a database, you only concern yourself
with your logical view
Data Definition Subsystem
Data definition subsystem helps you create
and maintain the data dictionary and define
the structure of the files in a database
You must create a data dictionary before
entering information into a database
Module J covers this for Microsoft Access
Data Manipulation Subsystem
Data manipulation subsystem helps you
add, change, and delete information
This is your primary DBMS interface as you
work with a database
Views
Report generators
QBE tools
SQL
Views
View allows you to see the contents of a
database file
Make whatever changes you want
Perform simple sorting
Query to find the location of information
Looks similar to a workbook with no row numbers
Views
Report Generators
Report generator helps you quickly define
formats of reports and what information you
want to see in a report
You can save report formats and generate
reports at any time with up-to-date
information
Report Generators
Report Generators
QBE Tools
Query-by-example (QBE) tool helps you
graphically design the answer to a question
What driver most often delivers concrete to
Triple A Homes?
QBE Tools
3-24
SQL
Structured query language (SQL)
standardized fourth-generation language
found in most DBMSs
Performs the same task as a QBE tool
But uses a sentence structure instead of pointand-click interface
SQL is used mostly by IT people
Application Generation
Subsystem
Application generation subsystem contains
facilities to help you develop transactionintensive applications
Data entry screen (called forms)
Programming languages
Used mostly by IT specialists
Data Administration Subsystem
Data administration subsystem helps you
manage the overall database environment
Backup and recovery
Security management
Query optimization
Concurrency control
Change management
Data Administration Subsystem
Backup and recovery
Periodically back up information
Recover a database if a failure occurs
Security management
Who has access to what information
Who can perform certain tasks (e.g., add, change,
or delete) on information
Data Administration Subsystem
Query optimization
Restructure physical view of information to
optimize response times to queries
Concurrency control
What happens if two people makes changes to the
same information at the same time?
Data Administration Subsystem
Change management
What is the effect of structural changes to a
database?
What if you add a new column?
What happens if you delete a column?
What happens if you change a columns
attributes?
DATA WAREHOUSES AND DATA
MINING
Data warehouses support OLAP and decision
making
Data warehouses do not support OLTP
Data-mining tools are the tools you use to
work with a data warehouse
DBMS software = database
Data-mining tools = data warehouse
What Is a Data Warehouse?
Data warehouse logical collection of
information gathered from operational
databases used to create business
intelligence that supports business analysis
activities and decision-making tasks
What Is a Data Warehouse?
What Is a Data Warehouse?
Multidimensional
Rows and columns
Also layers
Many times called hypercubes
What are the dimensions in Figure 3.8 on page
142?
What Are Data-Mining Tools?
Data-mining tools software tools that you
use to query information in a data warehouse
Query-and-reporting tools
Intelligence agents
Multidimensional analysis tools
Statistical tools
What Are Data-Mining Tools?
Query-And-Reporting Tools
Query-and-reporting tools similar to QBE
tools, SQL, and report generators in the typical
database environment
Intelligent Agents
Use various artificial intelligence tools such as
neural networks and fuzzy logic to form the
basis for information discovery and building
business intelligence
Help you find hidden patterns in information
Chapter 4 focuses more on these
Multidimensional Analysis
Tools
Multidimensional analysis (MDA) tools
slice-and-dice techniques that allow you to
view multidimensional information from
different perspectives
Bring new layers to the front
Reorganize rows and columns
Statistical Tools
Help you apply various mathematical models
to the information stored in a data warehouse
to discover new information
Regression
Analysis of variance
And so on
MANAGING THE INFORMATION
RESOURCE
Information is an organizational resource
Just like people, capital, and equipment
It must be managed effectively
MANAGING THE INFORMATION
RESOURCE
Who should oversee your organizations
information resource?
Chief information officer (CIO) oversees an
organizations information resource
Data administration plans for, oversees the
development of, and monitors the information
resource
Database administration technical and
operational aspects of managing information
MANAGING THE INFORMATION
RESOURCE
Is information ownership a consideration?
If you create information, you own it
You will also share it with others
Because you own it, you are responsible for its
quality
MANAGING THE INFORMATION
RESOURCE
How clean must your information be?
Duplicate information (records) must be
eliminated
Inaccurate information must be corrected
Information forms the basis of business
intelligence
If your business intelligence is bad, you will make
poor decisions
Files and Databases
File: A collection of records or documents
dealing with one organization, person, area or
subject (Rowley)
Manual (paper) files
Computer files
Database: A collection of similar records with
relationships between the records (Rowley)
Bibliographic, statistical, business data, images,
etc.
Introducing the Database
Major Database Concepts
Data and information
Data - Raw facts
Information - Processed data
Data management
Database
Metadata
Database management system (DBMS)
Data Storage Requirements
Sensor data
Temperature (15 @ 8 Kbps)
Humidity (15 @ 8 Kbps)
Gas (15 @ 8 Kbps)
Light (15 @ 8 Kbps)
Motion (15 @ 8 Kbps)
Pressure (100 @ 8 Kbps)
Microphone (15 @ 500 Kbps)
Camera (15 @ 10 Mbps)
Data Storage Requirements
User data
Multimedia
Phone messages/conversations (500 Kbps 10 Mbps)
Music (500 Kbps)
TV/Radio broadcasts (500 Kbps 10 Mbps)
Home movies (10 Mbps)
Images
Computer
Programs
Data files
Operating systems
Data Storage Issues
Issues
Query frequency and type
Sampling/recording rates
205 sensors (158,900 Kbps)
Multimedia recordings
Simultaneous playback
Analysis, prediction, decision-making queries
Transaction granularity
Historical data, decay
Security and privacy
Centralized vs. distributed
What Data to Store
Type of Data
Raw data
Pre-processed
Compressed
Frequency of Data Storage for Sensor
Data
Tradeoff between precision and quantity
General Concepts
Database definition
Organized collection of logically related data
Data
Known facts
Types: text, graphics, images, sound, videos
Database management system (DBMS)
Software package for defining and managing a
database
Database
What is a database ?
A collection of files storing related data
Give examples of databases
Accounts database; payroll database; UWs
students database; Amazons products
database; airline reservation database
Database Examples
Class roster
Hospital patients
Literature (published articles in a certain field)
Genomic information
Protein structure
Taxonomy
Single nucleotide polymorphism
Database Types (cont.)
Type
Typical number of
users
Typical
architecture
Typical size
Personal
Desktop/Laptop/
PDA
MB
Workgroup
5-25
Client/server:2 tier MB-GB
Department
25-100
Client/server:3 tier GB
Enterprise
>100
Client/server:
distributed
GB-TB
Internet
>1000
Web sever &
application
servers
MB-GB
Flat Files
Characteristics:
Data is stored as records in regular files
Records usually have a simple structure and fixed
number of fields
For fast access may support indexing of fields in the
records
No mechanisms for relating data between files
One needs special programs in order to access and
manipulate the data
Relational Database
Characteristics:
Data is organized into tables: rows & columns
Each row represents an instance of an entity
Each column represents an attribute of an entity
Metadata describes each table column
Relationships between entities are represented by
values stored in the columns of the corresponding
tables (keys)
Accessible through Standard Query Language (SQL)
Enterprise data model
Graphical representation of the high level entities
Example: Microbial database
each organism has multiple corresponding genes
One:Many relation
1
Organism
m
Gene
Database Types / Data Models
Relational
OO
Hybrid (Object-Relational)
Temporal
Deductive
Others
Spatial,
Example Data Representations
Relational
We all knowflat tables of atomic attributes with foreign
key relationships
OO
Complex data reps
multivalued, composite
Temporal
Relational model: add valid start, end dates to each table
(versions of info and when valid)
Includes time, events, durations
Operations
DDL/DML (data def/manip languages)
SQL
OQL
Update operations
Built-in insert, delete, update
Stored procedures for triggers, active (ECA) rules
Example Operations for
Temporal Databases
INCLUDES
Rows valid in a certain time period
BEFORE/AFTER a time condition
Set operations
Union, intersection of 2 time periods
Active DB
Event-Condition-Action rules
Allow for decisions to be made in the database instead
of a separate application
Relational
Implemented as triggers
Challenges
Rule consistency
(2+ rules do not contradict)
Guaranteed termination
Trigger loops (T1 <->T2)
Metadata
Data that describes the properties or
characteristics of other data
Does not include sample data
Allows database designers and users to
understand the meaning of the data
Metadata & Data Table
Organism
Name
Type
Max Length
Description
Name
Alphanumeric
100
Organism name
Size
Integer
10
Genome length (bases)
Gc
Float
Percent GC
Accession
Alphanumeric
10
Accession number
Release
Date
Release date
Center
Alphanumeric
100
Genome center name
Sequence
Alphanumeric
Variable
Sequence
Name
Size
Gc
Accession
Release
Center
Sequence
Escherichia coli K12
4,640,000
50
NC_000913
09/05/1997
Univ.
Wisconsin
AGCTTTTC
ATT
Streptococcus
pneumoniae R6
2,040,000
40
NC_003098
09/07/2001
Eli Lilly and
Company
TTGAAAGA
AAA
Metadata & Data Table (cont.)
Gene
Name
Type
Max Length
Description
Name
Alphanumeric
100
Gene name
Accession
Alphanumeric
10
Gene accession number
OAccesion
Alphanumeric
10
Organism accession number
Start
Integer
10
Gene start
End
Integer
10
Gene end
Strand
Character
Gene strand
Product
Alphanumeric
1000
Gene annotation
Sequence
Alphanumeric
Variable
Gene sequence
Name
Accession
OAccession
Start
End
Strand
Product
Sequence
thrL
16127995
NC_000913
190
255
the operon leader
peptide
MKRI
thrA
16127996
NC_000913
337
2799
homoserine
dehydrogenase I
MRVL
transposas
e_A
15902058
NC_003098
20207
20554
transposase
MWYN
Relationships
Used to connect tables
Field(s) that have the same value in the related tables
Organism.Accession=Gene.OAccession
Organism.Accession
Unique
Primary key
Gene.OAccession
Not unique
Secondary key
Database Systems
Types of Database Systems
Number of Users
Single-user
Desktop database
Multiuser
Workgroup database
Enterprise database
Scope
Desktop
Workgroup
Enterprise
Database Systems
Types of Database Systems
Location
Centralized
Distributed
Use
Transactional (Production)
Decision support
Data warehouse
Database
A Database is a collection of stored
operational data used by the application
systems of some particular enterprise (C.J.
Date)
Paper Databases
Still contain a large portion of the worlds knowledge
File-Based Data Processing Systems
Early batch processing of (primarily) business data
Database Management Systems (DBMS)
Why DBMS?
History
50s and 60s all applications were custom built for
particular needs
File based
Many similar/duplicative applications dealing with
collections of business data
Early DBMS were extensions of programming
languages
1970 - E.F. Codd and the Relational Model
1979 - Ashton-Tate and first Microcomputer DBMS
File Based Systems
Application
Delivery
List
Coal
Estimation
Just what
asked for
File
Toys
Addresses
Naughty
Nice Toys
From File Systems to DBMS
Problems with file processing systems
Inconsistent data
Inflexibility
Limited data sharing
Poor enforcement of standards
Excessive program maintenance
DBMS Benefits
Minimal data redundancy
Consistency of data
Integration of data
Sharing of data
Ease of application development
Uniform security, privacy, and integrity controls
Data accessibility and responsiveness
Data independence
Reduced program maintenance
Terms and Concepts
Data independence
Physical representation and location of data and
the use of that data are separated
The application doesnt need to know how or where
the database has stored the data, but just how to ask
for it
Moving a database from one DBMS to another should
not have a material effect on application program
Recoding, adding fields, etc. in the database should not
affect applications
Database Environment
CASE
Tools
Repository
User
Interface
DBMS
Application
Programs
Database
Database Components
DBMS
===============
Design tools
Database
Database contains:
Users Data
Metadata
Indexes
Application Metadata
Table Creation
Form Creation
Query Creation
Report Creation
Procedural
language
compiler (4GL)
=============
Run time
Form processor
Query processor
Report Writer
Language Run time
Application
Programs
User
Interface
Applications
Types of Database Systems
PC databases
Centralized database
Client/server databases
Distributed databases
Database models
PC Databases
E.g.:
Access
FoxPro
Dbase
Etc.
Centralized Databases
Central
Computer
Client Server Databases
Client
Client
Network
Database
Server
Client
Distributed Databases
Location C
Location B
computer
computer
computer
Location A
Homogeneous
Databases
Distributed Databases
Client
Heterogeneous
Or Federated
Databases
Database
Server
Remote
Comp.
Local Network
Comm
Server
Client
Remote
Comp.
Introducing the Database
Importance of DBMS
It helps make data management more efficient and effective.
Its query language allows quick answers to ad hoc queries.
It provides end users better access to more and better-managed
data.
It promotes an integrated view of organizations operations -- big
picture.
It reduces the probability of inconsistent data.
The DBMS Manages the Interaction
Between the End User and the Database
Figure 1.2
Introducing the Database
Why Database Design Is Important?
A well-designed database facilitates data management
and becomes a valuable information generator.
A poorly designed database is a breeding ground for
uncontrolled data redundancies.
A poorly designed database generates errors that lead to
bad decisions.
Historical Roots
Why Study File Systems?
It provides historical perspective.
It teaches lessons to avoid pitfalls of data management.
Its simple characteristics facilitate understanding of the design
complexity of a database.
It provides useful knowledge for converting a file system to a
database system.
Contents of the CUSTOMER File
Table 1.1 Basic File Terminology
Data
Raw facts that have little meaning unless they have been
organized in some logical manner. The smallest piece of data
that can be recognized by the computer is a single
character, such as the letter A, the number 5, or some
symbol such as; ? > * +. A single character requires one
byte of computer storage.
Field
A character or group of characters (alphabetic or numeric)
that has a specific meaning. A field might define a telephone
numbers, a birth date, a customer name, a year-to-date
(YTD) sales value, and so on.
Record
A logically connected set of one or more fields that describes
a person, place, or thing. For example, the fields that
comprise a record for a customer named J. D. Rudd might
consist of J. D. Rudds name, address, phone number, date
of birth, credit limit, unpaid balance, and so on.
File
A collection of related records. For example, a file might
contain data about ROBCOR Companys vendors; or, a file
might contain the records for the students currently enrolled
at Gigantic University.
Contents of the AGENT File
A Simple File System
DBMS
Software package for defining and managing a
database.
Examples:
Proprietary: MS Access, MS SQL Server, DB2,
Oracle, Sybase
Open source: MySql, PostgreSQL
DBMS Advantages
Program-data independence
Minimal data redundancy
Improved data consistency & quality
Access control
Transaction control
Improved accessibility & data sharing
Increased productivity of application development
Enforced standards
Web Databases
Data is accessible through Internet
Have different underlying database models
Example: biological databases
Molecular data: NCBI , Swissprot , PDB , GO
Protein interaction : DIP , BIND
Organism specific: Mouse , Worm, Yeast
Literature: Pubmed
Disease
Database Management System
What is a DBMS ?
A big C/C++ program written by someone else that
allows us to manage efficiently a large database and
allows it to persist over long periods of time
Give examples of DBMS
DB2 (IBM), SQL Server (MS), Oracle, Sybase
MySQL, Postgres,
What the Database Systems
Does
1.
2.
3.
4.
5.
6.
Create/store large datasets
Search/query/update
Change the structure
Concurrent access to many user
Recover from crashes
Security
Possible Organizations
Files
Spreadsheets
DBMS
1. Create/store Large Datasets
Files
Yes, but
Spreadsheets
Not really
DBMS
Yes
2. Search/Query/Update
Files
Simple queries (grep);
Updates are difficult
Spreadsheets
Simple queries;
Simple updates
DBMS
All
Updates: generally OK
3. Change the Structure
Add Address to each Actor
Files
Very hard
Spreadsheets
Yes
DBMS
Yes
4. Concurrent Access
Multiple users access/update the data
concurrently
Lost updates; inconsistent reads,
What can go wrong ?
How do we protect against that in OS ?
This is insufficient in databases; why ?
A logical action consists
of multiple updates
locks
5. Recover from crashes
Transfer $100 from account #4662 to #7199:
X = Read(Account, #4662);
X.amount = X.amount - 100;
Write(Account, #4662, X);
CRASH !
Y = Read(Account, #7199);
Y.amount = Y.amount + 100;
Write(Account, #7199, Y);
What is the problem ?
6. Security
Files
File-level
access control
Spreadsheets
Same [?]
DBMS
Table/attributelevel access control
124
Enters a DMBS
Two tier system or client-server
connection
(ODBC, JDBC)
Database server
(someone elses
C program)
Data files
Applications
125
Data Independence
Logical view
Directors:
Movie_Directors:
id
fName
lName
id
mid
15901
Francis Ford
Coppola
15901
130128
...
...
Movies:
mid
Title
Year
130128
The Godfather
1972
...
Directors_file
Moviews_title_index_file
Directors_fname_index_file
Movies_file
Physical view
126
What the Database Systems
Does
1.
2.
3.
4.
5.
6.
Create/store large datasets
Search/query/update
Change the structure
Concurrent access to many user
Recover from crashes
Security
Grant, Revoke, Roles
SQL DML
SQL DDL
Transactions
ACID
Distributed vs. Centralized
Centralized database can produce a bottleneck
Large volume of data input
Large database
Large volume of queries
In distributed databases, data consistency,
replication, and retrieval can be more problematic
Consistency of schemas
Retrieval in case the data location is not known
Communication overhead to ensure database
consistency
Database Systems
Commercial
DB2
Empress
Informix
Oracle
MS Access
MS SQL
Sybase
Free
Berkeley DB
PostgreSQL
MySQL
Relational Database
Definition:
Data stored in tables that are associated by shared
attributes (keys).
Any data element (or entity) can be found in the
database through the name of the table, the
attribute name, and the value of the primary key.
Relational Database Definitions
Entity: Object, Concept or event (subject)
Attribute: a Characteristic of an entity
Row or Record: the specific characteristics of
one entity
Table: a collection of records
Database: a collection of tables
The Relational Database model
Developed by E.F. Codd, C.J. Date (70s)
Table = Entity = Relation
Table row = tuple = instance
Table column = attribute
Table linkage by values
Entity-Relationship Model
The Relational Model
Each attribute has a unique name within an
entity
All entries in the column are examples of it
Each row is unique
Ordering of rows and columns is unimportant
Each position (tuple) is limited to a single
entry.
Data Model: Whats a model?
A data model is a representation of reality
Its used to define the storage and
manipulation of a data base.
Data Models have two components:
Structure: the structure of the data stored within
Operations: Facilities for manipulation of the data.
Relational Database Systems
Most popular DBMS model for GIS
Flexible approach to linkages between records
comes the closest to modeling the complexity
of spatial relationships between objects.
Database Tables
Tables represent entities
Tables are always named in the singular, such
as: Vehicle, Order, Grade, etc.
Tables in database jargon are flat files, dBase
or Spreadsheet like..
Attributes
Characteristics of an entity
Examples:
Vehicle (VIN, color, make, model, mileage)
Student (SSN, Fname, Lname, Address)
Fishing License (Type, Start_date, End_date)
Database Table Example
Figure 1: A simple and flawed table design.
Figure 2: An improved database table..
Database Views
A View is an individuals picture of a database.
It can be composed of many tables,
unbeknownst to the user.
Its a simplification of a complex data model
It provides a measure of database security
Views are useful, primarily for READ-only users
and are not always safe for CREATE, UPDATE, and
DELETE.
Table Indexing
An Index is a means of expediting the retrieval
of data.
Indexes are built on a column(s).
Indexes occupy disk space; occasionally a lot.
Indexes arent technically necessary for
operation and must be maintained by the
database administrator.
Database Table Keys
Definition:
A key of a relation is a subset of attributes with the
following attributes:
Unique identification
Non-redundancy
Types of Keys
PRIMARY KEY
Serves as the row level addressing mechanism in the relational
database model.
It can be formed through the combination of several items.
FOREIGN KEY
A column or set of columns within a table that are required to
match those of a primary key of a second table.
These keys are used to form a RELATIONAL JOIN - thereby
connecting row to row across the individual tables.
Relational Database Management
System (RDBMS)
Table A
Name
Address
Parcel #
John Smith
18 Lawyers Dr. 756554
T. Brown 14 Summers Tr. 887419
Table B
Parcel # Assessed Value
887419 152,000
446397 100,000
Database Keys
Primary Key - Indicates uniqueness within
records or rows in a table.
Foreign Key - the primary key from another
table, this is the only way join relationships
can be established.
There may also be alternate or secondary keys
within a table.
Constructing Join Relationships
One-to-many relationships include the
Primary Key of the one table and a Foreign
Key (FK) in the many table.
Other common terms
Cardinality: one-to-one, one-to-many, manyto-many relationships
Optionality: the relationship is either
mandatory or optional.
Ensuring Database Integrity
Database integrity involves the maintenance
of the logical and business rules of the
database.
There are two kinds of DB Integrity that
must be addressed:
Entity Integrity
Referential Integrity
Strategies for managing
Integrity
You could ignore it, but it costs you time.
Place the Burden on your customer or user.
Have the programmers fix the problem
Place the burden on the Database
Management System (DBMS)
Temporal integrity is one of the key challenges
of Address Database management.
Entity Integrity
Entity integrity deals
with within-entity
rules.
These rules deal with
ranges and the
permission of null
values in attributes or
possibly between
records
Examples of Entity Integrity
Data Type Integrity: very common and most
basic. Checks only for data type
compatibility with DB Schema, such as:
numeric, character, logical, date format, etc.
Commonly referred to in GIS manuals as:
Range and List domains
Ranges - acceptable Numeric ranges for input
List - acceptable text entries or drop-down lists.
Enforcing Integrity
Not a trivial task!
Not all database management systems or GIS
software enable users to enforce data
integrity during attribute entry or edit
sessions.
Therefore, the programmer or the Database
Administrator must enforce and/or check for
Integrity.
Referential Integrity
Referential integrity concerns two or more
tables that are related.
Example: IF table A contains a foreign key that
matches the primary key of table B
THEN
values of this foreign key either match the
value of the primary key for a row in table B or
must be null.
Functions of a Database
Management System
Data Storage, Retrieval and Update (CRUD)
Catalog or Data Dictionary
Shared Update Support
Backup and Recovery Services
Security Services
Integrity Services
Data Independence - independent from programs
Various Data Manipulation Utilities
CRUD
Four basic functions, for a given entity they
should all be performed with few exceptions,
in your system:
CREATE
READ
UPDATE
DELETE
Normalization
Normalization: a process for analyzing the
design of a relational database
Database Design - Arrangement of attributes into
entities
It permits the identification of potential
problems in your database design
Concepts related to Normalization:
KEYS and FUNCTIONAL DEPENDENCE
Ex: Database Normalization (1)
Sample Student
Activities DB Table
Poorly Designed
Non-unique records
John Smith
Test the Design by
developing sample
reports and queries
Ex: Database Normalization (2)
Created a unique ID
for each Record in the
Activities Table
Required the creation of
an ID look-up table
for reporting (Students
Table)
Converted the Flat-File
into a Relational
Database
Ex: Database Normalization (3)
Wasted Space
Redundant data entry
What about taking a 3rd
Activity?
Query Difficulties trying to find all
swimmers
Data Inconsistencies conflicting prices
Ex: Database Normalization (4)
Students table is fine
Elimination of two
columns and an
Activities Table
restructuring,
Simplifies the Table
BUT, we still have
Redundant data
(activity fees) and data
insertion anomalies.
Problem: If student #219
transfers we lose all references
to Golf and its price.
Ex: Database Normalization (5)
Modify the Design to
ensure that every
non-key field is
dependent on the
whole key
Creation of the
Participants Table,
corrects our problems
and forms a union
between 2 tables.
This is a Better Design!
Database Design: Basic Steps
Step 1: Determine the entities involved and create a
separate table for each type of entity (thing, concept,
event, theme) and name it.
Step 2: Determine the Primary Key for each table.
Step 3: Determine the properties for each entity (the
non-key attributes).
Step 4: Determine the relationships among the
entities
Design Example: Music CD
collection
Entities: the CD, Tracks, Composer
Attributes:
CD (ID, title, musician, cost, etc.)
Track (song title, length, order number)
Composer (name, genre, DOB, DOD)
Relationships: CD to Track, Composer to Track
Table Design Example
Figure 1: A simple and flawed table design.
Figure 2: An improved database table..
Step1: Creating a Data Model
Identify Candidate Entities
Identify Relationships
Define Entities & Relationships
Review Entity-Relationship Model
Step 2: Defining an Attribute
Model
List Candidate Attributes for each Entity
Add KEYS to model
Attribute & Normalize Model
Define Attributes
Review Logical Model
Step 3: Identify & Capture
Business Rules
Review & Verify Cardinalities
Define Referential Integrity
Identify Business Domains
Identify Attribute Default Values
Step 4: Define Physical Model
Select Target DBMS
Name Tables & Columns
Name & Define Indexes
Define Columns
Verify/Update Triggers
Generate Reports & Document Design
Step 5: Review Final Design
Verify Entities & Definitions
Verify Relationships & Definitions
Verify Attributes & Definitions
Verify Business Constraints
Approve Schema Design
A Review of the Advantages of
Database Processing
Lower cost (relative it what?)
More Information from same amount of data
Data Sharing is easier
Controlled or elimination of redundancy
Consistency, Integrity, Security
Increased Productivity
Some Disadvantage of
Database Processing
Greater Complexity
Possibly a greater impact of a failure
Recovery is more difficult
Although these are all debated issues,
opportunities for complete failure are often
reduced with the latest database products,
but reliability results in higher investment
costs.