I.
Database Systems and the Life Cycle
Introductory Concepts
data—a fact, something upon which an inference is based (information or knowledge has
value, data has cost)
data item—smallest named unit of data that has meaning in the real world (examples:
last name, address, ssn, political party)
data aggregate (or group) -- a collection of related data items that form a
whole concept; a simple group is a fixed collection, e.g. date (month, day, year);
a repeating group is a variable length collection, e.g. a set of aliases.
record—group of related data items treated as a unit by an application program (examples:
presidents, elections, congresses)
file—collection of records of a single type (examples: president, election)
database—computerized collection of interrelated stored data that serves the needs of
multiple users within one or more organizations, i.e. interrelated collections of
records of potentially many types. Motivation for databases over files: integration for
easy access and update, non-redundancy, multi-access.
database management system (DBMS) -- a generalized software system for
manipulating databases. Includes logical view (schema, sub-schema), physical view
(access methods, clustering), data manipulation language, data definition language,
utilities
- security, recovery, integrity, etc.
database administrator (DBA) -- person or group responsible for the effective use of
database technology in an organization or enterprise. Motivation: control over all phases of
the lifecycle.
Objectives of Database Management
1. Data availability—make an integrated collection of data available to a wide variety
of users
* at reasonable cost—performance in query update, eliminate or control
data redundancy
* in meaningful format—data definition language, data dictionary
* easy access—query language (4GL, SQL, forms, windows, menus);
embedded SQL, etc.; utilities for editing, report generation, sorting
2. Data integrity—insure correctness and validity
* checkpoint/restart/recovery
* concurrency control and multi-user updates
* accounting, audit trail (financial, legal)
3. Privacy (the goal) and security (the means)
* schema/sub-schema, passwords
4. Management control—DBA: lifecycle control, training, maintenance
1
5. Data independence (a relative term) -- avoids reprogramming of applications,
allows easier conversion and reorganization
* physical data independence—program unaffected by changes in the storage structure
or access methods
* logical data independence—program unaffected by changes in the schema
* Social Security Administration example (1980ís)
- changed benefit checks from $999.99 to $9999.99 format
- had to change 600 application programs
- 20,000 work hours needed to make the changes (10 work years)
* Student registration system—cannot go to a 4-digit or hexadecimal course
numbering system because of difficulty changing programs
*Y2K (year 2000) problem—many systems store 2-digit years (e.g. ‘02-OCT-98’) in
their programs and databases, that give incorrect results when used in date arithmetic
(especially subtraction), so that ‘00’ is still interpreted as 1900 rather than 2000. Fixing
this problem requires many hours of reprogramming and database alterations for many
companies and government agencies.
Relational Database Lifecycle
1. Requirements formulation and analysis
* natural data relationships (process-independent)
* usage requirements (process-dependent)
* hardware/software platform (OS, DBMS)
* performance and integrity constraints
* result: requirements specification document, data dictionary entries
2. Logical database design
2.1 ER modeling (conceptual design)
2.2 View integration of multiple ER models
2.3 Transformation of the ER model to SQL tables
2.4 Normalization of SQL tables (up to 3NF or BCNF)
*result: global database schema, transformed to table definitions
3. Physical database design
* index selection (access methods)
* clustering
4. Database distribution (if needed for data distributed over a network)
* data fragmentation, allocation, replication
5. Database implementation, monitoring, and modification
2
3
4
5
Characteristics of a Good Database Design Process
* iterative requirements analysis
- interview top-down
- use simple models for data flow and data relationships
- verify model
* stepwise refinement and iterative re-design
* well-defined design review process to reduce development costs review team
-database designers
-DBMS software group
-end users in the application areas when to review
- after requirements analysis & conceptual design
- after physical design
- after implementation (tuning) meeting format
- short documentation in advance
- formal presentation
- criticize product, not person
- goal is to locate problems, do solutions off line
- time limit is 1-2 hours
6
II. Requirements Analysis
Purpose - identify the real-world situation in enough detail
to be able to define database components. Collect two types of data: natural data (input to
the database) and processing data (output from the database).
Natural data requirements (what goes into the database)
1. Organizational objectives
- sell more cars this year
- move into to recreational vehicle market
2. Information system objectives
- keep track of competitors’ products and prices
- improve quality and timing of data to management regarding production schedule
delays, etc.
- keep track of vital resources needed to produce and market a product
3. Organizational structure/chart
4. Administrative and operational policies
- annual review of employees
- weekly progress reports
- monthly inventory check
- trip expense submission
5. Data elements, relationships, constraints, computing environment
Processing requirements (what comes out of the database)
1. Existing applications - manual, computerized
2. Perceived new applications
* quantifies how data is used by applications
* should be a subset of data identified in the natural
relationships (but may not be due to unforeseen applications)
* problem - many future applications may be unknown
7
Data and Process Dictionary Entries for Requirements Analysis
in the Database Design Lifecycle
Entity Description (possibly in a data dictionary)
Name customer
Reference-no 4201
Cardinality 10,000
Growth rate 100 per month
Synonyms user, buyer
Role (or description) someone who purchases or rents
a product made by the company.
Security level 0 (customer list is public)
Subtypes adults, minors
Key attribute(s) cust-no
Non-key attribute(s) cust-name, addr, phone, payment-status Relationship to
other entities salesperson, order, product
Used in which applications billing, advertising
Attribute description (data elements in a data dictionary)
Name cust-no
Reference-no 4202
Range of legal values 1 to 999,999
Synonyms cno, customer-number
Data type integer
Description customer id number set by the company.
Key or nonkey key
Source of data table of allowable id numbers
Used in applications billing
Attribute trigger /*describes actions that occur when
a data element is queried or
updated*/
Relationship description
Name purchase
Reference-no 511037
Degree binary
Entities and connectivity customer(0,n), product(1,n)
Synonyms buy
Attributes (of the relationship) quantity, order-no
Assertions a customer must have purchased at
least one product, but some products
may not have been purchased as yet
by any customers.
Process (application) description
Name payroll
Reference-no 163
Frequency bi-weekly
Priority 10
Deadline noon Fridays
Data elements used emp-name, emp-salary
Entities used employee
Data volume (how many entities) implicit from entity cardinality
8
Interviews at different levels
Top management - business definition, plan/objectives, future plans
Middle management - functions in operational areas, technical areas, job-titles, job functions
Employees - individual tasks, data needed, data out
Specific end-users of a DBMS - applications and data of interest
Basic rules in interviewing
1. Investigate the business first
2. Agree with the interviewee on format for documentation (ERD, DFD, etc.)
3. Define human tasks and known computer applications
4. Develop and verify the flow diagram(s) and ER diagram(s)
5. Relate applications to data (this helps your programmers)
Example: order entry clerk
Function: Take customer orders and either fill them or make adjustments.
Frequency: daily
Task Def Volume Data Elements
1. Create order 2000 A, B, E, H
2. Validate order 2000 A, B, G, H, J
3. Fill out error form 25 A, C
4. Reserve item/price 6000 A, D, H
5. Request alternate items 75 A, E, I, K,M
6. Enter unit price 5925 A, F, J, N
9
III. Entity-Relationship (ER) Modeling
Basic ER Modeling Concepts
Entity - a class of real world objects having common characteristics and properties about
which we wish to record information.
Relationship - an association among two or more entities
* occurrence - instance of a relationship is the collective instances of the related entities
* degree - number of entities associated in the relationship (binary, ternary, other n-ary)
* connectivity - one-to-one, one-to-many, many-to-many
* existence dependency (constraint) - optional/mandatory
Attribute - a characteristic of an entity or relationship
* Identifier - uniquely determines an instance of an entity
* Identity dependence - when a portion of an identifier is inherited from another entity
* Multi-valued - same attribute having many values for one entity
* Surrogate - system created and controlled unique key (e.g. Oracle’s “create sequence”)
10
11
12
13
Super-class (super-type)/subclass (subtype) relationship
Generalization
* similarities are generalized to a super-class entity, differences are specialized to a subclass
e called an “ISA” relationship (“specialization” is the inverse relationship)
* disjointness constraint - there is no overlap among subclasses
* completeness constraint - constrains subclasses to be all-inclusive of the super-class or
not total or partial coverage of the superclass)
* special property: hierarchical in nature
* special property: inheritance - subclass inherits the primary key of the super-class, super-
cla common nonkey attributes, each subclass has specialized non-key attributes
Aggregation
* “part-of” relationship among entities to a higher type aggregate entity (“contains” is the
inve relationship)
* attributes within an entity, data aggregate (mo-day-year)
* entity clustering variation: membership or “is-member-of” relationship
14