KEMBAR78
Isc 305 | PDF | Databases | Conceptual Model
0% found this document useful (0 votes)
43 views46 pages

Isc 305

Database management

Uploaded by

annmell921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views46 pages

Isc 305

Database management

Uploaded by

annmell921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

● 1.

0 INTRODUCTION TO DATABASE SYSTEMS

1.1 INTRODUCTION
In this lesson we define the concepts database and database systems. The Lesson also discusses
the components of a database system and ends up with a discussion on the brief history of
database systems.
1.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

1. Define the term database and database systems


2. Discuss the components of database system.
3. Explain the history behind database systems

1.3 What is a Database?


A database is an organized collection of logically related data, managed in such a way as to
enable the user or application programs to view the complete collections, or logical subset of the
collection, as single unit.
Answer from Kroenke: An integrated, self-describing collection of related data.

● Integrated: Data is stored in an uniform way, typically all in one place(a single physical
computer for example)
● Self-Describing: A database maintains a description of the data it contains (catalog)
● Related: Data has some relationship to other data. In a university
we have students who take courses taught by lecturers.

By taking the advantage of relationship and integration we can provide information to users as
opposed to simply data.
We can also say that the database is a model of what the users perceive Three main categories
of models:

0. User or Conceptual model: How users perceive the world and/or the business
1. Logical models: Represents the logical of how a business operates. For example, the
relationship between different entities and how the flow of the data through the
organization. Based on the logical model.
2. Physical model: Represent how the database is actually implemented on a computer
system. This is based on the logical model.

1.4 Database systems


A database system is much more than a database or a database management system. A database
system has four major components:

● Database
● Hardware
● Software
● Users

Database
The data in the database will be expected to be both integrated and shared particularly on multi-
user systems.
Contents of a database:
1) User data
2) Meta data
3) Indexes
4) Applications Metadata
User data: This is the data users work with directly by entering, updating and viewing.
Meta data: Data about data i.e. data that describe how user data are stored in terms of table
name, column name, data type, length, primary keys, etc.
Metadata are typically stored in systems tables or systems catalog and are typically only directly
accessible by the DBMS or by the system administrator.
Indexes: They provide an alternate means of accessing user data. Sorting and searching.
They allow the database to access a record without having to search the through the entire table.
Application Metadata: This is storage facility for forms, reports, queries and other application
components. It can be accessed via the database development programs.
Hardware: The hardware consists of the secondary storage on which the database physically
resides, together with the associated I/O devices, device controller’s etc.
Software: This is a database management system, which is a software used to build, maintain
and control database systems. All requests from the user to the database are handled by the
DBMS.
Types of DBMSs:
1) Relation DBMS
2) Hierarchical DBMS
3) Network DBMS
4) OO DBMS
5) Others
Users
There are five categories of users:

0. Application programmer-responsible for writing application programs that uses the


database.
1. Database designers-responsible for designing the logical model and databases.
2. Database administrators (DBA)-a technical person responsible for development of the
total system.
3. Data administrator-a business manager responsible for controlling the overall corporate
data resources.
4. End- user who interacts with the system from an on-line terminal by using query
languages etc

1.5 Brief history of database systems

● 1940’s, 50’s Initial use of computers as calculators. Limited data, focus on algorithms.
Science, military application.
● 1960’s Business uses. Organizations data, customers data, sales, inventory, accounting, etc.
file system based, high emphasis on applications programs to extract and assimilate data.
● 1970’s The relation model. Data separated to individual tables. Related by keys. Examples:
oracle, Sybase, Informix, digital RDB, IBM DB.
● 1980’s Microcomputers-the IBM PC. Database program such as Dbase, FoxPro, Paradox,
MS Access. Individual user can create, maintain small databases.
● Late-1980’s Local area networks. Workgroups sharing resources such as files, printers, e-
mail. Client/Server Database resides on a central server, applications programs run on client
PCs attached to the server over a LAN.
● 1990’s Internet and World wide web make databases of all kinds available from a single
type of client- the web Browser.

Other types of databases.

● Object-oriented Database Systems. Objects (data and methods) stored persistently/steadily.


● Distributed Database Systems. Copies of data reside at different locations for redundancy or
for performance reasons.

● TerminologiesPage

● LECTURE TWO
2.0 FILE SYSTEMS VS DATABASE SYSTEM
2.1 INTRODUCTION
In this lesson we define the concept file systems and provide a comparison between a file system
and database systems. The Lesson also discusses the advantages and disadvantages of database
systems versus a file system. The concept of Database Management System (DBMS) is also
introduced and the components of a DBMS and its advantages are discussed.
2.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

0. Define the term file systems and discuss its advantages and disadvantages.
1. Compare the file systems and database systems in terms advantages and disadvantages.
2. Discuss the components of DBMS.
3. Discuss the advantages of a DBMS

2.3 What is file system?


Files are collections of similar records. Data storage is build around the corresponding
application that uses the Diagrammatic representation of conventional file systems.
File Processing Systems

● Where data are stored to individual files is a very old, but often used approach to system
development.
● Each program (system) often had its own unique set of files.

What is a file?

● A File is a collection of data about a single entity.


● Files are typically designed to meet needs of a particular department or user group.
● Files are also typically designed to be part of a particular computer application

Advantages of a file:

● Are relatively easy to design and implement since they are normally based on a single
application or information system.
● The processing speed is faster than other ways of storing data.
Disadvantages:

● Program-data dependence.
● Duplication of data.
● Limited data sharing.
● Lengthy program and system development time.
● Excessive program maintenance when the system changed.
● Duplication of data items in multiple files. Duplication can affect on input, maintenance,
storage and possibly data integrity problems.
● Inflexibility and non-scalability. Since the conventional files are designed to support single
● Application, the original file structure cannot support the new requirements.

NOTE:
Today, the trend is in favor of replacing file-based systems and applications with database
systems.
2.4 Database Approach
A database is more than a file – it contains information about more than one entity and
information about relationships among the entities. Data about a single entity (e.g., Product,
Customer, Customer Order, Department) areeach stored to a “table†in the database.
Databases are designed to meet the needs of multiple users and to be used in multiple
applications.
One significant development in the more user-friendly relational DBMS products is that users
can sometimes get their own answers from the stored data by learning to use data querying
methods.

Advantages: of a database system:

● Program-data independence.
● Minimal data redundancy, improved data consistency, enforcement of standards, improved
data quality.
● Improved data sharing, improved data accessibility and responsiveness.
● Increased productivity of application development.
● Reduced program maintenance Data can be shared by many applications and systems.
● Data are stored in flexible formats. Data independence. If the data are well designed, the
user
● can access different combinations of same data for query and report purposes.
● Reduce redundancy.
2.5 Database Management Systems
Definition: - software which enables users to create, manipulate & control access to a database.
Functions:

● Provides the Data Definition Language (DDL): - specifies data types, structures and
constraints
● Provides Data Manipulation Language (DML): - insert, Delete, Update and retrieve data
● Provides controlled access to the database:

0. Security systems
1. Integrity systems
2. Concurrency control system
3. Recovery control system
4. User-accessible catalog

● Provides view mechanism

Allows different users to have their own view of data-eliminates complexity to users
Created by DDL and provides mechanisms for: Security; Customizes the appearance of DB;
Consistent-unchanging picture of the structure of the DB; Program- data dependence.
NOTE:
Functionality offered by DBMS:
Differ from one product to the other e.g. share ability, graphic or video support.
Never static e.g. Ms access 97, Ms access 2000
Components of a DBMS

● DBMS highly complex and sophisticated software.


● Partitioned into several software components (modules) which are assigned specific
functions.
● Supported by the underlying OS –basic services.
● Design must take into account the interface between the DBMS and OS

Software components
● Query processor:

Process queries

● Database manager:

Interfaces with user submitted application program and queries It has the following components
Authorization control: - authorizations of users Command processor: -carry out operation
Integrity checker: - request satisfies all necessary integrity constraints Query optimizer: -
optimal strategy of query execution
Transaction manager: - performs required transaction processing Scheduler: - controls
concurrent operations
Recovery manager: - ensures consistent state of database incase of failure Buffer manager: -
transfer data between main memory and secondary storage.

● File manage:

Manages the physical storage of data

● DML processor:

Converts DML statements into standard functions

● DDL Compiler:

Converts DDL statements into a set of tables containing meta data

● Catalog manager:

Manages access to and maintains system catalog

Advantages of using a DBMS


There are three main features of a database management system that make it attractive to use a
DBMS in preference to more conventional software. These features are centralized data
management, data independence, and systems integration.
In a database system, the data is managed by the DBMS and all access to the data is through the
DBMS providing a key to effective data processing. This contrasts with conventional data
processing systems where each application program has direct access to the data it reads or
manipulates. In a conventional DP system, an organization is likely to have several files of
related data that are processed by several different application programs.
In the conventional data processing application programs, the programs usually are based on a
considerable knowledge of data structure and format. In such environment any change of data
structure or format would require appropriate changes to the application programs. These
changes could be as small as the following:

1. Coding of some field is changed. For example, a null value that was coded as -1 is now
coded as -9999.
2. A new field is added to the records.
3. The length of one of the fields is changed. For example, the maximum number of digits
in a telephone number field or a postcode field needs to be changed.
4. The field on which the file is sorted is changed.

If some major changes were to be made to the data, the application programs may need to be
rewritten. In a database system, the database management system provides the interface between
the application programs and the data. When changes are made to the data representation, the
metadata maintained by the DBMS is changed but the DBMS continues to provide data to
application programs in the previously used way. The DBMS handles the task of transformation
of data wherever necessary.
This independence between the programs and the data is called data independence. Data
independence is important because every time some change needs to be made to the data
tructure, the programs that were being used before the change would continue to work.To
provide a high degree of data independence; a DBMS must include a sophisticated metadata
management system.
In DBMS, all files are integrated into one system thus reducing redundancies and making data
management more efficient. In addition, DBMS provides centralized control of the operational
data. Some of the advantages of data independence, integration and centralized control are:
Redundancies and inconsistencies can be reduced
In conventional data systems, an organization often builds a collection of application programs
often created by different programmers and requiring different components of the operational
data of the organization. The data in conventional data systems is often not centralized. Some
applications may require data to be combined from several systems. These several systems could
well have data that is redundant as well as inconsistent (that is, different copies of the same data
may have different values). Data inconsistencies are often encountered in everyday life. For
example, we have all come across situations when a new address is communicated to an
organization that we deal with (e.g. a bank, or Telecom, or a gas company), we find that some of
the communications from that organization are received at the new address while others continue
to be mailed to the old address. Combining all the data in a database would involve reduction in
redundancy as well as inconsistency. It also is likely to reduce the costs for collection, storage
and updating of data.
Better service to the Users
A DBMS is often used to provide better service to the users. In conventional systems,availability
of information is often poor since it normally is difficult to obtain information that the existing
systems were not designed for. Once several conventional systems are combined to form one
centralized data base, the availability of information and its up-to datedness is likely to improve
since the data can now be shared and the DBMS makes it easy to respond to unforeseen
information requests. Centralizing the data in a database also often means that users can obtain
new and combined information that would have been impossible to obtain otherwise. Also, use
of a DBMS should allow users that do not know programming to interact with the data more
easily.
The ability to quickly obtain new and combined information is becoming increasingly important
in an environment where various levels of governments are requiring organizations to provide
more and more information about their activities. An organization running a conventional data
processing system would require new programs to be written (or the information compiled
manually) to meet every new demand.
Flexibility of the system is improved
Changes are often necessary to the contents of data stored in any system. These changes are
more easily made in a database than in a conventional system in that these changes do not need
to have any impact on application programs.
Cost of developing and maintaining systems is lower
As noted earlier, it is much easier to respond to unforeseen requests when the data is centralized
in a database than when it is stored in conventional file systems. Although the initial cost of
setting up of a database can be large, one normally expects the overall cost of setting up a
database and developing and maintaining application programs to be lower than for similar
service using conventional systems since the productivity of programmers can be substantially
higher in using non-procedural languages that have been developed with modern DBMS than
using procedural languages.
Standards can be enforced
Since all access to the database must be through the DBMS, standards are easier to enforce.
Standards may relate to the naming of the data, the format of the data, the structure of the data
etc.
Security can be improved
In conventional systems, applications are developed in an ad hoc manner. Often different system
of an organization would access different components of the operational data. In such an
environment, enforcing security can be quite difficult. Setting up of a database makes it easier to
enforce security restrictions since the data is now centralized. It is easier to control that has
access to what parts of the database. However, setting up a database can also make it easier for a
determined person to breach security. We will discuss this in the next section.
Integrity can be improved
Since the data of the organization using a database approach is centralized and would be used by
a number of users at a time, it is essential to enforce integrity controls. Integrity may be
compromised in many ways. For example, someone may make a mistake in data input and the
salary of a full-time employee may be input as Ksh4,000 rather than Ksh40,000. A student may
be shown to have borrowed books but has no enrolment. Salary of a staff member in one
department may be coming out of the budget of another department. If a number of users are
allowed to update the same data item at the same time, there is a possibility that the result of the
updates is not quite what was intended. For example, in an airline DBMS we could have a
situation where the number of bookings made is larger than the capacity of the aircraft that is to
be used for the flight. Controls therefore must be introduced to prevent such errors to occur
because of concurrent updating activities.
2.6 FURTHER READING
T. Connolly, C. Begg, A. Strachan: Database Systems: A Practical Approach
to Design,Implementation, and Management (2002).3 Edition , Addison-Wesley, Chapter 2.
rd

● LECTURE THREE

3.0 DATA MODELS


3.1 INTRODUCTION
The lecture introduces to the different data models which can be applied in the design of a
database. Four data models: network, hierarchical, relation and object oriented are discussed in
details. The advantages and disadvantages of each are discussed.
3.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

0. Define the term data model and explain how the following data models work: Network,
Hierarchical, Relation and Object oriented.
1. Discuss the advantages and disadvantages of the above data models

3.3 Introduction
Data models are used for designing a database. Before the data available in an enterprise can be
put in a DBMS, an overall abstract view of the enterprise data must be developed. The view can
then be translated into a form that is acceptable by the DBMS. Although at first sight the task
may appear trivial, it is often a very complex process. The complexity of mapping the enterprise
data to a database management system can be reduced by dividing the process into two phases.
The first phase as noted above involves building an overall view (or model) of
the real world which is the enterprise (often called the logical database design). The objective
of the model is to represent, as accurately as possible, the information structures of the enterprise
that are of interest. This model is sometimes called the enterprise conceptual schema and the
process of developing the model may be considered as the requirements analysis of the database
development life cycle. This model is then mapped in the second phase to the user schema
appropriate for the database system to be used. The logical design process is an abstraction
process which captures the meaning of the enterprise data without getting involved in the details
of individual data values. The figure below shows this two steps.

Fig 3.1: Data Modeling Steps


The process is somewhat similar to modeling in the physical sciences. In logical database design,
similar to modeling in the physical sciences, a model is built to enhance understanding and
abstracting details. A model cannot be expected to provide complete knowledge (except for very
simple situations) but a good model should provide a reasonable interpretation of the real-life
situation. For example, a model of employee data in an interprise may not be able to capture the
knowledge that employees May Adams and John Adams are married to each other.
We accept such imperfections of the model since we want to keep the model as simple as
possible. It is clearly impossible (and possibly undesirable) to record every available piece of
information that is available in an enterprise. We only desire that the meaning captured by a data
model should be adequate for the purpose that the data is to be used for. The person organizing
the data to set up the database not only has to model the enterprise but also has to consider the
efficiency and constraints of the DBMS although the two phase process separates those two
tasks. Nevertheless, we only consider data models that can be implemented on computers. When
the database is complex, the logical database design phase may be very difficult. A number of
techniques are available for modeling data but we will discuss only one such technique that is,
we believe, easy to understand. The technique uses a convenient representation that enables the
designer to view the data from the point of view of the whole enterprise. This well known
technique is called the entity-relationship model.
3.4 Types of database models
A DBMS can take any one of the several approaches to manage data. Each approach constitutes
a database model. A data model is a collection of descriptions of data structures and their
contained fields, together with the operations or functions that manipulate them. A data model is
a comprehensive scheme for describing how data is to be represented for manipulation by
humans or computer programs. A thorough representation details the types of data, the
topological arrangements of data, spatial and temporal maps onto which data can be projected,
and the operations and structures that can be invoked to handle data and its maps. The various
Database Models are the following:-
● Relational – data model based on tables.
● Network – data model based on graphs with records as nodes and relationships
between records as edges.
● Hierarchical – data model based on trees.
● Object-Oriented – data model based on the object-oriented programming paradigm.

Hierarchical Model
In a Hierarchical model you could create links between these record types; the hierarchical
model uses Parent Child Relationships. These are a 1: N mapping between record types. This is
done by using trees, like set theory used in the relational model, "borrowed" from maths. For
example, an organization might store information about an employee, such as name, employee
number, department, salary. The organization might also store information about an employee's
children, such as name and date of birth. The employee and children data forms a hierarchy,
where the employee data represents the parent segment and the children data represents the child
segment. If an employee has three children, then there would be three child segments associated
with one employee segment. In a hierarchical database the parent-child relationship is one to
many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs
were popular from the late 1960s, with the introduction of IBM's Information Management
System (IMS) DBMS, through the 1970s.

Advantages

● Simplicity
● Data Security and Data Integrity
● Efficiency

Disadvantages

● Implementation Complexity
● Lack of structural independence
● Programming complexity

Network data model


A member record type in the Network Model can have that role in more than one set; hence the
multivalent concept is supported. An owner record type can also be a member or owner in
another set. The data model is a simple network, and link and intersection record types (called
junction records by IDMS) may exist, as well as sets between them .
Thus, the complete network of relationships is represented by several pair wise sets; in each set
some (one) record type is owner (at the tail of the network arrow) and one or more record types
are members (at the head of the relationship arrow). Usually, a set defines a 1:M relationship,
although 1:1 is permitted. The CODASYL network model is based on mathematical set theory.

Advantages

● Conceptual Simplicity
● Ease of data access
● Data Integrity and capability to handle more relationship types
● Data independence
● Database standards

Disadvantages

● System complexity
● Absence Of Structural Independence

Relational Model
This is a database model that organizes data logically into tables. A formal theory of data
consisting of three major components: (a) A structural aspect, meaning that data in the database
is perceived as tables, and only tables, (b) An integrity aspect, meaning that those tables satisfy
certain integrity constraints, and (c) A manipulative aspect, meaning that the tables can be
operated upon by means of operators which derive tables from tables. Here each table
corresponds to an application entity and each row represents an instance of that entity. (RDBMS
- relational database management system) A database based on the relational model was
developed by E.F. Codd. A relational database allows the definition of data structures, storage
and retrieval operations and integrity constraints. In such a database the data and relations
between them are organized in tables. A table is a collection of records and each record in a table
contains the same fields.
Properties of Relational Tables:

● Values Are Atomic


● Each Row is Unique
● Column Values Are of the Same Kind
● The Sequence of Columns is Insignificant
● The Sequence of Rows is Insignificant
● Each Column Has a Unique Name
Certain fields may be designated as keys, which mean that searches for specific values of that
field will use indexing to speed them up. Often, but not always, the fields will have the same
name in both tables. For example, an "orders" table might contain (customer-ID, product-code)
pairs and a "products" table might contain (product-code, price) pairs so to calculate a given
customer's bill you would sum the prices of all products ordered by that customer by joining on
the product-code fields of the two tables. This can be extended to joining multiple tables on
multiple fields. Because these relationships are only specified at retrieval time, relational
databases are classed as dynamic database management system. The RELATIONAL database
model is based on the Relational Algebra.
Advantages

● Structural Independence
● Conceptual Simplicity
● Ease of design, implementation, maintenance and usage.
● Ad hoc query capability
● Disadvantages
● Hardware Overheads
● Ease of design can lead to bad design

The relational model is the most important in today's world, so we will spend most of our time
studying it. Some people today question whether the relational model is not too simple, that it is
insufficiently rich to express complex data types.
NOTE:
Database theory evolved from file processing.
Object Oriented Data Models
Object DBMSs add database functionality to object programming languages. They bring much
more than persistent storage of programming language objects. Object DBMSs extend the
semantics of the C++, Smalltalk and Java object programming languages to provide full-featured
database programming capability, while retaining native language compatibility. A major benefit
of this approach is the unification of the application and database development into a seamless
data model and language environment. As a result, applications require less code, use more
natural data modeling, and code bases are easier to maintain. Object developers can write
complete database applications with a modest amount of additional effort.
In contrast to a relational DBMS where a complex data structure must be flattened out to fit into
tables or joined together from those tables to form the in-memory structure, object DBMSs have
no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-
to-one mapping of object programming language objects to database objects has two benefits
over other storage approaches: it provides higher performance management of objects, and it
enables better management of the complex interrelationships between objects. This makes object
DBMSs better suited to support applications such as financial portfolio risk analysis systems,
telecommunications service applications, World Wide Web document structures, design and
manufacturing systems, and hospital patient record systems, which have complex relationships
between data.
Advantages

● Capability to handle large number of different data types


● Marriage of object-oriented programming and database technology
● Data access
● Disadvantages
● Difficult to maintain
● Not suited for all applications

Some current and future trends


1. Object-oriented databases.
Can we apply object-oriented theory to databases? Some object-oriented commercial systems
have been developed, among them O2, Objectivity, and POET. None of these is dominant or
particularly successful. There is no standard for O-O databases. One trend is toward "object-
relational" systems (like Oracle 8i, 9i). Oracle has added object-oriented features to their existing
relational system.
2. Multimedia data.
Traditional DBMSs handled records with fields that were numbers or character strings, a few
hundred characters long at most. Now, DBMSs must handle picture and sound files, HTML
documents, etc.
3. Data warehousing and data mining.
Maintain storage of large amounts of data, often historical or on legacy systems, to support
planning and analysis. Data mining is extracting of relevant data and searching for trends and
from a data warehouse

● LECTURE FOUR

4.0 AN OVERVIEW OF DATABASE DESIGN METHODOLOGY


4.1 INTRODUCTION
The lecture discusses the different stages involved in the development of database. This includes:
conceptual design, logical design and physical design. It discusses in details what is entailed in
each of the three levels.
4.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

0. Explain the concept conceptual design, logical design and physical design.
1. Explain what is required in the conceptual design, logical design and physical design

4.3 Design Methodology


A design methodology is a structured approach that uses procedures, techniques, tools, and
documentation aids to support and facilitate the process of design
A database design methodology includes three main phases:
Conceptual database design Logical database design and Physical database design
4.4 Conceptual database design:

● Is the process of constructing a model of the information used in an enterprise, independent


of all physical consideration i.e. independent of implementation details such as target
DBMS software, application programs, programming languages, hardware platform etc.
● It involves building local conceptual data model for each view as follows:

0. Identify entity types


1. Identify relationship types
2. Identify and associate attributes with entity or relationship
3. Determine attribute domain
4. Determine candidate and primary key attributes
5. Specialize/Generalize entity types (optional step)
6. Draw entity-relationship diagram
7. Review local conceptual data model with user

4.5 Logical database design for the relation model

● Is the process of constructing a model of the information used in an enterprise based on a


specific data model, but independent of a particular DBMS and other physical
considerations.
● It involves building and validating logical data model for each user view as follows:

0. Map local conceptual data model to logical data model


1. Derive relations from logical data model
2. Validate model using normalization
3. Validate model against user transactions
4. Draw entity-relationship diagram
5. Define integrity constraints
6. Review logical data model with user Build and validate global logical data model
i.e.Merge local logical data models into global model
7. Validate global logical data model
8. Check for future growth
9. Draw final entity-relationship diagram
10. Review global logical data model with users

4.6 Physical database design

● Is the processing of producing a description of the implementation of the database on


secondary storage.
● It describes the storage structures and the access methods used to achieve efficient access to
the data.
● Translate the global logical data model for target DBMS Design base relation for target
DBMS.
● Design enterprise constraints for target DBMS Design physical representation.
● Analyze transactions.
● Choose file organization Choose secondary indexes.
● Consider the introduction of controlled redundancy Estimate controlled disk space.
● Design security mechanism Design user views Design access rules.
● Monitor and tune the operational system

4.7 FURTHER READING


T. Connolly, C. Begg, A. Strachan: Database Systems: A Practical Approach to Design,
Implementation, and Management (2002).3 Edition , Addison-Wesley, Chapter 4.
rd

● LECTURE FIVE

5.0 ENTITY RELATIONSHIP MODEL


5.1 INTRODUCTION
The lecture discusses the concepts related to entity relationship modeling which include: Entity
relationship model, entities, entity sets, relationships, attributes degree of relationships, types of
relationships. Relevant examples are provided for each concept.
5.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:
Define and give a relevant example for an entity relationship model, entities, entity sets,
relationships, attributes, degree of relationships and types of relationships.
5.3 What is Entity-Relationship Model?
The entity-relationship model is useful it facilitates communication between the database
designer and the end user during the requirements analysis. To facilitate such communication the
model has to be simple and readable for the database designer as well as the end user.
Entity-Relationship Model enables an abstract global view (that is, the enterprise conceptual
schema) of the enterprise to be developed
without any consideration of efficiency of the ultimate database.
The entity-relationship model views an enterprise as being composed of entities that
have relationships between them. To develop a database model based on the E-R technique
involves identifying the entities and relationships of interest. These are often displayed in an E-
R diagram. Before discussing these, we need to present some definitions.
5.4 Entities and Entity Sets
An entity is a person, place, or a thing or an object or an event which can be
distinctly identified and is of interest. A specific student, for example, John Smith with
student number 84002345 or a subject Database Management Systems with subject number
CP3010 or an institution called James Cook University are examples of entities.
Entity Sets: Entities are classified into different entity sets (or entity types).
An entity set is a set of entity instances or entity occurrences of the same type.
For example, all employees of a company may constitute an entity set employee and all
departments may belong to an entity set Dept.
An entity may belong to one or more entity sets. For example,
some Company employees may belong to employee as well as other entity sets, for example,
managers. Entity sets therefore are not always disjoint.
A database is a collection of interrelated information about an enterprise. A database
therefore consists of a collection of entity sets; it also includes information about relationships
between the entity sets. We discuss the relationships now.
5.5 Relationships, Roles and Relationship Sets
Relationships are associations or connections that exist between entities and may be looked at
as mappings between two or more entity sets. A relation therefore is a subset of the cross
products of the entity sets involved in the relationship.
The associated entities may be of the same type or of different types. For example, working -
for is a relationship between an employee and a company. Supervising is a relationship between
two entities belonging to the same entity set ( employee).
A relationship set R is a set of relations of the same type. It may be looked at as a mathematical
I

relation among n entities each taken from an entity set, not necessarily distinct:
Although we allow relationships among any number of entity sets, the most common cases
are binary relationships between two entity sets. The degree of a relationship is the number of
entities associated in the relationship. Unary, binary and ternary relationships therefore have
degree 1, 2 and 3 respectively.
We consider a binary relation R between two entity types E and E . The relationship may be
1 2

considered as two mappings E -> E and E -> E . It is important to consider the constraints on
1 2 2 1

these two mappings. It may be that one object from E may be associated with exactly one object
1

in E or any number of objects in E may be associated with any number of objects in E . Often the
2 1 2

relationships are classified as one-to-one, one-to-many, or many-to-many.


0. If for each employee there is at most one office and for each office there is at most one
employee, the relationship is one-to-one.
1. If an office may accommodate more than one employee but an employee has at most
one office, the relationship between office and employees is now one-to-many.
2. If an office may accommodate more than one staff and a staff member may be assigned
more than one office the relationship is many-to-many. For example, an engineer may
have one office in the workshop and another in the design office. Also each design
office may accommodate more than one staff. The relationship is therefore many-to-
many

5.6 Attributes, Values and Value Sets


As we have noted earlier, our database model consists of entities and relationships. Since these
are the objects of interest, we must have some information about them that is of interest. The
information of interest for each object is likely to be Object name or identifier, object,
properties.Object name or identifier enables us to identify each object uniquely. Each object is
described by properties and their values at some given time. Each entity instance in an entity set
is described by a set of attributes that describe their qualities, characteristics or properties that
are relevant and the values of the attributes. Relationships may also have relevant information
about them. An employee in an employee entity set is likely to be described by its attributes
like employee number, name, date of birth, etc. A relationship like enrolment might have
attributes like the date of enrolment. A relationship shipment between suppliers and parts may
have date of shipment and quantity shipped as attributes. For each attribute there are a number
of legal values applicable which form the values sets.
1. EMPLOYEE - the set of all employees at a company. The attributes of Employees are: name,
employee number, address, telephone number, salary.
2.DEPT - the set of all departments at the company. The attributes are: department
number, department name, supervisor number.
3.PROJECTS - the set of all projects at the company. The attributes are: project number, project
name, project manager number.
5.7 Representing Entities and Relationships
It is essential that we be able to identify each entity instance in every entity set and each
relationship instance in every relationship set. Since entities are represented by the values
of their attribute set, it is necessary that the set of attribute values be different for each
entity instance. Sometimes an artificial attribute may need to be included in the attribute set to
simplify entity identification (for example, although each instance of the entity set student can be
identified by the values of its attributes student name, address and date of birth, it is convenient
to include an artificial attribute student number to make identification of each entity instance
easier)
A group of attributes (possibly one) used for identifying entities in an entity set is called
an entity key. For example, for the entity set student, student number is an entity key and so is
(student name, address, date of birth). Of course, if k is a key then so must be each superset of k.
To reduce the number of possible keys, it is usually required that a key be For
example, student name by itself could not be considered an entity key since two students could
have the same name. When several minimal keys exist (such keys are often called candidate
keys), any semantically meaningful key is chosen as the entity primary key.
Similarly each relationship instance in a relationship set needs to be identified. This
identification is always based on the primary keys of the entity sets involved in the relationship
set. The primary key of a relationship set is the combination of the attributes that form the
primary keys of the entity sets involved. In addition to the entity identifiers, the relationship key
also (perhaps implicitly) identifies the role that each entity plays in the relationship.
Let employee number be the primary key of an entity set EMPLOYEE and company name be the
primary key of COMPANY. The primary key of relationship set WORKS-FOR is then
(employee number, company name). The role of the two entities is implied by the order in which
the two primary keys are specified.
In certain cases, the entities in an entity set cannot be uniquely identified by the values of their
own attributes. For example, children of an employee in a company may have names that are not
unique since a child of one employee is quite likely to have name that is the same as the name of
a child of some other employee. One solution to this problem is to assign unique numbers to all
children of employees in the company. Another, more natural, solution is to identify each child
by his/her name and the primary key of the parent who is an employee of the company. We
expect names of the children of an employee to be unique. Such attribute(s) that discriminates
between all the entities that are dependent on the same parent entity is sometime called
a discriminator; it cannot be called a key since it does not uniquely identify an entity without the
use of the relationship with the employee. Similarly history of employment (Position,
Department) would not have a primary key without the support of the employee primary key.
Such entities that require a relationship to be used in identifying them are called weak entities.
Entities that have primary keys are called strong or regular entities. Similarly, a relationship may
be weak or strong (or regular). A strong relationship is between entities each of which is strong;
otherwise the relationship is a weak relationship. For example, any relationship between the
children of employees and the schools they attend would be a weak relationship. A relationship
between employee entity set and the employer entity set is a strong relationship.
A weak entity is also called subordinate entity since its existence depends on another entity
(called the dominant entity). This is called existence dependence. A weak entity may also
be ID dependent on the dominant entity, although existence dependency does not imply ID
dependency. If a weak entity is ID dependent, the primary key of the weak entity is the primary
key of its dominant entity plus a set of attributes of the weak entity that can act as a discriminator
within the weak entity set. This method of identifying entities by relationships with other entities
can be applied recursively until a strong entity is reached although the relationship between
dominant entity and a weak entity is usually one-to-one, in some situations the relationship may
be many-to-many. For example, a company may employ both parents of some children. The
dependence is then many-to-many.
5.8 FURTHER READING
1) David M. kroenke, Database Processing, Fundamentals, design
and Implementation (2002), 8 ed., Prentice Hall. Chapter 2.
th

● LECTURE SIX

6.0 ENTITY RELATIONSHIP DIAGRAMS


6.1 INTRODUCTION
The lecture discusses how entity relationship diagrams are drawn and how they are applied to
model the database design process. Relevant examples are given and class activities are included
at the end of the lecture.
6.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

● Draw an entity relationship diagram given a case study.

6.3 The ER Diagrams


As noted earlier, one of the aims of building an entity-relationship diagram is to facilitate
communication between the database designer and the end user during the requirements analysis.
To facilitate such communication, the designer needs adequate communication tools. Entity-
Relationship diagrams are such tools that enable the database designer to display the overall
database view of the enterprise (the enterprise conceptual schema).
An E-R diagram naturally consists of a collection of entity sets and relationship sets and their
associations. A diagram may also show the attributes and value sets that are needed to describe
the entity sets and the relationship sets in the ERD. In an ERD, as shown: Entity sets are
represented by rectangular shaped boxes. Relationships are represented by diamond shaped
boxes. Ellipses are used to represent attributes and lines are used to link attributes to entity sets
and entity sets to relationship sets.
The type of relationship is shown by placing a label of 1, m or n on each end of every arc. The
label 1 indicates that only one of those entity instances may participate in a given relationship. A
letter label (often m or n) indicates that a number of these entity instances may participate in a
given relationship. The E-R diagrams may also show roles as labels on the arcs.
The E-R diagram above shows a number of entities and a number of relationships. Also note that
the entity employment history that is shown as double box. A double box indicates
that Employment History is a weak entity and that its existence depends on existence of
corresponding employee entity. The existence dependency of employment history entity on
employee entity is indicated by an arrow. A relationship between three entities is shown. Note
that each entity may participate in several relationships.
6.4 Entity Type Hierarchy
Although entity instances of one entity type are supposed to be objects of the same type, it often
happens that objects of one entity type do have some differences. For example, a company might
have vehicles that could be considered an entity type. The vehicles could however include cars,
trucks and buses and it may then be necessary to include capacity of the bus for buses and the
load capacity of the trucks for trucks, information that is not relevant for cars. In such situations,
it may be necessary to deal with the subsets separately while maintaining the view that all cars,
trucks and buses are vehicles and share a lot of information.
Entity hierarchies are known by a number of different names. At least the following names are
used in the literature:

0. Super types and subtypes


1. Generalization hierarchies
2. ISA hierarchies

We will not discuss entity hierarchies any further although use of hierarchies is now recognized
to be important in conceptual modeling.
6.5 Guidelines for Building an ERM
We have so far discussed the basics of the E-R models and the representation of the model as an
E-R diagram. Although the E-R model approach is often simple, a number of problems can arise.
We discuss some of these problems and provide guidelines that should assist in the modeling
process.
Choosing Entities
As noted earlier, an entity is an object that is of interest. It is however not always easy to decide
when a given thing should be considered an entity. For example, in a supplier-part database, one
may have the following information about each supplier.

● Supplier number
● supplier name
● supplier rating
● supplier location (i.e. city)

It is clear that supplier is an entity but one must now make a decision whether city is an entity or
an attribute of entity supplier. The rule of thumb that should be used in such situations is to ask
the question "Is city as an object of interest to us?". If the answer is yes, we must have some
more information about each city than just the city name and then city should be considered an
entity. If however we are only interested in the city name, city should be considered an attribute
of supplier.
As a guideline therefore, each entity should contain information about its properties. If an object
has no information other than its identifier that interests us, the object should be an attribute.
Multivalued Attributes
If an attribute of an entity can have more than one value, the attribute should be considered an
entity. Although conceptually multi-value attributes create no difficulties, problems arise when
the E-R model is translated for implementation using a DBMS. Although we have indicated
above that an entity should normally contain information about its properties, a multi-value
attribute that has no properties other than its value Should be considered an entity.
6.6 Database Design Process
When using the E-R model approach to database design, one possible approach is to Follow the
major steps that are listed below:

0. Study the description of the application.


1. Identify entity sets that are of interest to the application.
2. Identify relationship sets that are of interest to the application. Determine whether
Relationships are 1:1, 1: n or m: n.
3. Draw an entity-relationship diagram for the application.
4. Identify value sets and attributes for the entity sets and the relationship sets.
5. Identify primary keys for entity sets.
6. Check that ERD conforms to the description of the application.
7. Translate the ERD to the database model used by the DBMS.

6.7 Rigorous definition of entity


Although we have discussed the concept of an entity, we have not presented a rigorous definition
of an entity. A simple rigorous definition is not possible since there is no absolute distinction
between entity types and attributes. Usually an attribute exists only as related to an entity type
but in some contexts, an attribute can be viewed as an entity.
Further complications arise because an entity must finally be translated to relation. Since
relations have a somewhat inflexible structure, an entity itself must satisfy several artificial
constraints. For example, problems arise when things that we consider entities have attributes
that are mutli-valued. The attribute then needs to be declared an entity to avoid problems that
will appear in mapping an entity with multi-values attribute to the relational database.
Consider for example, an entity called vendor. The vendor has several attributes including
vendor number, vendor name, location, telephone number etc. Normally the location will be
considered an attribute but should we need to cater for a vendor with several branches, the
location would need to be made an entity and a new relationship located-in would be needed.
To overcome some of the above problems, rule of thumbs like the following should be Followed:

8. Entities have descriptive information; identifying attributes do not.


9. Multivalued attributes should be classed as entities.
10. Make an attribute that has a many-to-one relationship with an entity.
11. Attach attributes to entities that they describe most directly.
12. Avoid composite identifiers as much as possible.

Also there is no absolute distinction between an entity type and a relationship type although a
relationship is usually regarded as unable to exist on its own. For example, an enrolment cannot
be expected to exist on its own without the entities students and subjects.
A careful reader would have noticed our reluctance to discuss time in our data models. We have
assumed that only current attribute values are of interest. For example, when an employee's
salary changes, the last salary value disappears for ever. In many applications this situation
would not be satisfactory. One way to overcome such problem would be to have an entity
for salary history but this is not always the most satisfactory solution. Temporal data models
deal with the problem of modeling time dependent data.
6.8 The Facts-based View
If we study what we usually store in records we find that fields are character strings that surely
represent facts. Any field by itself conveys little useful information. For example, a field may
convey a name or a fact like department number but most useful information is conveyed only
when the interconnections between fields are also conveyed. Information therefore is expressed
not only by facts represented in fields but more importantly by relations among fields. Records
tie information together; they represent aggregation of facts. Records may be considered to have
three components:

13. What is each field about i.e. the entity type.


14. How each field refers to the entity representation
15. What information each field conveys i.e. a relationship or a fact about the entity.

The facts based view is based on the premise that rather than choosing entities as clustering
points and aggregating facts around them we use a design based on aggregating single-valued
related facts together.
Kent suggests the following as basis for facts based view of database design:

● Single-valued facts about things are maintained in records having that thing's identifier.
● Several single-valued facts about the same thing can be maintained in the same record.
● All other kinds of facts (e.g. multi-valued) are maintained in separate records, one record
for each fact.

NOTE:
It should be noted that database design is an iterative process.
6.9 Lecture Summary
Entity relationship model

● It is a high level conceptual data model developed by chen (1976) to facilitate database
design.
● It shows how data is logically grouped together and the relationship between the logical
data groupings as defined by the business requirements of the system.
● It comprises of:

Entity-relationship diagram (ERD)


Associated documents of entities, relationships and attributes
An entity-relationship model represents:

0. Logical data structure used in the current system


1. Logical structure of data for the required system

Entity type:
An object or concept that is identified by the enterprise as having an independent existence.
It can be an object wit a physical (or ‘real’) existence e.g. staff, suppliers, property etc or a
conceptual (or ‘abstract’) existence e.g. sale, inspection, work experience etc.
Entity:
An instance of an entity type that is uniquely identifiable. It is also referred as entity
occurrence/entity instance.
Weak entity: dependent on the existence of another entity
Strong entity: its existence does not depend on the existence of another entity
Attribute: Is a property or characteristic of an entity
Attribute domain: a set of values that may assigned to an attribute
Attributes can classified as simple attribute or composite attribute; single valued or multi-
valued; or derived attribute
Simple attribute: composed of a single component with independent existence e.g. sex, salary
etc
Composite attribute: composed of multiple components each with an independent
existence e.g. address (street, city, area, post-code)
Single-valued attribute: it holds a single value for a single entity e.g. customer No
Multi-valued attribute: it holds multiple values for a single entity e.g. address (Tel-no:)
Derived attribute: it represent a value that is derivable from the value of related attribute or set
of attributes, not necessary in the same entity, e.g. age derived from the date of birth (DOB)
attribute.
Keys: This is a data item that allows us to uniquely identify individual occurrences of an entity
type.
Candidate key: Is one or more attributes, whose value(s) uniquely identify each entity;
Primary key: The candidate key selected to be the primary key; examples ……
Composite key: A candidate key that consists of two or more attributes e.g. an advert
(Newspaper_name, Newspaper_date, Property _no)
Relationship: Is a link or association between entities.
Degree of relationship: The number of participating entities in a relationship.

● LECTURE SEVEN

7.0 NORMALIZATION
7.1 INTRODUCTION
The lecture discusses the meaning of the term normalization and its importance in database
design. Different types of normalization are discussed including: 1 , 2 and 3 normal forms.
st nd rd

Relevant examples are discussed for each normal form.


7.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

0. Define the term normalization and explain its importance in database design.
1. Explain the meaning of 1st, 2 and 3 normal forms.
nd rd

2. Explain with a relevant example how 1 , 2 and 3 normal forms are carried out.
st nd rd

7.3 NORMALIZATION

● A technique for producing a set of relations with desirable properties, given the data
requirement of an enterprise
● Relations can fall into one/more categories (or classes) called Normal Forms
● Normal Forms: A class of relations free from a certain set of anomalies
● Normal forms are given names such as:

0. First normal form (1NF)


1. Second normal form (2NF)
2. Third normal form (3NF)
3. Boyce-codd normal form (BCNF)
4. Fourth normal form (4NF)
5. Fifth normal form (5NF)
6. Domain-key normal form (DK/NF)

NOTE:
This forms are accumulative i.e. a relation in third form is also in 2NF and 1NF. Four our
discussions will only consider 1st, 2 and 3 normal forms.
nd rd

Advantages of Normalization

● It’s a formal technique with each stage of normalization process eliminating a particular
type of undesirable dependency.
● It highlights constraints and dependencies in the data and hence aids in understanding the
nature of the data.
● The 3NF produces well-designed databases, which provide a high degree of independence.

Data Redundancy and Update Anomalies

● The aim of a relational database design is to group attributes into relations so as to


minimize data redundancy and there-by reduce the file storage space required by the
implemented base relation
● Relations that have redundant data may have problems called, Update Anomalies classified
as:

Insertion-inserting a relation requires we have information from two or more entities


Deletion- deleting a relation results in some related information (from another entity) being lost
Modification anomalies- changing the value of one attribute in one row requires other rows
with the same attribute to be reflected with the same change
Functional Dependency

● A functional dependency describes a relationship between attributes in a single relation


● An attribute is functionally dependent on another if we can use the value of one attribute to
determine the value of another
● We use the symbol to indicate a functional dependency

Is read functionally determines


E.g. Student_Id, Student_Name
Student_ID, Course, Semester Grade
Staff_No Sname, Saddress, Position, Salary
The attributes listed on the left hand side of the are called determines
One can read A B as “ A determines B”
7.4 First Normal Form (1NF)

● A table (relation) is said to be in 1NF if and only it contains no repeating groups i.e. it has
no repeated values for a particular attributes with a single record or if it meets the definition
of a relation:

Each column (attribute) value must be a single value only


All values for a given column (attributes) must be of the same type Each column (attribute) name
must be unique
No two rows (tuples) in a relation can be identical The order of rows or columns is insignificant

● Example of a relation in 1NF: Stock (Company, Symbol, Date, Close-Date)


● Steps:

o Identify repeating groups


o Remove the repeating groups by forming new rows
o Identify the candidate keys
7.5 Second Normal Form (2NF)

● A table (relation) is in 2NF if and only it is in 1NF and every non-key attribute is fully
dependent on the key attribute.
● Relations that have single attributes for a key are automatically in 2NF i.e. fully functional
dependency e.g. A B removal of A means the dependency not sustained any more.

7.6 Third Normal Form (3NF)


● A table (relation) is in 3NF if and only it is in 2NF and every non-key attributes is
independent of all other non-key attributes i.e. it contains no Transitive Dependencies:
● Consider a relation R containing attributes A, B and C

If A B and B C then A C, then the three attributes with the above dependencies are said to have
transitive dependency
7.6 Normalization Example
Consider a sales order with the following fields:
SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo,
ClerkName, ItemNo, Description, Qty, UnitPrice
Think of this as the baseline -- one large table
Normalization: First Normal Form

● Separate Repeating Groups into New Tables.


● Repeating Groups Fields that may be repeated several times for one document/entity
● Create a new table containing the repeating data
● The primary key of the new table (repeating group) is always a composite key; Usually
document number and a field uniquely describing the repeating line, like an item number.

First Normal Form Example


The new table is as follows:
SalesOrderNo, ItemNo, Description, Qty, UnitPrice
The repeating fields will be removed from the original data table, leaving the
following. SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo,
ClerkName
These two tables are a database in first normal form
What if we did not Normalize the Database to First Normal Form?
Repetition of Data – Sales Order Header data repeated for every line in sales order.
Normalization: Second Normal Form

● Remove Partial Dependencies.


● Functional Dependency The value of one attribute in a table is determined entirely by the
value of another.
● Partial Dependency A type of functional dependency where an attribute is
functionally dependent on only part of the primary key (primary key must be a composite
key).
● Create separate table with the functionally dependent data and the part of the key on which
it depends. Tables created at this step will usually contain descriptions of resources.
Second Normal Form Example
The new table will contain the following fields:
Item No, Description
All of these fields except the primary key will be removed from the original table. The primary
key will be left in the original table to allow linking of data:
SalesOrderNo, ItemNo, Qty, UnitPrice
Never treat price as dependent on item. Price may be different for different sales orders
(discounts, special customers, etc.)
Along with the unchanged table below, these tables make up a database in second normal form:
SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName
What if we did not Normalize the Database to Second Normal Form?

● Repetition of Data -- Description would appear every time we had an order for the item
● Delete Anomalies -- All information about inventory items is stored in the Sales Order
Detail table. Delete a sales order, delete the item.
● Insert Anomalies -- To insert an inventory item, must insert sales order.
● Update Anomalies -- To change the description, must change it on every SO.

Normalization: Third Normal Form

● Remove transitive dependencies.


● Transitive Dependency A type of functional dependency where an attribute is functionally
dependent on an attribute other than the primary key. Thus its value is only indirectly
determined by the primary key.
● Create a separate table containing the attribute and the fields that are functionally dependent
on it. Tables created at this step will usually contain descriptions of either resources or
agents. Keep a copy of the key attribute in the original file.

Third Normal Form Example


The new tables would be:
CustomerNo, CustomerName, CustomerAdd
ClerkNo, ClerkName
All of these fields except the primary key will be removed from the original table. The primary
key will be left in the original table to allow linking of data as follows:
SalesOrderNo, Date, CustomerNo, ClerkNo
Together with the unchanged tables below, these tables make up the database in third normal
form.
ItemNo, Description
SalesOrderNo, ItemNo, Qty, UnitPrice
What if we did not Normalize the Database to Third Normal Form?

● Repetition of Data -- Detail for Cust/Clerk would appear on every SO


● Delete Anomalies -- Delete a sales order, delete the customer/clerk
● Insert Anomalies -- To insert a customer/clerk, must insert sales order.
● Update Anomalies -- To change the name/address, etc, must change it on every SO.

Completed Tables in Third Normal Form


Customers: CustomerNo, CustomerName, CustomerAdd
Clerks: ClerkNo, ClerkName
Inventory Items: ItemNo, Description
Sales Orders: SalesOrderNo, Date, CustomerNo, ClerkNo
SalesOrderDetail: SalesOrderNo, ItemNo, Qty, UnitPrice
7.7 Lecture Summary
Normalization processing:

● Represent the un-normalized table, identifying key attribute


● Identify the repeating groups
● Convert the 1NF by isolating the repeating groups of attributes to form a separate table
● Understand functional dependencies between attributes of the same table i.e. given one you
one be able to identify the other. Functional diagram may be used to represent the
dependencies
● Convert the 2NF by isolating non-key attributes not fully functional dependent on the key
i.e. remove partial functional dependence
● Convert the 3NF by isolating non-key attributes that functionally depends on other non-key
attributes
● Rationalize the results i.e. consider combining any table that have identical key attributes
● Discards any table that are contained in other tables
● Ensure the names of the tables reflects the information contained
● Identify and mark foreign keys

7.8 FURTHER READING


T. Connolly, C. Begg, A. Strachan: Database Systems: A Practical Approach to Design,
Implementation, and Management (2002).3 Edition , Addison-Wesley, Chapter 1.
rd
● LECTURE EIGHT

8.0 INTRODUCTION TO SQL


8.1 INTRODUCTION
The lecture discusses how the structured Query Language (SQL) is used to create and manipulate
the database. Data definition commands and data manipulations commands are discussed and
relevant examples given on how to create the database using Data definition commands and
manipulates it using data manipulations commands.
8.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

0. Identify the Data Definition Language (DDL) commands used to create a database.
1. Identify the Data Manipulation Language (DML) commands used to manipulate
database.
2. Apply the use of DDL and DML commands to create and manipulate a database.

8.3 What is SQL?


SQL stands for Structured Query Language and is sometimes pronounced as "sequel." At its
simplest, it is the language that is used to extract, manipulate, and structure data that resides in a
relational database management system (RDBMS).
Why Use SQL?
SQL is used in a variety of ways including queries, building reports, data-entry forms, combo
boxes, programs
8.4 Data Definition Language
DDL are commands for creating and altering the structure of objects:

● Column name
● Data Type
● Column Size etc DDL Commands
● CREATE - create table/view/index
● ALTER - modify table
● DROP - remove table/view/index

1.5 Data Manipulation Language


Command for Manipulating Databases

● adding
● inserting
● deleting
● sorting
● displaying etc.

DML Commands
SELECT: Queries data in the database
INSERT: Inserts data into a table
UPDATE: Changes the data in the database
DELETE: Removes data from a table
Using DDL
When you are manipulating the structure of a database, there are three primary objects that you
will work with: tables, indexes, and relationships.
Tables are the database structure that contains the physical data, and they are organized by their
columns (or fields) and rows (or records).
Indexes are the database objects that define how the data in the tables is arranged and sorted in
memory.
Relationships define how one or more tables relate to one or more other tables.
The CREATE Command
CREATE TABLE tblCustomers (CustomerID INTEGER, [Last Name] TEXT(50), [First Name]
TEXT(50), Phone TEXT(10),
Email TEXT(50))
The CREATE Command
If a field name includes a space or some other non-alphanumeric character, you must enclose that
field name within square brackets ([ ]).
If you do not declare a length for text fields, they will default to 255 characters. For consistency
and code readability, you should always define your field lengths.
For more information about the types of data that can be used in field definitions see separate
notes
You can declare a field to be NOT NULL, which means that null values cannot be inserted into
that particular field; a value is always required. A null value should not be confused with an
empty string or a value of 0; it is simply the database representation of an unknown value.

Defining Relationships
When defining the relationships between tables, you must make the CONSTRAINT declarations
at the field level. This means that the constraints are defined within a CREATE TABLE
statement. To apply the constraints, use the CONSTRAINT keyword after a field declaration,
name the constraint, name the table that it references, and name the field or fields within that
table that will make up the matching foreign key.
8.6 Data Manipulation
The most basic and most often used SQL statement is the SELECT statement. SELECT
statements are the workhorses of all SQL statements, and they are commonly referred to as select
queries. You use the SELECT statement to retrieve data from the database tables, and the results
are usually returned in a set of records (or rows) made up of any number of fields (or columns).
You must designate which table or tables to select from with the FROM clause.
Retrieving Records

Restricting the Result Set


To qualify a SELECT statement, you must use a WHERE clause, which will allow you to
specify exactly which records you want to retrieve.

Using Aggregate Functions to Work with Values


Aggregate functions are used to calculate statistical and summary information from data in
tables. These functions are used in SELECT statements, and all of them take fields or
expressions as arguments.
To count the number of records in a result set, use the Count function. Using an asterisk with the
Count function causes Null values to be counted as well.

To find the average value for a column or expression of numeric data, use the Avg function:

Grouping Records
Sometimes there are records in a table that are logically related, as in the case of the invoices
table. Since one customer can have many invoices, it could be useful to treat all the invoices for
one customer as a group.
The key to grouping records is that one or more fields in each record must contain the same
value for every record in the group.

Altering Tables
With the ALTER TABLE statement, you can add, remove, or change a column (or field), and
you can add or remove a constraint.
To add a field with the ALTER TABLE statement, use the ADD COLUMN clause with the
name of the field, its data type, and the size of the data type, if it is required.

To change the data type or size of a field, use the ALTER COLUMN clause with the name of the
field, the desired data type, and the desired size of the data type, if it is required.

If you want to change the name of a field, you will have to remove the field and then recreate it.
To remove a field, use the DROP COLUMN clause with the field name only.

Data Types - Text


Text - TEXT, TEXT(n), CHAR, CHAR(n), VARCHAR, VARCHAR(n), ALPHANUMERIC,
ALPHANUMERIC(n), STRING, STRING(n), MEMO etc.

Data Types - Date


The DATETIME data type is used to store date, time, and combination date/time values for the
years ranging from 100 to 9999. It uses 8 bytes of memory for storage, and its synonyms are
DATE, TIME, DATETIME, and TIMESTAMP.
Data Types - OLEOBJECT
The OLEOBJECT data types are used to store large binary objects such as Word documents or
Excel spreadsheets. The number of bytes is not specified, and the maximum size is 2.14
gigabytes. Its synonyms are IMAGE, LONGBINARY, GENERAL, and OLEOBJECT

Other Data Types

● The CURRENCY data type is used to store numeric data that contains up to 15 digits on the
left side of the decimal point, and up to 4 digits on the right. It uses 8 bytes of memory for
storage, and its only synonym is MONEY
● The BOOLEAN data types are logical types that result in either True or False values
● The COUNTER data type is used to store long integer values that automatically increment
whenever a new record is inserted into a table

More on Data Manipulation Language


Predicates
A predicate is an SQL clause that qualifies a SELECT statement, similar to a WHERE clause,
except that the predicate is declared before the column list. Predicates can further restrict the set
of records you are retrieving, and in some instances filter out any duplicate data that may exist.
ALL - ALL keyword (default) is used when no predicate is declared in an SQL statement
DISTINCT - used to control how duplicate values in a result set are handled. Based on the
column(s) specified in the field list, those rows that have duplicate values in the specified
columns are filtered
DISTINCTROW - similar to the DISTINCT keyword except that it is based on entire rows, not
just individual fields
TOP - used to return a certain number of rows that fall at the top or bottom of a range that is
specified by an ORDER BY clause
SQL Expressions
You can use any combination of operators, constants, literal values, functions, field names,
controls, or properties to build your SQL expressions
IN - The IN operator is used to determine if the value of an expression is equal to any of several
values in a specified list.
BETWEEN - The BETWEEN operator is used to determine if the value of an expression falls
within a specified range of values
LIKE - The LIKE operator is used to determine if the value of an expression compares to that of
a pattern.
IS NULL - A null value is one that indicates missing or unknown data. The IS NULL operator is
used to determine if the value of an expression is equal to the null value.
SELECT INTO Statement - The SELECT INTO statement can be used to create a new table
from one or more existing tables
Subqueries
A subquery is a SELECT statement that is used inside another SELECT, SELECT INTO,
INSERT INTO, DELETE, or UPDATE statement. It can help further qualify a result set based
on the results of another result set. This is called nesting, and since a subquery is a SELECT
statement, you can also nest a subquery inside another subquery. When you use a subquery in an
SQL statement, it can be part of a field list, a WHERE clause, or a HAVING clause.
There are three basic forms of subqueries, and each uses a different kind of predicate.
The IN subquery - The IN subquery is used to check the value of a particular column against a
list of values from a column in another table or query. It is limited in that it can return only a
single column from the other table.
EXISTS predicate is used in subqueries to check for the existence of values in a result set. If the
subquery does not return any rows, the comparison is False.
Joins
A JOIN statement enables you to retrieve records from tables that have defined relationships
Inner Join
The INNER JOIN, also known as an equi-join, is the most commonly used type of join. This join
is used to retrieve rows from two or more tables by matching a field value that is common
between the tables.

Self Join
It is also possible to join a table to itself by using an alias for the second table name in the FROM
clause. Let's suppose that we want to find all customer records that have duplicate last names.
We do this by creating the alias "A" for the second table and checking for first names that are
different.
SELECT tblCustomers.[Last Name], tblCustomers.[First Name]
FROM tblCustomers INNER JOIN tblCustomers AS A ON tblCustomers.[Last Name]=A.[Last
Name]
WHERE tblCustomers.[FirstName]<>A.[FirstName]
ORDER BY tblCustomers.[Last Name]
Outer Join
The OUTER JOIN is used to retrieve records from multiple tables while preserving records from
one of the tables, even if there is no matching record in the other table.
The LEFT OUTER JOIN will select all rows in the right table that match the relational
comparison criteria, and it will also select all rows from the left table, even if no match exists in
the right table.
The RIGHT OUTER JOIN is simply the reverse of the LEFT OUTER JOIN
Cartesian Join
A Cartesian Join is defined as "all possible combinations of all rows in all tables."
SELECT * FROM tblCustomers, tblInvoices
This is discouraged, especially with tables that contain hundreds or thousands of rows
Union Operator
Involve combining data from multiple sources of data into one result set (technically not a join)
used to splice together data from tables, SELECT statements, or queries, while leaving out any
duplicate rows. Both data sources must have the same number of fields, but the fields do not
have to be the same data type
SELECT [Last Name], [First Name], Email
FROM tblCustomers
UNION
SELECT [Last Name], [First Name], Email
FROM tblEmployees
Views
An SQL view is a database object that allows you to organize and look at data from one or more
tables, and can be referenced as if it were a single, virtual table:

Procedures
A procedure is a database object that you can use to execute an SQL statement that is based on
values that are passed to it. These values are often called parameters. You can think of a
procedure as an SQL-based function. The procedure allows you to pass in parameters that are
then used by the SQL statement, usually as part of a WHERE clause. The benefit is that you can
write very generic procedures that can be used in a variety of ways, and they can be called or
executed from many different places in your programming code.
Syntax:
CREATE PROCEDURE ProcedureName
(Parameter1 datatype, Parameter2 datatype) AS SQLStatement
Example:
CREATE PROCEDURE DeleteInvoices
(InvDate DATETIME) AS
DELETE FROM tblInvoices
WHERE tblInvoices.InvoiceDate < InvDate
Data Control
The GRANT and REVOKE statements allow a user to control access to objects in their schema.
The Grant command grants authorization for a subject (another user or group) to perform some
action (SELECT, INSERT, UPDATE, DELETE, ALTER, INDEX) on an object.
COMMIT - Make all recent changes to the database permanent. Changes that have occurred
since the last commit are made permanent. A commit can be done explicitly using the following
syntax:
COMMIT;
A commit is also done implicitly when the next SQL statement is executed or the user exits
ROLLBACK - Undo all recent changes to the database. A rollback can only undo changes made
since the last commit. The syntax for the ROLLBACK command is:
ROLLBACK;
Database Object Permissions
To implement the security, you must establish permissions on database objects for each user or
group. There are two kinds of permissions that you can establish: explicit or implicit
permissions. Explicit permissions are the privileges you assign or grant to a specific user, while
implicit permissions are the privileges that you assign to a group and that are inherited by
members of that group.
The GRANT statement assigns or allows a user or group to perform some type of action with a
specified database object, and its general forms are as follows:
GRANT privilege ON TABLE tablename TO grantee
Or
GRANT privilege ON DATABASE TO grantee
Database Object Permissions
The REVOKE statement removes or disallows a user or group from performing some type of
action, and its general forms are as follows:
REVOKE privilege ON TABLE tablename FROM grantee
Or
REVOKE privilege ON DATABASE TO grantee
Privileges
SELECT Tables - Allows a user to read the data and read the design of a specified table
DELETE Tables- Allows a user to delete data from a specified table
INSERT Tables - Allows a user to insert data into a specified table UPDATE Tables - Allows a
user to update data in a specified table DROP Tables - Allows a user to remove a specified table
SELECTSECURITY Tables - Allows a user to view the permissions for a specified table
UPDATESECURITY Tables, - Allows a user to change the permissions for a specified table
UPDATEIDENTITY Tables Allows a user to change the values in auto-increment columns
CREATE Tables - Allows a user to create a new table.
SELECTSCHEMA Tables -Allows a user to view the design of a specified table SCHEMA
Tables- Allows a user to modify the design of a specified table UPDATEOWNER Tables -
Allows a user to change the owner of a specified table.
ALL PRIVILEGES:- All Allows a user all permissions, including administrative, on a specified
table or database.
CREATEDB Database Allows a user to create a new database. EXCLUSIVECONNECT
Database Allows a user to open a database in exclusive mode.
CONNECT Database Allows a user to open a database. ADMINDB Database Allows a user to
administer a database. GRANT SELECT ON TABLE tblInvoices TO Shipping
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE tblInvoices TO Billing
Example assigning read-only permissions to the Shipping group, and all data manipulation
permissions to the Billing group:
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE tblInvoices TO Billing
● LECTURE NINE

9.0 FILE- SERVER SYSTEMS AND CLIENT- SERVER SYSTEMS


9.1 INTRODUCTION
The lecture discusses the concepts of file server systems, client server systems and distributed
database systems in terms of database access. It discusses the advantages and disadvantages of
the three approaches.
9.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

0. Explain how file server systems are used to manage the database
1. Explain how the client server systems are used to manage the database
2. Explain how the distributed database systems work.
3. Explain the advantages and disadvantages of file server and client server and
Distributed database systems

9.3 File-Server

● In a file-server environment the processing is distributed about the network typically a local
area network (LAN)
● The file-server holds the files required by the applications and the DBMS run on each
workstation, requesting files from the file server when necessary
● File-server architecture diagram………………………………..
● The file server simply acts as a shared hard disk drive
● The DBMS in each workstation sends request to the file-server for all data that the DBMS
requires that is stored on disk

Disadvantageous
There is large amount of network traffic.
A fully copy of the DBMS is required on each workstation.
Concurrency, recovery and integrity control are made complex because there can be multiple
DBMSs accessing the same file.
9.4 Client-server architecture

● As the name suggest, there is a client process, which requires some resource, and a server,
which provides the resource. There is no requirement that the client and server must reside
on the same machine hence, a server can be placed at one site in a LAN and the clients on
the other sites.
Advantages
It enables wider access to existing databases
Increased performance: since different CPUs can be processing applications in parallel since the
server and the client reside on different computers.
Hardware cost may be reduced since it is only the server that requires storage and processing
power sufficient to store and manage the database.
Communication cost are reduced: applications carry out part of the processing on the client and
send only requests for database access across the network, resulting to less data being sent across
the network.
Increased consistency: the server can handle integrity checks, rather than each application
program performing its own checking.
It maps onto open-systems architecture quite naturally.
9.5 Distributed databases
A logically interrelated collection of shared data (and description of this data) physically
distributed over a computer network
Distributed DBMS

● The software system that permits the management of the distributed database and makes the
distribution transparent to users.
● It consists of a single logical database that is split into a number of fragments which is
stored in one or more computers under the control of a separate DBMS with computers
connected by communicational network.
● Logical application: do not require data from other applications.
● Global application: requires data from other sites.

A DDBMS has the following characteristic:

0. A collection of logically related shared data


1. The data is split into a number of fragments
2. Fragments may be replicated

Fragments/replicates are located to sites.


The sites are linked by a communications network. The data at each site is under the control of a
DBMS.
The DBMS at each site can handle local applications, autonomously Each DBMS participates in
at least one global application
DDBMS diagram…………………………
Advantages of DDBMS
Organizational structure: many organizations are naturally distributed over several locations.
Share ability and local autonomy: users have local control of data to some degree.
Improved availability: Failure of one site does not effect other sites Improved reliability:
replication of data at more than one site.
Improved performance: location of data near the site of ‘greatest demand’ Economics:
more economical to partition the application and perform the processing locally.
Modular growth: much easier to handle expansion.
Disadvantages of DDBMS
Complexity: More complex than centralized because the ability to hide the distributed nature
from the user
Cost: because of increased complexity procurement and maintenance cost high than for
centralized DBMS
Security: access to replicated data have to be controlled and network itself have to be made
secure
Integrity control more difficult: prohibitive cost may be required to enforce integrity
constraints
Lack of standards: communication and data access protocols and no tools to convert CDBMS
to DDBMS
Lack of experience: level of experience not the same as that of CDBMS Database design more
complex: fragmentation, replication of data, and
allocation of fragments to specific site.
9.6 Lecture Summary

● LECTURE TEN

10.0 TRANSACTIONS
10.1 INTRODUCTION
The lecture introduces to the learner to concepts related to database transactions. It discusses
how transactions are handled by the DBMS and how data recovery can be achieved. Relevant
examples are given to illustrate the problems related to database transactions.
10.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:

0. Define the terms database transaction and state the ACID properties of a transaction.
1. Explain how transactions are managed by a DBMS
2. Explain how concurrency control is achieved in a transaction.
3. Explain with a relevant example the problems related with transactions
4. Explain how data recovery is managed in database.

10.3 Transactions

● A transaction is a series of actions, carried out by a single user or application program,


which must be treated as a logical unit of work. E.g. transfer of funds from one account to
another.
● Either the entire operations are carried out or none of the operations is carried out.
● Transactions transform the database from consistent state to another consistent state.

Properties of a transaction
Atomicity: The ‘all’ or ‘nothing’ property i.e. A transaction is an indivisible unit that is
either performed in its entirety or it is not performed at all.
Consistent: A transaction must transform the database from one consistent state to another.
Isolation: Transaction execute independently of one another. I.e. the partial effects
of incomplete transactions should not be visible to other transaction.
Durability (Persistence): The effects of a successfully completed (committed) transactions
are permanently recorded in the database and must not be lost because of a subsequent failure
10.4 Database architecture
Transaction Manager: Coordinates transactions on behalf of the application programs
Communicates with the scheduler to ensure transaction do not interfere with one another.
Scheduler: Responsible for implementing a particular strategy for concurrency control
(lock manager).
Recovery manager: Ensures that the database is restored to the state it was in before the start
of the transaction in case a failure occurs during the transactions.
Buffer manager: Responsible for the transfer of data between the disk storage and
main memory.
10.5 Concurrency control
This is the process of managing simultaneously operations on the database without having them
interfere with one another.
The need for concurrency control
There are many ways in which concurrently executing transactions can interfere with one answer
and so comprise the integrity and consistent of the DB, when multiple users are allowed to
access the database simultaneously.
Without concurrency control the following problems could arise:
● Lost update problem
● Uncommitted dependency problem
● Inconsistent analysis problem

The lost update problem


Another user can override an appropriate successfully completed update operation by one user
e.g.
Consider two transaction (T is executing concurrently with transaction T )
1 2

T is withdrawing Ksh 10 from an account with baln. x initially as Ksh100


1

T depositing Ksh 100 into the same account


2

T and T nearly starts the same time both read baln. x as Ksh10
1 2

T increases baln. x by Ksh 100 to Ksh 200 and stores the update
2

T decrements its copy of baln. x by Ksh 10 to Ksh 90 and stores its value in the database
1

overwriting the previous update and there-by losing the Ksh 100 previously added to the balance
The loss of T ’s update is avoided by preventing T from reading the value of baln. x until after
2 1

T ’s update has been completed i.e. locking


2

NB: if these transactions were executed serially one after the other with no interleaving of
operations, the final balance would be Ksh 190 no matter which transaction is performed first.
The uncommitted dependency problem
Occurs when one transaction is allowed to see the intermediate results of another transaction
before it has committed.
From the previous example if transaction T updates balnx to Ksh200, but it aborts the transaction
4

so that the balnx should be restored to its original value of Ksh100, but transaction T has read
3

balnx as Ksh200 and is using this value as the basis of the Ksh10 reduction, giving a new
increment balance of Ksh190, instead of Ksh90
The problem is avoided by preventing T from reading balnx until after the decision has been
3

made either commit or abort T ’s effects


4

The inconsistent analysis problem


Transactions that only read the database can obtain inaccurate results if they are allowed to read
partial results of incomplete transactions, which are simultaneously updating the database.
Occurs when a transaction reads several values from the database but a second transaction
updates some of them during the execution of the first. e.g. a transaction that is summing data in
a database ( for example totaling balances) will obtain inaccurate results if, while it is executing,
other transaction are updating the database.
Schedules and Serialization
Serial execution means executing one transaction at a time, with no interleaving of operations.
A Schedule shows the sequence of the operations of transactions. A schedule is Serializable if
it produces the same result as some serial schedule.
Concurrency Control techniques
Locking
This works under the principle that when one transaction is accessing database, a lock may deny
access to other transaction to prevent incorrect result
Mostly used to ensure serializability of concurrent transactions Read –shared.
Write-exclusive
Deadlock-a case that may result when two or more transaction are each waiting for locks held by
the other to be released.
Time stamping
Time stamp-a unique identifier created by the DBMS that indicates the relative starting time of a
transaction
Is a concurrency control protocol in which fundamental goal is to order transactions globally in
such a way that older transactions, transaction with smaller timestamps, get priority in case of
conflict
No locking hence no deadlock
Transactions involved in conflict are simply rolled and restarted
Optimistic techniques
Are based on the assumption that conflict is rare, and that it is more efficient to allow
transactions to proceed without imposing delays to ensure serializability
When a transaction wishes to commit, a check is performed to determine whether conflict has
occurred, if there has been a conflict, the transaction must be rolled back and restarted.
10.6 Database Recovery
Definition: the process of restoring the database to a correct state in the event of failure
Causes of failure:
System crashes: due to hardware or software errors, resulting to loss of main memory
Media failure: such as head crashes or unreadable media, resulting in the loss of parts
of secondary storage
Application software errors, such as logical errors in the program that is accessing
the database, which cause one or more transaction failure
Natural physical disasters, such as fires, floods earthquakes or power failures
Carelessness or unintentional destruction of data or facilities by operators or users
Sabotage or intentional corruption or destruction of data, hardware or software facilities
Recovery Techniques
A backup mechanism, which makes periodic backup copies of the database
Logging facilities, which keep track of the current state of transaction and database changes
A checkpoint facility, which enables updates to the database that are in progress to be made
permanent
A recovery manager, which allows the system to restore the database to a consistent state
following a failure
10.7 FURTHER READING
1) David M. kroenke, Database Processing, Fundamentals, design and Implementation
(2002), 8 ed. Prentice Hall. Chapter 5 and Chapter 6.
th

2) T. Connolly, C. Begg, A. Strachan: Database Systems: A Practical Approach to


Design,Implementation, and Management (2002).3 Edition , Addison-Wesley. Chapter 18 and
rd

19
REFERENCES
1) T. Connolly, C. Begg, A. Strachan: Database Systems: A Practical Approach
to Design,Implementation, and Management (2002).3 Edition , Addison-Wesley.
rd

2) R. Elmasri, S. B. Navathe: Fundamentals of Database Systems (2000), 3rd Edition. Addison-


Wesley.
3) C. J. Date: An Introduction to Database Systems(2000), 7 International Edition,. Addison-
th

Wesley.
4) David M. kroenke, Database Processing, Fundamentals, design and Implementation (2002),
8 ed., Prentice Hall.
th

You might also like