Isc 305
Isc 305
1.1 INTRODUCTION
In this lesson we define the concepts database and database systems. The Lesson also discusses
the components of a database system and ends up with a discussion on the brief history of
database systems.
1.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:
● Integrated: Data is stored in an uniform way, typically all in one place(a single physical
computer for example)
● Self-Describing: A database maintains a description of the data it contains (catalog)
● Related: Data has some relationship to other data. In a university
we have students who take courses taught by lecturers.
By taking the advantage of relationship and integration we can provide information to users as
opposed to simply data.
We can also say that the database is a model of what the users perceive Three main categories
of models:
0. User or Conceptual model: How users perceive the world and/or the business
1. Logical models: Represents the logical of how a business operates. For example, the
relationship between different entities and how the flow of the data through the
organization. Based on the logical model.
2. Physical model: Represent how the database is actually implemented on a computer
system. This is based on the logical model.
● Database
● Hardware
● Software
● Users
Database
The data in the database will be expected to be both integrated and shared particularly on multi-
user systems.
Contents of a database:
1) User data
2) Meta data
3) Indexes
4) Applications Metadata
User data: This is the data users work with directly by entering, updating and viewing.
Meta data: Data about data i.e. data that describe how user data are stored in terms of table
name, column name, data type, length, primary keys, etc.
Metadata are typically stored in systems tables or systems catalog and are typically only directly
accessible by the DBMS or by the system administrator.
Indexes: They provide an alternate means of accessing user data. Sorting and searching.
They allow the database to access a record without having to search the through the entire table.
Application Metadata: This is storage facility for forms, reports, queries and other application
components. It can be accessed via the database development programs.
Hardware: The hardware consists of the secondary storage on which the database physically
resides, together with the associated I/O devices, device controller’s etc.
Software: This is a database management system, which is a software used to build, maintain
and control database systems. All requests from the user to the database are handled by the
DBMS.
Types of DBMSs:
1) Relation DBMS
2) Hierarchical DBMS
3) Network DBMS
4) OO DBMS
5) Others
Users
There are five categories of users:
● 1940’s, 50’s Initial use of computers as calculators. Limited data, focus on algorithms.
Science, military application.
● 1960’s Business uses. Organizations data, customers data, sales, inventory, accounting, etc.
file system based, high emphasis on applications programs to extract and assimilate data.
● 1970’s The relation model. Data separated to individual tables. Related by keys. Examples:
oracle, Sybase, Informix, digital RDB, IBM DB.
● 1980’s Microcomputers-the IBM PC. Database program such as Dbase, FoxPro, Paradox,
MS Access. Individual user can create, maintain small databases.
● Late-1980’s Local area networks. Workgroups sharing resources such as files, printers, e-
mail. Client/Server Database resides on a central server, applications programs run on client
PCs attached to the server over a LAN.
● 1990’s Internet and World wide web make databases of all kinds available from a single
type of client- the web Browser.
● TerminologiesPage
● LECTURE TWO
2.0 FILE SYSTEMS VS DATABASE SYSTEM
2.1 INTRODUCTION
In this lesson we define the concept file systems and provide a comparison between a file system
and database systems. The Lesson also discusses the advantages and disadvantages of database
systems versus a file system. The concept of Database Management System (DBMS) is also
introduced and the components of a DBMS and its advantages are discussed.
2.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:
0. Define the term file systems and discuss its advantages and disadvantages.
1. Compare the file systems and database systems in terms advantages and disadvantages.
2. Discuss the components of DBMS.
3. Discuss the advantages of a DBMS
● Where data are stored to individual files is a very old, but often used approach to system
development.
● Each program (system) often had its own unique set of files.
What is a file?
Advantages of a file:
● Are relatively easy to design and implement since they are normally based on a single
application or information system.
● The processing speed is faster than other ways of storing data.
Disadvantages:
● Program-data dependence.
● Duplication of data.
● Limited data sharing.
● Lengthy program and system development time.
● Excessive program maintenance when the system changed.
● Duplication of data items in multiple files. Duplication can affect on input, maintenance,
storage and possibly data integrity problems.
● Inflexibility and non-scalability. Since the conventional files are designed to support single
● Application, the original file structure cannot support the new requirements.
NOTE:
Today, the trend is in favor of replacing file-based systems and applications with database
systems.
2.4 Database Approach
A database is more than a file – it contains information about more than one entity and
information about relationships among the entities. Data about a single entity (e.g., Product,
Customer, Customer Order, Department) areeach stored to a “table†in the database.
Databases are designed to meet the needs of multiple users and to be used in multiple
applications.
One significant development in the more user-friendly relational DBMS products is that users
can sometimes get their own answers from the stored data by learning to use data querying
methods.
● Program-data independence.
● Minimal data redundancy, improved data consistency, enforcement of standards, improved
data quality.
● Improved data sharing, improved data accessibility and responsiveness.
● Increased productivity of application development.
● Reduced program maintenance Data can be shared by many applications and systems.
● Data are stored in flexible formats. Data independence. If the data are well designed, the
user
● can access different combinations of same data for query and report purposes.
● Reduce redundancy.
2.5 Database Management Systems
Definition: - software which enables users to create, manipulate & control access to a database.
Functions:
● Provides the Data Definition Language (DDL): - specifies data types, structures and
constraints
● Provides Data Manipulation Language (DML): - insert, Delete, Update and retrieve data
● Provides controlled access to the database:
0. Security systems
1. Integrity systems
2. Concurrency control system
3. Recovery control system
4. User-accessible catalog
Allows different users to have their own view of data-eliminates complexity to users
Created by DDL and provides mechanisms for: Security; Customizes the appearance of DB;
Consistent-unchanging picture of the structure of the DB; Program- data dependence.
NOTE:
Functionality offered by DBMS:
Differ from one product to the other e.g. share ability, graphic or video support.
Never static e.g. Ms access 97, Ms access 2000
Components of a DBMS
Software components
● Query processor:
Process queries
● Database manager:
Interfaces with user submitted application program and queries It has the following components
Authorization control: - authorizations of users Command processor: -carry out operation
Integrity checker: - request satisfies all necessary integrity constraints Query optimizer: -
optimal strategy of query execution
Transaction manager: - performs required transaction processing Scheduler: - controls
concurrent operations
Recovery manager: - ensures consistent state of database incase of failure Buffer manager: -
transfer data between main memory and secondary storage.
● File manage:
● DML processor:
● DDL Compiler:
● Catalog manager:
1. Coding of some field is changed. For example, a null value that was coded as -1 is now
coded as -9999.
2. A new field is added to the records.
3. The length of one of the fields is changed. For example, the maximum number of digits
in a telephone number field or a postcode field needs to be changed.
4. The field on which the file is sorted is changed.
If some major changes were to be made to the data, the application programs may need to be
rewritten. In a database system, the database management system provides the interface between
the application programs and the data. When changes are made to the data representation, the
metadata maintained by the DBMS is changed but the DBMS continues to provide data to
application programs in the previously used way. The DBMS handles the task of transformation
of data wherever necessary.
This independence between the programs and the data is called data independence. Data
independence is important because every time some change needs to be made to the data
tructure, the programs that were being used before the change would continue to work.To
provide a high degree of data independence; a DBMS must include a sophisticated metadata
management system.
In DBMS, all files are integrated into one system thus reducing redundancies and making data
management more efficient. In addition, DBMS provides centralized control of the operational
data. Some of the advantages of data independence, integration and centralized control are:
Redundancies and inconsistencies can be reduced
In conventional data systems, an organization often builds a collection of application programs
often created by different programmers and requiring different components of the operational
data of the organization. The data in conventional data systems is often not centralized. Some
applications may require data to be combined from several systems. These several systems could
well have data that is redundant as well as inconsistent (that is, different copies of the same data
may have different values). Data inconsistencies are often encountered in everyday life. For
example, we have all come across situations when a new address is communicated to an
organization that we deal with (e.g. a bank, or Telecom, or a gas company), we find that some of
the communications from that organization are received at the new address while others continue
to be mailed to the old address. Combining all the data in a database would involve reduction in
redundancy as well as inconsistency. It also is likely to reduce the costs for collection, storage
and updating of data.
Better service to the Users
A DBMS is often used to provide better service to the users. In conventional systems,availability
of information is often poor since it normally is difficult to obtain information that the existing
systems were not designed for. Once several conventional systems are combined to form one
centralized data base, the availability of information and its up-to datedness is likely to improve
since the data can now be shared and the DBMS makes it easy to respond to unforeseen
information requests. Centralizing the data in a database also often means that users can obtain
new and combined information that would have been impossible to obtain otherwise. Also, use
of a DBMS should allow users that do not know programming to interact with the data more
easily.
The ability to quickly obtain new and combined information is becoming increasingly important
in an environment where various levels of governments are requiring organizations to provide
more and more information about their activities. An organization running a conventional data
processing system would require new programs to be written (or the information compiled
manually) to meet every new demand.
Flexibility of the system is improved
Changes are often necessary to the contents of data stored in any system. These changes are
more easily made in a database than in a conventional system in that these changes do not need
to have any impact on application programs.
Cost of developing and maintaining systems is lower
As noted earlier, it is much easier to respond to unforeseen requests when the data is centralized
in a database than when it is stored in conventional file systems. Although the initial cost of
setting up of a database can be large, one normally expects the overall cost of setting up a
database and developing and maintaining application programs to be lower than for similar
service using conventional systems since the productivity of programmers can be substantially
higher in using non-procedural languages that have been developed with modern DBMS than
using procedural languages.
Standards can be enforced
Since all access to the database must be through the DBMS, standards are easier to enforce.
Standards may relate to the naming of the data, the format of the data, the structure of the data
etc.
Security can be improved
In conventional systems, applications are developed in an ad hoc manner. Often different system
of an organization would access different components of the operational data. In such an
environment, enforcing security can be quite difficult. Setting up of a database makes it easier to
enforce security restrictions since the data is now centralized. It is easier to control that has
access to what parts of the database. However, setting up a database can also make it easier for a
determined person to breach security. We will discuss this in the next section.
Integrity can be improved
Since the data of the organization using a database approach is centralized and would be used by
a number of users at a time, it is essential to enforce integrity controls. Integrity may be
compromised in many ways. For example, someone may make a mistake in data input and the
salary of a full-time employee may be input as Ksh4,000 rather than Ksh40,000. A student may
be shown to have borrowed books but has no enrolment. Salary of a staff member in one
department may be coming out of the budget of another department. If a number of users are
allowed to update the same data item at the same time, there is a possibility that the result of the
updates is not quite what was intended. For example, in an airline DBMS we could have a
situation where the number of bookings made is larger than the capacity of the aircraft that is to
be used for the flight. Controls therefore must be introduced to prevent such errors to occur
because of concurrent updating activities.
2.6 FURTHER READING
T. Connolly, C. Begg, A. Strachan: Database Systems: A Practical Approach
to Design,Implementation, and Management (2002).3 Edition , Addison-Wesley, Chapter 2.
rd
● LECTURE THREE
0. Define the term data model and explain how the following data models work: Network,
Hierarchical, Relation and Object oriented.
1. Discuss the advantages and disadvantages of the above data models
3.3 Introduction
Data models are used for designing a database. Before the data available in an enterprise can be
put in a DBMS, an overall abstract view of the enterprise data must be developed. The view can
then be translated into a form that is acceptable by the DBMS. Although at first sight the task
may appear trivial, it is often a very complex process. The complexity of mapping the enterprise
data to a database management system can be reduced by dividing the process into two phases.
The first phase as noted above involves building an overall view (or model) of
the real world which is the enterprise (often called the logical database design). The objective
of the model is to represent, as accurately as possible, the information structures of the enterprise
that are of interest. This model is sometimes called the enterprise conceptual schema and the
process of developing the model may be considered as the requirements analysis of the database
development life cycle. This model is then mapped in the second phase to the user schema
appropriate for the database system to be used. The logical design process is an abstraction
process which captures the meaning of the enterprise data without getting involved in the details
of individual data values. The figure below shows this two steps.
Hierarchical Model
In a Hierarchical model you could create links between these record types; the hierarchical
model uses Parent Child Relationships. These are a 1: N mapping between record types. This is
done by using trees, like set theory used in the relational model, "borrowed" from maths. For
example, an organization might store information about an employee, such as name, employee
number, department, salary. The organization might also store information about an employee's
children, such as name and date of birth. The employee and children data forms a hierarchy,
where the employee data represents the parent segment and the children data represents the child
segment. If an employee has three children, then there would be three child segments associated
with one employee segment. In a hierarchical database the parent-child relationship is one to
many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs
were popular from the late 1960s, with the introduction of IBM's Information Management
System (IMS) DBMS, through the 1970s.
Advantages
● Simplicity
● Data Security and Data Integrity
● Efficiency
Disadvantages
● Implementation Complexity
● Lack of structural independence
● Programming complexity
Advantages
● Conceptual Simplicity
● Ease of data access
● Data Integrity and capability to handle more relationship types
● Data independence
● Database standards
Disadvantages
● System complexity
● Absence Of Structural Independence
Relational Model
This is a database model that organizes data logically into tables. A formal theory of data
consisting of three major components: (a) A structural aspect, meaning that data in the database
is perceived as tables, and only tables, (b) An integrity aspect, meaning that those tables satisfy
certain integrity constraints, and (c) A manipulative aspect, meaning that the tables can be
operated upon by means of operators which derive tables from tables. Here each table
corresponds to an application entity and each row represents an instance of that entity. (RDBMS
- relational database management system) A database based on the relational model was
developed by E.F. Codd. A relational database allows the definition of data structures, storage
and retrieval operations and integrity constraints. In such a database the data and relations
between them are organized in tables. A table is a collection of records and each record in a table
contains the same fields.
Properties of Relational Tables:
● Structural Independence
● Conceptual Simplicity
● Ease of design, implementation, maintenance and usage.
● Ad hoc query capability
● Disadvantages
● Hardware Overheads
● Ease of design can lead to bad design
The relational model is the most important in today's world, so we will spend most of our time
studying it. Some people today question whether the relational model is not too simple, that it is
insufficiently rich to express complex data types.
NOTE:
Database theory evolved from file processing.
Object Oriented Data Models
Object DBMSs add database functionality to object programming languages. They bring much
more than persistent storage of programming language objects. Object DBMSs extend the
semantics of the C++, Smalltalk and Java object programming languages to provide full-featured
database programming capability, while retaining native language compatibility. A major benefit
of this approach is the unification of the application and database development into a seamless
data model and language environment. As a result, applications require less code, use more
natural data modeling, and code bases are easier to maintain. Object developers can write
complete database applications with a modest amount of additional effort.
In contrast to a relational DBMS where a complex data structure must be flattened out to fit into
tables or joined together from those tables to form the in-memory structure, object DBMSs have
no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-
to-one mapping of object programming language objects to database objects has two benefits
over other storage approaches: it provides higher performance management of objects, and it
enables better management of the complex interrelationships between objects. This makes object
DBMSs better suited to support applications such as financial portfolio risk analysis systems,
telecommunications service applications, World Wide Web document structures, design and
manufacturing systems, and hospital patient record systems, which have complex relationships
between data.
Advantages
● LECTURE FOUR
0. Explain the concept conceptual design, logical design and physical design.
1. Explain what is required in the conceptual design, logical design and physical design
● LECTURE FIVE
relation among n entities each taken from an entity set, not necessarily distinct:
Although we allow relationships among any number of entity sets, the most common cases
are binary relationships between two entity sets. The degree of a relationship is the number of
entities associated in the relationship. Unary, binary and ternary relationships therefore have
degree 1, 2 and 3 respectively.
We consider a binary relation R between two entity types E and E . The relationship may be
1 2
considered as two mappings E -> E and E -> E . It is important to consider the constraints on
1 2 2 1
these two mappings. It may be that one object from E may be associated with exactly one object
1
in E or any number of objects in E may be associated with any number of objects in E . Often the
2 1 2
● LECTURE SIX
We will not discuss entity hierarchies any further although use of hierarchies is now recognized
to be important in conceptual modeling.
6.5 Guidelines for Building an ERM
We have so far discussed the basics of the E-R models and the representation of the model as an
E-R diagram. Although the E-R model approach is often simple, a number of problems can arise.
We discuss some of these problems and provide guidelines that should assist in the modeling
process.
Choosing Entities
As noted earlier, an entity is an object that is of interest. It is however not always easy to decide
when a given thing should be considered an entity. For example, in a supplier-part database, one
may have the following information about each supplier.
● Supplier number
● supplier name
● supplier rating
● supplier location (i.e. city)
It is clear that supplier is an entity but one must now make a decision whether city is an entity or
an attribute of entity supplier. The rule of thumb that should be used in such situations is to ask
the question "Is city as an object of interest to us?". If the answer is yes, we must have some
more information about each city than just the city name and then city should be considered an
entity. If however we are only interested in the city name, city should be considered an attribute
of supplier.
As a guideline therefore, each entity should contain information about its properties. If an object
has no information other than its identifier that interests us, the object should be an attribute.
Multivalued Attributes
If an attribute of an entity can have more than one value, the attribute should be considered an
entity. Although conceptually multi-value attributes create no difficulties, problems arise when
the E-R model is translated for implementation using a DBMS. Although we have indicated
above that an entity should normally contain information about its properties, a multi-value
attribute that has no properties other than its value Should be considered an entity.
6.6 Database Design Process
When using the E-R model approach to database design, one possible approach is to Follow the
major steps that are listed below:
Also there is no absolute distinction between an entity type and a relationship type although a
relationship is usually regarded as unable to exist on its own. For example, an enrolment cannot
be expected to exist on its own without the entities students and subjects.
A careful reader would have noticed our reluctance to discuss time in our data models. We have
assumed that only current attribute values are of interest. For example, when an employee's
salary changes, the last salary value disappears for ever. In many applications this situation
would not be satisfactory. One way to overcome such problem would be to have an entity
for salary history but this is not always the most satisfactory solution. Temporal data models
deal with the problem of modeling time dependent data.
6.8 The Facts-based View
If we study what we usually store in records we find that fields are character strings that surely
represent facts. Any field by itself conveys little useful information. For example, a field may
convey a name or a fact like department number but most useful information is conveyed only
when the interconnections between fields are also conveyed. Information therefore is expressed
not only by facts represented in fields but more importantly by relations among fields. Records
tie information together; they represent aggregation of facts. Records may be considered to have
three components:
The facts based view is based on the premise that rather than choosing entities as clustering
points and aggregating facts around them we use a design based on aggregating single-valued
related facts together.
Kent suggests the following as basis for facts based view of database design:
● Single-valued facts about things are maintained in records having that thing's identifier.
● Several single-valued facts about the same thing can be maintained in the same record.
● All other kinds of facts (e.g. multi-valued) are maintained in separate records, one record
for each fact.
NOTE:
It should be noted that database design is an iterative process.
6.9 Lecture Summary
Entity relationship model
● It is a high level conceptual data model developed by chen (1976) to facilitate database
design.
● It shows how data is logically grouped together and the relationship between the logical
data groupings as defined by the business requirements of the system.
● It comprises of:
Entity type:
An object or concept that is identified by the enterprise as having an independent existence.
It can be an object wit a physical (or ‘real’) existence e.g. staff, suppliers, property etc or a
conceptual (or ‘abstract’) existence e.g. sale, inspection, work experience etc.
Entity:
An instance of an entity type that is uniquely identifiable. It is also referred as entity
occurrence/entity instance.
Weak entity: dependent on the existence of another entity
Strong entity: its existence does not depend on the existence of another entity
Attribute: Is a property or characteristic of an entity
Attribute domain: a set of values that may assigned to an attribute
Attributes can classified as simple attribute or composite attribute; single valued or multi-
valued; or derived attribute
Simple attribute: composed of a single component with independent existence e.g. sex, salary
etc
Composite attribute: composed of multiple components each with an independent
existence e.g. address (street, city, area, post-code)
Single-valued attribute: it holds a single value for a single entity e.g. customer No
Multi-valued attribute: it holds multiple values for a single entity e.g. address (Tel-no:)
Derived attribute: it represent a value that is derivable from the value of related attribute or set
of attributes, not necessary in the same entity, e.g. age derived from the date of birth (DOB)
attribute.
Keys: This is a data item that allows us to uniquely identify individual occurrences of an entity
type.
Candidate key: Is one or more attributes, whose value(s) uniquely identify each entity;
Primary key: The candidate key selected to be the primary key; examples ……
Composite key: A candidate key that consists of two or more attributes e.g. an advert
(Newspaper_name, Newspaper_date, Property _no)
Relationship: Is a link or association between entities.
Degree of relationship: The number of participating entities in a relationship.
● LECTURE SEVEN
7.0 NORMALIZATION
7.1 INTRODUCTION
The lecture discusses the meaning of the term normalization and its importance in database
design. Different types of normalization are discussed including: 1 , 2 and 3 normal forms.
st nd rd
0. Define the term normalization and explain its importance in database design.
1. Explain the meaning of 1st, 2 and 3 normal forms.
nd rd
2. Explain with a relevant example how 1 , 2 and 3 normal forms are carried out.
st nd rd
7.3 NORMALIZATION
● A technique for producing a set of relations with desirable properties, given the data
requirement of an enterprise
● Relations can fall into one/more categories (or classes) called Normal Forms
● Normal Forms: A class of relations free from a certain set of anomalies
● Normal forms are given names such as:
NOTE:
This forms are accumulative i.e. a relation in third form is also in 2NF and 1NF. Four our
discussions will only consider 1st, 2 and 3 normal forms.
nd rd
Advantages of Normalization
● It’s a formal technique with each stage of normalization process eliminating a particular
type of undesirable dependency.
● It highlights constraints and dependencies in the data and hence aids in understanding the
nature of the data.
● The 3NF produces well-designed databases, which provide a high degree of independence.
● A table (relation) is said to be in 1NF if and only it contains no repeating groups i.e. it has
no repeated values for a particular attributes with a single record or if it meets the definition
of a relation:
● A table (relation) is in 2NF if and only it is in 1NF and every non-key attribute is fully
dependent on the key attribute.
● Relations that have single attributes for a key are automatically in 2NF i.e. fully functional
dependency e.g. A B removal of A means the dependency not sustained any more.
If A B and B C then A C, then the three attributes with the above dependencies are said to have
transitive dependency
7.6 Normalization Example
Consider a sales order with the following fields:
SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo,
ClerkName, ItemNo, Description, Qty, UnitPrice
Think of this as the baseline -- one large table
Normalization: First Normal Form
● Repetition of Data -- Description would appear every time we had an order for the item
● Delete Anomalies -- All information about inventory items is stored in the Sales Order
Detail table. Delete a sales order, delete the item.
● Insert Anomalies -- To insert an inventory item, must insert sales order.
● Update Anomalies -- To change the description, must change it on every SO.
0. Identify the Data Definition Language (DDL) commands used to create a database.
1. Identify the Data Manipulation Language (DML) commands used to manipulate
database.
2. Apply the use of DDL and DML commands to create and manipulate a database.
● Column name
● Data Type
● Column Size etc DDL Commands
● CREATE - create table/view/index
● ALTER - modify table
● DROP - remove table/view/index
● adding
● inserting
● deleting
● sorting
● displaying etc.
DML Commands
SELECT: Queries data in the database
INSERT: Inserts data into a table
UPDATE: Changes the data in the database
DELETE: Removes data from a table
Using DDL
When you are manipulating the structure of a database, there are three primary objects that you
will work with: tables, indexes, and relationships.
Tables are the database structure that contains the physical data, and they are organized by their
columns (or fields) and rows (or records).
Indexes are the database objects that define how the data in the tables is arranged and sorted in
memory.
Relationships define how one or more tables relate to one or more other tables.
The CREATE Command
CREATE TABLE tblCustomers (CustomerID INTEGER, [Last Name] TEXT(50), [First Name]
TEXT(50), Phone TEXT(10),
Email TEXT(50))
The CREATE Command
If a field name includes a space or some other non-alphanumeric character, you must enclose that
field name within square brackets ([ ]).
If you do not declare a length for text fields, they will default to 255 characters. For consistency
and code readability, you should always define your field lengths.
For more information about the types of data that can be used in field definitions see separate
notes
You can declare a field to be NOT NULL, which means that null values cannot be inserted into
that particular field; a value is always required. A null value should not be confused with an
empty string or a value of 0; it is simply the database representation of an unknown value.
Defining Relationships
When defining the relationships between tables, you must make the CONSTRAINT declarations
at the field level. This means that the constraints are defined within a CREATE TABLE
statement. To apply the constraints, use the CONSTRAINT keyword after a field declaration,
name the constraint, name the table that it references, and name the field or fields within that
table that will make up the matching foreign key.
8.6 Data Manipulation
The most basic and most often used SQL statement is the SELECT statement. SELECT
statements are the workhorses of all SQL statements, and they are commonly referred to as select
queries. You use the SELECT statement to retrieve data from the database tables, and the results
are usually returned in a set of records (or rows) made up of any number of fields (or columns).
You must designate which table or tables to select from with the FROM clause.
Retrieving Records
To find the average value for a column or expression of numeric data, use the Avg function:
Grouping Records
Sometimes there are records in a table that are logically related, as in the case of the invoices
table. Since one customer can have many invoices, it could be useful to treat all the invoices for
one customer as a group.
The key to grouping records is that one or more fields in each record must contain the same
value for every record in the group.
Altering Tables
With the ALTER TABLE statement, you can add, remove, or change a column (or field), and
you can add or remove a constraint.
To add a field with the ALTER TABLE statement, use the ADD COLUMN clause with the
name of the field, its data type, and the size of the data type, if it is required.
To change the data type or size of a field, use the ALTER COLUMN clause with the name of the
field, the desired data type, and the desired size of the data type, if it is required.
If you want to change the name of a field, you will have to remove the field and then recreate it.
To remove a field, use the DROP COLUMN clause with the field name only.
● The CURRENCY data type is used to store numeric data that contains up to 15 digits on the
left side of the decimal point, and up to 4 digits on the right. It uses 8 bytes of memory for
storage, and its only synonym is MONEY
● The BOOLEAN data types are logical types that result in either True or False values
● The COUNTER data type is used to store long integer values that automatically increment
whenever a new record is inserted into a table
Self Join
It is also possible to join a table to itself by using an alias for the second table name in the FROM
clause. Let's suppose that we want to find all customer records that have duplicate last names.
We do this by creating the alias "A" for the second table and checking for first names that are
different.
SELECT tblCustomers.[Last Name], tblCustomers.[First Name]
FROM tblCustomers INNER JOIN tblCustomers AS A ON tblCustomers.[Last Name]=A.[Last
Name]
WHERE tblCustomers.[FirstName]<>A.[FirstName]
ORDER BY tblCustomers.[Last Name]
Outer Join
The OUTER JOIN is used to retrieve records from multiple tables while preserving records from
one of the tables, even if there is no matching record in the other table.
The LEFT OUTER JOIN will select all rows in the right table that match the relational
comparison criteria, and it will also select all rows from the left table, even if no match exists in
the right table.
The RIGHT OUTER JOIN is simply the reverse of the LEFT OUTER JOIN
Cartesian Join
A Cartesian Join is defined as "all possible combinations of all rows in all tables."
SELECT * FROM tblCustomers, tblInvoices
This is discouraged, especially with tables that contain hundreds or thousands of rows
Union Operator
Involve combining data from multiple sources of data into one result set (technically not a join)
used to splice together data from tables, SELECT statements, or queries, while leaving out any
duplicate rows. Both data sources must have the same number of fields, but the fields do not
have to be the same data type
SELECT [Last Name], [First Name], Email
FROM tblCustomers
UNION
SELECT [Last Name], [First Name], Email
FROM tblEmployees
Views
An SQL view is a database object that allows you to organize and look at data from one or more
tables, and can be referenced as if it were a single, virtual table:
Procedures
A procedure is a database object that you can use to execute an SQL statement that is based on
values that are passed to it. These values are often called parameters. You can think of a
procedure as an SQL-based function. The procedure allows you to pass in parameters that are
then used by the SQL statement, usually as part of a WHERE clause. The benefit is that you can
write very generic procedures that can be used in a variety of ways, and they can be called or
executed from many different places in your programming code.
Syntax:
CREATE PROCEDURE ProcedureName
(Parameter1 datatype, Parameter2 datatype) AS SQLStatement
Example:
CREATE PROCEDURE DeleteInvoices
(InvDate DATETIME) AS
DELETE FROM tblInvoices
WHERE tblInvoices.InvoiceDate < InvDate
Data Control
The GRANT and REVOKE statements allow a user to control access to objects in their schema.
The Grant command grants authorization for a subject (another user or group) to perform some
action (SELECT, INSERT, UPDATE, DELETE, ALTER, INDEX) on an object.
COMMIT - Make all recent changes to the database permanent. Changes that have occurred
since the last commit are made permanent. A commit can be done explicitly using the following
syntax:
COMMIT;
A commit is also done implicitly when the next SQL statement is executed or the user exits
ROLLBACK - Undo all recent changes to the database. A rollback can only undo changes made
since the last commit. The syntax for the ROLLBACK command is:
ROLLBACK;
Database Object Permissions
To implement the security, you must establish permissions on database objects for each user or
group. There are two kinds of permissions that you can establish: explicit or implicit
permissions. Explicit permissions are the privileges you assign or grant to a specific user, while
implicit permissions are the privileges that you assign to a group and that are inherited by
members of that group.
The GRANT statement assigns or allows a user or group to perform some type of action with a
specified database object, and its general forms are as follows:
GRANT privilege ON TABLE tablename TO grantee
Or
GRANT privilege ON DATABASE TO grantee
Database Object Permissions
The REVOKE statement removes or disallows a user or group from performing some type of
action, and its general forms are as follows:
REVOKE privilege ON TABLE tablename FROM grantee
Or
REVOKE privilege ON DATABASE TO grantee
Privileges
SELECT Tables - Allows a user to read the data and read the design of a specified table
DELETE Tables- Allows a user to delete data from a specified table
INSERT Tables - Allows a user to insert data into a specified table UPDATE Tables - Allows a
user to update data in a specified table DROP Tables - Allows a user to remove a specified table
SELECTSECURITY Tables - Allows a user to view the permissions for a specified table
UPDATESECURITY Tables, - Allows a user to change the permissions for a specified table
UPDATEIDENTITY Tables Allows a user to change the values in auto-increment columns
CREATE Tables - Allows a user to create a new table.
SELECTSCHEMA Tables -Allows a user to view the design of a specified table SCHEMA
Tables- Allows a user to modify the design of a specified table UPDATEOWNER Tables -
Allows a user to change the owner of a specified table.
ALL PRIVILEGES:- All Allows a user all permissions, including administrative, on a specified
table or database.
CREATEDB Database Allows a user to create a new database. EXCLUSIVECONNECT
Database Allows a user to open a database in exclusive mode.
CONNECT Database Allows a user to open a database. ADMINDB Database Allows a user to
administer a database. GRANT SELECT ON TABLE tblInvoices TO Shipping
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE tblInvoices TO Billing
Example assigning read-only permissions to the Shipping group, and all data manipulation
permissions to the Billing group:
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE tblInvoices TO Billing
● LECTURE NINE
0. Explain how file server systems are used to manage the database
1. Explain how the client server systems are used to manage the database
2. Explain how the distributed database systems work.
3. Explain the advantages and disadvantages of file server and client server and
Distributed database systems
9.3 File-Server
● In a file-server environment the processing is distributed about the network typically a local
area network (LAN)
● The file-server holds the files required by the applications and the DBMS run on each
workstation, requesting files from the file server when necessary
● File-server architecture diagram………………………………..
● The file server simply acts as a shared hard disk drive
● The DBMS in each workstation sends request to the file-server for all data that the DBMS
requires that is stored on disk
Disadvantageous
There is large amount of network traffic.
A fully copy of the DBMS is required on each workstation.
Concurrency, recovery and integrity control are made complex because there can be multiple
DBMSs accessing the same file.
9.4 Client-server architecture
● As the name suggest, there is a client process, which requires some resource, and a server,
which provides the resource. There is no requirement that the client and server must reside
on the same machine hence, a server can be placed at one site in a LAN and the clients on
the other sites.
Advantages
It enables wider access to existing databases
Increased performance: since different CPUs can be processing applications in parallel since the
server and the client reside on different computers.
Hardware cost may be reduced since it is only the server that requires storage and processing
power sufficient to store and manage the database.
Communication cost are reduced: applications carry out part of the processing on the client and
send only requests for database access across the network, resulting to less data being sent across
the network.
Increased consistency: the server can handle integrity checks, rather than each application
program performing its own checking.
It maps onto open-systems architecture quite naturally.
9.5 Distributed databases
A logically interrelated collection of shared data (and description of this data) physically
distributed over a computer network
Distributed DBMS
● The software system that permits the management of the distributed database and makes the
distribution transparent to users.
● It consists of a single logical database that is split into a number of fragments which is
stored in one or more computers under the control of a separate DBMS with computers
connected by communicational network.
● Logical application: do not require data from other applications.
● Global application: requires data from other sites.
● LECTURE TEN
10.0 TRANSACTIONS
10.1 INTRODUCTION
The lecture introduces to the learner to concepts related to database transactions. It discusses
how transactions are handled by the DBMS and how data recovery can be achieved. Relevant
examples are given to illustrate the problems related to database transactions.
10.2 LECTURE OBJECTIVES
By the end of the lesson the leaner should be able:
0. Define the terms database transaction and state the ACID properties of a transaction.
1. Explain how transactions are managed by a DBMS
2. Explain how concurrency control is achieved in a transaction.
3. Explain with a relevant example the problems related with transactions
4. Explain how data recovery is managed in database.
10.3 Transactions
Properties of a transaction
Atomicity: The ‘all’ or ‘nothing’ property i.e. A transaction is an indivisible unit that is
either performed in its entirety or it is not performed at all.
Consistent: A transaction must transform the database from one consistent state to another.
Isolation: Transaction execute independently of one another. I.e. the partial effects
of incomplete transactions should not be visible to other transaction.
Durability (Persistence): The effects of a successfully completed (committed) transactions
are permanently recorded in the database and must not be lost because of a subsequent failure
10.4 Database architecture
Transaction Manager: Coordinates transactions on behalf of the application programs
Communicates with the scheduler to ensure transaction do not interfere with one another.
Scheduler: Responsible for implementing a particular strategy for concurrency control
(lock manager).
Recovery manager: Ensures that the database is restored to the state it was in before the start
of the transaction in case a failure occurs during the transactions.
Buffer manager: Responsible for the transfer of data between the disk storage and
main memory.
10.5 Concurrency control
This is the process of managing simultaneously operations on the database without having them
interfere with one another.
The need for concurrency control
There are many ways in which concurrently executing transactions can interfere with one answer
and so comprise the integrity and consistent of the DB, when multiple users are allowed to
access the database simultaneously.
Without concurrency control the following problems could arise:
● Lost update problem
● Uncommitted dependency problem
● Inconsistent analysis problem
T and T nearly starts the same time both read baln. x as Ksh10
1 2
T increases baln. x by Ksh 100 to Ksh 200 and stores the update
2
T decrements its copy of baln. x by Ksh 10 to Ksh 90 and stores its value in the database
1
overwriting the previous update and there-by losing the Ksh 100 previously added to the balance
The loss of T ’s update is avoided by preventing T from reading the value of baln. x until after
2 1
NB: if these transactions were executed serially one after the other with no interleaving of
operations, the final balance would be Ksh 190 no matter which transaction is performed first.
The uncommitted dependency problem
Occurs when one transaction is allowed to see the intermediate results of another transaction
before it has committed.
From the previous example if transaction T updates balnx to Ksh200, but it aborts the transaction
4
so that the balnx should be restored to its original value of Ksh100, but transaction T has read
3
balnx as Ksh200 and is using this value as the basis of the Ksh10 reduction, giving a new
increment balance of Ksh190, instead of Ksh90
The problem is avoided by preventing T from reading balnx until after the decision has been
3
19
REFERENCES
1) T. Connolly, C. Begg, A. Strachan: Database Systems: A Practical Approach
to Design,Implementation, and Management (2002).3 Edition , Addison-Wesley.
rd
Wesley.
4) David M. kroenke, Database Processing, Fundamentals, design and Implementation (2002),
8 ed., Prentice Hall.
th