COME 301 DATABASE
MANAGEMENT
Lecture 4
RELATIONAL MODEL
• Relational model is a simple model in which the database is represented as
a collection of “relations” where each relation is represented by a two-
dimensional table
• Relational Model represents how data is stored in Relational Databases. A
relational database stores data in the form of relations (tables).
Properties of Relational Model
• It is column homogeneous. In other words, in any given column of a table,
all items are of the same kind.
• Each item is a simple number or a character string. That is a table must be
in first normal form.
• All rows of a table are distinct.
• The ordering of rows within a table is immaterial.
• The column of a table are assigned distinct names and the ordering of these
columns is immaterial
Domain, attributes, tuples, and relational
• Attributes: The name of each column in a table is used to interpret its
meaning and is called an attribute. Each table is called a relation.
• Tuple: Each row in a table represents a record and is called a tuple. A
table containing ‘n’ attributes in a record is called n-tuple.
• Domain: A domain is a set of values that can be given to attributes.
So every attribute in a table has a specific domain. Values to these
attributes can not be assigned outside their domains.
• Relation: A relation consists of
• Relational schema
• Relation instance
Important terminologies
• Relational Schema: A relational schema specifies the relation’s name, its
attributes, and the domain of each attribute. If R is the name of a relation
and A1, A2,…An is a list of attributes representing R then R(A1, A2,…, An) is
called a Relational Schema. Each attribute in this relational schema takes a
value from some specific domain called domain(Ai).
• Example:
• PERSON (PERSON_ID:INTEGER, NAME:STRING, AGE:INTEGER, ADDRESS:STRING)
• Total number of attributes in a relation denotes the degree of a
relationship since the PERSON relation scheme contains four attributes, so
this relation is of degree 4.
• Relation Instance: A relational instance denoted as r is a collection of
tuples for a given relational schema at a specific point in time.
Important terminologies
• Degree: The number of attributes in the relation is known as degree of the relation.
The STUDENT relation defined below has degree 5.
• Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation
defined below has cardinality 4.
• Column: Column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from relation STUDENT.
• NULL Values: The value which is not known or unavailable is called NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
Constraints
• RELATIONAL CONSTRAINTS: There are three types of constraints on
relational database that include:
• Domain Constraints
• Key Constraints
• Integrity Constraints
Constraints
• Domain constraints: It specifies that each attribute in a relation an atomic
value from the corresponding domains.
• The data types associated with commercial RDBMS domains include:
• Standard numeric data types for integer
• Real numbers
• Characters
• Fixed length strings and variable length strings
• Thus, domain constraints specify the condition that we put on each instance
of the relation. So the values that appear in each column must be drawn
from the domain associated with that column
Constraints
• Key Constraints: This constraints states that the key attribute value in each
tuple msut be unique .i.e, no two tuples contain the same value for the key
attribute.(null values can be allowed) Emp(encode, name, address). Here
empcode can be unique
• Integrity Constraints: There are two types of integrity constraints:
• Entity Integrity Constraints
• Referential Integrity constraints
Constraints
Key Constraints:
1. Entity Integrity Constraints: It states that no primary key value can be null
and unique. This is because the primary key is used to identify individual tuple
in the relation. So we will not be able to identify the records uniquely
containing null values for the primary key attributes. This constraint is specified
on one individual relation.
Constraints
Key Constraints:
2. Referential Integrity Constraints: It states that the tuple in one relation
that refers to another relation must refer to an existing tuple in that
relation. This constraints is specified on two relations. If a column is
declared as foreign key that must be primary key of another table.
• Department (deptcode, dname) Here the deptcode is the primary key.
• Emp (empcode, name, city, deptcode). Here the deptcode is foreign key
Anomalies
• An anomaly is an irregularity or something which deviates from the expected or
normal state. When designing databases, we identify three types of
anomalies: Insert, Update and Delete.
Anomalies
• Insertion Anomaly in Referencing Relation:
We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is
not present in referenced attribute value. e.g.; Insertion of a student with
BRANCH_CODE ‘ME’ in STUDENT relation will result in error because ‘ME’ is not
present in BRANCH_CODE of BRANCH.
Anomalies
• Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED RELATION if value of
REFERENCED ATTRIBUTE is used in value of REFERENCING ATTRIBUTE. e.g; if we
try to delete tuple from BRANCH having BRANCH_CODE ‘CS’, it will result in error
because ‘CS’ is referenced by BRANCH_CODE of STUDENT, but if we try to delete
the row from BRANCH with BRANCH_CODE CV, it will be deleted as the value is
not been used by referencing relation. It can be handled by following method:
Anomalies
• ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION if
value used by REFERENCING ATTRIBUTE is deleted from REFERENCED
RELATION. e.g;, if we delete a row from BRANCH with BRANCH_CODE ‘CS’, the
rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this
case) will be deleted.
Anomalies
• ON UPDATE CASCADE: It will update the REFERENCING ATTRIBUTE in
REFERENCING RELATION if attribute value used by REFERENCING ATTRIBUTE is
updated in REFERENCED RELATION. e.g;, if we update a row from BRANCH
with BRANCH_CODE ‘CS’ to ‘CSE’, the rows in STUDENT relation with
BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be updated with
BRANCH_CODE ‘CSE’.
Anomalies
• SUPER KEYS:
Any set of attributes that allows us to identify unique rows (tuples) in a given
relation are known as super keys. Out of these super keys we can always choose
a proper subset among these which can be used as a primary key. Such keys are
known as Candidate keys. If there is a combination of two or more attributes
which is being used as the primary key then we call it as a Composite key.
FILE ORGANISATION
• FILE ORGANISATION AND ITS TYPES: A file organization is a technique to
organize data in the secondary memory. File organization is a way of arranging
the records in a file when the file is stored on the disk. Data files are organized
so as to facilitate access to records and to ensure their efficient storage. A
DBMS supports several file organization techniques.
FILE ORGANISATION
1. Heap files (unordered file) Basically these files are unordered file. It is
the simplest and most basic type. These files consist of randomly ordered
records. The records will have no particular order. The operations we can
perform on the records are insert , retrieve and delete. The features of the
heap file organization are:
• New records can be inserted in any empty space that can accommodate
them.
• When old records are deleted, the occupied space becomes empty and
available for any new insertion.
• If updated records grow, they may need to be relocated to a new empty
space. This needs to keep a list of empty space.
FILE ORGANISATION
• Advantage of heap files:
• 1. This is a simple file organization method
• 2. Insertion is somehow efficient
• 3. Good for bulk-lading data into a table.
• 4. Best if file scans are common or insertions are frequent
• Disadvantages of heap files:
• 1. Retrieval requires a linear search and is inefficient
• 2. Deletion can result in unused space/need for reorganization
FILE ORGANISATION
2. Sequential file organization: The most basic way to organize the collection of
records in a file is to use sequential organization. Records of the file are stored
in sequence by the primary key field values/ They are accessible only in the
order stored i.e, in the primary key order.
• This kind is of file organization works well for tasks which need to access
nearly every record in a file. Eg. Payroll.. In a sequentially organized file
records are written consecutively when the file is created and must be
accessed consecutively when the file later used for input.
FILE ORGANISATION
• Advantages of sequential file organization:
• It is fast and efficient when dealing with large volumes of data that need to be
processed periodically(batch system)
• Disadvantages of sequential file organization:
• Requires that all new transactions be sorted into the proper sequence for sequential
access processing
• Locating, storing, modifying, deleting or adding records in the file require rearranging
the file
• This method is too slow to handle application requiring immediate updating or
responses.
FILE ORGANISATION
3. Indexed sequential file organization: It organized the file like a large
dictionary, i.e, records are stored in order of the key, but an index is kept which
also permits a type of direct access.
• The records are stored sequentially by primary key values and there is an
index built over the primary key field.
• An index is a set of index value, address pairs. Indexing associates a set of
objects to a set of orderable quantities, that are usually smaller in number or
their properties. Thus an index is a mechanism for faster search.
• A sequential file that is indexed on its primary key is called an index sequential
file.
• The index allows for random access to records, while the sequential storage of
the records of the file provides easy access to the sequential records.
FILE ORGANISATION
4. Hashed file organization: Hashing is the most common form of purely
random access to a file or database.
• It is also used to access columns that do not have an index as an optimization
technique.
• Hash functions calculate the address of the page in which the record is to be
stored based on one or more fields in the record.
• The records in a hash file appear randomly distributed across the available
space. It requires some hashing algorithm and the technique. Hashing
algorithm converts a primary key value into a record address.
FILE ORGANISATION
• Advantage of hashed file organization:
• Insertion or search on hash key is fast.
• 2. Best if equality search is needed on hash key
• Disadvantage of hashed file organization:
• It is a complex file organization method
• search is slow
• It suffers from disk space overhead