Unit 1 Notes
Unit 1 Notes
Introduction, History and Purpose of Database System – Introduction to Big data– Database Schema
and Instances- Views of data – Data Models – Database System Architecture – Introduction to
relational databases Relational Model – Keys – Relational Algebra.
COURSE OBJECTIVES:
● To understand the fundamentals of data models and to represent a database system using
ER diagrams and relational database design..
COURSE OUTCOMES:
Data
• Data is raw fact or figures or entity.
• When activities in the organization takes place, the effect of these activities need to be
recorded which is known as Data.
Information
• Processed data is called information
• The purpose of data processing is to generate the information required for carrying out the
business activities.
• Data capture: Which is the task associated with gathering the data as and when they
originate.
• Data classification: Captured data has to be classified based on the nature and intended usage.
• Data retrieval: Data will be required frequently for further processing, Hence it is very
important to create some indexes so that data can be retrieved easily.
• Data maintenance: Maintenance is the task concerned with keeping the data upto-date.
• Data Verification: Before storing the data it must be verified for any error.
• Data Coding: Data will be coded for easy reference. • Data Editing: Editing means re-
arranging the data or modifying the data for presentation.
• Data transcription: This is the activity where the data is converted from one form into
another.
• Data transmission: This is a function where data is forwarded to the place where it would be
used further.
Database
Characteristics of DBMS
• To incorporate the requirements of the organization, system should be designed for easy
maintenance.
• Information systems should allow interactive access to data to obtain new information without
writing fresh programs.
• System should be designed to co-relate different data to meet new requirements.
• An independent central repository, which gives information and meaning of available data is
required.
• Integrated database will help in understanding the inter-relationships between data stored in
different applications.
• The stored data should be made available for access by different users simultaneously.
• Automatic recovery feature has to be provided to overcome the problems with processing
system failure.
DBMS Utilities
• A data loading utility: Which allows easy loading of data from the external format without
writing programs.
• A backup utility: Which allows to make copies of the database periodically to help in cases
of crashes and disasters.
• Recovery utility: Which allows to reconstruct the correct state of the database from the
backup and history of transactions.
• Monitoring tools: Which monitors the performance so that internal schema can be changed
and database access can be optimized.
• File organization: Which allows restructuring the data from one type to another?
● Banking
● Airlines
● Universities
● Manufacturing and selling
● Human resources
System programmers wrote these application programs to meet the needs of the
university. New application programs are added to the system as the need arises. For example,
suppose that a university decides to create a new major (say, computer science).
As a result, the university creates a new department and creates new permanent files (or
adds information to existing files) to record information about all the instructors in the
department, students in that major, course offerings, degree requirements, etc.
The university may have to write new application programs to deal with rules specific to
the new major. New application programs may also have to be written to handle new rules in
the university. Thus, as time goes by, the system acquires more files and more application
programs.
This typical file-processing system is supported by a conventional operating system. The
system stores permanent records in various files, and it needs different application programs to
extract records from, and add records to, the appropriate files. Before database management
systems (DBMSs) were introduced, organizations usually stored information in such systems.
Data isolation:
Because data are scattered in various files, and files may be in different formats, writing
new application programs to retrieve the appropriate data is difficult.
Integrity problems:
The data values stored in the database must satisfy certain types of consistency
constraints. Suppose the university maintains an account for each department, and records the
balance amount in each account.
Suppose also that the university requires that the account balance of a department may
never fall below zero. Developers enforce these constraints in the system by adding appropriate
code in the various application programs. However, when new constraints are added, it is
difficult to change the programs to enforce them. The problem is compounded when constraints
involve several data items from different files.
Atomicity problems:
A computer system, like any other device, is subject to failure. In many applications, it is
crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to
the failure.
Consider a program to transfer $500 from the account balance of department A to the
account balance of department B. If a system failure occurs during the execution of the
program, it is possible that the $500 was removed from the balance of department A but was not
credited to the balance of department B, resulting in an inconsistent database state. Clearly, it is
essential to database consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is
difficult to ensure atomicity in a conventional file-processing system.
Concurrent-access anomalies:
For the sake of overall performance of the system and faster response, many systems
allow multiple users to update the data simultaneously. Indeed, today, the largest Internet
retailers may have millions of accesses per day to their data by shoppers. In such an
environment, interaction of concurrent updates is possible and may result in inconsistent data.
Consider department A, with an account balance of $10,000. If two department clerks debit the
account balance (by say $500 and $100, respectively) of department A at almost exactly the
same time, the result of the concurrent executions may leave the budget in an incorrect (or
inconsistent) state.
Suppose that the programs executing on behalf of each withdrawal read the old balance,
reduce that value by the amount being withdrawn, and write the result back. If the two programs
run concurrently, they may both read the value $10,000, and write back $9500 and $9900,
respectively. Depending on which one writes the value last, the account balance of department
A may contain either $9500 or $9900, rather than the correct value of $9400. To guard against
this possibility, the system must maintain some form of supervision.
But supervision is difficult to provide because data may be accessed by many different
application programs that have not been coordinated previously.
As another example:
suppose a registration program maintains a count of students registered for a course, in
order to enforce limits on the number of students registered. When a student registers, the
program reads the current count for the courses, verifies that the count is not already at the limit,
adds one to the count, and stores the count back in the database. Suppose two students register
concurrently, with the count at (say) 39. The two program executions may both read the value
39, and both would then write back 40, leading to an incorrect increase of only 1, even though
two students successfully registered for the course and the count should be 41.
Furthermore, suppose the course registration limit was 40; in the above case both students
would be able to register, leading to a violation of the limit of 40 students.
Security problems:
Not every user of the database system should be able to access all the data. For example,
in a university, payroll personnel need to see only that part of the database that has financial
information. They do not need access to information about academic records. But, since
application programs are added to the file-processing system in an ad hoc manner, enforcing
such security constraints is difficult.
These difficulties, among others, prompted the development of database systems. In what
follows, we shall see the concepts and algorithms that enable database systems to solve the
problems with file-processing systems.
Advantages of DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing
same data multiple times). In a database system, by having a centralized database and
centralized control of data by the DBA the unnecessary duplication of data is avoided. It also
eliminates the extra time for processing the large volume of data. It results in saving the storage
space.
INTRODUCTION TO BIG DATA
Big Data is also data but with a huge size. Big Data is a term used to describe a collection of
data that is huge in volume and yet growing exponentially with time. In short such data is so
large and complex that none of the traditional data management tools are able to store it or
process it efficiently.
➤ ―Extremely large data sets that may be analyzed computationally to reveal patterns, trends
and association, especially relating to human behavior and interaction are known as Big Data.‖
Social Media:
The statistic shows that 500+terabytes of new data get ingested into the databases of social
media site Facebook, every day. This data is mainly generated in terms of photo and video
uploads, message exchanges, putting comments etc.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches up to many Petabytes.
Unlike structured data, which is stored in data warehouses, unstructured data is stored in data
lakes. Data lakes preserve the raw format of data as well as all of its information. Data lakes
make data more malleable, unlike data warehouses where data is limited to its defined schema.
DATABASE SCHEMA
Schema is the overall description of the database.
The basic structure of how the data will be stored in the database is called schema.
A database schema is the skeleton structure of the database. It represents the logical view of the
entire database.
A schema contains schema objects like table, foreign key, primary key, views, columns, data
types, stored procedure, etc.
A database schema can be represented by using the visual diagram. That diagram shows the
database objects and relationship with each other.
TYPES OF DB SCHEMA
1.Physical schema
2.Logical schema
3.View schema
Physical schema:
Physical schema is a term used in data management to describe how data is to be represented
and stored (files, indices, et al.) in secondary storage using a particular database management
system (DBMS) (e.g., Oracle RDBMS, Sybase SQL Server, etc.).
Logical schema:
A logical schema can be defined as the design of the database at its logical level. In this level,
the programmers, as well as the database administrator (DBA), work. At this level, data can be
described as certain types of data records that can be stored in the form of data structures.
View schema:
View schema can be defined as the design of the database at the view level, which generally
describes end-user interaction with database systems.
Example:
Let suppose you are storing students' information on a student's table. At the physical level,
these records are described as chunks of storage (in bytes, gigabytes, terabytes, or higher) in
memory, and these elements often remain hidden from the programmers.
Then comes the logical level; here at a logical level, these records can be illustrated as
fields and attributes along with their data type(s); their relationship with each other can be
logically implemented. Programmers generally work at this level because they are aware of
such things about database systems.
At view level, a user can able to interact with the system, with the help of GUI, and enter
the details on the screen. The users are not aware of the fact of how the data is stored and
what data is stored; such features are hidden from them.
DATABASE INSTANCES
The data which is stored in the database at a particular moment of time is called an instance of
the database.
One can easily change these instances using certainoperations, such as deletion and addition of
data and information.
search queries will not make any changes in any instances.
An instance is also called a current state or database state. The database schema that defines
variables in tables which belong to a specific database, the records of these variables at a
particular moment are called the instance of the database.
Example –
Let’s say a table teacher in our database whose name is School, suppose the table has 50 records
so the instance of the database has 50 records for now and tomorrow we are going to add
another fifty records so tomorrow the instance have total 100 records. This is called an
instance.
VIEWS OF DATA
Views of data refer to the different ways to show the database management system to the user
hiding its complexity.
Data Abstraction
Data Abstraction refers to the process of hiding irrelevant details from the user.
Example: If we want to access any mail from our Gmail then we don't know where that data is
physically stored i.e is the data present in India or USA or what data model has been used to
store that data? We are not concerned about these things. We are only concerned with our email.
So, information like these i.e. location of data and data models are irrelevant to us and in data
abstraction, we do this only. Apart from the location of data and data models, there are other
factors that we don't care of. We hide the unnecessary data from the user and this process of
hiding unwanted data is called Data Abstraction.
Data Independence
Data independence defines the extent to which the data schema can be changed at one
level without modifying the data schema at the next level.
Logical data independence describes the degree up to which the logical or conceptual
schema can be changed without modifying the external schema. Now, a question arises what is
the need to change the data schema at a logical or conceptual level?
Well, the changes to data schema at the logical level are made either to enlarge or reduce the
database by adding or deleting more entities, entity sets, or changing the constraints on data.
Mappings
Process of transforming request and results between three level it's called mapping.
There are the two types of mappings:
1.Conceptual/Internal Mapping
2.External/Conceptual Mapping
1. Conceptual/Internal Mapping:
The conceptual/internal mapping defines the correspondence between the conceptual
view and the store database.
It specifies how conceptual record and fields are represented at the internal level.
It relates conceptual schema with internal schema.
If structure of the store database is changed.
If changed is made to the storage structure definition-then the conceptual/internal mapping must
be changed accordingly, so that the conceptual schema can remain invariant.
There could be one mapping between conceptual and internal levels.
2. External/Conceptual Mapping:
The external/conceptual mapping defines the correspondence between a particular
external view and conceptual view.
It relates each external schema with conceptual schema.
The differences that can exist between these two levels are analogous to those that can exist
between the conceptual view and the stored database.
Example: fields can have different data types; fields and record name can be changed; several
conceptual fields can be combined into a single external field.
Any number of external views can exist at the same time; any number of users can share a given
external view: different external views can overlap.
There could be several mapping between external and conceptual levels.
DATA MODELS
Data Models are used to show how data is stored, connected, accessed and updated in the
database management system.
1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
Hierarchical Model was the first DBMS model. This model organises the data in the
hierarchical tree structure. The hierarchy starts from the root which has root data and then it
expands in the form of a tree adding child node to the parent node. This model easily represents
some of the real-world relationships like food recipes, sitemap of a website etc.
Example: We can represent the relationship between the shoes present on a shopping website
in the following way:
1. One-to-many relationship: The data here is organised in a tree-like structure where the
one-to-many relationship is between the datatypes. Also, there can be only one path from
parent to any node. Example: In the above example, if we want to go to the
node sneakers we only have one path to reach there i.e through men's shoes node.
2. Parent-Child Relationship: Each child node has a parent node but a parent node can
have more than one child node. Multiple parents are not allowed.
3. Deletion Problem: If a parent node is deleted then the child node is automatically
deleted.
4. Pointers: Pointers are used to link the parent node with the child node and are used to
navigate between the stored data. Example: In the above example the 'shoes' node points
to the two other nodes 'women shoes' node and 'men's shoes' node.
Advantages of Hierarchical Model
● It is very simple and fast to traverse through a tree-like structure.
● Any change in the parent node is automatically reflected in the child node so, the
integrity of data is maintained.
Disadvantages of Hierarchical Model
● As it does not support more than one parent of the child node so if we have some
complex relationship where a child node needs to have two parent node then that can't be
represented using this model.
Network Model
This model is an extension of the hierarchical model. It was the most popular model
before the relational model. This model is the same as the hierarchical model, the only
difference is that a record can have more than one parent. It replaces the hierarchical tree with a
graph. Example: In the example below we can see that node student has two parents i.e. CSE
Department and Library. This was earlier not possible in the hierarchical model.
1. Ability to Merge more Relationships: In this model, as there are more relationships so
data is more related. This model has the ability to manage one-to-one relationships as
well as many-to-many relationships.
2. Many paths: As there are more relationships so there can be more than one path to the
same record. This makes data access
● The data can be accessed faster as compared to the hierarchical model. This is because
the data is more related in the network model and there can be more than one path to
reach a particular node. So the data can be accessed in many ways.
● As more and more relationships need to be handled the system might get complex. So, a
user must be having detailed knowledge of the model to work with the model.
Entity-Relationship Model
● Relationship: Relationship tells how two attributes are related. Example: Teacher works
for a department.
Example:
In the above diagram, the entities are Teacher and Department. The attributes of Teacher entity
are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The attributes of
entity Department entity are Dept_id, Dept_name. The two entities are connected using the
relationship. Here, each teacher works for a department.
Features of ER Model
● Database Design: This model helps the database designers to build the database and is
widely used in database design.
Advantages of ER Model
● Simple: Conceptually ER Model is very easy to build. If we know the relationship
between the attributes and the entities we can easily build the ER Diagram for the model.
● Effective Communication Tool: This model is used widely by the database designers
for communicating their ideas.
● Easy Conversion to any Model: This model maps well to the relational model and can
be easily converted relational model by converting the ER model to the table. This model
can also be converted to any other model like network model, hierarchical model etc.
Disadvatages of ER Model
Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in the
form of a two-dimensional table. All the information is stored in the form of row and columns.
The basic structure of a relational model is tables. So, the tables are also called relations in the
relational model. Example: In this example, we have an Employee table.
● Tuples: Each row in the table is called tuple. A row contains all the information about
any instance of the object. In the above example, each row has all the information about
any specific individual like the first row has information about John.
● Attribute or field: Attributes are the property which defines the table or relation. The
values of the attribute should be from the same domain. In the above example, we have
different attributes of the employee like Salary, Mobile_no, etc.
Advantages of Relational Model
● Simple: This model is more simple as compared to the network and hierarchical model.
● Scalable: This model can be easily scaled as we can add as many rows and columns we
want.
● Hardware Overheads: For hiding the complexities and making things easier for the
user this model requires more powerful hardware computers and data storage devices.
● Bad Design: As the relational model is very easy to design and use. So the users don't
need to know how the data is stored in order to access it. This ease of design can lead to
the development of a poor database which would slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the relational model.
These problems can be avoided with the help of proper implementation and organisation.
The real-world problems are more closely represented through the object-oriented data model.
In this model, both the data and relationship are present in a single structure known as an object.
We can store audio, video, images, etc in the database which was not possible in the relational
model(although you can store audio and video in relational database, it is adviced not to store in
the relational database). In this model, two are more objects are connected through links. We
use this link to relate one object to other objects. This can be understood by the example given
below.
In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name, Job_title of
the employee and the methods which will be performed by that object are stored as a single
object. The two objects are connected through a common attribute i.e the Department_id and the
communication between these two will be done with the help of this common id.
Object-Relational Model
As the name suggests it is a combination of both the relational model and the object-
oriented model. This model was built to fill the gap between object-oriented model and the
relational model. We can have many advanced features like we can make complex data types
according to our requirements using the existing data types. The problem with this model is that
this can get complex and difficult to handle. So, proper understanding of this model is required.
It is a simple model in which the database is represented as a table consisting of rows and
columns. To access any data, the computer has to read the entire table. This makes the modes
slow and inefficient.
Semi-Structured Model
Example: Web-Based data sources which we can't differentiate between the schema and
data of the website. In this model, some entities may have missing attributes while others may
have an extra attribute. This model gives flexibility in storing the data. It also gives flexibility to
the attributes. Example: If we are storing any value in any attribute then that value can be either
atomic value or a collection of values.
Associative Data Model
Associative Data Model is a model in which the data is divided into two parts. Everything
which has independent existence is called as an entity and the relationship among these entities
are called association. The data divided into two parts are called items and links.
● Item: Items contain the name and the identifier(some numeric value).
1. The world cup is being hosted by London. The source here is 'the world cup', the verb 'is
being' and the target is 'London'.
2. ...from 30 May 2020. The source here is the previous link, the verb is 'from' and the
target is '30 May 2020'.
This is represented using the table as follows:
Context Data Model is a collection of several models. This consists of models like
network model, relational models etc. Using this model we can do various types of tasks which
are not possible using any model alone.
DATABASE ARCHITECTURE
DML query
A data manipulation language (DML) is a computer programming language used for adding
(inserting), deleting, and modifying (updating) data in a database.
DDL Interpreter
The DDL interpreter interprets DDL statements and records the definition in the data
dictionary.
DML Compiler
The DML compiler translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands.
Query evaluation engine executes low level instructions generated by the DML compiler.
Object code
Storage Manager:
The storage manager is the component of a database system that provides the interface between
the low-level data stored in the database and the application programs and queries submitted to
the system.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database. The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory. The storage manager implements several data structures as part of the
physical system implementation:
• Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a
database index provides pointers to those data items that hold a particular value
A database is a means of storing information in such a way that information can be retrieved
from it. In simplest terms, a relational database is one that presents information in tables with
rows and columns. A table is referred to as a relation in the sense that it is a collection of objects
of the same type (rows). Data in a table can be related according to common keys or concepts,
and the ability to retrieve related data from a table is the basis for the term relational database.
RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL,
and for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and
Microsoft Access.
A Relational database management system (RDBMS) is a database management system
(DBMS) that is based on the relational model as introduced by E. F. Codd.
Table/Relation
Properties of a Relation:
o Each relation has a unique name by which it is identified in the database.
o Relation does not contain duplicate tuples.
o The tuples of a relation have no specific order.
o All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one
value.
A table is the simplest example of data stored in RDBMS.
Let's see the example of the student table.
ID Name AGE COURSE
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Row or record
A row of a table is also called a record or tuple. It contains the specific information of each
entry in the table. It is a horizontal entity in the table. For example, The above table contains 5
records.
Properties of a row:
o No two tuples are identical to each other in all their entries.
o All tuples of the relation have the same format and the same number of entries.
o The order of the tuple is irrelevant. They are identified by their content, not by their
position.
1 Ajeet 24 B.Tech
Column/attribute
A column is a vertical entity in the table which contains all information associated with a
specific field in a table. For example, "name" is a column in the above table which contains all
information about a student's name.
Properties of an Attribute:
o Every attribute of a relation must have a name.
o Null values are permitted for the attributes.
o Default values can be specified for an attribute automatically inserted if no other value is
specified for an attribute.
o Attributes that uniquely identify each tuple of a relation are the primary key.
Name
Ajeet
Aryan
Mahesh
Ratan
Vimal
Data item/Cells
The smallest unit of data in the table is the individual data item. It is stored at the intersection of
tuples and attributes.
Properties of data items:
o Data items are atomic.
o The data items for an attribute should be drawn from the same domain.
In the below example, the data item in the student table consists of Ajeet, 24 and Btech, etc.
1 Ajeet 24 B.Tech
Degree:
The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Cardinality:
The total number of tuples at any one time in a relation is known as the table's cardinality. The
relation whose cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.
Domain:
The domain refers to the possible values each attribute can contain. It can be specified
using standard data types such as integers, floating numbers, etc. For example, An attribute
entitled Marital_Status may be limited to married or unmarried values.
NULL Values
The NULL value of the table specifies that the field has been left blank during record
creation. It is different from the value filled with zero or a field that contains space.
Relational Model
1. Attribute: Each column in a Table. Attributes are the properties which define a relation.
e.g., Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is stored
along with its entities. A table has two properties rows and columns. Rows represent
records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its
attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the
relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9. Relation key – Every row has one, two or multiple attributes, which is called relation
key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain
Properties of Relations
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to
be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
Example:
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and
if the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Example:
KEYS
For example, ID is used as a key in the Student table because it is unique for each student. In
the PERSON table, passport_number, license_number, SSN are keys since they are unique for
each person.
Types of keys:
Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys, as we saw in the PERSON table. The key which is most
suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another
table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's information in
the employee table. That's why we link these two tables through the primary key of one
table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute
in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate
keys. One key is chosen as the primary key from these candidate keys, and the remaining
candidate key, if it exists, is termed the alternate key. In other words, the total number of the
alternate keys is the total number of candidate keys minus the primary key. The alternate key
may or may not exist. If there is only one candidate key in a relation, it does not have an
alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other
candidate key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key.
This key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple
roles, and an employee may work on multiple projects simultaneously. So the primary key will
be composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So
these attributes act as a composite key since the primary key comprises more than one attribute.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input
and yields instances of relations as output. It uses operators to perform queries. An operator can
be either unary or binary. They accept relations as their input and yield relations as their
output. Relational algebra is performed recursively on a relation and intermediate results are
also considered relations.
The fundamental operations of relational algebra are as follows −
● Select
● Project
● Union
● Set different
● Cartesian product
● Rename
The result of set difference operation is tuples, which are present in one relation but are not in
the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
The results of relational algebra are also relations but without any name. The rename operation
allows us to rename the output relation. 'rename' operation is denoted with small Greek
letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Note – We are going to use below relation tables to show different dbms relational algebra
examples. These tables will be used for different examples shown in this post.
R-Schema(id, name)
R – Relation
Id Name
101 Raj
102 Rahul
103 Sachin
104 Anil
105 Prasad
S-Schema(id, name)
S – Relation
Id Name
101 Raj
104 Anil
106 Kapil
107 Sumit
The first additional relational algebra operations in dbms is Set intersection operation –
To make this task easier, there is an operation defined in relational algebra. This is called set
intersection operation.
Set Intersection operation selects common tuples from the two relations. It is denoted by
(∩)
For set intersection operation, the two-argument relation must be compatible relation.
So, if R and S are two relations, P = R ∩ S has tuples drawn from R and S, such that each tuple in P is in
R and S.
P=R∩S
Id Name
101 Raj
104 Anil
It means only row (101, Raj) and (104, Anil) are common in two relations (R and S).
▪ T is in πR – S (r)
▪ For every tuple ts in S, there is a tuple tr in R satisfying both of the following:
▪
1. tr[s] = ts[s]
2. tr[R-S] = t
Or,
In other words, we can say
Division operation R ÷ S can only be applied if and only if –
P=R÷S
Where,
Pis result we get after applying division operator,
R and S stands for relation (name of the tables) on which division operation is applied.
A1 B1
A1 B2
A2 B1
A3 B1
A4 B2
A5 B1
A5 B2
Now, answer below questions related to join operation –
Question A. Find P ÷ Q if Q is
B1
B2
Question B. Find P ÷ Q if Q is
B1
Question C. Find P ÷ Q if Q is
or,
R ← E.
Where,
Ris relation,
rstands for relation variable.
Eis Expression whose result we wish to assign to relation variable R.
The result of the expression to the right hand side of ← is assigned to relation variable on the left
side of ←. The
relation variable may be used in subsequent expressions.
R1 ← πname(Customer)
R2 ← πname(Employee)
R = R1 – R2
The fourth additional relational algebra operations in dbms is Natural Join operation
(4) Natural Join Operation (⋈)
It is denoted by the join symbol ⋈. The natural join operation forms a Cartesian product of its two
arguments,
performs selection forcing equality on those attributes that appear in both relation schema, and finally
removes
duplicate attributes. Natural join can be defined as:
P=R⋈S
Where,
Pis resultant relation after applying natural join operation on R and S,
R and S stands for relation (name of the table).
Employee relation
Id Name
101 Sachin
103 Rahul
104 Kapil
107 Ajay
Salary relation
Id Salary
101 65000
103 35000
104 22000
107 21910
When we perform join operation on Relation Employee and Salary we get
The outer join operation is an extension of the join operation to deal with missing information.
Suppose we have the relations with the following schemas which contain data on permanent doctors.
Doctor relation
It contains id and name of the doctor.
Doc-id Name
1 Anil
2 Ganesh
3 Sunil
4 Reena
= (Doctors ⋈ Permanent-Doc)
The result of above is shown below :
It is seen that, we have information about Doctor with docid=3 and his address, birthdate, sal. Since the
tuple describing docid=3, is absent in Permanent-doc. Similarly we have lost the name of docid=5, since
the tuple describing docid=5, is absent from Doctors relation.
The outer join operation can be used to avoid this loss of information. There are 3 forms of the
operation.
The left outer join takes all tuples in the left relation that did not match with any tuple in the right
relation pads the tuples with NULL values for all other attributes from the right relation, and adds them
to the result of natural join.
P = R ⟕S
Where,
Pis resultant relation after applying natural join operation on R and S,
R and S stands for relation (name of the table).
Result of the left outer join operation when applied on Doctors and Permanent-Doc relation –
The right outer join is symmetric with the left outer join. Tuples from the right relation that did not
match any from the left relation are padded with Nulls and added to the result of Natural join.