Rdbms Unit - II
Rdbms Unit - II
Data Models:
A Data Model in Database Management System (DBMS) is the concept of tools that are
developed to summarize the description of the database. Data Models provide us with a
transparent picture of data which helps us in creating an actual database. It shows us from the
design of the data to its proper implementation of data.
Types of Relational Models
1. Conceptual Data Model
2. Representational Data Model
3. Physical Data Model
It is basically classified into 3 types: -
4 SURESH DELHI 18
Important Terminologies
Attribute: Attributes are the properties that define an entity.
e.g., ROLL_NO, NAME, ADDRESS
Relation Schema: A relation schema defines the structure of the relation and
represents the name of the relation with its attributes. e.g., STUDENT (ROLL_NO,
NAME, ADDRESS, PHONE, and AGE) is the relation schema for STUDENT. If a
schema has more than 1 relation, it is called Relational Schema.
Tuple: Each row in the relation is known as a tuple. The above relation contains 4
tuples, one of which is shown as:
RA
1 DELHI 9455123451 18
M
NULL Values: The value which is not known or unavailable is called a NULL value.
It is represented by blank space. e.g., PHONE of STUDENT having ROLL_NO 4 is
NULL.
Relation Key: These are basically the keys that are used to identify the rows uniquely
or also help in identifying tables. These are of the following types.
Primary Key
Candidate Key
Super Key
Foreign Key
Alternate Key
Composite Key
Constraints in Relational Model
While designing the Relational Model, we define some conditions which must hold for data
present in the database are called Constraints. These constraints are checked before
performing any operation (insertion, deletion, and updation ) in the database. If there is a
violation of any of the constraints, the operation will fail.
Domain Constraints
These are attribute-level constraints. An attribute can only take values that lie inside the
domain range. e.g., If a constraint AGE>0 is applied to STUDENT relation, inserting a
negative value of AGE will result in failure.
Key Integrity
Every relation in the database should have at least one set of attributes that defines a tuple
uniquely. Those set of attributes is called keys. e.g., ROLL_NO in STUDENT is key. No two
students can have the same roll number. So, a key has two properties:
It should be unique for all tuples.
It can’t have NULL values.
Referential Integrity
When one attribute of a relation can only take values from another attribute of the same
relation or any other relation, it is called referential integrity. Let us suppose we have 2
relations.
Table Student
4 SURESH DELHI 18 IT
Table Branch
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation
which is referencing another relation is called REFERENCING RELATION (STUDENT in
this case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
Anomalies in the Relational Model
An anomaly is an irregularity or something which deviates from the expected or normal state.
When designing databases, we identify three types of anomalies: Insert, Update, and Delete.
Insertion Anomaly in Referencing Relation
We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not
present in the referenced attribute value. e.g., Insertion of a student with BRANCH_CODE
‘ME’ in STUDENT relation will result in an error because ‘ME’ is not present in
BRANCH_CODE of BRANCH.
Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED RELATION if the value of
REFERENCED ATTRIBUTE is used in the value of REFERENCING ATTRIBUTE. e.g; if
we try to delete a tuple from BRANCH having BRANCH_CODE ‘CS’, it will result in an
error because ‘CS’ is referenced by BRANCH_CODE of STUDENT, but if we try to delete
the row from BRANCH with BRANCH_CODE CV, it will be deleted as the value is not used
by referencing relation. It can be handled by the following method:
On Delete Cascade
It would delete the tuples from REFERENCING RELATION if the value used by
REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g., For, if we
delete a row from BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT relation
with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be deleted.
On Update Cascade
It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if the
attribute value used by REFERENCING ATTRIBUTE is updated in REFERENCED
RELATION. e.g; if we update a row from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’,
the rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case)
will be updated with BRANCH_CODE ‘CSE’.
Super Keys
Any set of attributes that allows us to identify unique rows (tuples) in a given relationship is
known as super keys. Out of these super keys, we can always choose a proper subset among
these that can be used as a primary key. Such keys are known as Candidate keys. If there is a
combination of two or more attributes that are being used as the primary key, then we call it a
Composite key.
Codd Rules in Relational Model
Edgar F Codd proposed the relational database model where he stated rules. Now these are
known as Codd’s Rules. For any database to be the perfect one, it has to follow the rules.
Advantages of the Relational Model
Simple model: Relational Model is simple and easy to use in comparison to other
languages.
Flexible: Relational Model is more flexible than any other relational model present.
Secure: Relational Model is more secure than any other relational model.
Data Accuracy: Data is more accurate in the relational data model.
Data Integrity: The integrity of the data is maintained in the relational model.
Operations can be Applied Easily: It is better to perform operations in the relational
model.
Disadvantages of the Relational Model
Relational Database Model is not very good for large databases.
Sometimes, it becomes difficult to find the relation between tables.
Because of the complex structure, the response time for queries is high.
Characteristics of the Relational Model
Data is represented in rows and columns called relations.
Data is stored in tables having relationships between them called the Relational
model.
The relational model supports the operations like Data definition, Data manipulation,
and Transaction management.
Each column has a distinct name, and they are representing attributes.
Each row represents a single entity.
Keys are one of the basic requirements of a relational database model. It is widely used to
identify the tuples(rows) uniquely in the table. We also use keys to set up relations amongst
various columns and tables of a relational database.
DOMAIN - TUPLE and Relation:
A domain is the set of possible values for a given attribute, and it can be considered as a
constraint on the value of the attribute, such as value of an attribute marks of a subject must
be integer value like 56 and it should be within the range of 0 to full marks.
A tuple is a complete row of the table. It is also called a record. It is data sets representing
attributes of a single item. A data model define how data are organized in database and how
the data can be accessed.
The candidate key can be simple (having only one attribute) or composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Table STUDENT_COURSE
STUD_NO TEACHER_NO COURSE_NO
1 001 C001
2 056 C005
Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘in
that column only once. That’s why the STUD_PHONE attribute is a candidate here but
cannot be a ‘null’ value in the primary key attribute.
2. Primary Key: There can be more than one candidate key in relation out of which one can
be chosen as the primary key. For Example, STUD_NO, as well as STUD_PHONE, are
candidate keys for relation STUDENT but STUD_NO can be chosen as the primary key
(only one out of many candidate keys).
It is a unique key.
It can identify only one tuple (a record) at a time.
It has no duplicate values; it has unique values.
It cannot be NULL.
Primary keys are not necessarily to be a single column; more than one column can
also be a primary key for a table.
Example:
STUDENT table -> Student (STUD_NO, SNAME,
ADDRESS, PHONE), STUD_NO is a primary key
Table STUDENT
3. Super Key: The set of attributes that can uniquely identify a tuple is known as Super Key.
For Example, STUD_NO, (STUD_NO, STUD_NAME), etc. A super key is a group of single
or multiple keys that identifies rows in a table. It supports NULL values.
Adding zero or more attributes to the candidate key generates the super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.
Example:
Consider the table shown above.
STUD_NO+PHONE is a super key.
1 005 C001
2 056 C005
It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can
be NULL as well as may contain duplicate tuples i.e., it need not follow uniqueness
constraint. For Example, STUD_NO in the STUDENT_COURSE relation is not unique. It
has been repeated for the first and third tuples. However, the STUD_NO in STUDENT
relation is a primary key and it needs to be always unique, and it cannot be null.
Relational Constraints
These are the restrictions or sets of rules imposed on the database contents. It validates the
quality of the database. It validates the various operations like data insertion, updating, and
other processes which must be performed without affecting the integrity of the data. It
protects us against threats/damages to the database. Mainly Constraints on the relational
database are of 4 types.
1. Domain constraints
2. Key constraints or Uniqueness Constraints
3. Entity Integrity constraints
4. Referential integrity constraints
Types of Relational Constraints
Let’s discuss each of the above constraints in detail.
1. Domain Constraints
1. Every domain must contain atomic values (smallest indivisible units) which means
composite and multi-valued attributes are not allowed.
2. We perform a datatype check here, which means when we assign a data type to a
column, we limit the values that it can contain. Eg. If we assign the datatype of
attribute age as int, we can’t give it values other than int datatype.
Example:
123456789
01 Bikash Dutta
234456678
Explanation: In the above relation, Name is a composite attribute and Phone is a multi-
values attribute, so it is violating domain constraint.
2. Key Constraints or Uniqueness Constraints
1. These are called uniqueness constraints since it ensures that every tuple in the relation
should be unique.
2. A relation can have multiple keys or candidate keys(minimal super key), out of which
we choose one of the keys as the primary key, we don’t have any restriction on
choosing the primary key out of candidate keys, but it is suggested to go with
the candidate key with less number of attributes.
3. Null values are not allowed in the primary key, hence Not Null constraint is also part
of the key constraint.
Example:
EID Name Phone
01 Bikash 6000000009
02 Paul 9000090009
01 Tuhin 9234567892
Explanation: In the above table, EID is the primary key, and the first and the last tuple have
the same value in EID ie 01, so it is violating the key constraint.
3. Entity Integrity Constraints:
1. Entity Integrity constraints say that no primary key can take a NULL value, since
using the primary key we identify each tuple uniquely in a relation.
Example:
01 Bikash 9000900099
02 Paul 600000009
Explanation: In the above relation, EID is made the primary key, and the primary key can’t
take NULL values but in the third tuple, the primary key is null, so it is violating Entity
Integrity constraints.
4. Referential Integrity Constraints
1. The Referential integrity constraint is specified between two relations or tables and
used to maintain the consistency among the tuples in two relations.
2. This constraint is enforced through a foreign key, when an attribute in the foreign
key of relation R1 has the same domain(s) as the primary key of relation R2, then the
foreign key of R1 is said to reference or refer to the primary key of relation R2.
3. The values of the foreign key in a tuple of relation R1 can either take the values of the
primary key for some tuple in relation R2, or can take NULL values, but can’t be
empty.
Example:
01 Divine 12
02 Dino 22
04 Vivian 14
DNO Place
12 Jaipur
13 Mumbai
14 Delhi
Explanation: In the above tables, the DNO of Table 1 is the foreign key, and DNO in Table 2
is the primary key. DNO = 22 in the foreign key of Table 1 is not allowed because DNO =
22 is not defined in the primary key of table 2. Therefore, Referential integrity constraints are
violated here.
Advantages of Relational Database Model
It is simpler than the hierarchical model and network model.
It is easy and simple to understand.
Its structure can be changed anytime upon requirement.
Data Integrity: The relational database model enforces data integrity through various
constraints such as primary keys, foreign keys, and unique constraints. This ensures
that the data in the database is accurate, consistent, and valid.
Flexibility: The relational database model is highly flexible and can handle a wide
range of data types and structures. It also allows for easy modification and updating of
the data without affecting other parts of the database.
Scalability: The relational database model can scale to handle large amounts of data
by adding more tables, indexes, or partitions to the database. This allows for better
performance and faster query response times.
Security: The relational database model provides robust security features to protect
the data in the database. These include user authentication, authorization, and
encryption of sensitive data.
Data consistency: The relational database model ensures that the data in the database
is consistent across all tables. This means that if a change is made to one table, the
corresponding changes will be made to all related tables.
Query Optimization: The relational database model provides a query optimizer that
can analyze and optimize SQL queries to improve their performance. This allows for
faster query response times and better scalability.
Disadvantages of the Relational Model
Few database relations have certain limits which can’t be expanded further.
It can be complex, and it becomes hard to use.
Complexity: The relational model can be complex and difficult to understand,
particularly for users who are not familiar with SQL and database design principles.
This can make it challenging to set up and maintain a relational database.
Performance: The relational model can suffer from performance issues when dealing
with large data sets or complex queries. Joins between tables can be slow, and
indexing strategies can be difficult to optimize.
Scalability: While the relational model is generally scalable, it can become difficult
to manage as the database grows. Adding new tables or indexes can be time-
consuming, and managing relationships between tables can become complex.
Cost: Relational databases can be expensive to license and maintain, particularly for
large-scale deployments. Additionally, relational databases often require dedicated
hardware and specialized software to run, which can add to the cost.
Limited flexibility: The relational model is designed to work with tables that have
predefined structures and relationships. This can make it difficult to work with data
that does not fit neatly into a table-based format, such as unstructured or semi-
structured data.
Data redundancy: In some cases, the relational model can lead to data redundancy,
where the same data is stored in multiple tables. This can lead to inefficiencies and
can make it difficult to ensure data consistency across the database.
Entity Relationship (ER) Model:
The Entity Relational Model is a model for identifying entities to be represented in the
database and representation of how those entities are related. The ER data model specifies
enterprise schema that represents the overall logical structure of a database graphically.
The Entity Relationship Diagram explains the relationship among the entities present in the
database. ER models are used to model real-world objects like a person, a car, or a company
and the relation between these real-world objects. In short, the ER Diagram is the structural
format of the database.
Why Use ER Diagrams In DBMS?
ER diagrams are used to represent the E-R model in a database, which makes them
easy to be converted into relations (tables).
ER diagrams provide the purpose of real-world modeling of objects which makes
them intently useful.
ER diagrams require no technical knowledge and no hardware support.
These diagrams are very easy to understand and easy to create even for a naive user.
It gives a standard solution for visualizing the data logically.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which
consists of these symbols:
Rectangles: Rectangles represent Entities in the ER Model.
Ellipses: Ellipses represent Attributes in the ER Model.
Diamond: Diamonds represent Relationships among Entities.
Lines: Lines represent attributes to entities and entities sets with other relationship
types.
Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
Double Rectangle: Double Rectangle represents a Weak Entity.
Symbols used in ER Diagram
Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database
System.
Components of ER Diagram
Entity
An Entity may be an object with a physical existence – a particular person, car, house, or
employee – or it may be an object with a conceptual existence – a company, a job, or a
university course.
Entity Set: An Entity is an object of Entity Type, and a set of all entities is called an entity
set. For Example, E1 is an entity having Entity Type Student and the set of all students is
called Entity Set. In ER diagram, Entity Type is represented as:
Entity Set
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on
other Entity in the Schema. It has a primary key, that helps in identifying it uniquely, and it is
represented by a rectangle. These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But
some entity type exists for which key attributes can’t be defined. These are called Weak
Entity types.
For Example, A company may store the information of dependents (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existed without the employee. So
Dependent will be a Weak Entity Type and Employee will be Identifying Entity type for
Dependent, which means it is Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The participation of weak entity
types is always total. The relationship between the weak entity type and its identifying strong
entity type is called identifying relationship and it is represented by a double diamond.
Attributes
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB,
Age, Address, and Mobile_No are the attributes that define entity type Student. In ER
diagram, the attribute is represented by an oval.
Attribute
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key
attribute. For example, Roll_No will be unique for each student. In ER diagram, the key
attribute is represented by an oval with underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For
example, the Address attribute of the student Entity type consists of Street, City, State, and
Country. In ER diagram, the composite attribute is represented by an oval comprising of
ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g., Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes
Relationship Type
A Relationship Type represents the association between entity types. For example, ‘Enrolled
in’ is a relationship type that exists between entity type Student and Course. In ER diagram,
the relationship type is represented by a diamond and connecting the entities with lines.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one
person.
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. n-ary Relationship: When there are n entities set participating in a relation, the
relationship is called an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is known
as cardinality. Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the
relationship, the cardinality is one-to-one. Let us assume that a male can marry one female
and a female can marry one male. So, the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
The initial relational schema is expressed in the following format writing the table names
with the attributes list inside a parentheses as shown below for
Persons (personid, name, lastname, email)
Persons and Phones are Tables. name, lastname, are Table Columns (Attributes).
[info]personid is the primary key for the table: Person[/info]
2. Multi-Valued Attributes
A multi-valued attribute is usually represented with a double-line oval.
If you have a multi-valued attribute, take the attribute, and turn it into a new entity or table of
its own. Then make a 1: N relationship between the new entity and the existing one. In simple
words. 1. Create a table for the attribute. 2. Add the primary (id) column of the parent entity
as a foreign key within the new table as shown below:
Persons (personid , name, lastname, email )
Phones ( phoneid , personid, phone )
[info]personid within the table Phones is a foreign key referring to the personid of
Persons[/info]
3. 1:1 Relationships
To keep it simple and even for better performances at data retrieval, I would personally
recommend using attributes to represent such relationship. For instance, let us consider the
case where the Person has or optionally has one wife. You can place the primary key of the
wife within the table of the Persons which we call in this case foreign key as shown below.
Persons (personid , name, lastname, email , wifeid )
Wife ( wifeid , name )
Or vice versa to put the personid as a foreign key within the Wife table as shown below:
Persons (personid , name, lastname, email )
Wife ( wifeid , name , personid)
[info]For cases when the Person is not married i.e., has no wifeID, the attribute can set to
NULL[/info]
4. 1: N Relationships
This is the tricky part! For simplicity, use attributes in the same way as 1:1 relationship but
we have only one choice as opposed to two choices. For instance, the Person can have
a House from zero to many, but a House can have only one Person. To represent such
relationship the personid as the Parent node must be placed within the Child table as a
foreign key but not the other way around as shown next:
There are some points for converting the ER diagram to the table:
Backward Skip 10sPlay VideoForward Skip 10s
o Entity type becomes a table.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual
tables.
o All single-valued attribute becomes a column for the table.
In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of
STUDENT table. Similarly, COURSE_NAME and COURSE_ID form the column of
COURSE table and so on.
o A key attribute of the entity type represented by the primary key.
In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID
are the key attribute of the entity.
o The multivalued attribute is represented by a separate table.
In the student table, a hobby is a multivalued attribute. So, it is not possible to represent
multiple values in a single column of STUDENT table. Hence, we create a table
STUD_HOBBY with column name STUDENT_ID and HOBBY. Using both the column, we
create a composite key.
o Composite attribute represented by components.
In the given ER diagram, student address is a composite attribute. It contains CITY, PIN,
DOOR#, STREET, and STATE. In the STUDENT table, these attributes can merge as an
individual column.
o Derived attributes are not considered in the table.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time
by calculating the difference between current date and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign the
mapping between the tables. Table structure for the given ER diagram is as below:
42 abc 17
43 pqr 18
44 xyz 18
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
roll_no name age
44 xyz 18
45 abc 19
enrol_n
o name dept building_no
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines
another attribute or set of attributes. If a relation R has attributes X, Y, Z with the
dependencies X->Y and X->Z which states that those dependencies are fully functional.
6. Partial Functional Dependency
In partial functional dependency a non-key attribute depends on a part of the
composite key, rather than the whole key. If a relation R has attributes X, Y, Z where X and Y
are the composite key and Z is non key attribute. Then X->Z is a partial functional
dependency in RBDMS.
Normalization:
-is the process of minimizing redundancy from a relation or set of relations. Redundancy in
relation may cause insertion, deletion, and update anomalies. So, it helps to minimize the
redundancy in relations. Normal forms are used to eliminate or reduce redundancy in
database tables.
What is Database Normalization?
In database management systems (DBMS), normal forms are a series of guidelines that help
to ensure that the design of a database is efficient, organized, and free from data anomalies.
There are several levels of normalization, each with its own set of guidelines, known as
normal forms.
Important Points Regarding Normal Forms in DBMS
First Normal Form (1NF): This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value, and each column should have a
unique name. The first normal form helps to eliminate duplicate data and simplify
queries.
Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each
non-key attribute be dependent on the primary key. This means that each column
should be directly related to the primary key, and not to other columns.
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same table.
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures
that each determinant in a table is a candidate key. In other words, BCNF ensures that
each non-key attribute is dependent only on the candidate key.
Normal forms help to reduce data redundancy, increase data consistency, and improve
database performance. However, higher levels of normalization can lead to more complex
database designs and queries. It is important to strike a balance between normalization and
practicality when designing a database.
Advantages of Normal Form:
Reduced data redundancy: Normalization helps to eliminate duplicate data in
tables, reducing the amount of storage space needed and improving database
efficiency.
Improved data consistency: Normalization ensures that data is stored in a consistent
and organized manner, reducing the risk of data inconsistencies and errors.
Simplified database design: Normalization provides guidelines for organizing tables
and data relationships, making it easier to design and maintain a database.
Improved query performance: Normalized tables are typically easier to search and
retrieve data from, resulting in faster query performance.
Easier database maintenance: Normalization reduces the complexity of a database
by breaking it down into smaller, more manageable tables, making it easier to add,
modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase database
efficiency, and simplify database design and maintenance.
First Normal Form:
If a relation contains composite or multi-valued attribute, it violates first normal form, or a
relation is in first normal form if it does not contain any composite or multi-valued attribute.
A relation is in first normal form if every attribute in that relation is singled valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued
attribute STUD_PHONE. Its decomposition into 1NF has been shown in table
2.
Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi-valued attribute so it is not in 1NF. Below Table is
in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form:
To be in second normal form, a relation must be in first normal form and relation must not
contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no
non-prime attribute (attributes which are not part of any candidate key) is dependent on any
proper subset of any candidate key of the table. Partial Dependency – If the proper subset of
candidate key determines non-prime attribute, it is called partial dependency.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee} Here, COURSE_FEE
cannot alone decide the value of COURSE_NO or STUD_NO; COURSE_FEE
together with STUD_NO cannot decide the value of COURSE_NO; COURSE_FEE
together with COURSE_NO cannot decide the value of STUD_NO; Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE,
i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the
candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation is not in 2NF. To
convert the above relation to 2NF, we need to split the table into two tables such as:
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For
instance, if there are 100 students taking C1 course, we don’t need to store its Fee as
1000 for all the 100 records, instead, once we can store it in the second table as the
course fee for C1 is 1000.
Example 2 – Consider following functional dependencies in relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency, i.e.,
any proper subset of AB doesn’t determine any non-prime attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
Example 1:
In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME, STUD_NO -
> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.
So, STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal
form.
To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO,
STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Consider relation R (A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate
keys in above relation are {A, E, CD, BC} All attributes are on right sides of all functional
dependencies are prime.
Example 2: Find the highest normal form of a relation R (A, B, C, D, E) with FD set as {BC-
>D, AC->BE, B->E}
Step 1: As we can see, (AC)+ = {A, C, B, E, D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can’t be derived from any other
attribute of the relation, so there will be only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute. The relation is in 2nd normal form because BC->D is in 2nd
normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2nd normal
form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is
a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to
satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be prime
attribute. So, the highest normal form of relation will be 2nd Normal form.
For example, consider relation R (A, B, C) A -> BC, B -> A and B both are super keys so
above relation is in BCNF.
Third Normal Form
A relation is said to be in third normal form, if we did not have any transitive dependency for
non-prime attributes. The basic condition with the Third Normal Form is that the relation
must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial functional
dependency X -> Y:
X is a Super Key.
Y is a Prime Attribute (this means that element of Y is some part of Candidate Key).
In other words,
Note – If A->B and B->C are two FDs then A->C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively dependent
attribute(s) from the relation by placing the attribute(s) in a new relation along with a
copy of the determinant.
Consider the examples given below.
Example-1:
In relation STUDENT given in Table 4,
FD set:
{STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key:
{STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on
STUD_NO. It violates the third normal form. To convert it in third normal form, we will
decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Example-2:
Consider relation R (A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right
sides of all functional dependencies are prime.
BCNF in DBMS
You came across a similar hierarchy known as the Chomsky Normal Form in the Theory of
Computation. Now, carefully study the hierarchy above. It can be inferred that every relation
in BCNF is also in 3NF. To put it another way, a relation in 3NF need not be in BCNF.
Ponder over this statement for a while.
To determine the highest normal form of a given relation R with functional dependencies, the
first step is to check whether the BCNF condition holds. If R is found to be in BCNF, it can
be safely deduced that the relation is also in 3NF, 2NF, and 1NF as the hierarchy shows. The
1NF has the least restrictive constraint – it only requires a relation R to have atomic values in
each tuple. The 2NF has a slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less
restrictive than the BCNF. In this manner, the restriction increases as we traverse down the
hierarchy.
Examples
Here, we are going to discuss some basic examples which let you understand the properties of
BCNF. We will discuss multiple examples here.
Example 1
Let us consider the student database, in which data of the student are mentioned.
Computer
101 Science & DBMS B_001 201
Engineering
Computer
Computer
101 Science & B_001 202
Networks
Engineering
Electronics &
VLSI
102 Communication B_003 401
Technology
Engineering
Electronics &
Mobile
102 Communication B_003 402
Communication
Engineering
Stu_I
D Stu_Branch
101 201
101 202
102 401
102 402