UNIT 1 Notes
UNIT 1 Notes
Case Study: ER Diagram on Online Streaming, Movie Ticket Recommendation, Bike Tracking
                     Data Independence.
                     Efficient Data Access.
                     Data Integrity and security.
                     Data administration.
                     Concurrent access and Crash recovery.
                     Reduced Application Development Time.
There are a few types that are very important and popular.
      Relational Database - a relational database are organized as a set of tables with columns
       and rows
      Object-Oriented Database - object-oriented database is represented in the form of
       objects
      Distributed Database - consists of two or more files located in different sites. The
       database may be stored on multiple computers, located in the same physical location, or
       scattered over different networks
      NoSQL Database - A NoSQL, or nonrelational database, allows unstructured and semi
       structured data to be stored and manipulated
      Graph Database - stores data in terms of entities and the relationships between entities.
      Cloud Database - collection of data, either structured or unstructured, that resides on a
       private, public, or hybrid cloud computing platform.
    An in-memory database (IMDB) is a computer system that stores and retrieves data
       records that reside in a computer’s main memory, e.g., random-access memory (RAM).
    IMDBs can store relational (tabular) data, document data, key-value data, or even a
       combination.
 The data and instructions for an in-memory database reside in the main memory, usually
    RAM.
 This approach reduces the I/O requests to the disk and improves overall database speed.
 Storing data in memory enables direct access to information and dramatically reduces the
    time needed to query data.
 IoT and edge computing. IoT sensors stream massive amounts of data. An in-memory
    database can store and perform calculations using real-time data before sending it to an
    on-disk database.
   E-commerce applications. Shopping carts, search results, session management, and
    quick page loads are all possible with an in-memory database
 Gaming industry. The gaming industry uses in-memory databases for updating
    leaderboards in real-time,
 Real-time security and fraud detection. In-memory databases help perform complex
    processing and analytics in real-time.
1.2.2 Distributed Databases:
   •   A distributed database is a database that is not limited to one computer system. It is like
       a database that consists of two or more files located on different computers.
   •   Distributed databases are needed when a particular data in the database needs to be
       accessed by various users globally.
   Database management systems are divided into multiple levels of abstraction for proper
functioning. database management system is not always directly accessible by the user or an
application.   architectures     follow    a tier-based     classification.        an n-tier   DBMS
Architecture divides the whole DBMS into related but n independent layers or levels.
Database applications are usually partitioned into two or three parts. In a two-tier architecture,
the application resides at the client machine, where it invokes database system functionality at
the server machine through query language statements.
Application program interface standards like ODBC and JDBC are used for interaction between
the client and the server. In contrast, in a three-tier architecture, the client machine acts as
merely a front end and does not contain any direct database calls. Instead, the client end
communicates with an application server, usually through a forms interface.
The application server in turn communicates with a database system to access data. The business
logic of the application, which says what actions to carry out under what conditions, is embedded
in the application server, instead of being distributed across multiple clients.
Three-tier applications are more appropriate for large applications, and for applications that run
on the WorldWideWeb.
1.3.1 Database System Architecture
   A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components. The storage manager is
important because databases typically require a large amount of storage space. The query
processor is important because it helps the database system simplify and facilitate access to
data.
It is the job of the database system to translate updates and queries written in a
nonprocedural language, at the logical level, into an efficient sequence of operations at the
physical level.
Users are differentiated by the way they expect to interact with the system
       · DDL interpreter, which interprets DDL statements and records the definitions in the
data dictionary.
       · DML compiler, which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all
give the same result. The DML compiler also performs query optimization, that is, it picks the
lowest cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Storage Manager:
       A storage manager is a program module that provides the interface between the low level
data stored in the database and the application programs and queries submitted to the system.
       The storage manager is responsible for the interaction with the file manager. The raw data
are stored on the disk using the file system, which is usually provided by a conventional operating
system.
       The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data
in the database.
The storage manager components include:
· Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
· Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
· File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
· Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than
the size of main memory
Transaction Manager:
       A transaction is a collection of operations that performs a single logical function in a
database application.
       Each transaction is a unit of both atomicity and consistency. Thus, we require that
transactions do not violate any database-consistency constraints. That is, if the database was
consistent when a transaction started, the database must be consistent when the transaction
successfully terminates.
       Transaction - manager ensures that the database remains in a consistent (correct) state
despite system failures (e.g., power failures and operating system crashes) and transaction
failures.
1. Physical level (internal level) - how the data is stored in the hardware
   2. Logical level (conceptual level) - The next-higher level of abstraction describes what
       data are stored in the database, and what relationships exist among those data.
   3. View level (external level) - The highest level of abstraction describes only part of
       the entire database.
Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in
memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers
generally work at this level because they are aware of such things about database systems.
       At view level, user just interact with system with the help of GUI and enter the details
at the screen, they are not aware of how the data is stored and what data is stored; such details
are hidden from them.
1.4.1 Schemas
       Schemas are the overall design of a database. For example: An employee table in
database exists with the following attributes:
              Physical schema
              logical schema
              view schema
       Schema represents the logical view of the database. It helps you understand what data
needs to go where.Schema helps the database users to understand the relationship between
data. Have a schema that shows the relationship between three tables: Course, Student and
Section.
1.4.2 Definition of instance: The data stored in database at a particular moment of time is
called instance of database. Database schema defines the attributes in tables that belong to a
particular database. The value of these attributes at a moment of time is called the instance of
that database.
                                                                                       EMP_NAME
                                                                                       EMP_ID
EMP_ADDRESS EMP_CONTACT
-------    ------ -----------       -----------
Chaitanya 101     Noida             95********
Ajeet      102    Delhi             99********
       Logical data independence can be defined as the immunity of the external schemas to
changes in the conceptual schema.
Physical Independence
       Physical data independence can be defined as the immunity of the conceptual schema to
changes in the internal schema.
       The entire structure of a database can be described using a data model. A data model is
a collection of conceptual tools for describing data, data relationships and consistency constraints.
A data model provides a way to describe the design of a database at the physical, logical, and
view levels as it provides a clear picture of the data making it easier for developers to create a
physical database.
The data models can be classified into four different categories:
                             Relational Model
                             Entity-Relationship Model
                             Object-Based Data Model
                             Semi-structured Data Model
                             Network Data Model
                             Hierarchical Data Model
Relational Model
       An Entity-Relationship model is a high-level data model that describes the structure of the
database in a pictorial form which is known as ER-diagram or it is used to represent logical
structure of the database easily. The entity-relationship (E-R) data model uses a collection of
basic objects, called entities and relationships among these objects.
Entities
       An entity contains a real-world property called attribute. This is the characteristics of that
attribute. Example: The entity teacher has the property like teacher id, salary, age, etc
Relationship
        Relationship tells how two attributes are related. Example: Teacher works for a
department.
<Employee 1>
<name>..............</name>
<age>.................</age>
<salary>.............</salary>
</employee 1>
ER Diagram - Relational Data Model
       An Entity–relationship model (ER model) describes the structure of a database with
the help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An
ER model is a design or blueprint of a database
ER Diagrams
       An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a table or
attribute of a table in database, so by showing relationship among tables and their attributes,
ER diagram shows the complete logical structure of a database.
ER Diagram
What is an ER Model?
       An Entity-Relationship Model represents the structure of the database with the help of a
diagram. ER Modelling is a systematic process to design a database as it would require you to
analyze all data requirements before implementing your database.
Components of ER Diagram
              1. Entity
              2. Attribute
              3. Relationship
Symbols used in ER Diagram
Entity
An entity is an object or component of data. It is anything in the real world, such as an object,
class, person, or place. Objects that physically exist and are logically constructed in the real world
are called entities.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship
with other entity is called weak entity. The weak entity is represented by a double rectangle.
For example – a bank account cannot be uniquely identified without knowing the bank to which
the account belongs, so bank account is a weak entity.
Attribute
   1.Key attribute
   2. Composite attribute
   3. Multivalued attribute
   4. Derived attribute
1. Key attribute:
              This attribute can uniquely identify an entity in an entity set. Key attribute is
       represented by oval same as other attributes however the text of key attribute is
       underlined. For example, the Aadhar number is a key attribute that can identify a person.
2. Composite attribute:
4. Derived attribute :
        These attributes are derived from other attributes in a database. An example of a derived
        attribute can be the age of an employee which must be derived from the Date of Birth of
        that employee.
Relationship
It is used to define the relationship that exists between different entities in an ER diagram.
Participation Constraints
1. Partial Participation – Not all entities are involved in the relationship. Partial participation is
denoted by single lines.
       For example: In the following diagram each college must have at-least one associated
Student. Total participation is represented using a double line between the entity set and
relationship set.
2. Total Participation – All entities are involved in the relationship. Total participation is
denoted by double lines.
       For instance, consider a relationship between a person and a driver's license where each
person can have only one driver's license and each driver's license belongs to only one person.
2. One-to-many – A relationship is denoted as ‘1:N’ when one instance of an entity is associated
with multiple instances of another entity.
          For instance, a mother can have multiple children, but each child has only one biological
mother.
3. Many-to-one – The relationship is denoted as ‘N:1’ when multiple instances of an entity are
associated with a single instance of another entity.
      For instance, a student can enroll in multiple courses, and a course can have multiple
students.
Example :
(a) Construct an E-R diagram for a car-insurance company whose customers own one or more
cars each. Each car has associated with it zero to any number of recorded accidents.
(b)   Suppose you are given the following requirements for a simple database for the National
Hockey League (NHL): the NHL has many teams, each team has a name, a city, a coach, a
captain, and a set of players, each player belongs to only one team, each player has a name, a
position (such as left wing or goalie), a skill level, and a set of injury records, a team captain is
also a player, a game is played between two teams (referred to as host_team and guest_team)
and has a date (such as May 11th, 1999) and a score (such as 4 to 2). Construct a clean and
concise ER diagram for the NHL database.
(c) A university registrar’s office maintains data about the following entities: courses, including
number, title, credits, syllabus, and prerequisites; course offerings, including course number,
year, semester, section number, instructor(s), timings, and classroom; students, including
student-id, name, and program; instructors, including identification number, name, department,
and title. Further, the enrollment of students in courses and grades awarded to students in each
course they are enrolled for must be appropriately modeled. Construct an E-R diagram for the
registrar’s office. Document all assumptions that you make about the mapping constraints.
1.6 Extended ER Model:
Extended ER is a high-level data model that incorporates the extensions to the original ER
model. Enhanced ER models are high level models that represent the requirements and
complexities of complex databases.
The extended Entity Relationship (ER) models are three types as given below –
        Aggregation
        Specialization
        Generalization
1.6.1 Specialization:
      The process of designing sub groupings within an entity set is called specialization.
      It is a top-down process.
      If an entity set is given with all the attributes in which the instances of the entity set are
       differentiated according to the given attribute value, then that sub-classes or the sub-
       entity sets can be formed from the given attribute.
      Specialization of account creates two entity sets: savings account and current account.
      In the E-R diagram specialization is represented by triangle components labeled ISA. The
       ISA relationship is referred as superclass- subclass relationship
1.6.2 Generalization:
1.6.3 Aggregation:
      It is an abstraction in which relationship sets are treated as higher level entity sets and
       can participate in relationships.
      Aggregation allows us to indicate that a relationship set participates in another relationship
       set.
      Aggregation is used to simplify the details of a given database where ternary relationships
       will be changed into binary relationships.
      Ternary relation is only one type of relationship which is working between three entities.
              Relational algebra is a procedural query language, it means that it tells what data
               to be retrieved and how to be retrieved.
              Relational Algebra works on the whole table at once, so we do not have to use
               loops etc to iterate over all the rows (tuples) of data one by one.
              All we have to do is specify the table name from which we need the data, and in
               a single line of command, relational algebra will traverse the entire given table
               to fetch data for you.
Basic/Fundamental Operations:
               1. Select (σ)
               2. Project (∏)
               3. Union (𝖴)
               4. Set Difference (-)
               5. Cartesian product (X)
               6. Rename (ρ)
               7. Joins
               8. Assignment Operator
               9. Division Operator
      1. Select Operation (σ) :This is used to fetch rows (tuples) from table(relation)
      which satisfies a given condition.
      Syntax: σp(r)
          σ is the predicate
          r stands for relation which is the name of the table
          p is prepositional logic ex: σage > 17 (Student)
      This will fetch the tuples(rows) from table Student, for which age will be greater than 17.
      σage > 17 and gender = 'Male' (Student)
      This will return tuples(rows) from table Student with information of male students, of
      age more than 17.
Input:
         σ BRANCH_NAME="perryride" (LOAN)
Output:
Project operation is used to project only a certain set of attributes of a relation. In simple
words, If you want to see only the names of all of the students in the Student table, then
you can use Project Operation.
               It will only project or show the columns or attributes asked for, and will also
                remove duplicate data from the columns.
Syntax of Project Operator (∏)
         ∏ column_name1, column_name2, . , column_nameN(table_name)
         Example:
         ∏Name, Age(Student)
         Above statement will show us only the Name and Age columns for all the rows of
         data in Student table.
Input:
         ∏ NAME, CITY (CUSTOMER)
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
               For this operation to work, the relations(tables) specified should have same number
                of attributes(columns) and same attribute domain. Also the duplicate tuples are
                automatically eliminated from the result.
         Syntax: A 𝖴 B
         ∏Student(RegularClass) 𝖴 ∏Student(ExtraClass)
     Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
       Williams                  L-17
Input:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
This operation is used to find data present in one relation and not present in the second
relation. This operation is also applicable to two relations, just like Union operation.
Syntax: A - B
Smith
Jones
           For example, if we want to find the information for Regular Class and Extra
            Class which are conducted during morning, then, we can use the following
            operation:
           σtime = 'morning' (RegularClass X ExtraClass)
           For the above query to work, both RegularClass and ExtraClass should have the
            attribute time.
Notation: E X D
EMPLOYEE
1 Smith A
2 Harry C
        3                           John                                B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
   3             John           B          C         Legal
Rename Operation (ρ):
    This operation is used to rename the output relation for any query operation
     which returns result like Select, Project etc. Or to simply rename a
     relation(table)
    Syntax: ρ(RelationNew, RelationOld)
 The rename operation is used to rename the output relation. It is denoted by rho (ρ).
    Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
     ρ(STUDENT1, STUDENT)
Joins
    A JOIN clause is used to combine rows from two or more tables, based on a
     related column between them.
    Join in DBMS is a binary operation which allows you to combine join product
     and selection in one single statement.
    The goal of creating a join condition is that it helps you to combine the data
     from two or more DBMS tables.
 The tables in DBMS are associated using the primary key and foreign keys.
Syntax:
EMPLOYEE SALARY
EMPLOYEE ⋈ SALARY
Types of Joins:
1. Theta Join
• Notation: r s
 •    If tr and ts have the same value for each of the attributes in R ∩ S (“same name
      attributes”), a tuple t is added to the result such that
S(B, D, E)
r s
                      A           B C             D                                 B D   E
                       α          1 α             a                               1   a   α
                          β       2 γ             a                               3   a   β
                          γ       4       β       b                               1   a   γ
                          α       1       γ       a                               2   b   δ
                          δ       2       β       b                               3   b   τ
    r   ⋈ s
                      A           B C             D           E
                      α           1 α             a           α
                      α           1       α       a           γ
                      α           1       γ       a           α
                      α           1       γ       a           γ
                      δ           2       β       b           δ
r s
                  A B             C
                  1 2             3                               3           1
                  4       5       6                               6           2
                  7       8       9
r ⋈ B<D s
                              A       B       C           D           E
                              1       2       3           3           1
                              1       2       3           6           2
                              4       5       6           6           2
        2. Equi Join:
       When Theta join uses only equality comparison operator, it is said to be equijoin
        special case of conditional join where only equality condition holds between a
        pair of attributes
       As values of two attributes will be equal in result of equijoin, only one attribute
        will be appeared in result
       Derivation: r⋈ s = σC(r × s)
       If C involves only the comparison operator “=”, the condition join is also called
        Equi-Join.
r s
                   A B          C
                   4 5          6                       6     8
                   7    8       9                       10 12
r ⋈ C=SC (ρS(SC,D) (s))
                                                  A B       C      SC        D
        3. Natural Join:
                                                  4 5       6      6         8
               A1 = A1 AND … AND An = An
               where {A1 … An} = attributes(r) ∩ attributes(s)
R ⋈ ρ (B, F, G) (S) = Select R.*, F, G from R, S where B=E;
   •   Select records from the first (left-most) table with matching right table records
   •   Join starting with the first (left-most) table.
   •   Then, any matched records from the second table (right-most) will be included
   •   there is no matching tuple is found in right relation, then the attributes of right
       relation in the join result are filled with null values
   •   Syntax:
Outer Join – Right Join:
• Syntax:
   •   all tuples from both relations are included in the result, irrespective of the
       matching condition.
   •   Syntax:
Semi Join:
   •   Semi-Join matches the rows of two relations and then show the matching
       rows of the relation whose name is mentioned to the left side of ⋉ Semi Join
       operator.
Division Operator:
P=R÷S
Where,
P is result we get after applying division operator,
R and S stands for relation (name of the tables) on which division operation is
applied.
   •   T is in πR – S (r)
   •   For every tuple ts in S, there is a tuple tr in R satisfying both of the following:
           • tr[s] = ts[s]
           • tr[R-S] = t
Assignment Operator:
Syntax:
R is a relation.
Example:
R1 ← πname(Customer)
R2 ← πname(Employee)
R = R1 – R2
1.9 Constraints
      Key constraints
      Domain constraints
      Referential integrity constraints
Key constraints
There must be at least one minimal subset of attributes in the relation, which can
identify a tuple uniquely. This minimal subset of attributes is called key for that relation
      In a relation with a key attribute, no two tuples can have identical values for
       key attributes.
      A key attribute cannot have NULL values.
Domain Constraints
Attributes have specific values in real-world scenario. For example, age cannot be
less than zero and telephone numbers cannot contain a digit outside 0-9.
Referential Integrity ensures that for every foreign key entry in child table must have
corresponding primary key entry present in the parent table
2.0 Keys
KEYS in DBMS is an attribute or set of attributes which helps you to identify a
row(tuple) in a relation(table). Database key is also helpful for finding unique record
or row from the table.
Primary key
There can be two ways to create a Primary key for the table. The first way is to alter
an already created to add the Primary key constraint on an attribute.
Here, I have chosen the Id as the primary key attribute.
Now if you try to add a new row with a duplicate Id value, it gives me an error message.
The second way of adding a Primary key is during the creation of the table itself. All
you have to do is add the Primary Key constraint at the end after defining all the
attributes in the table.
To define a Primary Key constraint on multiple attributes, you can list all the attributes
Alternate keys are those candidate keys which are not the Primary key.
There can be only one Primary key for a table. Therefore all the remaining Candidate
keys are known as Alternate or Secondary keys. They can also uniquely identify tuples
in a table, but the database administrator chose a different key as the Primary key.
If we look at the Employee table once again, since I have chosen Id as the Primary
key, the other Candidate Key (Email) becomes the Alternate key for the table.
Foreign Keys
       Foreign key is an attribute which is a Primary key in its parent table, but is
included as an attribute in another host table
A Foreign key generates a relationship between the parent and host tables. For
example, in addition to the Employee table containing the employees’ personal details,
we might have another table, Department containing information related to the
department of the employee with the following schema Department(Id, Name,
Location).
The Primary key in Department table is the Id. Add this attribute to the Employee by
making it the Foreign key in the table. Foreign Key constraint may added during table
creation are by using alter command. Here Altered the table, but creating Foreign Key
Dep_Id is now the Foreign Key in table, Employee while it is a Primary Key in the
Department table.
The Foreign key allows you to create a relationship between two tables in the database,
data related to a particular field (employee and department here). Using the Foreign
For example, if the Marketing department shifts from Kolkata to Pune, instead of
updating it for all the relevant rows in the Employee table, we can simply update the
location in the Department table.
Composite Keys
       A Composite key is a Candidate key or Primary key that consists of more than
one attribute. Sometimes it is possible that no single attribute will have the property
to identify tuples in a table uniquely. In such cases, we can use a group of attributes
to guarantee uniqueness. Combining these attributes will uniquely identify tuples in
the table.
Here, neither of the attributes contains unique values to identify the tuples. Therefore,
we can combine two or more attributes to create a key uniquely identifying the tuples.
For example, we can group Transaction_Id and Product_Id to create a key that can
 A Super key is a single key or a group of multiple keys that can uniquely
 Super keys can contain redundant attributes that might not be important for
identifying tuples
 Candidate keys are a subset of Super keys. They contain only those
 All Candidate keys are Super keys. But the vice-versa is not true
table.
 There can be multiple Super keys and Candidate keys in a table, but there
 Alternate keys are those Candidate keys that were not chosen to be the
 A Foreign key is an attribute that is a Primary key in its parent table but is
It divides larger tables into smaller tables and links them using relationships
Data Anomalies - An anomaly is where there is an issue in the data that is not
meant to be there
Functional dependency is a relationship that exists between two sets of attributes
of a relational table where one set of attributes can determine the value of the other
set of attributes. It is denoted by X -> Y, where X is called a determinant and Y is
called dependent.
            First, it helps reduce the amount of storage needed to store the data.
            Second, it prevents data conflicts that may creep in because of the
               existence of multiple copies of the same data
      Course                                Instructor
                            Venue                                Phone No.
      code                                  Name
                                            Prof
      CS202                 SF03                                 9089090909
                                            George
Here, the data basically stores the course code, venue, instructor name, and phone
number.
At first, this design seems to be good. However, issues start to develop once need to
modify information. For instance, suppose, if Prof. John changed his mobile number.
In such a situation, have to make edits in 2 places.
If someone just edited the phone number against CS201, but forgot to edit it for
CS203? This will lead to wrong information in the database. This problem can be easily
tackled by dividing our table into 2 simpler tables:
Table 1 (Instructor):
      Instructor ID
      Instructor Name
      Phone no.
                                 Instructor
       InstructorID                                         Phone no
                                   Name
                                                        9567889890,
       F01                      Prof John
                                                        8975643567
       F02                      Prof
                                                        9089090909
                                George
Table 2 (Course):
      Course code
      Venue
      Instructor ID
       Course
                              Venue              InstructorID
       code
 a single cell must not hold more than one value (atomicity)
 each column must have only one value for each row in the table
EmployeeDetail
       Employee
                               Employee_Name                     Employee_PhoneNo.
        _code
      has no partial dependency. That is, all non-key attributes are fully dependent
       on a primary key
      If there is a partial dependency, split the table in two and relocate the
       partially dependent characteristics to another table where they will fit in better
To better understand partial dependency and how to normalize the table to the second
normal form, let's use the following "EmployeeProjectDetail" table as
EmployeeProjectDetail
       Employee              Project            Employee              Project
        Code                   ID                Name                  Name
Employee Code and Project ID are the table's primary attributes in the table. Because
Employee Code may identify Employee Name and Project Name can be determined by
Project ID, have partial dependencies in this table. As a result, the relational table
mentioned above breaks the 2NF constraint.
The EmployeeProjectDetail table can be divided into the following three tables to
eliminate partial dependencies and normalize it into the second normal form.
EmployeeDetail
      Employee           Employee
        Code               Name
201 John
201 John
202 George
203 Rehen
EmployeeProject
      Employee            Project
        Code                ID
201 P02
201 P03
202 P04
203 P05
ProjectDetail
      Project            Project
        ID                Name
P02 Project123
P03 Project126
P04 Project125
P05 Project124
This means a non-prime attribute (an attribute that is not part of the candidate’s key)
is dependent on another non-prime attribute. This is what the third normal form (3NF)
eliminates.
 be in 2NF
      EXCEPTION: Adhering to the third normal form is not always possible, despite
       being theoretically desirable. Create separate tables for cities, ZIP codes, sales
       reps, customer classes, and any other factor that might be repeated in many
       records if you have a customer database and want to eliminate any potential
       inter-field dependencies. Normalization in DBMS is worthwhile to pursue in
       theory. However, numerous small tables might cause performance to suffer or
       go beyond open file and RAM limits.
      Applying the third normal form to regularly changing data may be more
       practical. If any dependent fields still exist, build your application to ask the
       user to confirm changes to all linked fields.
       Employee               Employee                                  Employee_zipcode
                                             Employee_City
        Code                   Name
      We can divide the "EmployeeDetail" table into the following two tables to
       eliminate transitive dependency from it and normalize it into the third normal
       form:
<EmployeeDetail>
<EmployeeLocation>
Employee_City Employee_zipcode
Bangalore 501602
Chennai 641092
Coimbatore 641201
Madurai 652301
Boyce-Codd As it has more limitations than 3NF, Normal Form is an improved form of
3NF. A relational table must conform to the following conditions to be in Boyce-Codd
normal form:
The third normal form of the table is required. X is the table's superkey for every non-
trivial functional dependence X -> Y. As a result, if Y is a prime attribute, X cannot be
a non-prime attribute.
A group of one or more attributes known as a superkey can be used to identify a row
in a database table specifically. To better understand how to normalize the table to the
BCNF, let's use the following "EmployeeProjectLead" table as an example:
       Employee             Project
                                                      ProjectLeader
         Code                 ID
The preceding table satisfies all conventional forms up to 3NF. However, since its
candidate key is "Employee Code, Project ID," it defies BCNF's criteria. Project Leader
is a non-prime property for the non-trivial functional dependence Project Leader ->
Project ID, whereas Project ID is a prime attribute. In BCNF, this is not permitted.
Divide the given table into three tables and then translate them into BCNF.
<EmployeeProject>
201 P02
                201                         P03
               202                          P04
203 P05
<ProjectLead>
Project ID ProjectLeader
P02 Jonson
P03 Swetha
P04 Arun
P05 Daniel
Let us understand 4NF with an example. Consider the table given below.
                                                         Listening to
 S012                     English
                                                         music
                                                         Watching
 S013                     Biology
                                                         movies
 S014                     Civics                        Cooking
other, but, both of them are dependant on student_id. Hence, the above relation is
not in 4NF.
Student_Subject Table
student_id student_subject
S011 Science
S012 English
S011 Maths
S013 Biology
S014 Civics
Student_Hobby Table
student_id student_hobby
 S014                              Cooking
5NF (Fourth Normal Form)
    1. It should be in the Fourth Normal Form.
    2. It should not have any Join Dependency and there shouldn't be any losses while
       perfoming the join function.
3. The relation can be considered to be in 5nf if it has been broken down to as many tables
as possible and it has passed through all the Normal Forms. This helps to diminish the data
redundancy.
According to the table above, Arya handles both maths and english classes for 2nd sem, but
not for 3rd sem. To identify a valid data in this situation, a combination of all these fields is needed.
Consider a scenario in which we add a new semester say Semester 4 but are unsure of the
subject or the students who will be enrolled in it, so we leave teacher_name and teacher_subject
as NULL. We cannot, however, leave the other two columns empty because the three columns
Due to these issues the above relation is not in 5NF. This can be achieved by breaking the
Semester teacher_subject
Teacher_Subject Table
teacher_subject teacher_name
Maths David
Maths Arya
English Arya
English Ashwini
Science Prateek
Semester_Teacher Table
teacher_name semester
There are many benefits to normalizing a database. Some of the main advantages are
as follows:
Disadvantages of Normalization
There are various drawbacks to normalizing a database. A few disadvantages are as
follows:
   1. When information is dispersed over many tables, it becomes necessary to link
       them together, extending the work. Additionally, the database becomes more
       intriguing to recognize.
   2. Tables will include codes rather than actual data since rewritten data will be
       saved as lines of numbers rather than actual data. As a result, the query table
       must constantly be consulted.
   3. Being designed for programs rather than ad hoc querying, the information
       model proves to be exceedingly difficult to query. It is made up of SQL that has
       been accumulated through time, and operating framework cordial query devices
       often carry out this task. As a result, it might be difficult to demonstrate
       knowledge and understanding without first comprehending the client's needs.
   4. The show's pace gradually slows down compared to the typical structural type.
   5. To successfully finish the standardization cycle, it is vital to have a thorough
   6. understanding of the many conventional structures. A bad plan with
       substantial irregularities and data inconsistencies can result from careless
       use.
2.2 Armstrong Axioms
     We can use closures of attributes to determine if any FD follows from a given
      set of FDs.
     Use Armstrong's axioms: complete set of inference rules from which it is
      possible to derive every FD that follows from a given set.
    ● The set of all those attributes which can be functionally determined from an
        attribute set is called as a closure of that attribute set.
    ● Closure of attribute set {X} is denoted as {X}+.
Step-01:
  Add the attributes contained in the attribute set for which closure is being calculated to the
  result set.
Step-02:
  Recursively add the attributes to the result set which can be functionally determined from
       the attributes already contained in the result set.
    Example:
    Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-
    A → BC BC → DE
    D → F CF → G
Now, let us find the closure of some attributes and attribute sets-
Closure of attribute A-
    A+ = { A }
    ={A,B,C       } ( Using A → BC )
    ={A,B,C       , D , E } ( Using BC → DE )
    ={A,B,C       , D , E , F } ( Using D → F )
    ={A,B,C       , D , E , F , G } ( Using CF → G )
    Thus,
A+ = { A , B , C , D , E , F , G }
Closure of attribute D-
D+ = { D }
   = { D , F } ( Using D → F )
We can not determine any other attribute using attributes D and F contained in the result set.
Thus, D+ = { D , F }
Closure of attribute set {B, C}-
{ B , C }+= { B , C }
= { B , C , D , E } ( Using BC → DE )
= { B , C , D , E , F } ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }
If there exists no subset of an attribute set whose closure contains all the attributes of
the relation, then that attribute set is called as a candidate key of that relation.
We can determine the candidate keys of a given relation using the following steps-
Step-01:
Example
       A→BC→DD→E
       Here, the attributes which are not present on RHS of any functional dependency
       are A, C and F. So, essential attributes are- A, C and F.
Step-02:
Case-01: If all essential attributes together can determine all remaining non-essential
attributes, then-
      The combination of essential attributes is the candidate key.
      It is the only possible candidate key.
Case-02:
      If all essential attributes together can not determine all remaining non-essential
       attributes, then-
      The set of essential attributes and some non-essential attributes will be the
       candidate key(s).
      In this case, multiple candidate keys are possible.
      To find the candidate keys, we check different combinations of essential and
       non-essential attributes.
C → F E → A EC → D
A→B
1.CD
2.EC
3.AE
4.AC
Also, determine the total number of candidate keys and super keys.
Step-01:
Step-02:
We will check if the essential attributes together can determine all remaining non-
essential attributes.
To check, we find the closure of CE
So, we have-
{ CE }+
={C,E}
= { C , E , F } ( Using C → F )
= { A , C , E , F } ( Using E → A )
= { A , C , D , E , F } ( Using EC → D )
= { A , B , C , D , E , F } ( Using A → B )
We conclude that CE can determine all the attributes of the given relation. So, CE is
the only possible candidate key of the relation.
   •   Lossless
   •   Lossy
Lossless Decomposition:
   Lossy Decomposition:
   •   Whenever we decompose a relation into multiple relational schemas, then the
       loss of data/information is unavoidable whenever we try to retrieve the original
       relation.
Properties of Decomposition:
2. Dependency Preservation
2. Dependency Preservation:
Examples:
Lossless Decomposition:
Lossy Decomposition:
Decompositions,
A1 (X, Y)
A2 (X, Z)
Example:
X Y Z
X1 Y1 Z1
X2 Y1 Z1
X1 Y2 Z2
X1 Y3 Z3
X Y
X1                        Y1
X2                          Y1
X1 Y2
X1 Y3
X Z
X1 Z1
X2 Z1
X1 Z2
X1 Z3
A ⊂ A1 ⨝ A2
X Y Z
X1 Y1 Z1
X1 Y1 Z2
X2 Y1 Z1
X1 Y2 Z2
X1 Y2 Z1
X1 Y3 Z3
X1                       Y3                      Z1
Dependency Preservation:
   •   With FD (FD1) R is decomposed or divided into R1 and with FD(FD2) into R2,
       then the possibility of three cases arise,
   •   FD1 ∪ FD2 = FD -> Decomposition is dependency preserving.
   •   FD1 ∪ FD2 is a subset of FD -> Not Dependency preserving.
   •   FD1 ∪ FD2 is a superset of FD -> This case is not possible.
Example:
Solution:
R2 (R, S). To solve this problem, we need to first find the closure of Functional
Dependencies FD1 and FD2 of the relations R1 (P, Q, R) and R2(R, S).
1) To find the closure of FD1, we have to consider all combinations of (P, Q, R). i.e.,
we need to find out the closure of P, Q, R, PQ, QR, and RP.
closure (P) = {P} // Trivial
= {R, P}
= {P, Q, R}
(PQ --> R // Removing PQ from right side as these are trivial attributes)
= {P, Q, R}
(QR --> P // Removing QR from right side as these are trivial attributes)
(PR --> S // Removing PR from right side as these are trivial attributes)
   •   From the given result, in FD1, PQ holds R (PQ --> R) and in FD2, R holds S (R
       --> S). But, there is no follow up in Functional Dependency S holds P (S --> P).
   •   FD1 U FD2 is a subset of FD.
•   So as a consequence, given decomposition is not dependency preserving.