DBMS basics and normalizations unit.pptx

What is Data?
• Data is a collection of a distinct small unit
of information.
• It can be used in a variety of forms like
text, numbers, media, bytes, etc. it can be
stored in pieces of paper or electronic
memory, etc.
• Word 'Data' is originated from the word
'datum' that means 'single piece of
information.’
• It is plural of the word datum.

Database
• A database is
– An organized collection of data, so that it can be
easily accessed and managed.
– A collection of inter-related data used to retrieve,
insert and delete the data efficiently.
– used to organize the data in the form of a table,
schema, views, and reports, etc.
• For example: The college Database
organizes the data about the admin, staff,
students and faculty etc.

Database Management System
• Database management system is a software which is used to
manage the database. For example: MYSQL, Oracle, etc.
• DBMS provides an interface to perform various operations like
database creation, storing data in it, updating data, creating a table
in the database and a lot more.
• It provides protection and security to the database.
• In the case of multiple users, it also maintains data consistency.
• DBMS allows users the following tasks:
• Data Definition: It is used for creation, modification, and removal of
definition that defines the organization of data in the database.
• Data Updation: It is used for the insertion, modification, and deletion of
the actual data in the database.
• Data Retrieval: It is used to retrieve the data from the database which
can be used by applications for various purposes.
• User Administration: It is used for registering and monitoring users,
maintain data integrity, enforcing data security, dealing with concurrency
control, monitoring performance and recovering information corrupted
by unexpected failure.

Characteristics of DBMS
• It uses a digital repository established on a server to
store and manage the information.
• Provide a clear and logical view of the process that
manipulates data.
• Contains automatic backup and recovery procedures.
• Contains ACID properties which maintain data in a
healthy state in case of failure.
• Reduce the complex relationship between data.
• Support manipulation and processing of data.
• Provide security of data.
• View the database from different viewpoints according
to the requirements of the user.

Advantages of DBMS
• Controls database redundancy: It can control data
redundancy because it stores all the data in one single
database file and that recorded data is placed in the
database.
• Data sharing: In DBMS, the authorized users of an
organization can share the data among multiple users.
• Easier Maintenance: It can be easily maintainable due to the
centralized nature of the database system.
• Reduce time: It reduces development time and maintenance
need.
• Backup: It provides backup and recovery subsystems which
create automatic backup of data during failures and restores
the data if required.
• multiple user interface: It provides different types of user
interfaces like graphical user interfaces, application program
interfaces etc.

Disadvantages of DBMS
• Cost of Hardware and Software: It requires a
high speed of data processor and large memory
size to run DBMS software.
• Size: It occupies a large space of disks and large
memory to run them efficiently.
• Complexity: Database system creates additional
complexity and requirements.
• Higher impact of failure: Failure is highly
impacted the database because in most of the
organization, all the data stored in a single
database and if the database is damaged due to
electric failure or database corruption then the
data may be lost forever.

Centralized Database
• It is the type of database that stores data at a centralized database
system.
• Users can access the stored data from different locations through
several applications.
• These applications contain the authentication process to let users
access data securely.
• An example of a Centralized database can be Central Library that
carries a central database of each library in a college/university.
• Advantages of Centralized Database
• It has decreased the risk of data management, i.e., manipulation of data
will not affect the core data.
• Data consistency is maintained as it manages data in a central
repository.
• It provides better data quality, which enables organizations to establish
data standards.
• It is less costly because fewer vendors are required to handle the data
sets.
• Disadvantages of Centralized Database
• The size of the centralized database is large, which increases the
response time for fetching the data.
• It is not easy to update such an extensive database system.

Distributed Database
• Unlike a centralized database system, in distributed systems,
data is distributed among different database systems of an
organization.
• These database systems are connected via communication
links.
• Such links help the end-users to access the data easily.
• Distributed database system can further be divided into:
• Homogeneous DDB: Those database systems which execute
on the same operating system and use the same application
process and carry the same hardware devices.
• Heterogeneous DDB: Those database systems which execute
on different operating systems under different application
procedures, and carries different hardware devices.
• Advantages of Distributed Database
• Modular development is possible in a distributed database, i.e.,
the system can be expanded by including new computers and
connecting them to the distributed system.
• One server failure will not affect the entire data set.

Relational Database
• This database is based on the relational data
model, which stores data in the form of rows(tuple)
and columns(attributes), and together forms a
table(relation).
• A relational database uses SQL for storing,
manipulating, as well as maintaining the data.
• E. F. Codd(Edger Frank Codd) invented the
database in 1970.
• Each table in the database carries a key that
makes the data unique from others.
• Examples of Relational databases are MySQL,
Microsoft SQL Server, Oracle, etc.

Properties of Relational
Database
• There are following four commonly known properties of a relational
model known as ACID properties, where:
• A means Atomicity: This ensures the data operation will complete
either with success or with failure. It follows the 'all or nothing'
strategy. For example, a transaction will either be committed or will
abort.
• C means Consistency: If we perform any operation over the data,
its value before and after the operation should be preserved. For
example, the account balance before and after the transaction
should be correct, i.e., it should remain conserved.
• I means Isolation: There can be concurrent users for accessing
data at the same time from the database. Thus, isolation between
the data should remain isolated. For example, when multiple
transactions occur at the same time, one transaction effects should
not be visible to the other transactions in the database.
• D means Durability: It ensures that once it completes the operation
and commits the data, data changes should remain permanent.

NoSQL Database
• Non-SQL/Not Only SQL is a type of database that is used for storing a wide
range of data sets.
• It is not a relational database as it stores data not only in tabular form but in
several different ways.
• It came into existence when the demand for building modern applications
increased.
• Thus, NoSQL presented a wide variety of database technologies in
response to the demands.
• We can further divide a NoSQL database into the following four types:
1. Key-value storage: It is the simplest type of database storage where it stores every
single item as a key (or attribute name) holding its value, together.
2. Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model
format as used in the application code.
3. Graph Databases: It is used for storing vast amounts of data in a graph-like structure.
Most commonly, social networking websites use the graph database.
4. Wide-column stores: It is similar to the data represented in relational databases.
Here, data is stored in large columns together, instead of storing in rows.

Cloud Database
• A type of database where data is stored in a virtual
environment and executes over the cloud
computing platform.
• It provides users with various cloud computing
services (SaaS, PaaS, IaaS, etc.) for accessing
the database.
• There are numerous cloud platforms, but the best
options are:
• Amazon Web Services(AWS)
• Microsoft Azure
• PhonixNAP
• ScienceSoft
• Google Cloud SQL, etc.

Object-oriented Databases
• The type of database that uses the object-based data model approach for storing
data in the database system.
• The data is represented and stored as objects which are similar to the objects used in
the object-oriented programming language.
• Advantages
– Complex data and a wider variety of data types.
– Easy to save and retrieve data quickly.
– Seamless integration with object-oriented programming languages.
– Easier to model the advanced real world problems.
– Extensible with custom data types.
• Disadvantages
– Not as widely adopted as relational databases.
– No universal data model. Lacks theoretical foundations and standards.
– Does not support views.
– High complexity causes performance issues.
– An adequate security mechanism and access rights to objects do not exist.

Hierarchical Databases
• It is the type of database that stores data in
the form of parent-children relationship
nodes.
• Here, it organizes data in a tree-like structure.

Network Databases
• It is the database that typically follows the
network data model.
• Here, the representation of data is in the
form of nodes connected via links between
them.
• Unlike the hierarchical database, it allows
each record to have multiple children and
parent nodes to form a generalized graph
structure.

DBMS – Three Level Architecture

External level
• It is also called view level.
• The reason this level is called “view” is because several
users can view their desired data from this level which
is internally fetched from database with the help of
conceptual and internal level mapping.
• The user doesn’t need to know the database schema
details such as data structure, table definition etc. user
is only concerned about data which is what returned
back to the view level after it has been fetched from
database (present at the internal level).
• External level is the “top level” of the Three Level
DBMS Architecture.

Conceptual level
• It is also called logical level. The whole design
of the database such as relationship among
data, schema of data etc. are described in this
level.
• Database constraints and security are also
implemented in this level of architecture. This
level is maintained by DBA (database
administrator).

Internal level
• This level is also known as physical level. This
level describes how the data is actually stored
in the storage devices. This level is also
responsible for allocating space to the data.
This is the lowest level of the architecture.

Example
• store customer information in a customer table.
• At physical level these records can be described as
blocks of storage (bytes, gigabytes, terabytes etc.) in
memory. These details are often hidden from the
programmers.
• At the logical level these records can be described as
fields and attributes along with their data types, their
relationship among each other can be logically
implemented. The programmers generally work at this
level because they are aware of such things about
database systems.
• At view level, user just interact with system with the
help of GUI and enter the details at the screen, they
are not aware of how the data is stored and what data
is stored; such details are hidden from them.

Instance and schema in DBMS
• DBMS Schema-Design of a database is called the
schema. Schema is of three types: Physical schema,
logical schema and view schema.
– The design of a database at physical level is called physical
schema, how the data stored in blocks of storage is
described at this level.
– Design of database at logical level is called logical schema,
• programmers and database administrators work at this level,
• at this level data can be described as certain types of data records
gets stored in data structures,
• however the internal details such as implementation of data
structure is hidden at this level.
– Design of database at view level is called view schema.
This generally describes end user interaction with
database systems.

Instance and schema in DBMS (cont..)
• DBMS Instance
– The data stored in database at a particular
moment of time is called instance of database.
– Database schema defines the variable declarations
in tables that belong to a particular database; the
value of these variables at a moment of time is
called the instance of that database.

DBMS Detailed Architecture
• The DBMS design depends upon its
architecture.
• The basic client/server architecture is used to
deal with a large number of PCs, web
servers, database servers and other
components that are connected with
networks.
• The client/server architecture consists of
many PCs and a workstation which are
connected via the network.
• DBMS architecture depends upon how users
are connected to the database to get their
request done.

Database Users:
• Users are differentiated by the way they expect to interact with the system:
• Application programmers:
– Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces.
– Rapid application development (RAD) tools are tools that enable an application programmer
to construct forms and reports without writing a program.
• Sophisticated users:
– Sophisticated users interact with the system without writing programs. Instead, they form
their requests in a database query language.
– They submit each such query to a query processor, whose function is to break down DML
statements into instructions that the storage manager understands.
• Specialized users :
– Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework.
– Among these applications are computer-aided design systems, knowledge base and expert
systems, systems that store data with complex data types (for example, graphics data and
audio data), and environment-modelling systems.
• Naïve users :
– Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.
– For example, a bank teller who needs to transfer $50 from account A to account B invokes a
program called transfer. This program asks the teller for the amount of money to be
transferred, the account from which the money is to be transferred, and the account to which
the money is to be transferred.
• DBA

Database Administrator(DBA):
• Coordinates all the activities of the database system. The database
administrator has a good understanding of the enterprise’s
information resources and needs.
• Database administrator's duties include:
– Schema definition: The DBA creates the original database schema by
executing a set of data definition statements in the DDL.
– Storage structure and access method definition.
– Schema and physical organization modification: The DBA carries out
changes to the schema and physical organization to reflect the
changing needs of the organization, or to alter the physical
organization to improve performance.
– Granting user authority to access the database: By granting different
types of authorization, the database administrator can regulate which
parts of the database various users can access.
– Specifying integrity constraints.
– Monitoring performance and responding to changes in requirements.

Query Processor:
• The query processor will accept query from user and
solves it by accessing the database.
• Parts of Query processor:
– DDL interpreter
– This will interprets DDL statements and fetch the
definitions in the data dictionary.
– DML compiler
– a. This will translates DML statements in a query language
into low level instructions that the query evaluation engine
understands.
– b. A query can usually be translated into any of a number
of alternative evaluation plans for same query result DML
compiler will select best plan for query optimization.
– Query evaluation engine
– This engine will execute low-level instructions generated
by the DML compiler on DBMS.

Storage Manager/Storage
Management:
• A storage manager is a program module which acts like
– interface between the data stored in a database and the application programs
and
– queries submitted to the system.
• Thus, the storage manager is responsible for storing, retrieving and
updating data in the database.
• The storage manager components include:
– Authorization and integrity manager: Checks for integrity constraints and
authority of users to access data.
– Transaction manager: Ensures that the database remains in a consistent state
although there are system failures.
– File manager: Manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
– Buffer manager: It is responsible for retrieving data from disk storage into
main memory. It enables the database to handle data sizes that are much
larger than the size of main memory.
– Data structures implemented by storage manager.
– Data files: Stored in the database itself.
– Data dictionary: Stores metadata about the structure of the database.
– Indices: Provide fast access to data items.

Data dictionary
• Data dictionary is an inventory of data elements in a database or data model with
detailed description of its format, relationships, meaning, source and usage.
• A data dictionary is a collection of descriptions of the data objects or items
in a data model for the benefit of programmers and others who need to refer
to them.
• A data dictionary contains
• a list of all files in the database
• the number of records in each file
• and the names and types of each field.
• Most database management systems keep the data dictionary hidden from users to
prevent them from accidentally destroying its contents.
• Elements of Data Dictionary
• Data dictionary is a table with data elements (columns) as rows and their
attributes as columns.
• Specific attributes vary depending on the purpose of the data dictionary.
• Essential elements
Data dictionary has 2 essential elements:
– List of tables (or entities)
– List of columns (or fields, or attributes)

Types of Data Dictionary
• Active Data Dictionary
– The DBMS software manages the active data
dictionary automatically.
– The modification is an automatic task and most
RDBMS has active data dictionary.
– It is also known as integrated data dictionary.
• Passive Data Dictionary
– Managed by the users and is modified manually
when the database structure change.
– Also known as non-integrated data dictionary.

Example-Data Dictionary
Student_ID Student_Name Student_Address
Student_City

Meta Data
• Metadata is simply defined as data about data.
• It means it is a description and context of the data.
• It helps to organize, find and understand data.
• For example:
– Every time you take a photo with today’s cameras a bunch of metadata is gathered
and saved with it.
– Such as
• File name
• Size of the file
• Date and time
• Camera settings etc.
• Meta data in Relational database:
Relational databases store and provide access not only data but also
metadata in a structure called data dictionary or system catalog. It holds
information about:
• tables,
• columns,
• data types,
• table relationship,
• constraints etc.

Data Models
• Data models defines
– How the logical structure of a database is
modelled.
– How data is connected to each other and how
they are processed and stored inside the system.

Why use Data Model?
• Ensures that all data objects required by the database are
accurately represented.
• Omission of data will lead to creation of faulty reports and produce
incorrect results.
• A data model helps to design the database at the conceptual,
physical and logical levels.
• Data Model structure helps to define the relational tables, primary
and foreign keys and stored procedures.
• It provides a clear picture of the base data and can be used by
database developers to create a physical database.
• It is also helpful to identify missing and redundant data.
• Though the initial creation of data model is laborious and time
consuming, in the long run, it makes your IT infrastructure upgrade
and maintenance cheaper and faster.

Types of Data Model
• 1) Relational Data Model:
– This type of model designs the data in the form of
rows and columns within a table.
– Thus, a relational model uses tables for
representing data and in-between relationships.
– Tables are also called relations.
– This model was initially described by Edgar F.
Codd, in 1969.
– The relational data model is the widely used
model which is primarily used by commercial data
processing applications.

Types of Data Model
• 2) Entity-Relationship Data Model:
– An ER model is the logical representation of data as
objects and relationships among them.
– These objects are known as entities, and relationship is an
association among these entities.
– This model was designed by Peter Chen and published in
1976.
– It was widely used in database designing.
– A set of attributes describe the entities.
– For example, student_name, student_id describes the
'student' entity.
– A set of the same type of entities is known as an 'Entity
set', and the set of the same type of relationships is known
as 'relationship set'.

Types of Data Model
• 3) Object-based Data Model:
– An extension of the ER model with notions of
functions, encapsulation, and object identity.
– This model supports a rich type system that
includes structured and collection types.
– In 1980, various database systems following
the object-oriented approach were developed.
– Here, the objects are nothing but the data
carrying its properties.

Types of Data Model
• 4) Semi-structured Data Model:
– This type of data model is different from the other
three data models.
– The semi-structured data model allows the data
specifications at places where the individual data
items of the same type may have different attributes
sets.
– The Extensible Markup Language, also known as
XML, is widely used for representing the semi-
structured data.
– Although XML was initially designed for including the
markup information to the text document, it gains
importance because of its application in the exchange
of data.

DBMS languages
• Database languages are used to read, update and store data in a
database. There are several such languages that can be used for this
purpose; one of them is SQL.
• Types of DBMS languages:

Data Definition Language (DDL)
• DDL is used for specifying the database schema. It is used
for creating tables, schema, indexes, constraints etc. in
database.
• The operations that we can perform on database using
DDL:
• To create the database instance – CREATE
• To alter the structure of database – ALTER
• To drop database instances – DROP
• To delete tables in a database instance – TRUNCATE
• To rename database instances – RENAME
• To drop objects from database such as tables – DROP
• To Comment – Comment

Data Manipulation Language (DML)
• DML is used for accessing and manipulating
data in a database. The following operations
on database comes under DML:
• To read records from table(s) – SELECT
• To insert record(s) into the table(s) – INSERT
• Update the data in table(s) – UPDATE
• Delete all the records from the table – DELETE

Data Control language (DCL)
• DCL is used for granting and revoking user
access on a database –
• To grant access to user – GRANT
• To revoke access from user – REVOKE
• In practical data definition language, data
manipulation language and data control
languages are not separate language, rather
they are the parts of a single database
language such as SQL.

Transaction Control Language(TCL)
• The changes in the database that we made
using DML commands are either performed or
rollbacked using TCL.
• To persist the changes made by DML
commands in database – COMMIT
• To rollback the changes made to the database
– ROLLBACK

DBA
• A Database Administrator (DBA) is individual or person
responsible for controlling, maintenance, coordinating, and
operation of database management system.
• Managing, securing, and taking care of database system is
primary responsibility.
• They are responsible for authorizing access to database,
coordinating, capacity, planning, installation, and monitoring
uses and for acquiring and gathering software and hardware
resources as and when needed.
• Their role also varies from configuration, database design,
migration, security, troubleshooting, backup, and data
recovery.
• Database administration is major and key function in any firm
or organization that is relying on one or more databases.
• They are overall commander of Database system.

Responsibilities of DBA
• Installing and upgrading the database server and application tools
• Allocating system storage and planning storage requirements for
the database system
• Modifying the database structure, as necessary, from information
given by application developers
• Enrolling users and maintaining system security
• Ensuring compliance with database vendor license agreement
• Controlling and monitoring user access to the database
• Monitoring and optimizing the performance of the database
• Planning for backup and recovery of database information
• Maintaining archived data
• Backing up and restoring databases
• Contacting database vendor for technical support
• Generating various reports by querying from database as per need
• Managing and monitoring data replication

Data Independence
• Data independence can be explained
using the three-schema architecture.
• Data independence refers characteristic of
being able to modify the schema at one
level of the database system without
altering the schema at the next higher
level.
• There are two types of data
independence:
• Logical data independence
• Physical data independence

Physical Data Independence
• Physical data independence allows changes in the physical
storage devices or organisation of the files to be made
without requiring changes in the conceptual view or any of
the external views.
• If we do any changes in the storage size of the
database system server, then the Conceptual structure
of the database will not be affected.
• Thus, the files may migrate from one type of physical media
to another or the file structure may change without any
need for changes in the application programs.
• For example, in case we want to change or upgrade the
storage system itself − suppose we want to replace hard-
disks with SSD − it should not have any impact on the
logical data or schemas.

Logical Data Independence
• Logical data is data about database, that is, it stores
information about how data is managed inside.
• For example, a table (relation) stored in the database and
all its constraints, applied on that relation.
• Logical data independence implies that application
programs need not be changed if fields are added to an
existing record; nor do they have to be changed if fields not
used by application programs are deleted.
• If we do some changes on table format, it should not
change the data residing on the disk
• Logical data independence indicates that the conceptual
schema can be changed without affecting the existing
external schemas.

ER Modelling
• An Entity–relationship model (ER model) describes the structure of a database
with the help of a diagram, which is known as Entity Relationship Diagram (ER
Diagram).
• An ER model is a design or blueprint of a database that can later be implemented
as a database.
• It develops a conceptual design for the database.
• The main components of E-R model are: entity set and relationship set.
• An ER diagram shows the relationship among entity sets.
• An entity set is a group of similar entities and these entities can have attributes.
• In terms of DBMS, an entity is a table or attribute of a table in database, so by
showing relationship among tables and their attributes, ER diagram shows the
complete logical structure of a database.

Components of ER Diagram
• Entity
– Weak Entity
– Strong Entity
• Attribute
– Single valued
– Multivalued attribute(mobile No.)
– Composite attribute(name=fname+lname)
– Derived attribute
• Relation
– One to one
– One to many
– Many to one
– Many to many

Entity
• An entity may be any object, class, person or
place.
• In the ER diagram, an entity can be
represented as rectangles.
• Consider an organization as an example-
manager, product, employee, department etc.
can be taken as an entity.
• Types of entities
– Weak
– Strong

Weak and Strong Entity
• Weak Entity-
– An entity that depends on another entity called a
weak entity.
– The weak entity doesn't contain any key attribute
of its own.
– The weak entity is represented by a double
rectangle.
• Strong entity- An entity that has a key
attribute is called strong entity and
represented by rectangle.

Attribute
• The attribute is used to describe the
property of an entity. Eclipse is used to
represent an attribute.
• For example, id, age, contact number,
name, etc. can be attributes of a student.

Types of attributes
1. Key attribute
2. Simple and Composite attribute
3. Single Valued and Multivalued attribute
4. Derived attribute

Key Attribute
• A key attribute can uniquely identify an entity from an
entity set.
• For example, student roll number can uniquely identify
a student from a set of students.
• Key attribute is represented by oval same as other
attributes however the text of key attribute is
underlined.

Simple andComposite attribute:
• If an attribute cannot be divided into simpler components,
it is a simple attribute.
• Example for simple attribute : employee_id of an
employee.
• An attribute that is a combination of other attributes is known as
composite attribute.
• For example, In student entity, the student address is a composite
attribute as an address is composed of other attributes such as pin
code, state, country.

Single and Multivalued attribute:
• If an attribute can take only a single value for each entity
instance, it is a single valued attribute. example for single
valued attribute : age of a student. It can take only one
value for a particular student.
• An attribute that can hold multiple values is known as multivalued
attribute.
• It is represented with double ovals in an ER Diagram.
• For example – A person can have more than one phone numbers
so the phone number attribute is multivalued.

Derived Attribute
• A derived attribute is one whose value is dynamic and
derived from another attribute.
• It is represented by dashed oval in an ER Diagram.
• For example – Person age is a derived attribute as it
changes over time and can be derived from another
attribute (Date of birth).

Relationship
• A relationship is represented by diamond shape in
ER diagram, it shows the relationship among
entities.
• There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

One to one relationship
• When a single instance of an entity is associated
with a single instance of another entity then it is
called one to one relationship.
• For example, a person has only one passport and a
passport is given to one person.

One to Many Relationship
• When a single instance of an entity is associated
with more than one instances of another entity then
it is called one to many relationship.
• For example – a customer can place many orders
but a order cannot be placed by many customers.

Many toOne Relationship
• When more than one instances of an entity is
associated with a single instance of another entity
then it is called many to one relationship.
• For example – many students can study in a single
college but a student cannot study in many colleges
at the same time.

Many to Many Relationship
• When more than one instances of an entity is
associated with more than one instances of another
entity then it is called many to many relationship.
• For example, a student can be assigned to many
projects and a project can be assigned to many
students.

Partial andTotal Participation of an
Entity set
• Partial participation
– Consider the relationship - Employee is head of the department.
– Here all employees will not be the head of the department.
– Only one employee will be the head of the department.
– In other words, only few instances of employee entity participate in the relationship. So
employee entity's participation is partial in the said relationship.
• A Total participation of an entity set represents that each entity in entity set must have at
least one relationship in a relationship set.
– For example: In the below diagram each college must have at-least one associated Student.

Example-E-R diagram for hospital

Advantages and Disadvantages of ER
Modeling
• Advantages
– ER Modeling is simple and easily understandable.
– It is represented in business users language and it can be
understood by non-technical specialist.
– Intuitive and helps in Physical Database creation.
– Can be generalized and specialized based on needs.
– Can help in database design.
– Gives a higher level description of the system
• Disadvantages
– Physical design derived from E-R Model may have some
amount of ambiguities or inconsistency.
– Sometime diagrams may lead to misinterpretations

Keys
• Keys play an important role in the relational
database.
• Used to uniquely identify any record or row of
data from the table.
• Used to establish and identify relationships
between tables.
• For example: In Student table, ID is used as
a key because it is unique for each student.
In PERSON table, passport_number,
license_number, SSN are keys since they are
unique for each person.

Types of Keys
• Candidate key
• Primary key
• Foreign key
• Alternate key
• Super key
• Composite key
• Surrogate key

Keys
• Candidate Key: The minimal set of attribute which can uniquely identify a
tuple is known as candidate key. For Example, STUD_NO in STUDENT
relation.
– The value of Candidate Key is unique and non-null for every tuple.
– There can be more than one candidate key in a relation. For Example,
STUD_NO as well as STUD_PHONE both are candidate keys for relation
STUDENT.
– The candidate key can be simple (having only one attribute) or composite as
well. For Example, {STUD_NO, COURSE_NO} is a composite candidate key for
relation STUDENT_COURSE.
• Super Key: The set of attributes which can uniquely identify a tuple is
known as Super Key. For Example, STUD_NO, (STUD_NO, STUD_NAME)
etc.
– Adding zero or more attributes to candidate key generates super key.
– A candidate key is a super key but vice versa is not true.
• Primary Key: There can be more than one candidate key in a relation out
of which one can be chosen as primary key. For Example, STUD_NO as well
as STUD_PHONE both are candidate keys for relation STUDENT but
STUD_NO can be chosen as primary key (only one out of many candidate
keys).
• Composite(concatenated) Key: A key that is made up of two or
more attributes.
• Surrogate Key: A key that is added to a table to serve as the

Continue..
• Alternate Key: The candidate key other than primary key is
called as alternate key. For Example, STUD_NO as well as
STUD_PHONE both are candidate keys for relation
STUDENT but STUD_PHONE will be alternate key (only one
out of many candidate keys).
• Foreign Key: If an attribute can only take the values which
are present as values of some other attribute, it will be
foreign key to the attribute to which it refers.
– The relation which is being referenced is called referenced
relation and corresponding attribute is called referenced
attribute and
– the relation which refers to referenced relation is called
referencing relation and corresponding attribute is called
referencing attribute.
– Referenced attribute of referenced relation should be primary
key for it. For Example, STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.

Advanced ER-diagram
• Abstraction is the simplification mechanism
used to hide extra details of a set of objects.
• It allows one to concentrate on the properties
that are of interest to the application.
• There are two main abstraction mechanism
used to model information:
– Generalization
– Specialization

Generalization
• Generalization is the abstracting process of viewing set of objects as a single general class by
concentrating on the general characteristics of the constituent sets while suppressing or
ignoring their differences.
• Generalization is like a bottom-up approach in which two or more entities of lower level
combine to form a higher level entity if they have some attributes in common.
• In generalization, an entity of a higher level can also combine with the entities of the lower
level to form a further higher level entity.
• Generalization is more like subclass and superclass system, but the only difference is the
approach. Generalization uses the bottom-up approach.
• In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
combined to make a superclass.
• For instance, student is a generalization of graduate or undergraduate, full-time or part-time
students.
• Similarly, employee is generalization of the classes of objects cook, waiter, and cashier.
• Generalization is an IS_A relationship; therefore, manager IS_AN employee, cook IS_AN
employee, waiter IS_AN employee, and so forth.

Specialization
• Specialization is a top-down approach, and it is opposite to
Generalization. In specialization, one higher level entity can be
broken down into two lower level entities.
• Specialization is used to identify the subset of an entity set that
shares some distinguishing characteristics.
• Normally, the superclass is defined first, the subclass and its related
attributes are defined next, and relationship set are then added.

Example-Generalization and
specilization

Aggregation
• In aggregation, the relation between two entities is treated as a
single entity. In aggregation, relationship with its corresponding
entities is aggregated into a higher level entity.

Where Aggregation is required?

Reduction of ER diagram to Table
• Entity type becomes a table.
• All single-valued attribute becomes a column for the table.
• A key attribute of the entity type represented by the
primary key.
• The multi-valued attribute is represented by a separate
table.
• Composite attribute represented by components.
• Derived attributes are not considered in the table.
• Using these rules, you can convert the ER diagram to tables
and columns and assign the mapping between the tables.

ER- Diagram For College Database

Conversion of entity sets:
• 1. For each strong entity type E in the ER diagram, we create a relation R
containing all the single attributes of E.
• The primary key of the relation R will be one of the key attribute of R.
– STUDENT(rollno (primary key),name, address)
– FACULTY(id(primary key),name ,address, salary)
– COURSE(course-id,(primary key), course_name, duration)
– DEPARTMENT(dno(primary key),dname)
• 2. for each weak entity type W in the ER diagram, we create another
relation R that contains all simple attributes of W.
• If E is an owner entity of W then key attribute of E is also include In R.
• This key attribute of R is set as a foreign key attribute of R.
• Now the combination of primary key attribute of owner entity type and
partial key of the weak entity type will form the key of the weak entity
type GUARDIAN((rollno,name) (primary key),address,relationship)

Conversion of relationship sets
• One-to-one relationship:
– For each 1:1 relationship type R in the ER-diagram
involving two entities E1 and E2 we choose one of
entities(say E1) preferably with total participation and
add primary key attribute of another E as a foreign key
attribute in the table of entity(E1).
– Include all the simple attributes of relationship type R
in E1 if any,
– For example, the department relationship has been
extended to include head-id and attribute of the
relationship.
– DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)

Conversion of relationship sets:
• One-to-many relationship:
– For each 1:n relationship type R involving two entities E1 and E2, we identify the entity
type (say E1) at the n-side of the relationship type R and include primary key of the
entity on the other side of the relation (say E2) as a foreign key attribute in the table of
E1.
– Include all simple attribute
– For example: The works in relationship between the DEPARTMENT and FACULTY.
– For this relationship choose the entity at N side, i.e, FACULTY and add primary key
attribute of another entity DEPARTMENT, ie, DNO as a foreign key attribute in FACULTY.
– FACULTY(CONSTAINS WORKS_IN RELATIOSHIP) (ID,NAME,ADDRESS,BASIC_SAL,DNO)
• Many-to-many relationship:
– For each m:n relationship type R, we create a new table (say S) to represent R, We also
include the primary key attributes of both the participating entity types as a foreign key
attribute in s.
– Any simple attributes of the m:n relationship type is also included as attributes of S.
– For example: The M:n relationship taught-by between entities COURSE; and FACULTY
shod be represented as a new table. The structure of the table will include primary key
of COURSE and primary key of FACULTY entities.
– TAUGHT-BY(ID (primary key of FACULTY table),course-id (primary key of COURSE table)

• N-ary relationship:
• For each n-ary relationship type R where n>2, create a new table S to represent R
• Include as foreign key attributes in s the primary keys of the relations that
represent the participating entity types.
• We also include any simple attributes of the n-ary relationship type as attributes of
S.
• The primary key of S is usually a combination of all the foreign keys that reference
the relations representing the participating entity types
loan
customer
employee
Loan
sanction

• Multi-valued attributes:
– For each multivalued attribute ‘A’, we create a new
relation R that includes an attribute corresponding to
plus the primary key attributes k of the relation that
represents the entity type or relationship that has as
an attribute.
– The primary key of R is then combination of A and k.
For example, if a STUDENT entity has rollno,name and
phone number where phone numer is a multivalued
attribute then create table PHONE(rollno,phoneno)
where primary key is the combination,
– In the STUDENT table we need not have phone
number, instead it can be simply (rollno,name) only.

• Converting Generalization /specification hierarchy to tables:
– A simple rule for conversion may be to decompose all the specialized
entities into table in case they are disjoint,
– for example, create 3 table as:
• Account(account_no,name,branch,balance)
• Saving account(account-no,intrest)
• Current_account(account-no,charges)

SQL
• SQL is Structured Query Language, which is a
computer language for storing, manipulating and
retrieving data stored in a relational database.
• SQL is the standard language for Relational
Database System. All the Relational Database
Management Systems (RDMS) like MySQL, MS
Access, Oracle, Sybase, Informix, Postgres and
SQL Server use SQL as their standard database
language.

Why SQL?
• SQL is widely popular because it offers the following
advantages −
• Allows users to access data in the relational database
management systems.
• Allows users to describe the data.
• Allows users to define the data in a database and
manipulate that data.
• Allows to embed within other languages using SQL
modules, libraries & pre-compilers.
• Allows users to create and drop databases and tables.
• Allows users to create view, stored procedure,
functions in a database.
• Allows users to set permissions on tables, procedures
and views.

Characteristics of SQL
• SQL is extremely flexible.
• SQL uses a free form syntax that gives the
ability to user to structure the SQL statements
in a best suited way.
• It is a high level language.
• It receives natural extensions to its functional
capabilities.
• It can execute queries against the database.

Advantages of SQL
• SQL provides a greater degree of abstraction than
procedural language.
• It is coded without embedded data-navigational
instructions.
• It enables the end users to deal with a number of
database management systems where it is
available.
• It retrieves quickly and efficiently huge amount of
records from a database.
• No coding required while using standard SQL.

DDL
• DDL stands for Data Definition Language.
• It is a language used for defining and modifying the data and its
structure.
• It is used to build and modify the structure of your tables and other
objects in the database.
• DDL commands are as follows,
1. CREATE
2. DROP
3. ALTER
4. RENAME
5. TRUNCATE
These commands can be used to add, remove or modify tables
within a database.
• DDL has pre-defined syntax for describing the data.

CREATE COMMAND
• CREATE TABLE <table_name>
( column_name1 datatype,
column_name2 datatype,
.
.
.
column_name_n datatype
);

DROP COMMAND
• DROP command allows to remove entire
database objects from the database.
• It removes entire data structure from the
database.
• It deletes a table, index or view.
•
Syntax:
DROP TABLE <table_name>;
OR
DROP DATABASE <database_name>;

ALTER COMMAND
• An ALTER command allows to alter or modify the
structure of the database.
• It modifies an existing database object.
• Using this command, you can add additional
column, drop existing column and even change
the data type of columns.
• Syntax:
ALTER TABLE <table_name>
ADD <column_name datatype>;

RENAME COMMAND
• RENAME command is used to rename an
object.
• It renames a database table.
• Syntax:
RENAME TABLE <old_name> TO
<new_name>;

TRUNCATE COMMAND
• TRUNCATE command is used to delete all the rows
from the table permanently.
• It removes all the records from a table, including all
spaces allocated for the records.
• This command is same as DELETE command, but
TRUNCATE command does not generate any rollback
data.
• Syntax:
TRUNCATE TABLE <table_name>;
Example:
TRUNCATE TABLE employee;

SQL Data Manipulation Language
(DML)
• Introduction to DML
• DML stands for Data Manipulation Language.
• It is a language used for selecting, inserting, deleting
and updating data in a database.
• It is used to retrieve and manipulate data in a relational
database.
• DML commands are as follows,
1. SELECT
2. INSERT
3. UPDATE
4. DELETE
DML performs read-only queries of data.

Basic Structure
• Basic structure of an SQL expression consists of select, from and
where clauses.
– select clause lists attributes to be copied - corresponds to relational
algebra project.
– from clause corresponds to Cartesian product - lists relations to be
used.
– where clause corresponds to selection predicate in relational algebra.
• Typical query has the form
• select A1,A2, A3,....from r1,r2,r3,..... where P
• where each Ai represents an attribute, each ri a relation, and P is a
predicate.
• This query is equivalent to the relational algebra expression:
• ΠA1,A2, ..., An(P(r1r2...rm))

Structure of DML statement
• Select [attributelist][*] from [relations/tables]
where [predicate/condition]
• Insert into tablename
([attributelist])values(value1,value2,....)
OR
• Insert into tablename
values(value1,value2,....))
• Delete * from tablename where predicate

Aggregate Functions
• These functions operate on the multiset of
values of a column of arelation, and return a
value
• avg: average value
• min: minimum value
• max:maximum value
• sum: sum of values
• count: number of values

• Find the average account balance at the
Perryridge branch.
• select avg(balance) from account where branch-
name= “Perryridge”
• Find the number of tuples in the customer
relation.
• select count(*) from customer
• Find the number of depositors in the bank.
• select count(distinctcustomer-name) from
depositor

SET Operations in SQL
• SQL supports few Set operations which can be
performed on the table data. These are used
to get meaningful results from data stored in
the table, under different special conditions.
• UNION
• UNION ALL
• INTERSECT
• MINUS

UNION Operation
• UNION is used to combine the results of two
or more SELECT statements. However it will
eliminate duplicate rows from its resultset. In
case of union, number of columns and
datatype must be same in both the tables, on
which UNION operation is being applied.

UNION ALL
• This operation is similar to Union. But it also
shows the duplicate rows.

INTERSECT
• Intersect operation is used to combine two
SELECT statements, but it only retuns the
records which are common from both SELECT
statements. In case of Intersect the number of
columns and datatype must be same.

ID NAME
1 abhi
2 adam
ID NAME
2 adam
3 Chester
The Second table
The First table
ID NAME
2 adam
Result of MINUS will be

Set difference(MINUS)
• The Minus operation combines results of two
SELECT statements and return only those in the
final result, which belongs to the first set of the
result.(returns unique records from table1)

ID NAME
1 abhi
2 adam
ID NAME
2 adam
3 Chester
The Second table
The First table
ID NAME
1 abhi
Result of MINUS will be

nested sub queries
• A Subquery or Inner query or a Nested query is a
query within another SQL query and embedded
within the WHERE clause.
• A subquery is used to return data that will be
used in the main query as a condition to further
restrict the data to be retrieved.
• Subqueries can be used with the SELECT, INSERT,
UPDATE, and DELETE statements along with the
operators like =, <, >, >=, <=, IN, BETWEEN, etc.

• There are a few rules that subqueries must follow −
• Subqueries must be enclosed within parentheses.
• A subquery can have only one column in the SELECT clause, unless
multiple columns are in the main query for the subquery to
compare its selected columns.
• An ORDER BY command cannot be used in a subquery, although the
main query can use an ORDER BY. The GROUP BY command can be
used to perform the same function as the ORDER BY in a subquery.
• Subqueries that return more than one row can only be used with
multiple value operators such as the IN operator.
• A subquery cannot be immediately enclosed in a set function.
• The BETWEEN operator cannot be used with a subquery. However,
the BETWEEN operator can be used within the subquery.

• SELECT column_name [, column_name ]
FROM table1 [, table2 ] WHERE column_name
OPERATOR (SELECT column_name [,
column_name ] FROM table1 [, table2 ]
[WHERE])

Using GRANT and REVOKE
• Data Control Language(DCL) is used to control
privileges in Database. To perform any operation in the
database, such as for creating tables, sequences or
views, a user needs privileges. Privileges are of two
types,
• System: This includes permissions for creating session,
table, etc and all types of other system privileges.
• Object: This includes permissions for any command or
query to perform any operation on the database
tables.
• GRANT: Used to provide any user access privileges or
other priviliges for the database.
• REVOKE: Used to take back permissions from any user.

Transaction Control language(TCL)
• Transaction Control Language(TCL) commands are used to manage
transactions in the database.
• These are used to manage the changes made to the data in a table
by DML statements.
• It also allows statements to be grouped together into logical
transactions.
• COMMIT command
• COMMIT command is used to permanently save any transaction
into the database.
• When we use any DML command like INSERT, UPDATE or DELETE,
the changes made by these commands are not permanent, until the
current session is closed, the changes made by these commands
can be rolled back.
• To avoid that, we use the COMMIT command to mark the changes
as permanent. Sql> commit;

ROLLBACK command
• This command restores the database to last
commited state. It is also used with SAVEPOINT
command to jump to a savepoint in an ongoing
transaction.
• If we have used the UPDATE command to make
some changes into the database, and realise that
those changes were not required, then we can
use the ROLLBACK command to rollback those
changes, if they were not commited using the
COMMIT command.
• SQL>ROLLBACK TO savepoint_name;

SAVEPOINT command
• SAVEPOINT command is used to temporarily
save a transaction so that you can rollback to
that point whenever required.

DBMS basics and normalizations unit.pptx

More Related Content

Similar to DBMS basics and normalizations unit.pptx

Recently uploaded

DBMS basics and normalizations unit.pptx