KEMBAR78
Self Doc DBMS | PDF | Databases | Career & Growth
0% found this document useful (0 votes)
33 views24 pages

Self Doc DBMS

Self_Doc_DBMS

Uploaded by

Pratibha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views24 pages

Self Doc DBMS

Self_Doc_DBMS

Uploaded by

Pratibha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

DBMS - OVERVIEW

Database is a collection of related data and data is a collection of facts and figures that can be processed to
produce information.
Mostly data represents recordable facts. Data aids in producing information, which is based on facts. For
example, if we have data about marks obtained by all students, we can then concludeabout toppers and
average marks.
A database management system stores data in such a way that it becomes easier to retrieve,manipulate, and
produce information.

Characteristics
Traditionally, data was organized in file formats. DBMS was a new concept then, and all the research was done
to make it overcome the deficiencies in traditional style of data management.
A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-world entities to design its architecture.
It uses the behavior and attributes too. For example, a school database may use students as an entity and
their age as an attribute.
• Relation-based tables − DBMS allows entities and relations among them to form tables. Auser can
understand the architecture of a database just by looking at the table names.
• Isolation of data and application − A database system is entirely different than its data. A database is an
active entity, whereas data is said to be passive, on which the database works and organizes. DBMS also
stores metadata, which is data about data, to ease its own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a relation when any of its
attributes is having redundancy in values. Normalization is a mathematically rich and scientific process that
reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database remains consistent.There exist
methods and techniques, which can detect attempt of leaving database in inconsistent state. A DBMS can
provide greater consistency as compared to earlier forms of data storing applications like file-processing
systems.
• Query Language − DBMS is equipped with query language, which makes it more efficient to retrieve and
manipulate data. A user can apply as many and as different filtering options asrequired to retrieve a set of
data. Traditionally it was not possible where file-processing system was used.
• ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability normally
shortened as ACID. These concepts are applied on transactions, which manipulate data in a database. ACID
properties help the database stay healthy in multi-transactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user environment and allowsthem to access and
manipulate data in parallel. Though there are restrictions on transactions when users attempt to handle the
same data item, but users are always unaware of them.
• Multiple views − DBMS offers multiple views for different users. A user who is in the Salesdepartment will
have a different view of database than a person working in the Production department. This feature enables
the users to have a concentrate view of the database according to their requirements.
• Security − Features like multiple views offer security to some extent where users are unable to access data of
other users and departments. DBMS offers methods to impose constraints while entering data into the database
and retrieving the same at a later stage. DBMS offers many different levels of security features, which enables
multiple users to have different views with different features. For example, a user in the Sales department
cannot see the data that belongs to the Purchase department. Additionally, it can also be managed how much
data of the Sales department should be displayed to the user. Since a DBMS is not saved on the disk as
traditional file systems, it is very hard for miscreants to break the code.
DBMS - DATA MODELS
Data models define how the logical structure of a database is modeled. Data Models are fundamental entities to
introduce abstraction in a DBMS. Data models define how data is connected to each other and how they are processed
and stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in the same plane. Earlier
data models were not so scientific, hence they were prone to introduce lots of duplication and update anomalies.
Entity-Relationship Model
Entity-Relationship ER Model is based on the notion of real-world entities and relationships among them. While
formulating real-world scenario into the database model, the ER Model creates entity set, relationship set, general
attributes and constraints.

ER Model is best used for the conceptual design of a database.


ER Model is based on −
• Entities and their attributes.
• Relationships among entities.

These concepts are explained below.

• Entity − An entity in an ER Model is a real-world entity having properties called attributes. Every
attribute is defined by its set of values called domain. For example, in a school database, a student is
considered as an entity. Student has various attributes like name, age,class, etc.
• Relationship − The logical association among entities is called relationship. Relationshipsare mapped
with entities in various ways. Mapping cardinalities define the number of associations between two
entities.
Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model
is based on first-order predicate logic and defines a table as an n-ary relation.

The main highlights of this model are −


• Data is stored in tables called relations. Relations can be normalized.
• In normalized relations, values saved are atomic values. Each row in a relation contains a unique value.
• Each column in a relation contains values from a same domain.
ER MODEL - BASIC CONCEPTS
The ER model defines the conceptual view of a database. It works around real-world entities and the associations
among them. At view level, the ER model is considered a good option for designing databases.

Entity

An entity can be a real-world object, either animate or inanimate, that can be easily identifiable.For example, in a
school database, students, teachers, classes, and courses offered can be considered as entities. All these entities have
some attributes or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar
values. For example, a Students set may contain all the students of a school; likewise a Teachers set may contain all the
teachers of a school from all faculties. Entitysets need not be disjoint.

Attributes

Entities are represented by means of their properties, called attributes. All attributes have values.For example, a
student entity may have name, class, and age as attributes.

There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a
numeric value. It has to be alphabetic. A student's age cannot benegative, etc.

Types of Attributes
• Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a
student's phone number is an atomic value of 10 digits.
• Composite attribute − Composite attributes are made of more than one simple attribute. For example, a
student's complete name may have first_name and last_name.
• Derived attribute − Derived attributes are the attributes that do not exist in the physical database, but
their values are derived from other attributes present in the database. For example, average_salary in a
department should not be saved directly in the database, instead it can be derived. For another example,
age can be derived from data_of_birth.
• Single-value attribute − Single-value attributes contain single value. For example −
Social_Security_Number.
• Multi-value attribute − Multi-value attributes may contain more than one values. For example, a person
can have more than one phone number, email_address, etc.
These attribute types can come together in a way like –
• simple single-valued attributes.
• simple multi-valued attributes
• composite single-valued attributes
• composite multi-valued attributes
Entity-Set and Keys
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
For example, the roll_number of a student makes him/her identifiable among students.
• Super Key − A set of attributes one or more that collectively identifies an entity in an entity set.
• Candidate Key − A minimal super key is called a candidate key. An entity set may have more than one candidate
key.
• Primary Key − A primary key is one of the candidate keys chosen by the database designer to uniquely identify
the entity set.
Relationship
The association among entities is called a relationship. For example, an employee works_at a
department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
• Binary = degree 2
• Ternary = degree 3
• n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be associated with the number of
entities of other set via relationship set.
• One-to-one − One entity from entity set A can be associated with at most one entity ofentity set B and
vice versa.

• One-to-many − One entity from entity set A can be associated with more than one entities of entity set B
however an entity from entity set B, can be associated with at most one entity.

• Many-to-one − More than one entities from entity set A can be associated with at most oneentity of entity
set B, however an entity from entity set B can be associated with more than one entity from entity set A.

• Many-to-many − One entity from A can be associated with more than one entity from B and vice
versa
RELATION DATA MODEL
Relational data model is the primary data model, which is used widely around the world for data storage and
processing. This model is simple and it has all the properties and capabilities required to process data with
storage efficiency.
Concepts

Tables − In relational data model, relations are saved in the format of Tables. This format stores the relation
among entities. A table has rows and columns, where rows represents records and columns represent the
attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation instance. Relation
instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name tablename, attributes, and their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify the row in the
relation table uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.
Constraints

Every relation has some conditions that must hold for it to be a valid relation. These conditions are called
Relational Integrity Constraints.
There are three main integrity constraints −
• Key constraints
• Domain constraints
• Referential integrity constraints
Key Constraints

There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This
minimal subset of attributes is called key for that relation. If there are more than one such minimal subsets,
these are called candidate keys.
Key constraints force that −
• in a relation with a key attribute, no two tuples can have identical values for key attributes.
• a key attribute can not have NULL values.

Key constraints are also referred to as Entity Constraints.


Domain Constraints

Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The same
constraints have been tried to employ on the attributes of a relation. Every attribute is bound to have a specific
range of values. For example, age cannot be less than zero and telephone numbers cannot contain a digit outside
0-9.
Referential integrity Constraints

Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a relation
that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, then
that key element must exist.
Difference between Fact Table and Dimension Table
A reality or fact table’s record could be a combination of attributes from totally different dimension tables. The Fact
Table or Reality Table helps the user to investigate the business dimensions that helps him in call taking to enhance
his business.
On the opposite hand, Dimension Tables facilitate the reality table or fact table to gather dimensions on that the
measures needs to be taken.
The main difference between fact table or reality table and the Dimension table is that dimension table contains
attributes on that measures are taken actually table.

Difference between Fact Table and Dimension Table:


S.NO Fact Table Dimension Table

Fact table contains the measuring of the Dimension table contains the attributes on that truth table
1.
attributes of a dimension table. calculates the metric.

In fact table, There is less attributes than


2. While in dimension table, There is more attributes than fact table.
dimension table.

In fact table, There is more records than


3. While in dimension table, There is less records than fact table.
dimension table.

4. Fact table forms a vertical table. While dimension table forms a horizontal table.

The attribute format of fact table is in numerical


5. While the attribute format of dimension table is in text format.
format and text format.

6. It comes after dimension table. While it comes before fact table.

The number of fact table is less than dimension


7. While the number of dimension is more than fact table in a schema.
table in a schema.

It is used for analysis purpose and decision While the main task of dimension table is to store the information
8.
making. about a business and its process.
Aggregate Fact Tables:
• Aggregate fact tables are a special kind of fact tables in a data warehouse which contains new metrics
which are been derived from one or more aggregate functions (COUNT, AVERAGE, MIN, MAX, etc.) or from
some specialized functions whose outputs are totally derived from a grouping of base data.
• Aggregates are basically summarization of the fact related data which are been used as a purpose to
improve the performance.
• These new metrics, called as “aggregate facts” or “summary statistics” are been stored and maintained in
database of the data warehouse in special fact table at the grain of the aggregation.
• In similar way, the corresponding dimensions are been rolled up and compressed to match the new grain
of the fact.
• These specialized tables are been used as an substitutions whenever possible for returning user queries.
The reason is the speed.
• Querying a neat aggregate table is much faster and uses less of the disk I/O than the base, atomic fact
table, especially when the dimensions are large as well.
• If you want to amaze your users then start adding the aggregates.
• Even you can use this technique in your operational systems as well, giving boost to the foundational
reports.
EXAMPLE:

EXAMPLE

Limitations of Aggregate Fact Tables:


• Does not support exploratory analysis.
• Must be reaggregated each and every time when there is been certain change in source data so that the
changes can be reflected in the data warehouse.
• The narrow capability leads to low and limited interactive use
Types of Dimensions in Data Warehouse Model
1. Slowly Changing Dimensions
2. Rapidly Changing Dimensions
3. Junk Dimensions
4. Stacked dimensions
5. Inferred Dimensions
6. Conformed Dimensions
7. Degenerate Dimensions
8. Role-Playing Dimensions
9. Shrunken Dimensions
10. Static Dimensions
Slowly Changing Dimensions
It depends on the business requirement, where any particular feature history of changes in the data warehouse is
preserved. It is called a slowly changing feature, and a quality dimension is called a slowly changing dimension.
Rapidly Changing Dimensions
A dimension attribute change is a rapidly changing feature. If we do not need to track changes, rapid quality is not a
problem. If you need to follow the changes, then using the standard slowly changing amplitude technique can cause
massive amplitude size inflation. The solution moves the attribute to its dimension, with a different foreign key. The new
dimension is called a rapidly changing size.
Junk Dimensions
A junk dimension fact table is a single table with the combination of attributes to avoid multiple foreign keys. Junk
dimensions are created to manage foreign dimensions, that are created by rapidly changing dimensions.

Note that the junk dimension is always of type 0 (constant). The site's name has the word junk, usually after dim instead
of the end. The data type is consistent, that is, for the Y / N column, it is either bit or CHAR (1), but
neither INT nor VARCHAR (N) - my preference is CHAR (1). A junk dimension has not a business key.

Stacked dimension

A stacked dimension is a used where two or more dimensions are combined with an aspect:

Transaction_type_key Transaction_type Category

0 Unknown Unknown

1 Direct sale Transaction type

2 Refund Transaction type

3 Purchase Transaction type

4 eCommerce Transaction type

5 Down payment Payment type

6 Full delivery Payment type

7 On delivery Payment type

A stacked dimension has one or two attributes and is always SCD type 0 (no update).

We see many types and status columns: Product Type, Customer Status, Store Type, Security Type, Security Class, Broker
Type, etc. All the columns store to their respective dimensions because they are the properties of the dimension.
Deferred Dimension
When loading a fact record, a dimension record may not be ready yet. It is technically called an inferior member but is
often called as sensible dimension.
Distorted Dimension
A dimension that is used in many places is called a distorted dimension. A conformal dimension can be used in a single
database or multiple fact tables in multiple data warts or data warehouses.
Degenerate Dimension
A degenerate dimension happens when the dimension attribute is stored as part of a fact table, but not in a separate
dimension table. These dimensions are keys for which there is no other attribute. In a data warehouse, they are often used
as the result of a query to analyse the source of numbers that are collected in a report. We use these values to detect
transactions in an OLTP system.
Role-playing Dimension
A role-playing dimension is one where the same dimension key includes more than one foreign key in the fact table. For
example, a fact table contains foreign keys for both the ship date and delivery date. But the same dimension data attributes
are applied to every foreign key. So, we can join the same dimension table for both foreign keys. Here, the date dimension
is taking many roles to map the ship date and the delivery date.
Shrunken Dimension
A shrunken dimension is a subset of another aspect. For example, the order fact table includes a foreign key for the
product. The target fact table can add a foreign key to the product category, which is in the product table but less granular-
creating a small dimension table with a product category because its primary key is a way to deal with this situation of
many grains. If the product dimension is a snowflake, then there is a separate table for the product category, a shrunken
dimension.
Fixed Dimension
Static dimensions are not extracted from the real data source but created in the data warehouse context

Slowly Changing Dimensions

A slowly changing dimension is a set of data attributes that change slowly over a period of time rather than changing
regularly e.g. address or name. These attributes can change over a period of time and that will get combined as a slowly
changing dimension. These dimension can be classified in types:[3]

• Type 0 (Retain original): Attributes never change. No history.


• Type 1 (Overwrite): Old values are overwritten with new values for attribute. No history.
• Type 2 (Add new row): A new row is created with either a start date / end date or a version for a new value.
This creates history.
• Type 3 (Add new attribute): A new column is created for a new value. History is limited to the number of
columns designated for storing historical data.
• Type 4 (Add history table): One table keeps the current value, while the history is saved in a second table.
• Type 5 (Combined Approach 1 + 4): Combination of type 1 and type 4. History is created through a second
history table.
• Type 6 (Combined Approach 1 + 2 + 3): Combination of type 1, type 2 and type 3. History is created through
separate row and attributes.
• Type 7 (Hybrid Approach): Both surrogate and natural key are used.[4]
Star Schema , Snowflake Schema and Fact Constellation Schema
Star Schema: Star schema is the type of multidimensional model which is used for data warehouse. In star schema,
The fact tables and the dimension tables are contained. In this schema fewer foreign-key join is used. This schema
forms a star with fact table and dimension tables.

Snowflake Schema: Snowflake Schema is also the type of multidimensional model which is used for data warehouse.
In snowflake schema, The fact tables, dimension tables as well as sub dimension tables are contained. This schema
forms a snowflake with fact tables, dimension tables as well as sub-dimension tables.

Fact Constellation Schema: The fact constellation schema is also a type of multidimensional model. The fact
constellation schema consists of dimension tables that are shared by several fact tables. The fact constellation schema
consists of more than one star schema at a time. Unlike the snowflake schema, the planetarium schema is not really
easy to operate, as it has multiple numbers between tables. Unlike the snowflake schema, the constellation schema, in
fact, uses heavily complex queries to access data from the database.
Star Vs Snowflake
S.NO Star Schema Snowflake Schema

In star schema, The fact tables and the While in snowflake schema, The fact tables, dimension tables as well
1.
dimension tables are contained. as sub dimension tables are contained.

2. Star schema is a top-down model. While it is a bottom-up model.

3. Star schema uses more space. While it uses less space.

While it takes more time than star schema for the execution of
4. It takes less time for the execution of queries.
queries.

5. In star schema, Normalization is not used. While in this, Both normalization and denormalization are used.

6. It’s design is very simple. While it’s design is complex.

While the query complexity of snowflake schema is higher than star


7. The query complexity of star schema is low.
schema.

8. It’s understanding is very simple. While it’s understanding is difficult.

9. It has less number of foreign keys. While it has more number of foreign keys.

10. It has high data redundancy. While it has low data redundancy.

Snowflake Vs Fact
S.NO Snowflake Schema Fact Constellation

Snowflake schema contains the large central fact table, While in fact constellation schema, dimension tables are
1.
dimension tables and sub dimension tables. shared by many fact tables.

2. Snowflake schema saves significant storage. While fact constellation schema does not save storage.

The snowflake schema consists of one star schema at a Whereas the fact constellation schema consists of more
3.
time. than one star schema at a time.

In fact constellation schema, the tables are tough to


4. In snowflake schema, tables can be maintained easily.
maintain.

While fact constellation schema is a normalized form of


5. Snowflake schema is a normalized form of star schema.
snowflake schema and star schema.

Snowflake schema is easy to operate as compared to Fact constellation schema is not easy to operate as
6. fact constellation schema as it has less number of joins compared to snowflake schema as it has multiple number
between the tables. of joins between the tables.

In snowflake schema, to access the data from database While in fact constellation schema, to access the data from
7.
simple and less complex query is used. database heavier complex query is used.
Star Vs Fact

S.NO Star Schema Fact constellation schema

A star schema depicts each dimension with only one- While in this, dimension tables are shared by many fact
1.
dimension table. tables.

In star schema, tables can be maintained easily in While in fact constellation schema, tables cannot be
2.
comparison of fact constellation schema. maintained easily comparatively.

Whereas it is a normalized form of star and snowflake


3. Star schema does not use normalization.
schema.

In star schema, simple queries are used to access data While in this, heavily complex queries are used to access
4.
from the database. data from the database.

Star schema is easy to operate as compared to fact While fact constellation schema is not easy to operate
5. constellation schema as it has less number of joins as compared to star schema as it has many joins
between the tables. between the tables.

Star schema uses less space as compared to fact While fact constellation schema uses more space
6.
constellation schema. comparatively.

While it is very difficult to understand due to its


7. It is very simple to understand due to its simplicity.
complexity.
Difference between OLAP and OLTP in DBMS
OLAP stands for Online Analytical Processing. OLAP systems have the capability to analyze database information of
multiple systems at the current time. The primary goal of OLAP Service is data analysis and not data processing.

OLTP stands for Online Transaction Processing. OLTP has the work to administer day-to-day transactions in any
organization. The main goal of OLTP is data processing not data analysis.

Online Analytical Processing (OLAP)


Online Analytical Processing (OLAP) consists of a type of software tool that is used for data analysis for business
decisions. OLAP provides an environment to get insights from the database retrieved from multiple database systems at
one time.

OLAP Examples
Any type of Data Warehouse System is an OLAP system. The uses of the OLAP System are described below.

• Spotify analyzed songs by users to come up with a personalized homepage of their songs and playlist.
• Netflix movie recommendation system.

Benefits of OLAP Services


• OLAP services help in keeping consistency and calculation.
• We can store planning, analysis, and budgeting for business analytics within one platform.
• OLAP services help in handling large volumes of data, which helps in enterprise-level business applications.
• OLAP services help in applying security restrictions for data protection.
• OLAP services provide a multidimensional view of data, which helps in applying operations on data in various
ways.
Drawbacks of OLAP Services
• OLAP Services requires professionals to handle the data because of its complex modeling procedure.
• OLAP services are expensive to implement and maintain in cases when datasets are large.
• We can perform an analysis of data only after extraction and transformation of data in the case of OLAP which
delays the system.
• OLAP services are not efficient for decision-making, as it is updated on a periodic basis.
Online Transaction Processing (OLTP)
Online transaction processing provides transaction-oriented applications in a 3-tier architecture. OLTP administers the
day-to-day transactions of an organization.

OLTP Examples
An example considered for OLTP System is ATM Center a person who authenticates first will receive the amount first and
the condition is that the amount to be withdrawn must be present in the ATM. The uses of the OLTP System are
described below.

• ATM center is an OLTP application.


• OLTP handles the ACID properties during data transactions via the application.
• It’s also used for Online banking, Online airline ticket booking, sending a text message, add a book to the
shopping cart.

Benefits of OLTP Services


• OLTP services allow users to read, write and delete data operations quickly.
• OLTP services help in increasing users and transactions which helps in real-time access to data.
• OLTP services help to provide better security by applying multiple security features.
• OLTP services help in making better decision making by providing accurate data or current data.
• OLTP Services provide Data Integrity, Consistency, and High Availability to the data.
Drawbacks of OLTP Services
• OLTP has limited analysis capability as they are not capable of intending complex analysis or reporting.
• OLTP has high maintenance costs because of frequent maintenance, backups, and recovery.
• OLTP Services get hampered in the case whenever there is a hardware failure which leads to the failure of online
transactions.
• OLTP Services many times experience issues such as duplicate or inconsistent data.
Difference between OLAP and OLTP

Category OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)

It is well-known as an online database query It is well-known as an online database modifying


Definition
management system. system.

Data source Consists of historical data from various Databases. Consists of only operational current data.

It makes use of a standard database management


Method used It makes use of a data warehouse.
system (DBMS).

It is subject-oriented. Used for Data Mining,


Application It is application-oriented. Used for business tasks.
Analytics, Decisions making, etc.
Category OLAP (Online Analytical Processing) OLTP (Online Transaction Processing)

Normalized In an OLAP database, tables are not normalized. In an OLTP database, tables are normalized (3NF).

The data is used in planning, problem-solving, and The data is used to perform day-to-day fundamental
Usage of data
decision-making. operations.

It provides a multi-dimensional view of different


Task It reveals a snapshot of present business tasks.
business tasks.

It serves the purpose to extract information for It serves the purpose to Insert, Update, and Delete
Purpose
analysis and decision-making. information from the database.

The size of the data is relatively small as the historical


Volume of data A large amount of data is stored typically in TB, PB
data is archived in MB, and GB.

Relatively slow as the amount of data involved is


Queries Very Fast as the queries operate on 5% of the data.
large. Queries may take hours.

The OLAP database is not often updated. As a The data integrity constraint must be maintained in an
Update
result, data integrity is unaffected. OLTP database.

Backup and It only needs backup from time to time as The backup and recovery process is maintained
Recovery compared to OLTP. rigorously

The processing of complex queries can take a It is comparatively fast in processing because of simple
Processing time
lengthy time. and straightforward queries.

This data is generally managed by CEO, MD, and


Types of users This data is managed by clerksForex and managers.
GM.

Operations Only read and rarely write operations. Both read and write operations.

With lengthy, scheduled batch operations, data is The user initiates data updates, which are brief and
Updates
refreshed on a regular basis. quick.

Nature of
The process is focused on the customer. The process is focused on the market.
audience

Database Design Design with a focus on the subject. Design that is focused on the application.

Productivity Improves the efficiency of business analysts. Enhances the user’s productivity.
DDL (Data Definition Language)

DDL or Data Definition Language actually consists of the SQL commands that can be used to define the database
schema. It simply deals with descriptions of the database schema and is used to create and modify the structure of
database objects in the database.
DDL is a set of SQL commands used to create, modify, and delete database structures but not data. These commands are
normally not used by a general user, who should be accessing the database via an application.
List of DDL Commands

Command Description Syntax

Create database or its objects (table,


CREATE TABLE table_name (column1
CREATE index, function, views, store procedure,
data_type, column2 data_type, ...);
and triggers)

DROP Delete objects from the database DROP TABLE table_name;

ALTER TABLE table_name ADD COLUMN


ALTER Alter the structure of the database
column_name data_type;

Remove all records from a table, including


TRUNCATE all spaces allocated for the records are TRUNCATE TABLE table_name;
removed

COMMENT 'comment_text' ON TABLE


COMMENT Add comments to the data dictionary
table_name;

RENAME TABLE old_table_name TO


RENAME Rename an object existing in the database
new_table_name;
DQL (Data Query Language)

DQL statements are used for performing queries on the data within schema objects. The purpose of the DQL Command
is to get some schema relation based on the query passed to it. We can define DQL as follows it is a component of SQL
statement that allows getting data from the database and imposing order upon it. It includes the SELECT statement.
This command allows getting the data out of the database to perform operations with it. When a SELECT is fired against
a table or tables the result is compiled into a further temporary table, which is displayed or perhaps received by the
program i.e. a front-end.

DQL Command

Command Description Syntax

It is used to retrieve data from SELECT column1, column2, ...FROM


SELECT table_name<br>WHERE condition;
the database

DML(Data Manipulation Language)


The SQL commands that deal with the manipulation of data present in the database belong to DML or Data
Manipulation Language and this includes most of the SQL statements.
It is the component of the SQL statement that controls access to data and to the database. Basically, DCL statements are
grouped with DML statements.

List of DML commands

Command Description Syntax

INSERT INTO table_name (column1, column2, ...) VALUES


INSERT Insert data into a table (value1, value2, ...);

Update existing data within UPDATE table_name SET column1 = value1, column2 = value2
UPDATE WHERE condition;
a table

Delete records from a


DELETE DELETE FROM table_name WHERE condition;
database table

LOCK Table control concurrency LOCK TABLE table_name IN lock_mode;

Call a PL/SQL or JAVA


CALL CALL procedure_name(arguments);
subprogram

EXPLAIN Describe the access path to


EXPLAIN PLAN FOR SELECT * FROM table_name;
PLAN data
DCL (Data Control Language)
DCL includes commands such as GRANT and REVOKE which mainly deal with the rights, permissions, and other controls
of the database system.

List of DCL commands:

Command Description Syntax

Assigns new privileges to a user GRANT privilege_type [(column_list)]


GRANT account, allowing access to specific ON [object_type] object_name TO user
database objects, actions, or functions. [WITH GRANT OPTION];

Removes previously granted privileges REVOKE [GRANT OPTION FOR]


from a user account, taking away their privilege_type [(column_list)] ON
REVOKE
access to certain database objects or [object_type] object_name FROM user
actions. [CASCADE];

TCL (Transaction Control Language)


Transactions group a set of tasks into a single execution unit. Each transaction begins with a specific task and ends when
all the tasks in the group are successfully completed. If any of the tasks fail, the transaction fails.
Therefore, a transaction has only two results: success or failure.

List of TCL Commands

Command Description Syntax

BEGIN BEGIN TRANSACTION


Starts a new transaction
TRANSACTION [transaction_name];

Saves all changes made during the


COMMIT COMMIT;
transaction

Undoes all changes made during the


ROLLBACK ROLLBACK;
transaction

Creates a savepoint within the current


SAVEPOINT SAVEPOINT savepoint_name;
transaction
SQL offers several types of Keys, each serving a distinct purpose in the database ecosystem:
1) Primary Key: It is an identifier for each record in a table. It ensures data uniqueness and serves as a reference for
establishing relationships.
2) Unique Key: Like a Primary Key, a Unique Key enforces uniqueness but allows null values. It's used for columns that
must be unique but might contain missing information.
3) Foreign Key: A Foreign Key establishes a link between two tables based on a standard column. It maintains
referential integrity and enforces relationships between tables.
4) Composite Key: A Composite Key uses multiple columns to create a unique identifier. It's useful when a single
column cannot ensure uniqueness.
5) Candidate Key: Candidate Keys are potential options for Primary Keys. They share the properties of uniqueness
and minimal redundancy.
6) Alternate Key: An Alternate Key is a candidate key that isn't chosen as the Primary Key. It provides additional
options for uniquely identifying records.
7) Super Key: It is a set of attributes that, taken together, uniquely identify records. It can include more details than
necessary for a primary key.
Normal Forms in DBMS
Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may
cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations. Normal forms are
used to eliminate or reduce redundancy in database tables.

Normalization of DBMS
In database management systems (DBMS), normal forms are a series of guidelines that help to ensure that the design of
a database is efficient, organized, and free from data anomalies. There are several levels of normalization, each with its
own set of guidelines, known as normal forms.
Important Points Regarding Normal Forms in DBMS
• First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each table cell should contain
only a single value, and each column should have a unique name. The first normal form helps to eliminate
duplicate data and simplify queries.
• Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key attribute be
dependent on the primary key. This means that each column should be directly related to the primary key,
and not to other columns.
• Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are independent of
each other. This means that each column should be directly related to the primary key, and not to any other
columns in the same table.
• Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each determinant in a
table is a candidate key. In other words, BCNF ensures that each non-key attribute is dependent only on the
candidate key.
• Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does not contain
any multi-valued dependencies.
• Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves decomposing a table into
smaller tables to remove data redundancy and improve data integrity.
Normal forms help to reduce data redundancy, increase data consistency, and improve database performance. However,
higher levels of normalization can lead to more complex database designs and queries. It is important to strike a balance
between normalization and practicality when designing a database.
Advantages of Normal Form
• Reduced data redundancy: Normalization helps to eliminate duplicate data in tables, reducing the amount
of storage space needed and improving database efficiency.
• Improved data consistency: Normalization ensures that data is stored in a consistent and organized manner,
reducing the risk of data inconsistencies and errors.
• Simplified database design: Normalization provides guidelines for organizing tables and data relationships,
making it easier to design and maintain a database.
• Improved query performance: Normalized tables are typically easier to search and retrieve data from,
resulting in faster query performance.
• Easier database maintenance: Normalization reduces the complexity of a database by breaking it down into
smaller, more manageable tables, making it easier to add, modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase database efficiency, and simplify database
design and maintenance.
First Normal Form

If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in first normal form
if it does not contain any composite or multi-valued attribute. A relation is in first normal form if every attribute in that
relation is singled valued attribute.

• Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its
decomposition into 1NF has been shown in table 2.

Example

• Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3

• In the above table Course is a multi-valued attribute so it is not in 1NF. Below Table is in 1NF as there is no
multi-valued attribute

ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form
To be in second normal form, a relation must be in first normal form and relation must not contain any partial dependency.
A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any
candidate key) is dependent on any proper subset of any candidate key of the table. Partial Dependency – If the proper
subset of candidate key determines non-prime attribute, it is called partial dependency.

• Example 1 – Consider table-3 as following below.



STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000

• {Note that, there are many courses having the same course fee} Here, COURSE_FEE cannot alone decide the
value of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO; Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent
on a proper subset of the candidate key, which is a partial dependency and so this relation is not in 2NF. To
convert the above relation to 2NF, we need to split the table into two tables such as :

Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE

Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000

• NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are 100
students taking C1 course, we don’t need to store its Fee as 1000 for all the 100 records, instead, once we
can store it in the second table as the course fee for C1 is 1000.

• Example 2 – Consider following functional dependencies in relation R (A, B , C, D )

AB -> C [A and B together determine C]


BC -> D [B and C together determine D]

In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of AB doesn’t
determine any non-prime attribute.

X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).

Example 1: In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}

Candidate Key: {STUD_NO}

For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form.
To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_AGE) STATE_COUNTRY (STATE, COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate keys in above relation are {A, E, CD,
BC} All attributes are on right sides of all functional dependencies are prime.

Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B->E}

Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of relation, So AC will be
candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key {A, C} in this example and others will be non-
prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite attribute. The
relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate key AC) and
AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of candidate
key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute) and in B->E
(neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super key
or RHS should be prime attribute. So the highest normal form of relation will be 2nd Normal form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both are super keys so above relation is in BCNF.

Third Normal Form


A relation is said to be in third normal form, if we did not have any transitive dependency for non-prime attributes. The
basic condition with the Third Normal Form is that, the relation must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial functional dependency X -> Y:
• X is a Super Key.
• Y is a Prime Attribute ( this means that element of Y is some part of Candidate Key).

BCNF
BCNF (Boyce-Codd Normal Form) is just a advanced version of Third Normal Form. Here we have some additional rules
than Third Normal Form. The basic condition for any relation to be in BCNF is that it must be in Third Normal Form.
We have to focus on some basic rules that are for BCNF:

1. Table must be in Third Normal Form.


2. In relation X->Y, X must be a superkey in a relation.

Fourth Normal Form


Fourth Normal Form contains no non-trivial multivaued dependency except candidate key. The basic condition with
Fourth Normal Form is that the relation must be in BCNF.
The basic rules are mentioned below.
1. It must be in BCNF.
2. It does not have any multi-valued dependency.

Fifth Normal Form


Fifth Normal Form is also called as Projected Normal Form. The basic conditions of Fifth Normal Form is mentioned below.
Relation must be in Fourth Normal Form.
The relation must not be further non loss decomposed.

Applications of Normal Forms in DBMS


• Data consistency: Normal forms ensure that data is consistent and does not contain any redundant
information. This helps to prevent inconsistencies and errors in the database.
• Data redundancy: Normal forms minimize data redundancy by organizing data into tables that contain only
unique data. This reduces the amount of storage space required for the database and makes it easier to
manage.
• Response time: Normal forms can improve query performance by reducing the number of joins required to
retrieve data. This helps to speed up query processing and improve overall system performance.
• Database maintenance: Normal forms make it easier to maintain the database by reducing the amount of
redundant data that needs to be updated, deleted, or modified. This helps to improve database
management and reduce the risk of errors or inconsistencies.
• Database design: Normal forms provide guidelines for designing databases that are efficient, flexible, and
scalable. This helps to ensure that the database can be easily modified, updated, or expanded as needed.
Some Important Points about Normal Forms
• BCNF is free from redundancy caused by Functional Dependencies.
• If a relation is in BCNF, then 3NF is also satisfied.
• If all attributes of relation are prime attribute, then the relation is always in 3NF.
• A relation in a Relational Database is always and at least in 1NF form.
• Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
• If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute), then the
Relation is always in 2NF( because no Partial functional dependency possible).
• Sometimes going for BCNF form may not preserve functional dependency. In that case go for BCNF only if
the lost FD(s) is not required, else normalize till 3NF only.
• There are many more Normal forms that exist after BCNF, like 4NF and more. But in real world database
systems it’s generally not required to go beyond BCNF.
Conclusion
In Conclusion, relational databases can be arranged according to a set of rules called normal forms
in database administration (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF), which reduce data redundancy and preserve data
integrity. By resolving various kinds of data anomalies and dependencies, each subsequent normal form expands upon
the one that came before it. The particular requirements and properties of the data being stored determine which normal
form should be used; higher normal forms offer stricter data integrity but may also result in more complicated database
structures.

You might also like