KEMBAR78
Dbms | PDF | Databases | Database Index
0% found this document useful (0 votes)
34 views69 pages

Dbms

The document discusses key concepts in database management, including the differences between logical and physical data independence, the concept of weak entities, and the distinctions between schema and instance. It also outlines the types of data models, the roles of various database users, and the characteristics and advantages of DBMS. Additionally, it explains the three-schema architecture and the types of keys used in relational databases.

Uploaded by

tharunyarajeev.2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views69 pages

Dbms

The document discusses key concepts in database management, including the differences between logical and physical data independence, the concept of weak entities, and the distinctions between schema and instance. It also outlines the types of data models, the roles of various database users, and the characteristics and advantages of DBMS. Additionally, it explains the three-schema architecture and the types of keys used in relational databases.

Uploaded by

tharunyarajeev.2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

MODULE 1

1)What is the difference between logical data


independence and physical data independence? Which
one is harder to achieve? Why?

Difference Between Logical and Physical Data


Independence
Logical Data Physical Data
Independence Independence

You can change the You can change how data is


structure of the database stored (like using a different
(like adding or removing hard drive or changing
tables) without affecting storage methods) without
how users see or use the affecting how users interact
data. with it.

You change how tables are You change how the data is
related or change some physically stored (e.g.,
attributes, but users still moving it from one system to
access the data the same another), but users don't
way. notice any difference.

Harder to achieve, Easier to achieve, because


because changes to the changes in storage don’t
database structure affect affect how users or
Logical Data Physical Data
Independence Independence

how applications and users applications interact with the


interact with it. data.

Which is Harder to Achieve?


• Logical Data Independence is harder to achieve.
o Why?: When you change the logical structure of
the database (like how tables or relationships are
arranged), it often requires changes to the
applications or queries used by the users. So,
maintaining logical independence is more
difficult.
2) What is the concept of a weak entity used in data
modelling? Define the terms owner entity type,
Identifying relationship type.
Weak Entity
• It depends on another entity (owner entity) to be
identified.
• It does not have a primary key on its own.
• It is identified using a partial key + the primary key of the
owner.

Example: A Dependent entity depends on an Employee


entity.
(Dependent Name alone may not be unique, but with Employee
ID, it becomes unique.)

Owner Entity Type


• The owner entity type is the strong (regular) entity to which
the weak entity is related.
• It provides the key to uniquely identify the weak entity.

In the example above, Employee is the owner entity of


Dependent.

Identifying Relationship Type


• The relationship that links a weak entity to its owner is
called the identifying relationship.
• It is usually shown with a double diamond in ER diagrams.
• It ensures that the weak entity can be uniquely identified
through its owner.

Example: The relationship "HasDependent" between


Employee and Dependent is an identifying relationship.

3) Distinguish between schema and instance

Schema Instance

The overall design or structure The actual data stored in the


of the database. database at a given time.

Changes frequently as data is


Rarely changes (fixed design).
added, deleted, or updated.
Schema Instance

Intensional – it defines what Extensional – it shows the


the data should be. current content.

A table definition: Rows like: (101, "Anu", 20), (102,


STUDENT(RollNo, Name, Age) "Rahul", 21)

4)SEMI STRUCTURED DATA,STRUCTURED DATA,


UNSTRUCTURED DATA
1. Structured Data
• Well-organized and follows a fixed format (tables with rows
and columns).
• Stored in relational databases (SQL).
• Easy to search, analyze, and manage.
• Example: Relational data like student records in a table.

2. Semi-Structured Data
• Does not follow a strict format, but has some structure
that helps in organization.
• Not stored in traditional relational databases directly, but
can be processed and converted.
• Useful for handling flexible and hierarchical data.
• Example: XML, JSON

3. Unstructured Data
• Has no predefined format or data model.
• Cannot be stored easily in relational databases.
• Common in multimedia, logs, and documents.
• Needs special tools/platforms to store and analyze.
• Example: Text files, PDFs, Word documents, audio/video
files

5)MAJOR ADVANTAGES OF DBMS


1. Data repetition Control
o DBMS avoids storing the same data in multiple places.
o It helps in saving storage space and avoiding
confusion.
2. Data Consistency
o Since redundancy is minimized, the chances of having
conflicting data are reduced.
o Example: If a student’s name is updated in one table,
it’s updated everywhere.
3. Data Sharing
o Multiple users and applications can access the same
data at the same time.
o Useful in multi-user environments like banking,
reservations, etc.
4. Data Security
o DBMS provides controlled access to data using
usernames and passwords.
o Only authorized users can access or modify the data.
5. Data Integrity
o DBMS allows defining rules and constraints (like NOT
NULL, UNIQUE) to ensure accuracy and correctness
of data.
6. Backup and Recovery
o DBMS provides automatic backup and recovery
features in case of system failure.
o Helps in preventing data loss.
7. Program-Data Independence
o Changes in the structure of the data do not affect the
application programs.
o This is possible because of the data abstraction
provided by DBMS.
8. Better Decision Making
• Accurate, consistent, and up-to-date data supports better
analysis and decisions in organizations.

6)DATA MODEL AND TYPES OF DATA MODEL**


• A data model is a set of concepts used to describe the
structure of a database.
• It helps in defining:
o What kind of data is stored
o How data is organized
o How different data items are related
o What operations can be performed on the data
Categories of Data Models
There are 3 main types of data models:

A. High-level or Conceptual Data Models


• Closest to how users naturally think about data.
• Easy for users to understand.
• Uses concepts like:
o Entities (real-world objects like Employee, Student)
o Attributes (properties like Name, Age)
o Relationships (associations like works_on,
enrolled_in)
• Example: Entity-Relationship (ER) model

B. Representational or Implementation Data Models


• A balance between user understanding and how data is
actually stored.
• Hides some low-level storage details but still close to
actual implementation.
• Used in most commercial DBMSs.
• Also called record-based models because they represent
data using records.
Examples:
• Relational model (most popular)
• Hierarchical model (older)
• Network model (older)

C. Low-level or Physical Data Models


• Describes how data is stored in the computer.
• Deals with technical details like:
o File formats
o Record ordering
o Access paths (ways to quickly find data)
Example of Access Path:
• Index: Helps in finding data quickly using a keyword or ID.

7)DATABASE USERS
Database Users and Workers Behind the Scene
In a DBMS environment, many types of users and professionals
are involved in the design, use, and maintenance of the database
system. They can be grouped into:

1. Database Administrator (DBA)


• The DBA is responsible for managing the database and
related resources.
• Key responsibilities:
o Authorizing user access
o Monitoring database usage
o Installing and managing DBMS software and
hardware
o Ensuring security, performance, and reliability
• DBA is also responsible if there is a security breach or slow
system performance.

2. Database Designers
• They decide what data to store and how to store it.
• Design appropriate structures and models for the
database.
• Work closely with all types of users to understand their
requirements.
• Responsible for creating a design that supports user needs.

3. End Users
End users are those who directly interact with the database to
perform their tasks.
Types of End Users:
• Casual End Users:
o Use the database occasionally.
o Require different types of data each time.
o Example: Manager generating ad-hoc reports.
• Naive or Parametric End Users:
o Use the database frequently.
o Perform repetitive tasks using pre-defined queries.
o Use canned transactions (prewritten queries).
o Example: Bank clerk, ticket booking staff.
• Sophisticated End Users:
o Have deep knowledge of DBMS.
o Use advanced tools and features.
o Example: Data scientists, engineers, analysts.
• Stand-alone Users:
o Maintain personal databases.
o Use ready-made software with GUI or menu-based
tools.
o Example: MS Access users managing their own data.

4. System Analysts and Application Programmers


• System Analysts:
o Understand the needs of naive users.
o Define specifications for canned transactions.
o Act as a bridge between users and developers.
• Application Programmers:
o Write the actual application programs.
o Responsible for coding, testing, debugging, and
maintaining the programs.
o Ensure that programs match the user's requirements.

5. Workers Behind the Scene


These people are not end users but are essential for building and
running the DBMS.
• DBMS System Designers and Implementers:
o Develop the core DBMS software.
o Design internal modules and user interfaces.
• Tool Developers:
o Create tools that help in database design,
performance tuning, or management.
o Example: Query optimizers, performance monitoring
tools.
• Operators and Maintenance Personnel:
o Manage the hardware and system software.
o Responsible for the daily operation and maintenance
of the system environment.

8)CHARACTERISTICS OF DBMS***
a) Self-Describing Nature
• A DBMS stores not just the data, but also its structure and
constraints in a special place called the catalog.
• This is called metadata (data about data).
• Applications use metadata to understand how to access the
data.
b) Program-Data Independence
• In file systems, changes in data structure require changes in
all programs.
• In DBMS, data structure is separate from programs.
• So, structure can change without affecting application code.
c) Program-Operation Independence
• The way operations (functions) work can be changed without
affecting how they are called.
• Applications use only the operation names and parameters
(interface), not how they work inside.
d) Data Abstraction
• Users don’t need to know how data is stored.
• DBMS shows a conceptual view of the data.
• Hides low-level details like file formats, indexes, etc.
e) Support for Multiple Views
• Different users can have different views of the same
database.
• A view can be:
o A subset of the data.
o Virtual data derived from the main database.
• Views help in security and simplicity for users.
f) Data Sharing and Multiuser Access
• Multiple users can use the database at the same time.
• DBMS uses concurrency control to handle this safely.
• Ensures that updates made by one user don’t affect others
in the wrong way.
g) Transactions
• A transaction is a set of operations (like read, write).
• DBMS ensures that transactions are executed safely and
correctly.
• Transactions are:
o Atomic: All steps happen or none happen.
o Consistent: Keep database in valid state.
o Isolated: Don’t interfere with each other.
o Durable: Changes are saved even after a crash.

9)THREE SCHEMA ARCHITECTURE****


It is a design approach in DBMS that separates the database into
three levels to improve data independence and user flexibility.

1. Internal Level (Physical Level)


• Describes how data is physically stored on storage devices.
• Uses physical data models.
• Includes file structures, indexes, and access paths.
2. Conceptual Level (Logical Level)
• Describes the overall structure of the database for all
users.
• Hides physical storage details.
• Focuses on entities, data types, relationships, and
constraints.
• Uses a representational data model like relational model.
3. External Level (View Level)
• Describes user-specific views of the database.
• Each user sees only the data they need.
• Hides unnecessary data from users.
• Each view is defined using the same or a different data
model as the conceptual level.

Benefits of Three-Schema Architecture


• Helps separate user applications from the physical
database.
• Provides data abstraction at different levels.
• Allows data independence (changes at one level do not
affect other levels).
• Makes database design and management easier.

Mappings Between Levels


• Mapping means converting data from one level to another.
• DBMS converts user queries from external level →
conceptual level → internal level.
• Retrieved data is also reformatted from internal →
conceptual → external level.
• Some small DBMSs may skip the external level to improve
performance.

MODULE 2
1)KEYS****
In a relational database, a key is used to uniquely identify tuples
(rows) in a relation (table). The main types of keys are:
1. Super Key
• A super key is any set of one or more attributes that can
uniquely identify a tuple.
• Example:
Consider the relation:
employee (ssn, ename, department, birthdate, gender)
o {ssn} is a super key because it can uniquely identify
each employee.
o {ssn, ename} is also a super key.
o Even all attributes together form a default super key.

2. Candidate Key
• A candidate key is a minimal super key (no extra attributes).
• Example:
o {ssn} is a candidate key.
o {ename, department} could also be a candidate key if
this combination is unique.
o {ssn, ename} is not a candidate key because ssn alone
is sufficient.

3. Primary Key
• The primary key is the candidate key selected by the
database designer to uniquely identify tuples.
• Example:
o If {ssn} is chosen from the candidate keys, then it
becomes the primary key of the employee relation.

4. Foreign Key
• A foreign key is an attribute in one relation that refers to the
primary key of another relation.
• It maintains referential integrity between relations.
• Example:
Consider two relations:
department (dept-name, building, budget)
instructor(ID, name, dept-name, salary)
o Here, dept-name in instructor is a foreign key
referencing the dept-name in the department relation.
o This ensures every dept-name in instructor must exist
in department.

2)STEPS OF CONVERTING ER DIAGRAM TO SCHEMA


Step 1: Mapping of Regular Entity Types
1. Create a relation R:
For each regular entity type (E), create a relation R that
includes all simple attributes of the entity type.
2. Composite attributes:
If any attribute of E is composite, include only its simple
components in the relation.
3. Primary Key:
Choose one of the key attributes of E as the primary key of
R. If the key is composite (i.e., formed by multiple
attributes), then all attributes that make up the composite
key will collectively form the primary key of R.
4. Secondary Keys:
If multiple keys exist for E, each additional key will be
represented as a secondary key in the relation R.
Step 2: Mapping of Weak Entity Types
1. Create a relation R:
For each weak entity type (W), create a relation R that
includes all simple attributes of the weak entity type.
2. Foreign Key:
Include foreign key attributes which reference the primary
key of the owner (strong) entity type.
3. Primary Key:
The primary key of R will be a combination of the primary
key of the owner entity and the partial key (if any) of the
weak entity type W.

Step 3: Mapping of Binary 1:1 Relationship Types


1. Identify relations S and T:
Identify relations S and T that represent the entities
participating in the binary 1:1 relationship.
2. Three Approaches for Mapping:
There are 3 approaches to model the relationship:
o Foreign Key Approach:
Choose the relation S (preferably with total
participation).
Add a foreign key to S, which will reference the primary
key of T.
Include all simple attributes of the 1:1 relationship as
attributes in relation S.

o Merged Relationship Approach:


Merge the two entities and create a new relation for the
1:1 relationship.
o Cross-Reference or Relationship Relation Approach:
Create a new relation to directly represent the 1:1
relationship and include the primary keys of both
relations.

Step 4: Mapping of Binary 1:N Relationship Types


1. Identify Relation S:
Identify the relation S corresponding to the entity at the N-
side of the relationship.
2. Foreign Key in S:
The foreign key in S will reference the primary key of the
entity at the 1-side of the relationship.
3. Simple Attributes:
Include any simple attributes of the 1:N relationship as
attributes in the relation S.
Step 5: Mapping of Binary M:N Relationship Types
1. Create Relation S:
For an M:N relationship type, create a new relation S to
represent the relationship.
2. Foreign Keys in S:
The foreign keys in S will be the primary keys of the entities
involved in the M:N relationship.
3. Attributes of Relationship:
Include any simple attributes of the M:N relationship as
attributes in S.

Step 6: Mapping of Multivalued Attributes


1. Create a New Relation R:
For each multivalued attribute A, create a new relation R.
2. Attributes in R:
The new relation R includes the attributes corresponding to
the multivalued attribute A, as well as the primary key of the
relation that has A as a multivalued attribute.
3. Foreign Key:
Include the foreign key referencing the relation containing
the multivalued attribute A.
4. Primary Key of R:
The primary key of R is a combination of the multivalued
attribute A and the primary key of the original relation.

Step 7: Mapping of N-ary Relationship Types


1. Create a New Relation S:
For each N-ary relationship type, create a new relation S to
represent the relationship.
2. Foreign Key Attributes:
Include foreign key attributes in S referencing the primary
keys of all the participating entity relations.
3. Include Simple Attributes:
Any simple attributes of the N-ary relationship should be
included in the relation S.
4. Primary Key of S:
The primary key of relation S will be a combination of all the
foreign keys that reference the participating entity relations.
5. Cardinality Constraints:
If any entity type in the relationship has a cardinality
constraint of 1 (i.e., total participation), then the primary key
of S should not include the foreign key referencing that
entity.

3)RELATIONAL ALGEBRA OPERATIONS**


1. Select (σ) Operation:
• Picks specific rows (tuples) from a table that meet a
condition.
• Example: Find all employees with a salary greater than
$50,000.
o Syntax:
σSalary>50000(Employee)
o This will select only the rows (tuples) where the salary
is above $50,000.

2. Project (Π) Operation:


• Selects specific columns (attributes) from a table and
removes duplicates.
• Example: Get only the names and salaries of employees.
o Syntax:
Πname, salary(Employee)
o This will give you a list of employee names and salaries,
excluding any duplicate rows.
3. Union (U) Operation:
• Combines two tables by keeping all rows from both tables
(removes duplicates).
• Example: Combine two lists of students from different
classes.
o Syntax:
R=P∪Q
o It will give you a new table with all students from both
class lists (no repeats).

4. Set Difference (-) Operation:


• Finds the rows that are in the first table but not in the second
one.
• Example: Find students who are in Class A but not in Class
B.
o Syntax:
R=P−Q
o This will give you a list of students who are only in Class
A, excluding those who are also in Class B.

5. Cartesian Product (×) Operation:


• Combines every row from the first table with every row from
the second table.
• Example: Combine each student with each course (creates
all possible combinations).
o Syntax:
R=P×Q
o If table P has 2 rows and table Q has 3 rows, the result
will have 6 rows, each being a combination of a row
from P and a row from Q.

4)JOIN OPERATION**
JOIN Operation in Simple Language
The JOIN operation is used to combine related rows (tuples) from
two tables (relations) into a single row. It connects rows based on
a condition, usually involving matching values in columns from
both tables.
General Form of JOIN:
When joining two tables, say R and S, the result will be a new
table that includes columns from both tables. The syntax looks
like this:
R⋈S
The result is a table with all the columns from both R and S, but
only includes rows where the join condition is true (i.e., where
some column values match between the two tables).

Types of JOIN Operations:


1. Cartesian Product (without JOIN Condition):
• Combines every row of table R with every row of table S. This
results in every possible combination of rows.
• Example: If R has 3 rows and S has 2 rows, the Cartesian
product will result in 6 rows.
2. EQUIJOIN:
• A special type of join where we compare columns using the
equals (=) operator. The result includes matching rows from
both tables where column values are the same.
• Example: Suppose we are joining employees with their
departments. An EQUIJOIN would match employees with
the same department ID in both tables.
3. Outer Join (LEFT, RIGHT, FULL):
Outer joins are used to include all rows, even if there is no match
between the two tables.

• LEFT OUTER JOIN:


o Includes all rows from the left table (p), even if they
don’t match with any row in the right table (q). For
unmatched rows from S, it will fill in NULL values.
o Example: If a department has no employees, it will still
be included in the result, with employee data as NULL.
o Syntax:

• RIGHT OUTER JOIN:


o Includes all rows from the right table (p), even if they
don’t match with any row in the left table (Q). For
unmatched rows from R, it will fill in NULL values.
o Example: If a project has no employees assigned to it,
the project will still appear in the result, with employee
data as NULL.
o Syntax:

• FULL OUTER JOIN:


o Combines the results of both left and right outer joins.
It includes all rows from both tables, whether they have
matching rows or not. For unmatched rows, NULL
values are used for the missing data.
o Example: If some employees don’t have assigned
projects and some projects have no employees, both
will be included, with NULL in place of missing data.
o Syntax:
`
o P Q

Employee Table (R):

emp_id emp_name branch_name

1 John Mesa

2 Jane Seattle

3 Mike Redmond

Branch Table (S):

branch_name branch_location

Mesa Hollywood

Seattle Washington

Redmond Oregon

• EQUIJOIN (matching branch names):


o Result:

emp_id emp_name branch_name branch_location

1 John Mesa Hollywood

2 Jane Seattle Washington

3 Mike Redmond Oregon

• LEFT OUTER JOIN:


o Result (if we added an employee with no branch):
emp_id emp_name branch_name branch_location

1 John Mesa Hollywood

2 Jane Seattle Washington

3 Mike Redmond Oregon

4 Sarah NULL NULL

• FULL OUTER JOIN:


o Result (if there are unmatched branches):

emp_id emp_name branch_name branch_location

1 John Mesa Hollywood

2 Jane Seattle Washington

3 Mike Redmond Oregon

4 Sarah NULL NULL

NULL NULL Portland Oregon

5)DATA TYPES
CHAR(n):
• Fixed length string, always stores n characters. If the string
is shorter, it pads with spaces.
VARCHAR(n):
• Variable length string, up to n characters, no extra spaces.
NUMERIC(i, j) or DECIMAL(i, j):
• For exact numbers with i digits in total and j digits after the
decimal.
BOOLEAN:
• Stores TRUE, FALSE, or NULL.
DATE:
• Stores a calendar date (YYYY-MM-DD).
TIME:
• Stores the time of day (HH:MM:SS).

6)CREATE DROP ALTER


DDL is used to define, create, modify, or delete the structure
(schema) of database objects like tables.
1. CREATE Command
Used to create a new table (or other database objects like views,
indexes, etc.).

Syntax:
CREATE TABLE <table_name> (column1 datatype [constraints],
column2 datatype [constraints],
...
);

Example:
CREATE TABLE customer (customer_name CHAR(20) PRIMARY
KEY,customer_street CHAR(30),customer_city CHAR(30)
);
2. DROP Command
Used to completely delete a table (or any other database object)
from the database.

Syntax:
DROP TABLE <table_name>;

Example:
DROP TABLE customer;

This removes both the data and the table schema. After this,
the table no longer exists.

3. ALTER Command
Used to modify the structure of an existing table.
a) Add a new column:

Syntax:
ALTER TABLE <table_name> ADD <column_name> <datatype>;

Example:
ALTER TABLE customer ADD email VARCHAR(50);
b) Drop an existing column:

Syntax:
ALTER TABLE <table_name> DROP COLUMN <column_name>;

Example:
ALTER TABLE customer DROP COLUMN email;
7) What is a foreign key constraint? Why are such
constraints important?What is referential integrity?
Foreign Key Constraint:
A foreign key is a field (or collection of fields) in one table that
refers to the primary key in another table.
It is used to establish and enforce a link between the data in two
tables.
Syntax:
FOREIGN KEY (column_name) REFERENCES
parent_table(primary_key_column)

Importance of Foreign Key Constraints:


• Maintains referential integrity between tables.
• Prevents actions that would break links between related
tables.
• Ensures that the value in the foreign key column must match
a value in the referenced primary key column, or be NULL.

Referential Integrity:
Referential Integrity is a property of data stating that all its
references are valid.
It ensures that a foreign key value in a child table always refers to
an existing primary key value in the parent table.
Example:
If account.branch_name is a foreign key referencing
branch.branch_name, you cannot insert an account for a non-
existing branch.

8) What is constraint? Discuss about domain


constraint, entity integrity and referential integrity
constraint with suitable example.
What is a Constraint?
A constraint in DBMS is a rule that restricts the values that can
be stored in a relation (table).
Constraints ensure the accuracy, validity, and integrity of data
in the database.

Types of Constraints:
1. Domain Constraint
• Specifies that the value of each attribute (column) must be
of a specific data type and format.
• Ensures values fall within a valid set or range.
Example:
age INT CHECK (age >= 18 AND age <= 60);
This restricts age to be only between 18 and 60.

2. Entity Integrity Constraint


• States that the primary key of a relation cannot be NULL.
• This ensures that each tuple (row) is uniquely identifiable.
Example:
CREATE TABLE student (
roll_no INT PRIMARY KEY,
name VARCHAR(50)
);
Here, roll_no cannot be NULL and must be unique for each
student.

3. Referential Integrity Constraint


• Ensures that a foreign key in one relation matches a
primary key in another relation.
• Maintains valid relationships between tables.
Example:
CREATE TABLE department (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(30)
);

CREATE TABLE employee (


emp_id INT PRIMARY KEY,
name VARCHAR(50),
dept_id INT,
FOREIGN KEY (dept_id) REFERENCES department(dept_id)
);
In this case, employee.dept_id must match a department.dept_id
or be NULL.

9) Discuss the differences between an equi-join and a


natural join.
Equi-Join
• A type of join where the condition uses only the equal (=)
operator.
• Combines rows from two tables based on a condition like
A.col = B.col.
• The result contains both matching columns, even if they
have the same name.
• Needs to explicitly mention the join condition.

Natural Join
• A special type of equi-join that automatically joins tables
based on all common columns.
• Removes duplicate columns from the result.
• No need to write join condition — it is implicitly based on
common attribute names.
• Cleaner and shorter query syntax.
MODULE 3
1)Explain a situation where a multi-level index would be
significantly less effective than a single-level index,
and vice versa*****(ESSSAY)
When a Multi-Level Index is Less Effective than a Single-Level
Index:
1. Small Datasets:
If the database is small (just a few records), there's no need
for multiple levels of indexing. A single-level index works
just fine because it is simpler and faster. Adding extra levels
would slow things down instead of speeding up the search.
o Example: A small table with 200 records doesn’t need
a multi-level index. A single index is quick and enough
to find the data.
2. Extra Overhead:
Multi-level indexes take up more space and require extra
work to manage (like adding, deleting, or reorganizing). This
is unnecessary for smaller datasets, where a single-level
index can do the job faster and with less effort.

When a Multi-Level Index is More Effective than a Single-Level


Index:
1. Large Datasets:
For large databases with millions of records, a multi-level
index becomes helpful because it makes the search faster.
A single-level index would be too large and slow to handle
such big data efficiently. The multi-level index reduces the
number of records it needs to check at each step, speeding
up the search.
o Example: A database with 10 million records would be
very slow with just a single-level index. A multi-level
index breaks it down, making the search much faster.
2. Better Use of Space:
In large datasets, a multi-level index helps save space and
speeds up the search by organizing the data in levels. Each
level helps narrow down the search, so it doesn’t have to
look through everything in a single big list.

2) Describe any three aggregate functions in SQL with


example.
Three Aggregate Functions in SQL
1. COUNT()
o The COUNT() function is used to count the number of
rows in a table or the number of non-NULL values in a
specific column.
o Syntax:
o COUNT(column_name)
o Example:
o SELECT COUNT(*) FROM employees;
This query will return the total number of rows in the employees
table.
o Example with a specific column:
o SELECT COUNT(salary) FROM employees;
This will return the number of employees who have a non-NULL
salary.
2. SUM()
o The SUM() function is used to calculate the total sum of
the numeric values in a column.
o Syntax:
o SUM(column_name)
o Example:
o SELECT SUM(salary) FROM employees;
This query will return the total sum of all the salaries in the
employees table.
3. AVG()
o The AVG() function is used to calculate the average
value of a numeric column.
o Syntax:
o AVG(column_name)
o Example:
o SELECT AVG(salary) FROM employees;
This will return the average salary of all employees in the
employees table.

3) Differentiate***
(i)B Trees and B+ Trees
(i) B Trees vs B+ Trees

B- Tree B+ Tree

Stored in both internal and


Stored only in leaf nodes
leaf nodes

Slower, as data can be at


Faster, as all data is at leaf level
different levels

Leaf nodes are linked (good for


Leaf nodes not linked
range queries)

Internal and leaf nodes both Internal nodes hold keys only; leaf
hold data nodes hold data

More efficient (due to linked leaf


Less efficient
nodes)

General indexing Databases and file systems


3) What is a grid file? What are its advantages and
disadvantages?
A grid file is a type of indexing method used to quickly find
records in a database using two or more fields (like name and
age together).
It divides the space of values into a grid-like structure, like a
table, where each cell in the grid stores a pointer to the data
(record or block).
Each cell points to a set of records that match the value ranges
for the fields in that cell.
Advantages of Grid File
• Can search using more than one field (multikey search).
Example: You can search by both "age" and "salary"
together.
• Faster search – You can directly jump to the right cell in the
grid.
• Supports insert and delete easily – You don’t need to
rebuild the whole index.
Disadvantages of Grid File
• Takes more space – The grid directory can become large if
there are many fields or many value ranges.
• Harder to manage – Splitting and updating the grid can be
complex.
• Wastes space if data is uneven – Some cells in the grid
may be empty if data is not spread out evenly.

4) What is the difference between the WHERE and


HAVING clause? Illustrate with an example
WHERE Clause HAVING Clause

Individual rows Groups formed by GROUP BY

SELECT, UPDATE, DELETE SELECT (used only with GROUP BY)

After grouping and aggregate


Grouping (GROUP BY)
functions

Cannot use aggregate Can use aggregate functions like


functions directly COUNT, SUM
Example
Consider the following EMPLOYEE table:

EmpID Dept Salary

1 HR 40000

2 IT 60000

3 HR 50000

4 IT 70000

Using WHERE (filters rows):


SELECT * FROM EMPLOYEE
WHERE Salary > 50000;

This filters individual rows where salary is greater than 50000.

Using HAVING (filters groups):


SELECT Dept, AVG(Salary) AS AvgSal
FROM EMPLOYEE
GROUP BY Dept
HAVING AVG(Salary) > 55000;

This filters groups (departments) where the average salary is


greater than 55000.

5)B- TREES AND B+ TREE ******


B-tree
• B-tree is a balanced search tree used for indexing.
• Pointers to data blocks are stored in both internal and leaf
nodes.
• Leaf and internal nodes have the same structure, except
that internal nodes have tree pointers while leaf nodes may
have NULLs.
• Each node contains up to p−1 keys and p pointers, where p
is the order of the tree.
• The internal node stores: <P1, <K1, Pr1>, P2, <K2, Pr2>, ...,
<Kq–1, Prq–1>, Pq>, where
o Pi are tree pointers to child nodes,
o Pri are pointers to actual data records.
• Keys within nodes are sorted: K1 < K2 < ... < Kq−1.
• For a key X in subtree Pi:
Ki−1 < X < Ki for 1 < i < q,
X < Ki for i = 1,
Ki−1 < X for i = q.
• All leaf nodes are at the same level.
• When a node is full, it is split, and splitting may propagate
to the root.
• Allows both random and sequential access to data.
• Useful when search and update operations are frequent.
• Can be slower for range queries, since not all keys are in
leaf nodes.
B+-tree
• B+-tree is a variation of B-tree optimized for range queries
and block access.
• Data pointers are stored only in the leaf nodes.
• Internal nodes act as pure index nodes without actual data
pointers.
• Internal nodes are of the form:
<P1, K1, P2, K2, ..., Pq–1, Kq–1, Pq>, where Pi are tree
pointers.
• For a value X in subtree Pi:
Ki−1 < X ≤ Ki for 1 < i < q,
X ≤ Ki for i = 1,
Ki−1 < X for i = q.
• Leaf nodes are of the form:
<<K1, Pr1>, <K2, Pr2>, ..., <Kq–1, Prq–1>, Pnext>
where Pri points to actual data, and Pnext is a pointer to the
next leaf node.
• All keys are present at the leaf level, allowing efficient
sequential access.
• Internal nodes only maintain keys for routing the search.
• Leaf nodes are linked, supporting range queries efficiently.
• Like B-tree, all leaf nodes are at the same level.
• Requires more pointer space in leaf level but supports
faster searching.

6)INDEXING****
Indexing in DBMS
• Indexes are additional access structures used to speed up
retrieval of records.
• They provide secondary access paths, allowing alternative
ways to access data without changing the actual file
organization.
• Indexes are created using any field in the table (not just
primary keys).

Primary Indexes
• Primary index is built on the ordered data file, using the
primary key.
• It is a file with fixed-length records having two fields:
o First field: value of the primary key of a block.
o Second field: pointer to that block (block address).
• Only one index entry per block, not per record.
• The first record in each block is called the anchor record.
• Since not all keys are indexed, it is a sparse index.
• Requires log₂(b) + 1 block accesses using binary search.
• Fewer blocks are needed for index file than for data file.
• Disadvantage: Insertion/deletion is costly because it may
require moving records and updating the index.

Dense vs Sparse Index


• Dense Index: Contains an index entry for every search key
value in the data file.
• Sparse Index: Contains index entries for some search
values only.
• Primary index is typically a sparse index.

Secondary Indexes
• A secondary index provides access using a field other than
the primary key.
• Can be built on a candidate key (unique) or non-key
(duplicate values).
• Since data is not ordered by secondary field, block anchors
can't be used.
• Hence, secondary indexes are usually dense.
• Each index entry has:
o A value from the secondary field.
o A pointer to the record or block where the record is
stored.
• Improves access time significantly when searching by non-
primary fields.
• Disadvantage: Needs more storage and has higher search
time than primary indexes.

Multilevel Indexing
• Used when the index file is too large to fit in memory.
• The first-level index is built on the data file.
• A second-level index is built on the first-level index (acts as
a primary index for the index file).
• If the second level also becomes large, a third-level index
can be created on top of it.
• This process continues until the top-level index fits in one
disk block.
• All levels have the same blocking factor (fan-out) because
index entries are of the same size.
• The number of levels t needed for r₁ entries at the first level
is
t = ⌈log₍fₒ₎(r₁)⌉, where fo is the fan-out.

7) With the help of an example explain Single-level


indexing and multi-level indexing. Also, compare and
contrast single-level indexing with multi-level
indexing*****
Single-Level Indexing:
In single-level indexing, all the index entries are stored at a single
level. Each index entry consists of a search key and a pointer to
the data record or block where the record is stored. In simple
terms, it's like creating a direct link from the search key to the
data.
Example:
Consider a database with employee records, where the employee
IDs are used to access the data. Suppose we create an index on
employee ID, which links the employee ID directly to the data
record.
• Index Table:

Employee ID Data Pointer

101 Block 1

102 Block 2

103 Block 3
Here, the index directly points to the block (or location) where
each employee's record is stored.
• When to Use: Single-level indexing works well when the
dataset is not too large and the index can fit into memory or
a few blocks.
Multi-Level Indexing:
In multi-level indexing, there are multiple levels of indexing. The
first level contains pointers to blocks of data or further index
blocks. Each index block in a multi-level index contains pointers
to other blocks. This type of indexing is used when the dataset is
large, and a single index is not enough to efficiently handle the
data.
Example:
For the same employee records, if there are too many records to
fit in a single index, we could use a multi-level index.
• Level 1 (First Level Index):

Employee ID Range Pointer to Data Block

101-150 Index Block 1

151-200 Index Block 2

• Level 2 (Second Level Index):


o Index Block 1 points to a block where records with IDs
101 to 150 are stored.
o Index Block 2 points to a block where records with IDs
151 to 200 are stored.
Here, the first level contains pointers to the second level, and the
second level contains pointers to the actual data blocks.
• When to Use: Multi-level indexing is useful when the data set
is large, and a single level index would not be efficient.

Difference Between Single-Level Indexing and Multi-Level


Indexing:

Single-Level Indexing Multi-Level Indexing

Contains multiple levels of index


Contains only one level of
entries, with each level pointing
index entries.
to the next.

Works well for small data More efficient for large data sets
sets. that cannot fit into a single index.

Requires fewer disk Requires more disk accesses


accesses as it directly because it involves multiple
points to the data. levels.

Suitable for small or Suitable for large datasets where


moderately sized datasets. a single index is insufficient.

Requires less storage as


Requires more storage as
there's only one index
multiple index levels are needed.
level.

More complex due to multiple


Simple to implement and
levels of indexing and
manage.
management.
Single-Level Indexing Multi-Level Indexing

A large employee database indexed


A small employee database
by employee ID, with multiple index
indexed by employee ID.
levels.

8) Compare DDL and DML with the help of an example


DML (Data Manipulation
DDL (Data Definition Language)
Language)

Defines the structure of the Manipulates the data in the


database database

CREATE, ALTER, DROP, INSERT, UPDATE, DELETE,


TRUNCATE SELECT

Database schema (structure) Table content (data)

Yes (changes are permanent No (requires COMMIT to save


immediately) changes)

Can be rolled back (if not


Cannot be rolled back
committed)

CREATE TABLE STUDENT (...) INSERT INTO STUDENT (...)

Example:
DDL Example
Create a table called STUDENT:
CREATE TABLE STUDENT (
Roll_No INT,
Name VARCHAR(50),
Age INT
);
This defines the structure of the table.

DML Example
Insert a record into the STUDENT table:
INSERT INTO STUDENT (Roll_No, Name, Age)
VALUES (101, 'Anu', 20);
This adds data to the table.

9) Explain any three differences between Hash indexes


and B+ tree indexes.***
Hash Index B+ Tree Index

Uses a hash table to store key- Uses a tree structure with


value pairs nodes and levels

Good for exact match Supports range queries (e.g., >,


searches (e.g., = 101) <, BETWEEN)

No order among indexed Sorted order maintained in leaf


values nodes

10)VIEWSS
Views in SQL
• A view is a virtual table created from one or more base
tables or other views.
• It does not store data physically like base tables do.
• You can query a view just like a table, but update
operations may have limitations.
Creating a View
• Syntax:
• CREATE VIEW view_name AS
• SELECT columns FROM table(s) WHERE condition;

Example 1: Simple View


Tables:
• EMPLOYEE(name, ssn, address, Dno, salary)
• WORKS_ON(Essn, Pno, Hours)
• PROJECT(Pname, Pnumber, Plocation, Dnum)
Goal: Get employee name and project name they work on.
CREATE VIEW WORKS_ON1 AS
SELECT name, Pname
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE ssn = Essn AND Pno = Pnumber;
Now, to find employees working on project 'x':
SELECT name FROM WORKS_ON1 WHERE Pname = 'x';

Example 2: View with Aggregate Functions


CREATE VIEW DEPT_INFO (Dept_name, No_of_emps, Total_sal)
AS
SELECT Dname, COUNT(*), SUM(salary)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber = Dno
GROUP BY Dname;
Advantages of Views
1. Simplifies complex queries.
2. Helps in data security by hiding certain columns or rows.
3. Automatically reflects changes from the base tables.

11)ASSERTION AND TRIGGER***


1. CREATE ASSERTION
• Used to add extra constraints (rules) that are not possible
using regular constraints like Primary Key, Unique, or Foreign
Key.
• Syntax:
• CREATE ASSERTION constraint_name CHECK (condition);
• Example:
Ensure that no employee earns more than their
department's manager:
• CREATE ASSERTION SALARY_CONSTRAINT
• CHECK (
• NOT EXISTS (
SELECT * FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT D
WHERE E.Salary > M.Salary
AND E.Dno = D.Dnumber
AND D.Mgr_ssn = M.Ssn
)
);
• Explanation:
o SALARY_CONSTRAINT is the name of the constraint.
o CHECK ensures that the condition must always be
true.
o If the condition becomes false, the database violates
the constraint.
o The NOT EXISTS clause is commonly used in
assertions to ensure that no invalid data exists.
o Assertions are checked by the DBMS continuously for
all database states.
• Difference from CHECK constraints:
o CHECK on attributes/domains/tuples: Checked only
during INSERT or UPDATE, so faster.
o ASSERTION: Checked on the entire database state,
used for more complex conditions.
2. CREATE TRIGGER
• Automatically perform actions when specific events occur
in the database, such as INSERT, DELETE, or UPDATE.
• Example Use Case: Alert the manager if an employee's
travel expenses go above a limit.
Triggers have 3 components:
1. Event:
o The database action that starts the trigger (INSERT,
DELETE, or UPDATE).
o Use BEFORE or AFTER to define when the trigger runs.
2. Condition:
o Optional.
o Defined using the WHEN clause.
o If true, the action is executed.
3. Action:
o The operation to be performed (SQL statements,
procedures, etc.).

Important Trigger Terms:


• :new → New value (used in INSERT and AFTER UPDATE).
• :old → Old value (used in DELETE and BEFORE UPDATE).

Types of Triggers:
1. Row-level Trigger:
o Runs once per row affected.
o Use: FOR EACH ROW.
o E.g., if 5 rows inserted → trigger runs 5 times.
2. Statement-level Trigger:
o Runs once per operation, no matter how many rows
are affected.
o No FOR EACH ROW used.

Trigger Example (Oracle Syntax):


Check if an employee’s salary is greater than their supervisor’s. If
so, notify the supervisor.
CREATE TRIGGER SALARY_VIOLATION
BEFORE INSERT OR UPDATE OF SALARY, SUPER_SSN ON
EMPLOYEE
FOR EACH ROW
WHEN (
NEW.SALARY > (
SELECT SALARY
FROM EMPLOYEE
WHERE SSN = NEW.SUPERVISOR_SSN
)
)
INFORM_SUPERVISOR(NEW.Supervisor_ssn, NEW.Ssn);

Use Cases of Triggers:


• Enforcing business rules.
• Audit logging (recording changes).
• Automatic updates of related data.
• Monitoring changes in the database.

12)Difference Between Trigger and Constraint:


Constraint Trigger

Only to data after trigger is


All existing and new data
created

INSERT, UPDATE INSERT, DELETE, UPDATE

Perform actions or checks on


Enforce simple rules
events

NOT NULL, UNIQUE, FOREIGN


Alerting, logging, calculations
KEY

MODULE 4
1) What is the lossless join property of decomposition?
Why is it important
• When a relation R is decomposed into two or more sub-
relations (like R₁, R₂...),
the lossless join property ensures that no information is lost
during decomposition.
A decomposition of relation R into {R₁, R₂, ..., Rₙ} is lossless if:
R₁ ⨝ R₂ ⨝ ... ⨝ Rₙ = R
(Join of decomposed relations = original relation)
Why is it Important?

1. Prevents Data Loss


o Ensures original data can be fully recovered after
decomposition.

2. Maintains Data Integrity


o No incorrect or incomplete data after joining the sub-
relations.

3. Safe Normalization
o During normalization, we break tables to remove
redundancy.
o Lossless join makes sure this process doesn’t affect
data.

4. Without it, data may get lost or distorted when relations


are joined again.

2)ALGORITHMM FOR LOSSLESS JOIN PROPERTY***


Input:
• A universal relation R
• A decomposition D = {R₁, R₂, ..., Rₘ}
• A set of functional dependencies F
Output:
• Whether the decomposition has the lossless join property or
not
Steps:
Step 1: Create the matrix S
• Matrix has 1 row per relation Rᵢ in D
• 1 column per attribute Aⱼ in R
Step 2: Initialize matrix with symbols bᵢⱼ
• Each cell S(i, j) = bᵢⱼ → unique symbol for each cell
Step 3: Mark attributes present in each Rᵢ
• For every attribute Aⱼ present in Rᵢ, set S(i, j) = aⱼ
(each aⱼ is a fixed symbol for attribute Aⱼ)

Step 4: Apply Functional Dependencies


• Repeat until no more changes happen:
o For each FD X → Y in F:
▪ Find all rows in S that have the same symbols in
columns of X
▪ In those rows, make the symbols in columns of Y
the same:
▪ If any row has a symbol → set all others to
that a symbol
▪ If no a, pick one b symbol and make all
others the same

Step 5: Check for Lossless Join


• If any row is entirely made of aⱼ symbols →
Lossless join property is preserved
• Otherwise →
Lossless join is not guaranteed

3) Write an algorithm to find the closure of an attribute


Input:
• Attribute set X
• Set of functional dependencies F
Output:
• Closure of X (denoted as X⁺)

Step-by-Step Algorithm:
1. Initialize the closure:
o Let X⁺ = X
2. Repeat the following until no more attributes can be added
to X⁺:
o For each functional dependency Y → Z in F:
▪ If Y ⊆ X⁺, then add Z to X⁺
3. Return the final value of X⁺

Example:
Let F = {A → B, B → C, AC → D, D → E}
Find A⁺
Step 1: A⁺ = {A}
Step 2:
• A → B ⇒ A⁺ = {A, B}
• B → C ⇒ A⁺ = {A, B, C}
• AC → D ⇒ A, C ∈ A⁺ ⇒ A⁺ = {A, B, C, D}
• D → E ⇒ D ∈ A⁺ ⇒ A⁺ = {A, B, C, D, E}

Final A⁺ = {A, B, C, D, E}

4) What are the different anomalies that can occur in a


poorly designed database? Provide examples for
each.**
1. Insertion Anomaly
• Occurs when we cannot insert data into a table without
other unrelated data.

Example: Suppose we have a table:

StudentName Course Instructor

Rahul DBMS Dr. Ravi

• Now, if we want to add a new course (e.g., "OS") without any


student enrolled yet, we can’t insert it because
StudentName is required.

2. Deletion Anomaly
• Occurs when deleting one piece of data causes
unintentional loss of other valuable data.

Example:
StudentName Course Instructor

Rahul DBMS Dr. Ravi

• If Rahul is the only student enrolled in DBMS, and we delete


his record, we also lose information about the course
DBMS and instructor Dr. Ravi.

3. Update Anomaly
• Occurs when data is duplicated, and an update in one
place requires updates in multiple places.

Example:

StudentName Course Instructor

Rahul DBMS Dr. Ravi

Nisha DBMS Dr. Ravi

• If Dr. Ravi’s name or course info changes, we need to


update it in multiple rows. Missing one row causes
inconsistent data

5) Define the term functional dependency. Why are


some functional dependencies called trivial?**
• A functional dependency (FD) is a relationship between
two sets of attributes in a relation.
• It is written as A → B, which means:
o Attribute B is functionally dependent on A.
o If two rows have the same value for A, they must also
have the same value for B.

Example:
If RollNo → Name, then each RollNo is linked to one unique Name.

Trivial Functional Dependency


• A functional dependency A → B is trivial if B is a subset of A.

Examples of trivial FDs:


• A→A
• AB → A
• ABC → AB
These are always true and do not provide new information, so
they are called trivial.

6) Normalization:
• Normalization is the process of organizing data in a
database to:
o Remove data redundancy (repetition)
o Avoid insertion, deletion, and update anomalies
o Ensure data integrity

Advantages:
• Removes duplicate data
• Ensures logical data structure
• Makes query processing more efficient
7)ARMSTRONG AXIOMS****

Armstrong’s Axioms
• Introduced by William W. Armstrong.
• Used to infer all functional dependencies (FDs) from
a given set.
• The complete set of FDs derived from a given set F is
called the closure of F (denoted as F⁺).
• These rules are sound (always correct) and complete
(can derive all true FDs).

Primary Axioms (Basic Rules)


1. Reflexivity
o If B ⊆ A, then A → B
(A set always determines its subsets)
Example: If A = {Roll, Name}, then A → Roll
2. Augmentation
o If A → B, then AC → BC (for any attribute set C)
(Adding the same attributes to both sides keeps
the FD valid)
Example: If A → B, then AD → BD
3. Transitivity
o If A → B and B → C, then A → C
(Like in algebra, dependencies can be chained)
Example: A → B, B → C ⟹ A → C

Secondary Rules (Derived Rules)


4. Union
o If A → B and A → C, then A → BC
(Multiple FDs from same LHS can be combined)
Example: A → D, A → E ⟹ A → DE
5. Decomposition
o If A → BC, then A → B and A → C
(A single FD can be split into two separate ones)
Example: A → XY ⟹ A → X and A → Y
6. Composition
o If A → B and C → D, then AC → BD
(Merge separate FDs on different sets)
Example: A → M, B → N ⟹ AB → MN
7. Pseudo Transitivity
o If A → B and BC → D, then AC → D
(Used when the RHS of one FD is part of the LHS
of another)
Example: A → B, BD → E ⟹ AD → E
These axioms are used to compute:
• Closure of attributes (A⁺)
• Minimal cover of FDs
• Canonical cover, and
• Normalization of relations
8)1NF.2NF.3NF.BCNF*****
First Normal Form (1NF)
• A table is in 1NF if:
o It has only atomic (indivisible) values in each cell.
o There are no repeating groups (no multiple values in a
single cell).
o Every row has a unique identifier (Primary Key).
o It deletes multivalued and composite attributes.

Conditions:
• Each column should contain unique and simple values.
• No arrays or sets allowed.
• Separate tables for related data groups.
Second Normal Form (2NF)
• A table is in 2NF if:
o It is already in 1NF.
o Every non-prime attribute is fully functionally
dependent on the entire primary key, not on part of it.

Important Terms:
• Full functional dependency: Attribute depends on the
whole composite key.
• Non-prime attribute: Not part of any candidate key.
Third Normal Form (3NF)
• A table is in 3NF if:
o It is already in 2NF.
o There is no transitive dependency between non-prime
attributes and the primary key.

Transitive Dependency:
• When A → B and B → C, then A → C is transitive.
• This should be removed in 3NF.

Boyce-Codd Normal Form (BCNF)


• BCNF is a stricter form of 3NF (also called 3.5NF).
• A table is in BCNF if:
o It is already in 3NF.
o For every functional dependency X → Y, X must be a
super key.

You might also like