Database management final notes
1) Aggregate Functions
Definition:
Aggregate functions are built-in SQL functions that perform calculations on a group of values
and return a single value. These functions are commonly used in data analysis and reporting.
They help to summarize data by applying mathematical operations.
Common Aggregate Functions:
Function Description
SUM() Adds all the values in a column
AVG() Calculates the average of values
COUNT() Counts the number of rows
MAX() Returns the highest value
MIN() Returns the lowest value
Usage with GROUP BY:
Aggregate functions are often used with the GROUP BY clause to group data based on one or more
columns.
Example:
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
Explanation:
This query shows the average salary of employees in each department.
2) Sub Queries
Definition:
A subquery (also called an inner query or nested query) is a query placed inside another SQL
query. It is used to return data that will be used in the main query. Subqueries help perform
operations that require intermediate results.
Types of Subqueries:
1. Scalar Subquery – returns a single value
2. Row Subquery – returns a single row
3. Table Subquery – returns multiple rows and columns
4. Correlated Subquery – depends on the outer query
Subquery Usage Locations:
• In the WHERE clause (commonly used)
• In the FROM clause (used as a temporary table)
• In the SELECT clause (returns single value for display)
Example:
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
Explanation:
The subquery calculates the average salary. The outer query then selects names of employees
who earn more than this average.
3) Joins and Its Types
Definition:
Joins are used to fetch data from two or more tables based on a related column between them. In
relational databases, data is often stored in separate tables, and joins help combine this data
logically.
Types of Joins:
1. INNER JOIN
Returns only the matching rows from both tables.
Example:
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;
2. LEFT JOIN (LEFT OUTER JOIN)
Returns all rows from the left table and matched rows from the right table. If no match is found,
NULL is shown.
Example:
SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
3. RIGHT JOIN (RIGHT OUTER JOIN)
Returns all rows from the right table and matched rows from the left table.
Example:
SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;
4. FULL JOIN (FULL OUTER JOIN)
Returns all rows when there is a match in one of the tables. If no match is found in one table,
NULL is returned.
Note: Not all databases support FULL JOIN directly.
Why Use Joins?
• To retrieve data spread across tables
• To avoid duplication
• To maintain normalized structures
4) Normalization and Its Types
Definition:
Normalization is the process of organizing data in a database to reduce redundancy (duplicate
data) and improve data integrity. It involves dividing large tables into smaller, related tables.
Goals of Normalization:
• Eliminate redundant data
• Ensure logical data storage
• Improve query performance
• Make updates efficient and accurate
Types (Normal Forms):
1. First Normal Form (1NF):
• No repeating groups or arrays in a table
• Each column must contain atomic (indivisible) values
Example:
Incorrect: Courses: Math, English, Science
Correct: Create separate rows for each course
2. Second Normal Form (2NF):
• Must be in 1NF
• All non-key attributes should fully depend on the primary key
Example:
Split a table where course names depend only on student ID (partial dependency)
3. Third Normal Form (3NF):
• Must be in 2NF
• No transitive dependency (i.e., non-key fields should not depend on other non-key fields)
Example:
A student’s city depends on the ZIP code, not directly on the student. Move city and ZIP to a
separate table.
Benefits of Normalization:
• Avoids data duplication
• Saves storage space
• Improves data consistency
• Makes updates and inserts easier
5) Data Warehouse
Definition:
A data warehouse is a system used for reporting and data analysis. It collects data from different
sources, transforms it, and stores it in a structured format for business decision-making.
Main Features:
• Subject-Oriented: Organized by topic (e.g., sales, finance)
• Integrated: Data from multiple sources is combined
• Time-Variant: Contains historical data
• Non-Volatile: Data is read-only and does not change
Uses of Data Warehouses:
• Business intelligence
• Data mining
• Market research
• Trend analysis
• Forecasting
Example:
A retail company collects daily sales data from all its branches into a data warehouse. Managers
use this data to see which products are most sold during holidays.