Mock Interview Questions and Answers
Mock Interview Questions and Answers
In SQL, both INNER JOIN and LEFT JOIN are used to combine rows from two
or more tables based on a related column between them. However, they
differ in how they handle unmatched rows between the tables.
1. INNER JOIN:
INNER JOIN returns only the rows that have matching values in
both tables based on the join condition.
If there are duplicate values in the join column of either table,
the INNER JOIN will return all possible combinations of those
duplicates.
Example:
SELECT *
FROM table1
INNER JOIN table2 ON table1.column_name = table2.column_name;
_name;
2. LEFT JOIN:
LEFT JOIN returns all the rows from the left table (the first
table mentioned in the query) and the matched rows from the
right table. If there are no matching rows in the right table, it
returns NULL values for the columns of the right table.
If there are duplicate values in the join column of either table,
the LEFT JOIN will still return all rows from the left table, and
for each occurrence of a duplicate value, it will produce a
separate row with the corresponding matches from the right
table.
Example:
SELECT *
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;
sqlCopy code
SELECT * FROM table1 LEFT JOIN table2 ON table1.column_name =
table2.column_name;
In summary, both INNER JOIN and LEFT JOIN can handle duplicates, but
they differ in how they handle unmatched rows and how they include
duplicates in the result set. INNER JOIN only returns matching rows, while
LEFT JOIN returns all rows from the left table and includes matching rows
from the right table, with NULL values for non-matching rows.
2.Practical quest.:From the table restaurant_transactions containing the columns id, date and final
bill. Find out the highest revenue month in the year 2021
Ans:-
1. Filter the data to include only transactions from the year 2021.
2. Group the data by month and calculate the total revenue for each
month.
3. Find the month with the highest total revenue.
SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(final_bill) AS total_revenue
FROM
restaurant_transactions
WHERE
EXTRACT(YEAR FROM date) = 2021
GROUP BY
EXTRACT(MONTH FROM date)
ORDER BY
total_revenue DESC
LIMIT 1;
3.Practical quest.: From the above data set find out the highest revenue month in each year.
Same as above.
4. Theoretical quest.: What is the difference between dense_rank() and rank() functions.
n SQL, both RANK() and DENSE_RANK() are window functions used to assign ranks to
rows within a partition of a result set based on the values of a specified column. However,
they differ in how they handle ties (rows with equal values) in the ranking process:
1. RANK() Function:
RANK() assigns unique ranks to rows within the partition.
If there are ties (rows with equal values), RANK() assigns the same rank to
each tied row, but leaves gaps in the ranking sequence for the next rank. For
example, if two rows tie for the first place, the next row would be ranked third
(not second).
The rank returned by RANK() increments by one for each distinct value in the
ordered partition.
2. DENSE_RANK() Function:
DENSE_RANK() also assigns ranks to rows within the partition.
If there are ties, DENSE_RANK() assigns the same rank to each tied row, but
does not leave gaps in the ranking sequence. It assigns consecutive ranks to
the tied rows, without any gaps.
The rank returned by DENSE_RANK() increments by one for each distinct
value in the ordered partition, similar to RANK(). However, it doesn't leave
gaps in the ranking sequence when there are ties.
Notice that RANK() leaves a gap between the second and third ranks because two rows are
tied for the second place, while DENSE_RANK() assigns consecutive ranks without any
gaps.
In summary, RANK() leaves gaps in the ranking sequence when there are ties, while
DENSE_RANK() assigns consecutive ranks without any gaps for tied rows.
5. Practical quest.: From two tables one containing movies data and other containing actors data
with movie_id as foreign key, find out the actors who are not having any movie for 3 or more
years.
find the actors who have not appeared in any movie for 3 or more years, you would typically
use a combination of SQL queries involving joins, date calculations, and filtering. Here's how
you can achieve this:
Assuming you have two tables: movies containing movie data and actors containing actor
data with movie_id as the foreign key.
You can start by joining the actors and movies tables based on the movie_id foreign key.
Then, you can calculate the maximum release date of each actor's movies and compare it with
the current date minus 3 years. If an actor's latest movie release date is earlier than the
calculated date, it means they haven't appeared in any movie for 3 or more years.
sqlCopy code
SELECT
actors.actor_id,
actors.actor_name
FROM
actors
LEFT JOIN
movies ON actors.actor_id = movies.actor_id
GROUP BY
actors.actor_id,
actors.actor_name
HAVING
MAX(movies.release_date) IS NULL OR
Explanation:
LEFT JOIN is used to join the actors and movies tables based on the actor_id.
GROUP BY is used to group the result set by actor_id and actor_name.
MAX(movies.release_date) calculates the maximum release date of each actor's
movies.
HAVING clause filters the result set to include actors whose maximum movie release
date is either NULL (indicating they have not appeared in any movie) or is less than
or equal to the date 3 years ago ( DATE_SUB(CURRENT_DATE(), INTERVAL 3
YEAR)).
This query will return the actor_id and actor_name of actors who have not appeared in
any movie for 3 or more years.
Other than these there were some random discussions about the query structure and CTEs.
Query structure:-
1. SELECT Clause:
The SELECT clause specifies which columns or expressions to
include in the query result set.
It is usually the first clause in an SQL query.
Example: SELECT column1, column2 FROM table_name;
2. FROM Clause:
The FROM clause specifies the table or tables from which to
retrieve data.
It comes after the SELECT clause.
Example: SELECT column1, column2 FROM table_name;
3. JOIN Clause:
The JOIN clause is used to combine rows from two or more
tables based on a related column between them.
It is used when data needs to be retrieved from multiple
tables.
Example: SELECT * FROM table1 JOIN table2 ON
table1.column_name = table2.column_name;
4. WHERE Clause:
The WHERE clause is used to filter rows based on specified
conditions.
It is optional but commonly used to restrict the number of
rows returned by the query.
Example: SELECT * FROM table_name WHERE condition;
5. GROUP BY Clause:
The GROUP BY clause is used to group rows that have the
same values into summary rows.
It is typically used with aggregate functions (e.g., COUNT,
SUM, AVG) to perform calculations on groups of rows.
Example: SELECT column1, SUM(column2) FROM table_name
GROUP BY column1;
6. HAVING Clause:
The HAVING clause is used in combination with the GROUP BY
clause to filter group rows based on specified conditions.
It is similar to the WHERE clause but operates on grouped
rows rather than individual rows.
Example: SELECT column1, SUM(column2) FROM table_name
GROUP BY column1 HAVING condition;
7. ORDER BY Clause:
The ORDER BY clause is used to sort the result set based on
one or more columns.
It can sort in ascending (default) or descending order.
Example: SELECT * FROM table_name ORDER BY column_name
ASC|DESC;
8. LIMIT Clause:
The LIMIT clause is used to restrict the number of rows
returned by the query.
It is commonly used with ORDER BY to retrieve a subset of
rows.
Example: SELECT * FROM table_name LIMIT 10;
CTE’s;-
CTE" stands for Common Table Expression. It's a temporary named result set that
you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs
are particularly useful for making complex queries more readable and
manageable by breaking them down into smaller, logical parts.
When working with SQL joins, handling NULL values can be important,
especially when dealing with outer joins where there might be unmatched
rows. Here's how you can handle NULLs in different types of joins:
1. INNER JOIN:
In an INNER JOIN, only the rows that have matching values in
both tables are returned.
If there are NULL values in the join columns of either table,
those rows will not be included in the result set.
So, there's no specific handling of NULLs needed in an INNER
JOIN because NULLs are implicitly excluded from the result
set.
2. LEFT JOIN:
In a LEFT JOIN, all the rows from the left table (the first table
mentioned in the query) are returned, along with matching
rows from the right table.
If there are no matching rows in the right table, NULL values
are returned for the columns of the right table.
You can use the IS NULL or IS NOT NULL operators to check for
NULL values in the columns from the right table.
Example:
sqlCopy code
SELECT * FROM table1 LEFT JOIN table2 ON table1.column_name =
table2.column_name WHERE table2.column_name IS NULL ; -- This condition
checks for unmatched rows from table2
3. RIGHT JOIN:
In a RIGHT JOIN, all the rows from the right table (the second
table mentioned in the query) are returned, along with
matching rows from the left table.
If there are no matching rows in the left table, NULL values
are returned for the columns of the left table.
Similarly, you can use the IS NULL or IS NOT NULL operators to
check for NULL values in the columns from the left table.
4. FULL OUTER JOIN:
In a FULL OUTER JOIN, all rows from both tables are returned,
with NULL values for columns that do not have a match in the
other table.
You can use IS NULL or IS NOT NULL operators to filter rows
based on NULL values in the joined columns.
to calculate a rolling average over a 5-day period from a given table, you
can achieve this using a window function in SQL, provided that your
database system supports it (e.g., PostgreSQL, MySQL 8.0+, SQL Server,
etc.).
SELECT
date_column,
value_column,
AVG(value_column) OVER (ORDER BY date_column ROWS BETWEEN 4
PRECEDING AND CURRENT ROW) AS rolling_avg
FROM
your_table;
Window frames:-
sqlCopy code
SELECT
department_id,
MAX(salary) AS second_highest_salary
FROM (
SELECT
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC)
AS salary_rank
FROM
your_table
) ranked_salaries
WHERE
salary_rank = 2
GROUP BY
department_id;
In this query:
This query will give you the second-highest salary within each
department. If there are ties for the highest salary within a department,
this query will still return the second-highest salary.
If use “Un bounded preceding and unbounded following” which will give only SINGLE output
based on all INPUT Values/PARTITION(if used)
Ranking Functions:-
Note:-Make sure you should highlight the column name in lead and lag functions
Very important note that in the lead and lag functions that how many rows you are leading or lagging
which you need to highlight in the lead /lag functions like Lead(new_id,2) or lag(new_id,2)
nth value of the salary using dense rank, you can utilize the DENSE_RANK() window
function along with a subquery. Here's how you can do it:
sqlCopy code
In this query:
WITH ranked_salaries AS (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS salary_dense_rank
FROM
your_table
)
SELECT
salary
FROM
ranked_salaries
WHERE
salary_dense_rank = @n;
This query will return the salary corresponding to the nth dense rank value specified by @n.
Adjust the table name and column names as per your actual database schema.
SELECT DISTINCT
department_id,
NTH_VALUE(salary, @n) OVER (PARTITION BY department_id ORDER BY salary
DESC) AS nth_salary
FROM
your_table;
In this query:
This query will return the nth salary value for each department. Adjust the
table name and column names as per your actual database schema.
find the employees who work for the weekends for each company
Query :-
3. SELECT
4. company_name,
5. employee_name
6. FROM
7. employees
8. WHERE
DAYOFWEEK(work_day) IN (1, 7);
Hemanth document too to document all of the
questions:
order of execution:-
1. FROM: This clause specifies the tables from which the data will be
retrieved.
2. WHERE: This clause filters the rows returned by the FROM clause
based on the specified conditions.
3. GROUP BY: If grouping is specified, the rows are grouped based on
the columns specified in this clause.
4. HAVING: This clause filters the grouped rows based on the
specified conditions.
5. SELECT: This clause selects the columns that will be included in the
result set.
6. ORDER BY: This clause sorts the result set based on the specified
columns.
7. LIMIT / OFFSET: These clauses are used to limit the number of
rows returned or to skip a certain number of rows.
t's important to note that not all queries will include all of these clauses, and the
order of execution may vary based on the specific query. For example, if you're
not grouping data, the GROUP BY and HAVING clauses won't be present.
Similarly, if you're not sorting data, the ORDER BY clause won't be present.
As you can see, each row from Table A is combined with every row from
Table B, resulting in a total of 3 x 3 = 9 rows in the output.
Cross joins are not as commonly used as other types of joins because they
can lead to large result sets, especially when dealing with tables that
contain a large number of rows. However, they can be useful in certain
scenarios, such as when you need to generate all possible combinations of
rows from two tables.
It's important to exercise caution when using cross joins, as they can
easily lead to unintended consequences if not used carefully.
Relational databases organize data into tables, where each table consists
of rows and columns. They follow the relational model based on the
principles of relational algebra and set theory. Relationships between
tables are established using keys (e.g., primary keys, foreign keys),
enabling efficient querying and data integrity enforcement.
This query will give you the profession with the highest average income in
each category. Adjust the table and column names according to your
database schema.
In above table find percentage of female users in each
profession compared to total users :-
To find the percentage of female users in each profession compared to
the total users, you can use a SQL query with a combination of
aggregation functions and window functions. Assuming you have a table
named users with columns profession, gender, and user_id, the following
query can be used:
In this query:
1. We first calculate the count of female users, the total count of users,
and the total count of users per profession using the COUNT()
function along with a CASE statement to count only female users.
2. We use a window function ROW_NUMBER() to assign a rank to each
profession based on the total count of users. The PARTITION BY
clause partitions the data by profession, and the ORDER BY clause
orders the data by the total count of users in descending order.
3. We then join the result of the first CTE with the result of the second
CTE to get the total count of users per profession.
4. Finally, we select the profession, gender, and the percentage of
female users in each profession compared to the total users, where
the gender is female and the rank is 1.
1. ROW_NUMBER():
Assigns a unique integer to each row within the partition.
The numbering starts at 1 for the first row in the partition and
increments by 1 for each subsequent row.
It does not handle ties; each row gets a distinct number even
if multiple rows have the same values.
2. RANK():
Assigns a unique rank to each distinct value within the
partition.
If there are ties (i.e., rows with the same values), they receive
the same rank, and the next rank is skipped.
For example, if two rows tie for the first position, the next row
receives a rank of 3, not 2.
3. DENSE_RANK():
Similar to RANK(), assigns a unique rank to each distinct value
within the partition.
Handles ties like RANK(), but it does not skip ranks. Instead, it
assigns consecutive ranks to tied rows.
For example, if two rows tie for the first position, the next row
receives a rank of 2, not 3.
Here's a comparison using an example:
Student Score
Alice 90
Bob 85
Carol 90
David 80
Eve 85
1. Subqueries:
Filtering and Aggregation: Subqueries are often used to filter results based
on conditions that cannot be directly expressed in a WHERE clause. For
example, you might want to filter rows based on the result of another query.
Nested Queries: Subqueries allow you to nest one query inside another. This
is useful when you need to perform a calculation or retrieve data based on the
result of another query.
Subquery Expressions: Subqueries can also be used to return a single value
or a list of values that can be used in various parts of a query, such as
SELECT, WHERE, HAVING, and even as part of an expression.
Correlated Subqueries: These are subqueries where the inner query
references a column from the outer query. Correlated subqueries can be used
to perform row-by-row processing or to filter data based on values from the
outer query.
2. Window Functions:
Analytical Calculations: Window functions are used to perform calculations
across a set of rows related to the current row. They allow you to calculate
running totals, moving averages, rank items, and perform other analytical
tasks without grouping the result set.
Avoiding Subqueries: Window functions can often replace subqueries and
are generally more efficient and easier to read. They allow you to achieve
similar results without the need for nested queries.
Partitioning Data: Window functions partition the result set into groups of
rows based on specified criteria, such as grouping by a particular column. This
allows you to perform calculations within each partition separately.
ORDER BY Clause: Window functions also allow you to specify an order for
the rows within each partition, which is useful for calculating running totals or
finding the "top N" items within each group.
In summary, while subqueries and window functions can both be used for similar tasks, they
have distinct capabilities and use cases. Subqueries are primarily used for filtering,
aggregation, and nested queries, while window functions are used for analytical calculations
and partitioning data within a query result set. Depending on the specific requirements of
your query, you may choose to use one or both of these features to achieve your desired
result.
Running Totals:-
You can find running totals using window functions in SQL. Here's an
example of how you can achieve this:
Let's say you have a table named sales with columns date and amount,
and you want to calculate the running total of sales amount over time.
This query will return a result set with three columns: date, amount, and
running_total . The running_total column will contain the running total of
sales amount up to each date.
In this result, the running_total column contains the cumulative sum of the
amount column up to each date.
Similarly ,
To calculate the running total and percentage of running total for each
month in the sales table, you can use window functions along with
subqueries to achieve this. Here's how you can do it:
Explanation:
This query will give you the month, amount, monthly running total, and
percentage of running total for each month in the sales table.
Moving average:-
To calculate the moving average of a specific column over a certain window of
rows, you can use the window function AVG() along with the ROWS or range
clause in SQL. Here's an example of how to calculate the moving average for a
column named amount over a window of the last 3 rows:
In this query:
This query will return a result set with three columns: date, amount, and
moving_average . The moving_average column will contain the moving
average of the amount column over the specified window of rows.
In this result, the moving_average column contains the moving average of the
amount column over the window of the last 3 rows, including the current row.
Note that for the first two rows, where there are not enough preceding rows to
calculate the moving average, the result is NULL. Adjust the window size and
column names according to your specific requirements and database schema.
Find Moving Average for each month and percentage of the total for each month :-
To calculate the moving average for each month and the percentage of the total
for each month, you can use a combination of window functions, subqueries, and
common table expressions (CTEs) in SQL. Below is an example of how you can
achieve this:
Let's say you have a table named sales with columns product, category,
and sales_amount, and you want to find the top 3 products within each
category based on their sales amount.
In this query:
This query will give you the top 3 products within each category based on their sales amount.
Views:
DDL: Views are created using the DDL command CREATE VIEW. This
command defines the structure of the view based on the SELECT
query provided.
DML: Views can be queried using DML commands like SELECT. They
do not store data themselves but provide a way to query data
stored in tables.
DCL: Permissions can be granted or revoked on views using DCL
commands like GRANT and REVOKE. This allows controlling access
to the underlying tables through the views.
DQL: Views are primarily used for querying data, which falls under
DQL. They allow users to retrieve data from tables in a more
convenient or secure manner.
To delete duplicate rows from a table, you can use the DELETE statement along
with a common table expression (CTE) to identify the duplicate rows. Here's a
general approach:
Asked about creating multiple tables under a database how does it
will perform:-
1. Creating Tables:
Use the CREATE TABLE statement to create individual tables
within your database.
Specify the table name, along with the columns and their data
types, constraints, indexes, and any other relevant attributes.
Here's a basic example of creating two tables in a SQL
database:
Let's say you have a table named orders with columns order_id and
order_date, and you want to find consecutive orders based on the order
date.
WITH MonthlyOrders AS (
SELECT
CustomerID,
FROM
Orders
),
ConsecutiveMonths AS (
SELECT
CustomerID,
year,
month,
FROM
MonthlyOrders
WHERE
distinct_months = 1
SELECT
CustomerID,
MIN(OrderDate) AS start_date,
MAX(OrderDate) AS end_date
FROM
Orders
WHERE
EXISTS (
SELECT 1
FROM ConsecutiveMonths
HAVING COUNT(*) = 12
GROUP BY
CustomerID, month_rank
ORDER BY
CustomerID, start_date;
To calculate the running average of a restaurant's orders for the next 3 days
including the current day, you can use a window function along with a suitable
date range condition in SQL.
WITH DailyOrders AS (
SELECT
order_date,
restaurant_id,
order_count,
SUM(order_count) OVER (PARTITION BY restaurant_id ORDER BY
order_date ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS
running_total
FROM
orders
WHERE
order_date BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL '3
days'
)
SELECT
order_date,
restaurant_id,
order_count,
running_total / 3.0 AS running_average
FROM
DailyOrders
ORDER BY
restaurant_id, order_date;