KEMBAR78
February SQL Questions Compiled | PDF | Data Management | Computing
0% found this document useful (0 votes)
136 views174 pages

February SQL Questions Compiled

The SQL query joins user activity log data with user profiles to calculate the number of unique users who had any activity in each month. It uses a HAVING clause to filter for months where the count of distinct users is greater than zero to find the monthly active users. SELECT DATE_FORMAT(date, '%Y-%m') AS month, COUNT(DISTINCT user_id) AS mau FROM user_activity a JOIN user_profiles p ON a.user_id = p.user_id GROUP BY month HAVING mau > 0 This query joins the user activity log

Uploaded by

subhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views174 pages

February SQL Questions Compiled

The SQL query joins user activity log data with user profiles to calculate the number of unique users who had any activity in each month. It uses a HAVING clause to filter for months where the count of distinct users is greater than zero to find the monthly active users. SELECT DATE_FORMAT(date, '%Y-%m') AS month, COUNT(DISTINCT user_id) AS mau FROM user_activity a JOIN user_profiles p ON a.user_id = p.user_id GROUP BY month HAVING mau > 0 This query joins the user activity log

Uploaded by

subhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 174

PROBLEM STATEMENT: Calculate the running total of amount for transactions table.

Table: transactions

id amount
1 10
2 20
3 30
4 40
5 50
6 60
7 70
8 80
9 90
10 100

MENTAL APPROACH:
1. We will take each row sum amount of all the rows above it and include it to sum them up.

USING CORRELATED SUBQUERY:


A correlated query in SQL is a type of subquery where the outer query is dependent on the inner query. This
means that the inner query is executed for each row of the outer query. The result of the inner query is used
to filter the outer query. The inner query is essentially a temporary table that is used to perform calculations
and return results to the outer query.

QUERY:
QUERY EXPLANATION:
1. We are selecting the ID, amount, and then calculating the running total of the amount by utilizing a
subquery for it.
For each row in the outer query, the inner query calculates the sum of the amount where the id from the
inner query (t2.id) is less than or equal to the id from the outer query (t1.id).
This acts as a loop, where for each row, the running total is calculated by the subquery and then it moves
to the next id from outer query and checks the WHERE clause condition, and continues this process till
last row.

USING WINDOW FUNCTION:


QUERY:

QUERY EXPLANATION:
1. We are SELECTING id, amount, and SUM(amount).
2. With SUM(amount) we are using the OVER clause and ORDER BY inside it so that for each id, it will sum
the amount for the current row and all the previous rows (which will give running_total).

OUTPUT:
id amount running_total
1 10 10
2 20 30
3 30 60
4 40 100
5 50 150
6 60 210
7 70 280
8 80 360
9 90 450
10 100 550
By Manish Kumar Chaudhary

PARETO PRINCIPAL: The Pareto Principle states that for many outcomes, roughly 80% of consequences come
from 20% of causes. It is also known as the 80/20 rule. In other words, a small portion of inputs leads to a large
portion of outcomes.
e.g:
1. 80% of the productivity comes from 20% of the employees.
2. 80% of the sales come from 20% of the clients.
3. 80% of the sales come from 20% of the products or services.
4. 80% of the decisions made in a meeting are completed in 20% of the total meeting time.
5. 20% of the people hold 80% of the wealth.
6. 80% of employees earn what the top 20% of employees earn in a company.

PROBLEM STATEMENT: Find the products for which total sales amount to 80% of total sales (total revenue).
For the dataset and video tutorial by Ankit Bansal Sir click here.

MENTAL APPROACH:
1. Calculate the total sales and set aside 80% of that total.
2. Determine the sales for each individual product.
3. Arrange the product sales in descending order, with the highest-selling products at the top.
4. Sum up the sales for each product until the total equals 80% of the total sales.
The products we obtain will contribute to 80% of the company's overall sales

QUERY:

QUERY EXPLANATION:
1. The first CTE, "product_sales_cte," calculates sales for each product.
2. The second CTE, "running_total_cte," calculates the running total.
This will help us find the running total and compare where it becomes 80% of the total revenue.
3. The SELECT statement queries the necessary fields and uses a WHERE clause to identify records where the
running total is less than 80% of the total revenue.
A subquery is used in the WHERE clause to determine 80% of the total revenue.
QUERY TO GET WHAT % OF PRODUCTS CONTRIBUTE TO 80 % OF TOTAL SALES:

QUERY EXPLANATION:
1. Here, we utilise the same CTEs as before, but now the final SELECT query is also used inside a new CTE, the
third CTE final_product_cte.
This provided us with a list of products that account for 80% of total sales (total revenue).
2. We are now utilising the SELECT clause to COUNT product from the aforementioned final_product_cte and
divide it by the total number of distinct products.
Subquery is used to count the total number of goods. Grammar correction is required.

OUTPUT:
perct_product
22.18045

Visit my LinkedIn page by clicking here for more answers to interview questions.
BY MANISH KUMAR CHAUDHARY

PROBLEM STATEMENT: Identify projects that are at risk for going overbudget. A project is considered to be
overbudget if the cost of all employees assigned to the project is greater than the budget of the project.
You'll need to prorate the cost of the employees to the duration of the project. For example, if the budget for
a project that takes half a year to complete is $10K, then the total half-year salary of all employees assigned to
the project should not exceed $10K. Salary is defined on a yearly basis, so be careful how to calculate salaries
for the projects that last less or more than one year.
Output a list of projects that are overbudget with their project name, project budget, and prorated total
employee expense (rounded to the next dollar amount).

Table: linkedin_projects

FIELD TYPE
id int
title varchar
budget int
start_date datetime
end_time datetime

Table: linkedin_emp_projects

FIELD TYPE
emp_id int
project_id int

Table: linkedin_employees

FIELD TYPE
id int
first_name varchar
last_name varchar
salary int

MENTAL APPROACH:
MENTAL APPROACH:
1. Combine all the tables on the basis of matching columns.
2. Determine the time required to complete the project.
3. Calculate the salary per month for each employee and then determine the salary required for the project
period.
4. Sum the salary of each employee for each project.
5. Compare the budget with the total calculated prorated salary.
So we can get the risky projects.

QUERY:

QUERY EXPLANATION:
1. To calculate employee expenses per project, we are using the CTE calc_cte by joining all the tables
together.
We use the DATEDIFF function to find the number of months required for each project, and then divide
by 12 to get the per-month salary.
The SUM function is used to sum the salaries for all individual projects.
2. With a SELECT statement, we obtain the required fields and use a WHERE condition to filter records
where the budget is less than the total employee expense.
FINAL OUTPUT:

title budget employee_expense


Project1 29498 39831
Project11 11705 31779
Project12 10468 60680
Project14 30014 33092
Project16 19922 22030
Project18 10302 45508
Project2 32487 55278
Project20 19497 56910
Project21 24330 58267
Project22 18590 23729
Project24 11918 79966
Project25 38909 54763
Project26 36190 83726
Project29 10935 51443
Project30 24011 52724
Project32 12356 62578
Project33 30110 50171
Project34 16344 21422
Project35 23931 26209
Project36 4676 22458
Project37 8806 64771
Project4 15776 32152
Project40 42941 43697
Project42 24934 29855
Project44 22885 49623
Project46 9824 38740
Project50 18915 24904
Project6 41611 61498
Project9 32341 46341
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

MONTHLY ACTIVE USERS

DIFFICULTY LEVEL :HARD


Question From
MONTHALY ACTIVE USERS
Monthly active user or MAU is a term that refers to the
number of unique customers who interacted with a
service or product of a business within a month.

Essentially, MAU is a key performance indicator (KPI)


that measures online user engagement. Many online
businesses use this metric, including online gaming,
social networking, and mobile app companies. But
other types of companies use MAU as well.

Investors generally look closely at the number of


MAUs of a business. Why? Because the metric
provides a quick overview of the business’s user
growth. Furthermore, MAU delivers some crucial
insights into the business’s ability to attract new
customers and retain the existing ones.
PROBLEM STATEMENT
Assume you have the table below containing
information on Facebook user actions. Write a query
to obtain the active user retention in July 2022. Output
the month (in numerical format 1, 2, 3) and the
number of monthly active users (MAUs).
Hint: An active user is a user who has user action
("sign-in", "like", or "comment") in the current month
and last month.

user_actions Table:
Column Name Type

user_id integer

event_id integer

string ("sign-in, "like",


event_type
"comment")

event_date datetime
user_actionsExample Input:
user_id event_id event_type event_date

445 7765 sign-in 05/31/2022 12:00:00

742 6458 sign-in 06/03/2022 12:00:00

445 3634 like 06/05/2022 12:00:00

742 1374 comment 06/05/2022 12:00:00

648 3124 like 06/18/2022 12:00:00

Example Output for June 2022:


mont monthly_active_user
h s

6 1

Example
In June 2022, there was only one monthly active user
(MAU), user_id 445.
MENTAL APPROACH

1.Mentally it is very simple we just need to check the


current month's users and compare it with the previous
month's users.

If a user in the current month is found in the previous


month that means he/she is an active user.

2.So we will have to do similarly for all months that are


available and just count the number of active users in
current month.
QUERY
THIS IS NOT FINAL QUERY IT IS WHAT IN BACKGROUND
WE HAVE
QUERY EXPLANATION
1.We are getting the records by self-joining the
table on the basis of different conditions.

The first condition is that a.user_id<=b.user_id.


This means the user_id from table b will be joined
when the user_id from table a is less or equal. So
we will get multiple combinations.

After this, we gave a condition based on the date


that event_date from table 'a' should be less than
or equal to table 'b'. What this will do is it will filter
out those records where the date from table 'b' is
less than or equal to table 'a'. This we are checking
so that we can compare next month's date
greater than the previous month's date.

Now we are giving condition to check the month


difference between dates is 1 and also checking
for only July month.
FINAL QUERY

QUERY EXPLANATION
1.In this query we are simply counting the distinct
number of users now and grouping them on the basis
of month.
SAMPLE OUTPUT BEFORE ACTUAL OUTPUT
user_id event_id event_date user_id event_date

445 3634 06/05/2022 12:00:00 648 07/03/2022 12:00:00

648 3124 06/18/2022 12:00:00 648 07/03/2022 12:00:00

648 2725 06/22/2022 12:00:00 648 07/03/2022 12:00:00

445 3634 06/05/2022 12:00:00 445 07/05/2022 12:00:00

FINAL OUTPUT

month_no monthly_active_users

7 2
BY MANISH KUMAR CHAUDHARY

THANK YOU
“Whatever you are, be a good one.”
Abraham Lincoln
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Output the user id and current


payment status sorted by the
user id.

DIFFICULTY LEVEL :HARD


Question From
PROBLEM STATEMENT
Write a query to update the Facebook
advertiser's status using the daily_pay
table. Advertiser is a two-column table
containing the user id and their
payment status based on the last
payment and daily_pay table has
current information about their
payment. Only advertisers who paid
will show up in this table.

Output the user id and current


payment status sorted by the user id.
advertiser Table:

Column Name Type

user_id string

status string

advertiser Example Input:

user_id status

bing NEW

yahoo NEW

alibaba EXISTING
daily_pay Table:

Column Name Type

user_id string

paid decimal

daily_pay Example Input:

user_id paid

yahoo 45.00

alibaba 100.00

target 13.00
Example Output:
user_id new_status

bing CHURN

yahoo EXISTING

alibaba EXISTING

Bing's updated status is CHURN


because no payment was made in
the daily_pay table whereas Yahoo
which made a payment is updated
as EXISTING.
CONDITION

# Start End Condition

1 NEW EXISTING Paid on day T

2 NEW CHURN No pay on day T

3 EXISTING EXISTING Paid on day T

4 EXISTING CHURN No pay on day T

5 CHURN RESURRECT Paid on day T

6 CHURN CHURN No pay on day T

7 RESURRECT EXISTING Paid on day T

8 RESURRECT CHURN No pay on day T


QUERY
QUERY EXPLANATION
To get the records from both the table
we have made use of full join so that
we get user id from both tables.

Now, here problem is that user id from


one table will be null and for one there
will be a value. So, to get those record
in single column we are using
COALESCE.
OUTPUT BEFORE COALESCE

user_id status user_id paid

bing NEW

yahoo NEW yahoo 45.00

alibaba EXISTING alibaba 100.00

baidu EXISTING

target CHURN target 13.00

tesla CHURN

morgan RESURRECT morgan 600.00

chase RESURRECT

fitdata 25.00
OUTPUT
user_id new_status

alibaba EXISTING

baidu CHURN

bing CHURN

chase CHURN

fitdata NEW

morgan EXISTING

target RESURRECT

tesla CHURN

yahoo EXISTING
BY MANISH KUMAR CHAUDHARY

THANK YOU
If you want to lift yourself up, lift up
someone else.
Booker T. Washington
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Apple Product Counts


(Asked by Google)

DIFFICULTY LEVEL :MEDIUM


Question From
PROBLEM STATEMENT
Find the number of Apple product users and the
number of total users with a device and group the
counts by language. Assume Apple products are only
MacBook-Pro, iPhone 5s, and iPad-air. Output the
language along with the total number of Apple users
and users with any device. Order your results based
on the number of total users in descending order.

playbook_events playbook_users
user_id: int user_id: int
occurred_at: datetime created_at: datetime
event_type: varchar company_id: int
event_name: varchar language: varchar
location: varchar activated_at: datetime
device: varchar state: varchar
MENTAL APPROACH
1. First, combine both tables on the basis of user_id.
2. Find the total number of users for each language those
are having a device by simply counting them all.
3. Now for each language find the total number of users
who have a device of Apple.
4. As we have the total number of users who have devices
and the total number of users who have apple devices for
each langauge. Now we will will sort them in ddecreasing
order on the basis of total number of users with device.

QUERY
QUERY EXPLANATION
1. We have used CASE WHEN with SUM function so that we
can flag the applie products as 1 and other device as 0
and then SUM them up to get total count of it.
2. For getting total number of users with a device we
simply used COUNT function.
3. GROUP BY on the basis of language as for each
langauge we want these COUNTs.
4. At last ordering them in descending order of COUNT of
total number of users with device.
SAMPLE OUTPUT
no_of_apple_prod no_of_total_users_wi
language
ucts_users th_device

english 17 53

spanish 2 16

japanese 2 6
BY MANISH KUMAR CHAUDHARY

THANK YOU
Strive for progress, not perfection.
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Write a query to get the average stars for


each product every month.
(Asked by Amazon)

DIFFICULTY LEVEL :MEDIUM


Question From
PROBLEM STATEMENT
Given the reviews table, write a query to get the
average stars for each product every month.
The output should include the month in numerical
value, product id, and average star rating rounded to
two decimal places. Sort the output based on month
followed by the product id.

reviews Table:

Column Name Type

review_id integer

user_id integer

submit_date datetime

product_id integer

stars integer (1-5)


reviews Example Input:
review_id user_id submit_date product_id stars

6171 123 06/08/2022 00:00:00 50001 4

7802 265 06/10/2022 00:00:00 69852 4

5293 362 06/18/2022 00:00:00 50001 3

6352 192 07/26/2022 00:00:00 69852 3

4517 981 07/05/2022 00:00:00 69852 2

Example Output:
mth product avg_stars

6 50001 3.50

6 69852 4.00

7 69852 2.50

Explanation
In June (month #6), product 50001 had two ratings - 4
and 3, resulting in an average star rating of 3.5.
MENTAL APPROACH
1.For each product id for each month note down their
stars.
2.Find the average of stars that are received by each
product id for that particular month.

Query

Query EXPLANATION
1.We simply writing simple query and getting the
average of stars and grouping on the basis of month
no and product id.
In SQL Server we can directly use MONTH() function to
extract the month no.
OUTPUT
month_no product_id average_star_rating

5 25255 4.00

5 25600 4.33

6 12580 4.50

6 50001 3.50

6 69852 4.00

7 11223 5.00

7 69852 2.50
BY MANISH KUMAR CHAUDHARY

THANK YOU
“Whatever you are, be a good one.”
Abraham Lincoln
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Average Weight of Medal-Winning Judo


(Question asked by ESPN)

DIFFICULTY LEVEL :MEDIUM


Question From
PROBLEM STATEMENT
Find the average weight of medal-winning Judo
players of each team with a minimum age of 20 and
a maximum age of 30. Consider players at the age of
20 and 30 too. Output the team along with the
average player weight.
id: int
name: varchar
sex: varchar
age: float
height: float
weight: float
team: varchar
noc: varchar
games: varchar
year: int
season: varchar
city: varchar
sport: varchar
event: varchar
medal: varchar
MENTAL APPROACH
1. For each team first, we will take all judo-playing members
and also who have won any medal.
2. Now as we want players whose age is between 20 and 30
so we will select those players records.
3. Now we will find the average weight of these players for
each team

QUERY
QUERY EXPLANATION
1.We are simplying writing query to get team and average
weight in SELECT clause.
2. In WHERE clause we are giving condition that sport we are
looking for is Judo and age should be between 20 and 30
along with condition that medal column is NOT NULL.
Here, medal column NOT NULL is used because we want
players who have won some medal.
3. GROUP BY used for grouping on the basis of team.

OUTPUT

team avg_wt

France 77

Georgia 84

Japan 70

Romania 48
BY MANISH KUMAR CHAUDHARY

THANK YOU
“However difficult life may seem, there is always
something you can do and succeed at.”
STEPHEN HAWKING
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Find the cancellation rate of requests with


unbanned users for each day.
(Leetcode Hard Problem)

DIFFICULTY LEVEL :HARD


Question From
Ankit Bansal
Yotube Channel
PROBLEM STATEMENT
Write a SQL query to find the cancellation rate of
requests with unbanned users (both client and driver
must not be banned) each day between "2013-10-01"
and "2013-10-03". Round the cancellation rate to two
decimal points.
The cancellation rate is computed by dividing the
number of cancelled (by client or driver) requests
with unbanned users by the total number of request
with unbanned users
FIELD on that day TYPE

id int

client_id int

driver_id int

city_id int

status varchar

request_at varchar
trips Table
FIELD TYPE

id int

client_id int

driver_id int

city_id int

status varchar

request_at varchar

users Table
FIELD TYPE

user_id int

banned varchar

role varchar
trips Table Input

id client_id driver_id city_id status request_at

1 1 10 1 completed 2013-10-01

2 2 11 1 cancelled_by_driver 2013-10-01

3 3 12 6 completed 2013-10-01

4 4 13 6 cancelled_by_client 2013-10-01

5 1 10 1 completed 2013-10-02

6 2 11 6 completed 2013-10-02

7 3 12 6 completed 2013-10-02

8 2 12 12 completed 2013-10-03

9 3 10 12 completed 2013-10-03

10 4 13 12 cancelled_by_driver 2013-10-03
users Table Input
users_id banned role

1 No client

2 Yes client

3 No client

4 No client

10 No driver

11 No driver

12 No driver

13 No driver

MENTAL APPROACH
1. We need to combine both tables and then filter out
where users are not banned.
2.After this now we are counting the total request for each
date and also counting the total canceled requests either
by client or driver.
3.Now calculate the cancellation rate by using formula.
QUERY
QUERY EXPLANATION
1. We are using the CTE flag_count_cte to count the number
of requests raised per day and count the canceled request
either by client or user.
Here, we have joined two times because the client id and
driver id both are included in the user's table.

2. Now, in the SELECT query, we are retrieving request date


along with the cancellation rate for each date.

OUTPUT OF CTE
request_at count_requests count_cancelled

2013-10-01 3 1

2013-10-02 2 0

2013-10-03 2 1
OUTPUT

request_at cancellation_rate

2013-10-01 33.33

2013-10-02 0.00

2013-10-03 50.00
BY MANISH KUMAR CHAUDHARY

THANK YOU
"Success is not final, failure is not fatal:
it is the courage to continue that
counts."
Winston Churchill
DO YOU KNOW?
WHAT IS CASCADING
REFRENTIAL INTEGRITY?

Interview Question

SQL
BY MANISH KUMAR CHAUDHARY
CASCADING REFERENTIAL INTEGRITY
As we know, if tables are related through a foreign
key and primary key, we cannot delete the existing
table until the referential integrity is removed. To
address this issue, SQL Server offers four constraints
that allow us to define the actions it should take
when a user tries to delete or update a key to which
existing foreign key points. These four constraints are
termed as Cascading Referential Integrity constraint.
These four constraints are:
1. No Action: This is SQL Server's default action. This
indicates that if an update or deletes statement impacts
rows in foreign key tables, the action shall be rejected
and rolled back. An error message will be displayed.
2. Cascade: If a user attempts to delete data that
would impact the rows in a table with a foreign key
constraint, those rows will be deleted automatically
when the corresponding primary key record is
deleted. Similarly, if an update statement affects
rows in the foreign key table, those rows will be
updated with the new value from the primary key
record after it has been updated.
3. SET NULL: When a user attempts to delete or
update data that would impact rows in a table with a
foreign key constraint, the values in the foreign key
columns will be set to NULL when the corresponding
primary key record is deleted or updated in the
primary key table. It's important to note that the
foreign key columns must allow NULL values for this
to occur.
4. SET DEFAULT: If a delete or update statement
modifies rows in a table with a foreign key
constraint, then all rows that contain those foreign
keys are set to their default value. However, in order
for this to happen, all foreign key columns in the
related table must have default constraints defined
on them.
THANK YOU
“If you do what you always did, you will get what you
always got.”
CAN YOU SOLVE?
A teacher wants to replace each
student with adjacent student.

Interview Question

SQL
BY MANISH KUMAR CHAUDHARY
PROBLEM STATEMENT
In school, a teacher wants to change the seat of each
student to the adjacent student

If the number of students is odd then for the last


student we will not have any replacement. So, he/she
will seat at same place.

Here, id is seat number.

Assumption: All seats are occupied.


student_seat Table OUTPUT

id student id student

1 Hitachi 1 Hyundai

2 Hyundai 2 Hitachi

3 Bajaj 3 Hero

4 Hero 4 Bajaj

5 Tata 5 Tata
QUERY
QEURY EXPLANATION
I am taking use of CTE to make query look simpler.

1.In CTE we are making use of LEAD to get the next student who seats just after the
student who is currently seating at that seat. And LAG to get the previous student who
was seating just before the current student who is now seating at that seat.

2.Now we will check with the CASE statment and get the student.

When the seat id is odd then we replace the current student with next student. And if the
seat id is even then we will replace current student with previous student.

Now if the number of students is odd then last student will be at odd seat id and it will be
displayed as NULL. Thus, to solve this we are using ISNULL which will replace the NULL
with the same student.
CTE OUTPUT

id student next_student prev_student

1 Hitachi Hyundai NULL

2 Hyundai Bajaj Hitachi

3 Bajaj Hero Hyundai

4 Hero Tata Bajaj

5 Tata NULL Hero


FINAL OUTPUT

id student

1 Hyundai

2 Hitachi

3 Hero

4 Bajaj

5 Tata
THANK YOU
The secret of getting ahead is getting started.
Mark Twain
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Write a query to list the top three cities


that have the most completed trade
orders in descending order.
(Asked by Robinhood)

DIFFICULTY LEVEL :EASY


Question From
PROBLEM STATEMENT
You are given the tables below containing
information on Robinhood trades and users. Write a
query to list the top three cities that have the most
completed trade orders in descending order.
trades Table:
Column Name Type

order_id integer

user_id integer

price decimal

quantity integer

status string('Completed' ,'Cancelled')

timestamp datetime
trades Example Input:
order_id user_id price quantity status timestamp

100101 111 9.80 10 Cancelled 08/17/2022 12:00:00

100102 111 10.00 10 Completed 08/17/2022 12:00:00

100259 148 5.10 35 Completed 08/25/2022 12:00:00

100264 148 4.80 40 Completed 08/26/2022 12:00:00

100305 300 10.00 15 Completed 09/05/2022 12:00:00

100400 178 9.90 15 Completed 09/09/2022 12:00:00

100565 265 25.60 5 Completed 12/19/2022 12:00:00

users Table:
Column Name Type

user_id integer

city string

email string

signup_date datetime
users Example Input:
user_id city email signup_date

111 San Francisco rrok10@gmail.com 08/03/2021 12:00:00

148 Boston sailor9820@gmail.com 08/20/2021 12:00:00

harrypotterfan182@gmail.c
178 San Francisco 01/05/2022 12:00:00
om

265 Denver shadower_@hotmail.com 02/26/2022 12:00:00

houstoncowboy1122@hotm
300 San Francisco 06/30/2022 12:00:00
ail.com

Example Output:

city total_orders

San Francisco 3

Boston 2

Denver 1
MENTAL APPROACH
1. Combine both tables and for each city count the
number of orders that has status as "completed".
2. Now out of this select top 3 which is having highest
number of orders.
QUERY
QUERY EXPLANATION
1. We are using CTE rank_cte to count total number orders
in each city as well as providing rank to them so that we
can get the top 3.
2. By using SELECT query we are fetching the records which
are having rank<=3. This will give us top 3 records.

OUTPUT

city total_orders

San Francisco 4

Boston 3

Denver 2
BY MANISH KUMAR CHAUDHARY

THANK YOU
“Life isn’t about finding yourself. Life is
about creating yourself.”
George Bernard Shaw
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Data Science Skills


[LinkedIn SQL Interview Question]

Question From
PROBLEM STATEMENT
Given a table of candidates and their skills, you're
tasked with finding the candidates best suited for an
open Data Science job. You want to find candidates
who are proficient in Python, Tableau, and PostgreSQL.
Write a query to list the candidates who possess all of
the required skills for the job. Sort the output by
candidate ID in ascending order.
Assumption:
There are no duplicates in the candidates table.

candidates Table:

Column Name Type

candidate_id integer

skill varchar
candidates Example Input:

candidate_id skill

123 Python

123 Tableau

123 PostgreSQL

234 R

234 PowerBI

234 SQL Server

345 Python

345 Tableau
Example Output:
candidate_id

123

EXPLANATION
Candidate 123 is displayed because they have Python,
Tableau, and PostgreSQL skills. 345 isn't included in the
output because they're missing one of the required skills:
PostgreSQL.

The dataset you are querying against may have different


input & output - this is just an example!

MENTAL APPROACH
1. Look for all candiates those are having skills as Python,
Tableau and PostgreSQL
QUERY

QUERY EXPLANATION
1. Using SELECT we are getting candidate_id .
2.WHERE clause is used along with IN keyword so that we
get records for those candidate having skill in Python,
Tableau or PostgreSQL.
Instead of IN keyword we can use LIKE keyword as well.
Here, it will list all candidate those haing skill in either of
these 3 skills.
3.Using GROUP BY along with HAVING where we are giving
condition that COUNT should be 3.
These here means we want those candidate only which
have these 3 skills only.
OUTPUT

candidate_id

123

147
BY MANISH KUMAR CHAUDHARY

THANK YOU
“Wisdom is not a product of schooling but of the lifelong
attempt to acquire it.”
Albert Einstein
BY MANISH KUMAR CHAUDHARY

SQL
Interview Question

Q.Find the winning teams of DeepMind


employment competition.
(Question By Google)
PROBLEM STATEMENT:
Find the winning teams of DeepMind employment competition.
Output the team along with the average team score. Sort records
by the team score in descending order.
google_competition_participants google_competition_scores

FIELD TYPE FIELD TYPE

member_id int member_id int

team_id int member_score float


MENTAL APPROACH:
1. First combine both tables.
2. For each team add the scores scored by each member of that
team.
This will give us the total score for each team and we will be able
to identify which team has won.
3. Along with step 2 we will find the average marks score for each
team.
Query
QUERY EXPLANATION:
1. By taking use of CTE team_scores_cte we are calculating
both average scores by each team and also ranking them
on the basis of total scores made by each team.
So, it will give us the row number for each user on the basis of
decreasing the order of SUM of marks by each member for
each team.
2. With a SELECT statement, we obtain the required fields and
use a WHERE clause to filter records where rank equals 1.
This will give us the winning team along with the average
score by that team.
Here, we used rank because there might be teams that have
scored same scores.
Output

team_id avg_team_score

8 0.77
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Find out who got the most


votes and won the election.
(Asked by Google)
DIFFICULTY LEVEL :MEDIUM
Question From
PROBLEM STATEMENT
The election is conducted in a city and everyone can
vote for one or more candidates, or choose not to
vote at all. Each person has 1 vote so if they vote for
multiple candidates, their vote gets equally split
across these candidates. For example, if a person
votes for 2 candidates, these candidates receive an
equivalent of 0.5 vote each. All candidates can vote
too. Find out who got the most votes and won the
election. Output the name of the candidate or
multiple names in case of a tie. To avoid issues with a
floating-point error you can round the number of
votes received by a candidate to 3 decimal places.

voting_results
voter: varchar
candidate: varchar
MENTAL APPROACH
1. First, find for how many candidates the voter voted.
2. Now for each candidate count the number of votes
received by a candidate.
Here, if a voter voted for multiple candidates then votes
received by a candidate will be 1 divided by no of votes given
by that particular voter. This is done for each individual voter
and candidate.

There are many voters who didn't vote for anyone so we will
ignore those voters.
SAMPLE INPUT TABLE
voter candidate

Kathy

Charles Ryan

Charles Christine

Charles Kathy
QUERY
QUERY EXPLANATION
1. With CTE votes_cte , for each voter and for each
candidate we are getting votes received by that
canddiate.
Here, we have used CASE condition for COUNT(voter) with
PARTITION by voter because for each voter we are
counting number of votes and then finding the votes
recieved by the candidate using formula 1/no_of_votes.

SAMPLE OUTPUT FOR votes_cte


voter candidate votes

Alan Christine 1.000

Andrew Ryan 0.500

Andrew Christine 0.500

Anthony Paul 0.200

Anthony Anthony 0.200


QUERY EXPLANATION
2. With CTE rank_cte , we are simply adding votes recieved
by each candidate and then ranking them on the basis of
votes recieved by them..

SAMPLE OUTPUT FOR rank_cte

candidate no_of_votes rnk

Christine 5.283 1

Ryan 5.149 2

Nicole 2.700 3

Anthony 2.400 4

3. Using SELECT statement we are simply getting the


candidate that has ranked 1.
BY MANISH KUMAR CHAUDHARY

THANK YOU
Strive for progress, not perfection.
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Find the number of customers


without an order
(Asked by Amazon)

DIFFICULTY LEVEL :MEDIUM


Question From
PROBLEM STATEMENT
Find the number of customers without an order

orders
id: int
cust_id: int
order_date: datetime
order_details: varchar
total_order_cost: int

customers
id: int
first_name: varchar
last_name: varchar
city: varchar
address: varchar
phone_number: varchar
MENTAL APPROACH
1. Simply compare the table and find those customer
counts which are not present in orders table.

QUERY 1

QUERY2

OUTPUT
no_cust

9
BY MANISH KUMAR CHAUDHARY

THANK YOU
Strive for progress, not perfection.
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Write a query to find the winner in each


group
(Leetcode Hard Problem)

DIFFICULTY LEVEL :HARD


Question From
Ankit Bansal
Yotube Channel
PROBLEM STATEMENT
Write a query to find the winner in each group.
The winner in each group is the player who scored the
maximum total points within the group.
In case of tie the lowest player_id wins

players Table matches Table


FIELD TYPE FIELD TYPE

player_id int match_id int

group_id int first_player int

second_player int

first_score int

second_score int
players Input Table

player_id group_id

15 1

25 1

30 1

45 1

10 2

35 2

50 2

20 3

40 3
matches Table
second_playe
match_id first_player first_score second_score
r

1 15 45 3 0

2 30 25 1 2

3 30 15 2 0

4 40 20 5 2

5 35 50 1 1

MENTAL APPROACH
1.First we need to find the player_id, their score and group
they belongs to.
2.Now we will add the points scored by same players
toegther to get total points they have scored.
3.After this for each group we will find the maximum points
score by a particular player.
If maximum points scored by each player is same then we
will choose lowest player_id.
QUERY WRITTEN BY ME
QUERY EXPLANATION
1. We are using the CTE players_score_cte to get the
details of the first and second players in a single
column by using UNION ALL.
Here, we have joined the players' table with the
matches table and then used UNION ALL to get
details of both the first and second players.

2. Again we have used the second CTE ranking_cte


to rank the players on the basis of their total score.
Here, In the OVER clause, we have used PARTITION BY
group_id because for each group we need to find
the maximum points scored by a player. We have
used ORDER BY on the basis of SUM(scores) in
descending order and player_idin ascending order.
So, we will get rank 1 for players who scored the
maximum points from each group.

3.At last we are using SELECT query to fetch the


required records and filtering to get rank =1 . Thus, we
will get players who have scored maximum points
from each group_id.
QUERY WRITTEN BY Ankit Sir
Comparison Between Ankit Sir's query and
my query
1.I have joined the table players and matches within
CTE to get the required records. This led more cost in
query performance so it should be avoided I think.

OUTPUT

group_id player_id

1 15

2 35

3 40
BY MANISH KUMAR CHAUDHARY

THANK YOU
“Don’t wait. The time will never be just
right.”
Napoleon Hill
BY MANISH KUMAR CHAUDHARY

SQL
Interview Question

Q.Write a query to find total rides and profit


rides by each driver. (Question by Uber)

CREDIT TO: Ankit Bansal Sir


PROBLEM STATEMENT:
Write a query to find total rides and profit rides by each driver.

Profit ride is when the end location of the current ride is the same as
the start location of the next ride.

It is not necessary that the end time of the current ride should be
the same as the start time of the next ride to qualify as a profit ride.
drivers

FIELD TYPE

id varchar

start_time time

end_time time

start_loc varchar

end_loc varcahr
MENTAL APPROACH:
1. First we will simply count total rides taken by each drivers.
2. We will directly compare the end location and start location for
the consecutive travels made by particular driver.
3. Now if end location equals start location of next ride then we
will count it as profit ride.
Query
QUERY EXPLANATION:
1. First we are using CTE row_nummber_cte to give row_number to each row on
the basis of ascending order of start time for each driver.
This will help us to give row_number so that we can self-join on this basis and
compare the particular row with the next row.

2. Again we are using CTE profit_rides_flag to get flag the profitable rides as 1
and non-profitable rides as 0 by using the CASE WHEN statement.
Here, we have used r1.row_no+1=r2.row_no so that we can compare the two
consecutive rows.

3. We are using a simple SELECT statement to get the desired output and
GROUPING them on the basis of the driver.
In this we have used SUM() for counting the profit rides from above CTE.
Output

id total_rides profit_rides

dri_1 5 1

dri_2 2 0
By Manish Kumar Chaudhary

THANK YOU

“You must either modify your dreams or magnify your skills.”


Jim Rohn
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Q.First and Last Day


PROBLEM STATEMENT
What percentage of transactions happened on first and last day
of the promotion. Segment results per promotion. Output
promotion id, percentage of transactions on the first day and
percentage of transactions on the last day.

facebook_sales_promotions facebook_sales

FIELD TYPE FIELD TYPE

promotion_id int product_id int

start_date datetime promotion_id int

end_date datetime cost_in_dollars int

media_type varchar customer_id int

cost int date datetime

units_sold int
MENTAL APPROACH

1. First combine both table on the basis of promotion id.

2. For each promotion id count the number of transaction made for


first day.

And we can compare start day of each promotion to the date


(which is day on which promotion was done).

3. Similarly follow same procedure for last day.

4.Now count the total number of transaction for each promotion.

This is to be calculated as we want % of transactions for first and


last day.

5.Using formula to calculate the % transaction on first and last day


we calculate for distinct promotions.

Formula % transaction = no of tranasaction on first or last day


*100%/total number of transaction (this is done for each different
promotion)
QUERY

QUERY EXPPANATION
1.With CTE flag_cte we are flagging the first and last day transaction
as 1 for each different promotions. Here we are also counting the
total number of transactions for each promotion.

2.Using SELECT query we are getting required records along with


percentage of transaction performed on first and last day of
promotion.
OUTPUT

first_day_transact last_day_transac
promotion_id
ion_perc tion_perc

1 16.667 33.333

2 70 30

3 16.667 0

4 0 0

INSIGHTS BASED ON OUTPUT


1 .We can see that for prmotion id 1 16.67% of transaction happens on
first day ,33.33% on last day and rest on other days.

2. For promotion id 2 all transactions are happening on first and last


day only. So, here we should improve for other days.

3. For promotion id 4 there has been no transaction on first and last


day of promotion
BY MANISH KUMAR CHAUDHARY

THANK YOU
You don't learn to walk by following rules. You learn by
doing, and by falling over.
Richard Branson
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Find the number of unique product


combinations that are purchased in the
same transaction.
(Asked by Walmart)

DIFFICULTY LEVEL :MEDIUM


Question From
PROBLEM STATEMENT
Assume you are given the following tables on
Walmart transactions and products. Find the number
of unique product combinations that are purchased
in the same transaction.
For example, if there are 2 transactions where apples
and bananas are bought, and another transaction
where bananas and soy milk are bought, my output
would be 2 to represent the 2 unique combinations.
Assumptions:
For each transaction, a maximum of 2 products is
purchased.
You may or may not need to use the products
table.
transactions Table:
Column Name Type

transaction_id integer

product_id integer

user_id integer

transaction_dat
datetime
e

transactions Example Input:


transaction_id product_id user_id transaction_date

231574 111 234 03/01/2022 12:00:00

231574 444 234 03/01/2022 12:00:00

231574 222 234 03/01/2022 12:00:00

137124 111 125 03/05/2022 12:00:00

137124 444 125 03/05/2022 12:00:00


products Table:
Column Name Type

product_id integer

product_name string

products Example Input:

product_id product_name

111 apple

222 soy milk

333 instant oatmeal

444 banana

555 chia seed


products Table:
Column Name Type

product_id integer

product_name string

products Example Input:

product_id product_name

111 apple

222 soy milk

333 instant oatmeal

444 banana

555 chia seed


MENTAL APPROACH
1.First for all each transaction we need to find the
combination of product (combination of 2 only).
2.Now we will find the unique combination of products
that was bought all together.

Query To Find combination of products


EXPLANATION
We are doing self-join on the basis of transaction id and
also on the basis of product id from the first table is greater
than the product id of the second table. If the product id
from the first table is greater than that of the second table
then we will get the record.
By doing this we will get two columns for the product and
each will have a unique product id. Thus, from here we can
get a unique combination of products (two products for
each transactio)
OUTPUT For this
ransaction_id product_id product_id

137124 444 111

231574 222 111

231574 444 222

231574 444 111

256234 333 222

523152 444 222


FINAL QUERY

EXPLANATION
In this query we have concat the both product columns and
counting the distinct combination.
FINAL OUTPUT

unique_pairs

4
BY MANISH KUMAR CHAUDHARY

THANK YOU
"Always do your best. What you plant
now, you will harvest later."
Og Mandino
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Write a query to compare the


viewership on laptops versus
mobile devices.
(Asked by New York Times)

DIFFICULTY LEVEL :EASY


Question From
PROBLEM STATEMENT
Assume that you are given the table below
containing information on viewership by device type
(where the three types are laptop, tablet, and phone).
Define “mobile” as the sum of tablet and phone
viewership numbers. Write a query to compare the
viewership on laptops versus mobile devices.
Output the total viewership for laptop and mobile
devices in the format of "laptop_views" and
"mobile_views".
viewership Table:

Column Name Type

user_id integer

device_type string ('laptop', 'tablet', 'phone')

view_time timestamp
viewership Example Input:
user_id device_type view_time

123 tablet 01/02/2022 00:00:00

125 laptop 01/07/2022 00:00:00

128 laptop 02/09/2022 00:00:00

129 phone 02/09/2022 00:00:00

145 tablet 02/24/2022 00:00:00

Example Output:

laptop_views mobile_views

2 3
MENTAL APPROACH
1. Simply counting the number of time each device type
used
2. If it is phone and tablet then count them as mobile views
and for laptop as laptop view.

QUERY

QUERY EXPLANATION
1. We are using CASE statement to flag the parts as 1 if it is
phone and tablet and then summing them up using
SUM function.
2. Similarly we are doing same for the laptop.
OUTPUT WITHOUT SUM

laptop_views mobile_views

0 1

1 0

1 0

0 1

0 1

OUTPUT WITH SUM

mobile_views laptop_views

3 2
BY MANISH KUMAR CHAUDHARY

THANK YOU
“If there is no struggle, there is no
progress.”
Frederick Douglass
BY MANISH KUMAR CHAUDHARY

Interview Question
SQL

Write a query to find for each seller, whether


the brand of the second item (by date) they
sold is their favorite brand?
(Leetcode Hard Problem)

DIFFICULTY LEVEL :HARD


Question From
Ankit Bansal
Yotube Channel
PROBLEM STATEMENT
MARKET ANALYSIS:
Write a query to find for each seller, whether the
brand of the second item (by date) they sold is their
favorite brand.
If a seller sold less than two items, report the answer
for that seller as no o/p

seller_id 2nd_item_fav_brand

1 yes/no

2 yes/no

users Table
FIELD TYPE

user_id int

join_date date

favorite_brand varchar
users Input Table
user_id join_date favorite_brand

1 2019-01-01 Lenovo

2 2019-02-09 Samsung

3 2019-01-19 LG

4 2019-05-21 HP

orders Table
FIELD TYPE

order_id int

order_date date

item_id int

buyer_id int

seller_id int
orders Input Table
order_id order_date item_id buyer_id seller_id

1 2019-08-01 4 1 2

2 2019-08-02 2 1 3

3 2019-08-03 3 2 3

4 2019-08-04 1 4 2

5 2019-08-04 1 3 4

6 2019-08-05 2 2 4

items Table

FIELD TYPE

item_id int

item_brand varchar
items Input Table
item_id item_brand

1 Samsung

2 Lenovo

3 LG

4 HP
MENTAL APPROACH
1.First from the users table take all the unique users id
(as they represent both seller and buyer).
We are taking user id because for each seller we want
the corresponding output.

2.Now for each seller find the second item sold by


them.

3.After this we will add a column of favorite brand


next to item sold so that we can compare.

4. No for each seller if the favorite brand and item is


matching then it have yes for 2nd_item_fav_brand
and else no for less than one product sold or favorite
brand is not same as item sold.
QUERY WRITTEN BY ME
QUERY EXPLANATION
1. We are using the CTE data_cte for joining all the
tables to get the required records.

Here, we have used LEFT JOIN starting from the


users table because we want all user id to be in
our final output.

In data_cte we have also used row_number to


give row numbers to all the records based on the
descending order of order date for each seller.

2.At last we are using the SELECT query to fetch the


required records and we are also using the CASE
statement to check for the particular condition.

Here, In the first when statement we will get 'no' for


those sellers who have either sold less than 2
items or if they have sold more than 2 but whose
favorite brand is not the same as the item brand.
QUERY EXPLANATION
In the second when statement we will get 'yes' for
those sellers who have sold more than 2 items and
their favorite brand is same as item brand.

Here, I have filtered for row_no =1 because it will give


deatils for all sellers who have either sold 1 item or
who sold more than 2 item. (because I have order on
descending order of order_date)
CTE OUTPUT
favorite_bra
seller_id item_id order_date row_no item_brand
nd

1 NULL NULL 1 NULL Lenovo

2 1 2019-08-04 1 Samsung Samsung

2 4 2019-08-01 2 HP Samsung

3 3 2019-08-03 1 LG LG

3 2 2019-08-02 2 Lenovo LG

4 2 2019-08-05 1 Lenovo HP

4 1 2019-08-04 2 Samsung HP
QUERY WRITTEN BY Ankit Sir
Comparison Between Ankit Sir's query and
my query
I have joined the tables first and gave row number on
the basis of descending order of order date.

I could also have used CASE Statement in CTE only.

OUTPUT
seller_id second_item_fav_brand

1 no

2 yes

3 yes

4 no
BY MANISH KUMAR CHAUDHARY

THANK YOU
"Life is 10% what happens to you and
90% how you react to it."
Charles R. Swindoll
DO YOU KNOW?
How to find second highest
salary?

Interview Question

SQL
BY MANISH KUMAR CHAUDHARY
I am sharing few of methods that I know how to
get second highest salary from a table.
1.
2.
3.
4.
5.
THANK YOU
“Success is liking yourself, liking what you do, and liking
how you do it.”
Maya Angelou
BY MANISH KUMAR CHAUDHARY

SQL
Interview Question

MOST COMMON INTERVIEW QUESTIONS


What is Data?
Data is a collection of raw, unorganised facts and details such as text, observations,
figures, symbols, and object descriptions, among other things. In other words, data
has no special function and is meaningless on its own.

e.g: A table containing student details and marks.

What is Information?
Information is processed, organised and structured data. It provides context for data
and enables decision making. The information we collect depends on the data we
have. Information alone is capable of providing the insights.

e.g: Average marks scored by students.


What is DBMS and RDBMS?

DBMS: Database Management Systems (DBMS) are software systems that store,
retrieve, and execute data queries. A database management system (DBMS) acts as
a bridge between an end-user and a database, allowing users to create, read, edit,
and remove data in the database.
It also assists users in efficiently maintaining data and retrieving it as needed with
minimal effort.

RDBMS: RDBMS is an abbreviation for Relational Database Management Systems


which stores the data in the form of rows and columns in table format. It is used for
storing the structured data.
SQL Server, Oracle, MySQL, MariaDB, and SQLite are examples of RDBMS systems.
Difference between DBMS and RDBMS.

DBMS RDBMS

DBMS stores data as file. RDBMS stores data in tabular form.

Data elements need to access Multiple data elements can be accessed at the
individually. same time.

Data is stored in the form of tables which are


No relationship between data.
related to each other.

Normalization is not present. Normalization is present.

DBMS does not support distributed


RDBMS supports distributed database.
database.
How many types of Joins are there in SQL?
There are four types join mainly:
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from
the right table
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from
the left table
FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table
BY MANISH KUMAR CHAUDHARY

THANK YOU
“It is what we know already that often prevents us from learning.”
Claude Bernard.
BY MANISH KUMAR CHAUDHARY

SQL
Interview Question

MOST COMMON INTERVIEW QUESTIONS


Part -2
If there are two tables A,B. What will be the result if join them using
Inner Join and Left Join

A B

1 1

1 1

1 1

1
Inner Join

A B

1 1

1 1

1 1

1
Output for Inner Join & Left Join
1

1
What is order of wrtiting the query in SQL?

1. SELECT (DISTINCT can be used if required) col_names


, col with window function (if required)
2. FROM table_name
3. JOIN table_name
4. ON condition for joins
3. WHERE condition for filtering
4. GROUP BY col_names
5. HAVING condition on aggregate for filtering
6. ORDER BY col_names
7. LIMIT/TOP to get only required number of records
What is order of execution of query in SQL?

1. FROM and JOIN


2. WHERE condition
3. GROUP BY
4. HAVING condition
5. Window Function
6. SELECT
7. DISTINCT
8. ORDER BY
9. LIMIT/TOP
UNION VS UNION ALL

Both UNION and UNION ALL are used to combine the result set of two different SELECT
queries.

UNION filters out the duplicate records which are present in the output of both SELECT
queries. Whereas UNION ALL keeps duplicate records as well.

Conditions for using UNION and UNION ALL operators:


1. There must be the same number of columns retrieved in each SELECT statement to
be combined.
2. The columns retrieved must be in the same order in each SELECT statement.
3. The columns retrieved must be of similar data types.
BY MANISH KUMAR CHAUDHARY

THANK YOU
“Learning is not attained by chance, it must be sought for with
ardor and attended to with diligence.”
Abigail Adams

You might also like