100% found this document useful (1 vote)

123 views13 pages

SQL for Aspiring Data Scientists

SQL is an important skill for data scientists to extract and prepare data from multiple sources for machine learning models. Some key reasons include: (1) In industry, datasets often need to be prepared from multiple tables using SQL queries involving joins, aggregations, etc. (2) As a machine learning engineer experiments with different features, SQL is useful to try new feature extractions. (3) SQL is needed for general analytics on big data beyond the limitations of tools like Excel. The document then provides examples of SQL case studies and questions often asked in interviews.

Uploaded by

Himanshu Patidar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

123 views13 pages

SQL for Aspiring Data Scientists

Uploaded by

Himanshu Patidar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Data Science

Part 2 - SQL

CANTILEVER LABS IS OFFICIAL TRAINING PARTNER OF

IIT BOMBAY | IIT MADRAS | IIT KHARGPUR | IIT HYDERABAD

BITS PILANI | BITS HYDERABAD | NIT ROURKELA | SYMBIOSIS

JNTU HYDERABAD | SREENIDHI | MAHINDRA UNIVERSITY | GITAM srm

chennai | gnits | cmr-cet | GEETHANJALI | CHITKARA

& many more

Index
Part 2/4 - SQL
Part 1 - Python Part 3 - P&S Part 4 - Deep into Data Science

01
Lets understand first, As a Data scientist why do we need SQL?

02
Case study 1: (PAYPAL interview)

03
Case study 3 : Joins

04
Case study 4 : Analyzing telecom data

05
GENERAL SQL QUESTIONS
Lets understand first, As a Data scientist why do we need SQL?

When we learn machine learning academically, we use datasets from Kaggle or other
such websites. Many of those datasets are readymade (we directly get a csv file with x
rows and y columns). In industrial setting, we have to prepare datasets from multiple
data sources(tables), build hypothesis and test them. There can be multiple SQL queries
running at backend to prepare just one column(feature) in your dataset, which may
involve aggregation, ordering, windowing, joins and many such SQL operations.

As machine learning models performance majorly depends on quality features its been
trained on, So while project is in development phase, we have to do lot of
experimentation with hyperparameters and quality features, for which we have to try
new set of features for improvements, that’s where at least basic understanding of SQL
comes handy. Later when pilots are successful, these data extraction pipelines will be
automated by data engineers where you need advanced knowledge to optimize
workflows.

Also for general purpose analytics to gain more insights from data we need SQL as
Excel has its limitations when it comes to big data.

Some examples of features that you will be using as input dataset of your machine
learning model in various industries are,

Telecom : How many times customer did recharge after expiry of his prepaid plan, Avg
of last 3 recharges MRP

Finance : Sum/avg of top 3 high value transactions of customer, days passed since
recent transaction, creditworthiness

Manufacturing : Number of times maintenance activity performed, days between each

maintenance or breakdowns of machine

Ecommerce : Tag customers who did > 50$ purchase in their 2nd transaction (high
value repeating customers)

In following sections you will find 2 types of question sets, one will be case study
based, which are mostly asked in interviews, second will be fundamental questions. At
the end we also have interview checklist for SQL and some useful links to learn and
practice sql.

01 Data Science | Part 2 - SQL

Case study 1: (PAYPAL interview)

Table 1 (daily transaction data) columns : Pymt_ID, Pymt_Date, Sndr_ID, Rcvr_ID, Amt

Table 2 columns : Rcvr_ID, Rcvr_name, Rcvr_Industry

Table 3 columns : Sndr_id, Sndr_name, Sndr_age

Q. Which industry has 3rd highest total receiving amount.

Case study 2 : Window functions

Table Name: Employee_MST (keeps record of active employee salary and dept)

Table Name: Employee_DTL (keeps record of all employees associated with company)

Q . Refer Above tables and Write a Query which gives below output,

02 Data Science | Part 2 - SQL

A. Output table has only employee names which have joined recently. Here concept
used is first get department wise ranking using window function with descending order
of dates and use that table with alias and then get the data which has recent date rank
(row_number) to get only recently joined employee

Case study 3 : Joins

Employee_name

Employee_dtl

Q. Write a Query which gives below Output.

03 Data Science | Part 2 - SQL

Case study 4 : Analyzing telecom data

Table : 1 year data of recharges done by subscribers

Q1. HOW MANY TOTAL RECHARGES EACH SUBSCRIBER HAS DONE IN JUNE MONTH

Q2. WHICH RECHARGE PLAN MRP IS SUBSCRIBED MOST

Q3. EXTRACT CUSTOMERS WHO HAVE DONE MORE THAN 15 RECHARGES

04 Data Science | Part 2 - SQL

Q4. GET TOTAL, AVG AND MAXIMUM OF RECENT 3 RECHARGE AMOUNT OF
SUBSCRIBER

Q5. THERE ARE HOW MANY SUCH CUSTOMERS IN SYSTEM, WHO HAVE NOT DONE
ANY RECHARGE FOR LAST 35 DAYS

Q6. GET RECENT RECHARGE OF SUBSCRIBERS

GENERAL SQL QUESTIONS

Q1) What is the difference between ISNULL and COALESCE?

ISNULL is used when we want null values as imputed by our specified value in final table.

COALESCE returns first non null entry

Q2. What are different SQL commands : ( As a data scientist we majorly deal with
DDL, DML,DQL )

05 Data Science | Part 2 - SQL

Q3 . Types of joins in SQL :

Q4 . Data types in SQL :

Q5. What is the difference between Delete, Truncate and Drop ?

Delete : We can delete all rows or targeted rows based on condition

Truncate : We can delete all rows from table at once

Drop : We can delete entire table from database

06 Data Science | Part 2 - SQL

Q6. How is “PARTITION BY” different from “GROUP BY”?
PARTITION BY gives aggregated columns with each record in the specified table. If we have 15 records in
the table, the query output SQL PARTITION BY also gets 15 rows. On the other hand, GROUP BY gives
one row per group in result set.

E.g. : Suppose we have below table of student heights in class A and B,

We want to know avg. height of students from class A and B,

Group by clause will give below output

But, Now if I want to see each students height compared to their class avg. height, we will use partition
by clause as below.

Output :

Now its more informative for me to see each student height as well as class avg.

Q7. What is order of each SQL clause

07 Data Science | Part 2 - SQL

Q8. What is the difference between RANK() ,ROW_NUMBER() and DENSE_RANK() ?
Rank() : it is used in window function, it ranks the data as per order given in window. It skips the ranking
if it finds similar record for that window

Dense_rank() : it works in similar way as of rank(), but it does not skip ranking if it finds duplicate in
window

Row_num() : it returns simply row number of record in window function.

Q9.
Grouping Data and Using Aggregate Functions

Ordering Data Results

Selecting Data from Multiple Tables ( Joins )

Q10. Different types of aggregate functions

COUNT()
SUM()
MIN()
MAX()
AVG()
STDEV()
VAR()

08 Data Science | Part 2 - SQL

Q11. What are Constraints in SQL?
NOT NULL - Restricts NULL value from being inserted into a column.
CHECK - Verifies that all values in a field satisfy a condition.
DEFAULT - Automatically assigns a default value if no value has been specified for the field.
UNIQUE - Ensures unique values to be inserted into the field.
INDEX - Indexes a field providing faster retrieval of records.
PRIMARY KEY - Uniquely identifies each record in a table.
FOREIGN KEY - Ensures referential integrity for a record in another table.

Q12 . What are ACID properties?

Atomicity: This property ensures that the transaction is completed in all-or-nothing way.
Consistency: This ensures that updates made to the database is valid and follows rules and
restrictions.
Isolation: This property ensures integrity of transaction that are visible to all other transactions.
Durability: This property ensures that the committed transactions are stored permanently in the
database

Q13 . How to find the 5th highest salary in SQL?

Q14. What is cte in SQL

CTEs are Common Table Expressions that are used to create temporary result tables from which data
can be retrieved/ used.

Interview checklist for SQL :

Before interview, you should have at least solved problems that contain following SQL
clauses.

Group by, Order by, having, window functions, is null, rank, dense_rank, row_number,
min, max, avg, stdev, count, all types of joins, like, wildcards,.

09 Data Science | Part 2 - SQL

Useful links :

https://www.w3schools.com/sql/

https://www.codecademy.com/courses/learn-sql/lessons/aggregate-functions/exercises/
intro\
https://www.hackerrank.com/domains/sql\

09 Data Science | Part 2 - SQL

Part 1/4 - Python

Part 2 - SQL

Next Part 3 - P&S

Part 4 - Deep into Data Science

@cantilever_labs

@cantilever labs

www.cantileverlabs.com

Data Science | Part 1 - Python

SQL PDF
No ratings yet
SQL PDF
28 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
SQL For Data Analysis PDF
100% (1)
SQL For Data Analysis PDF
10 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
27 pages
SQL Tutorial
No ratings yet
SQL Tutorial
787 pages
SQL Mastery for Job Seekers
No ratings yet
SQL Mastery for Job Seekers
28 pages
Real Data Analyst Interview Questions Answers
No ratings yet
Real Data Analyst Interview Questions Answers
15 pages
Database Testing
No ratings yet
Database Testing
52 pages
SQL Combined
No ratings yet
SQL Combined
24 pages
SQL Doc
No ratings yet
SQL Doc
39 pages
Crack Your Data Engineering SQL Round
No ratings yet
Crack Your Data Engineering SQL Round
112 pages
SQL Interview Questions Guide
No ratings yet
SQL Interview Questions Guide
11 pages
SQL Basics 1752319177
No ratings yet
SQL Basics 1752319177
37 pages
Module-Ii 2
No ratings yet
Module-Ii 2
99 pages
Questions For Preparation
No ratings yet
Questions For Preparation
9 pages
A Complete Data Science Interview With 100 Questions
100% (1)
A Complete Data Science Interview With 100 Questions
57 pages
1 Complete SQL For Data Science Cheatsheet
No ratings yet
1 Complete SQL For Data Science Cheatsheet
3 pages
Ultimate SQL Interview Question Bank
No ratings yet
Ultimate SQL Interview Question Bank
4 pages
SQL For Everyone
No ratings yet
SQL For Everyone
11 pages
SQL For Data Analysis Cheat Sheet-By Srija Biswas
No ratings yet
SQL For Data Analysis Cheat Sheet-By Srija Biswas
22 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
4 pages
SQL 1721960421
No ratings yet
SQL 1721960421
131 pages
SQL Workshop
No ratings yet
SQL Workshop
22 pages
SQL Master
No ratings yet
SQL Master
10 pages
SQL Interview
No ratings yet
SQL Interview
6 pages
3 Notes of 3 Unit
No ratings yet
3 Notes of 3 Unit
36 pages
Exclusive SQL Tutorial On Data Analysis in R
No ratings yet
Exclusive SQL Tutorial On Data Analysis in R
9 pages
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
SQL Topics - Aasif Codes
No ratings yet
SQL Topics - Aasif Codes
3 pages
SQL For Everyone (Definitive Guide)
No ratings yet
SQL For Everyone (Definitive Guide)
10 pages
SQL
No ratings yet
SQL
12 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
Top SQL Interview Questions 2024
No ratings yet
Top SQL Interview Questions 2024
20 pages
SQL Interview Questions & Answers
No ratings yet
SQL Interview Questions & Answers
6 pages
KPMG Data Analyst Interview Questions
No ratings yet
KPMG Data Analyst Interview Questions
30 pages
The Most Commonly Used SQL Queries
No ratings yet
The Most Commonly Used SQL Queries
29 pages
SQL Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
No ratings yet
SQL Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
14 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
21 pages
SQL
No ratings yet
SQL
9 pages
Advanced Concepts in SQL
No ratings yet
Advanced Concepts in SQL
5 pages
Complete SQL Overview
No ratings yet
Complete SQL Overview
4 pages
10 SQL Interview Questions To Prepare As A Data Analyst
No ratings yet
10 SQL Interview Questions To Prepare As A Data Analyst
13 pages
SQL Interview Questions and Answers
No ratings yet
SQL Interview Questions and Answers
34 pages
SQL Questions
No ratings yet
SQL Questions
14 pages
Aaaaaa
No ratings yet
Aaaaaa
15 pages
SQL Tutorial For Beginners
No ratings yet
SQL Tutorial For Beginners
10 pages
SQL Tutorial On Data Analysis in R
No ratings yet
SQL Tutorial On Data Analysis in R
5 pages
Real Data Analyst Interview Questions Detailed
No ratings yet
Real Data Analyst Interview Questions Detailed
14 pages
SQL Problems
No ratings yet
SQL Problems
18 pages
Learn
No ratings yet
Learn
31 pages
70+ SQL Interview Questions
No ratings yet
70+ SQL Interview Questions
19 pages
Real DSA and SQL Interview Questions Solutions
No ratings yet
Real DSA and SQL Interview Questions Solutions
15 pages
Data Analytics - Advanced
No ratings yet
Data Analytics - Advanced
62 pages
Module 2
No ratings yet
Module 2
85 pages
SQL Interview Guide For Three Previous Posts
No ratings yet
SQL Interview Guide For Three Previous Posts
15 pages
Top Advanced SQL Interview Questions & Answers
No ratings yet
Top Advanced SQL Interview Questions & Answers
6 pages
SQL Guide for Data Analysts
No ratings yet
SQL Guide for Data Analysts
11 pages
DBMS Lab 2-2
No ratings yet
DBMS Lab 2-2
42 pages
100 SQL Questions With Real Examples-2
No ratings yet
100 SQL Questions With Real Examples-2
16 pages
Learning SQLite For iOS - Sample Chapter
No ratings yet
Learning SQLite For iOS - Sample Chapter
25 pages
Electronic Embedded Systems: COURSE OUTLINE 2019 - 2021
No ratings yet
Electronic Embedded Systems: COURSE OUTLINE 2019 - 2021
40 pages
SQL Practice for Database Students
No ratings yet
SQL Practice for Database Students
5 pages
Top 30 Database Administrator Interview Questions For 2024 - Datacamp
No ratings yet
Top 30 Database Administrator Interview Questions For 2024 - Datacamp
28 pages
Hekaton
No ratings yet
Hekaton
118 pages
Immigration Clearance
60% (5)
Immigration Clearance
92 pages
Final Sem 1 v1 PLSQL
No ratings yet
Final Sem 1 v1 PLSQL
24 pages
Ebenezer Sathe: Profile
No ratings yet
Ebenezer Sathe: Profile
2 pages
Connecting A S7-1500 To A SQL Database
No ratings yet
Connecting A S7-1500 To A SQL Database
27 pages
Game Modding Configuration
No ratings yet
Game Modding Configuration
47 pages
Constraints and Triggers: Deferring Constraint Checking
No ratings yet
Constraints and Triggers: Deferring Constraint Checking
82 pages
Odd Sem Time Table (24-25)
No ratings yet
Odd Sem Time Table (24-25)
7 pages
Large Language Model Enhanced Text-to-SQL Generation - A Survey
No ratings yet
Large Language Model Enhanced Text-to-SQL Generation - A Survey
18 pages
Database Security Features Guide
No ratings yet
Database Security Features Guide
6 pages
Asss
No ratings yet
Asss
11 pages
DBMS Notes Class 10
No ratings yet
DBMS Notes Class 10
9 pages
PL/SQL Procedure Guide
No ratings yet
PL/SQL Procedure Guide
3 pages
DBMS Unit-1
No ratings yet
DBMS Unit-1
17 pages
Introduction To Database
No ratings yet
Introduction To Database
57 pages
Integrating Database With Web Site
No ratings yet
Integrating Database With Web Site
8 pages
SSIS Package Configurations
No ratings yet
SSIS Package Configurations
20 pages
SQL Server Student Guide-2
No ratings yet
SQL Server Student Guide-2
112 pages
CV - Praveen Tiwari 9 YEX
No ratings yet
CV - Praveen Tiwari 9 YEX
4 pages
DBMS Week - 7 (1) (1) 1
No ratings yet
DBMS Week - 7 (1) (1) 1
3 pages
Answers of Adbms
No ratings yet
Answers of Adbms
48 pages
Mastering MySQL A Comprehensive Guide
No ratings yet
Mastering MySQL A Comprehensive Guide
10 pages
Power BI Certificate 1-12
No ratings yet
Power BI Certificate 1-12
199 pages
DBMS Lab Manual for Students
100% (1)
DBMS Lab Manual for Students
94 pages
MS SQL Server Management MS SQL Server Management MS SQL Server Management Studio
No ratings yet
MS SQL Server Management MS SQL Server Management MS SQL Server Management Studio
61 pages
2nd Year NEP Syllabus
No ratings yet
2nd Year NEP Syllabus
30 pages

SQL for Aspiring Data Scientists

Uploaded by

SQL for Aspiring Data Scientists

Uploaded by

Data Science

CANTILEVER LABS IS OFFICIAL TRAINING PARTNER OF

IIT BOMBAY | IIT MADRAS | IIT KHARGPUR | IIT HYDERABAD

BITS PILANI | BITS HYDERABAD | NIT ROURKELA | SYMBIOSIS

JNTU HYDERABAD | SREENIDHI | MAHINDRA UNIVERSITY | GITAM srm

& many more

Manufacturing : Number of times maintenance activity performed, days between each

01 Data Science | Part 2 - SQL

Table 2 columns : Rcvr_ID, Rcvr_name, Rcvr_Industry

Table 3 columns : Sndr_id, Sndr_name, Sndr_age

Q. Which industry has 3rd highest total receiving amount.

Case study 2 : Window functions

02 Data Science | Part 2 - SQL

Case study 3 : Joins

Q. Write a Query which gives below Output.

03 Data Science | Part 2 - SQL

Case study 4 : Analyzing telecom data

Table : 1 year data of recharges done by subscribers

Q2. WHICH RECHARGE PLAN MRP IS SUBSCRIBED MOST

Q3. EXTRACT CUSTOMERS WHO HAVE DONE MORE THAN 15 RECHARGES

04 Data Science | Part 2 - SQL

Q6. GET RECENT RECHARGE OF SUBSCRIBERS

GENERAL SQL QUESTIONS

COALESCE returns first non null entry

05 Data Science | Part 2 - SQL

Q4 . Data types in SQL :

Q5. What is the difference between Delete, Truncate and Drop ?

Truncate : We can delete all rows from table at once

Drop : We can delete entire table from database

06 Data Science | Part 2 - SQL

E.g. : Suppose we have below table of student heights in class A and B,

We want to know avg. height of students from class A and B,

Group by clause will give below output

Q7. What is order of each SQL clause

07 Data Science | Part 2 - SQL

Row_num() : it returns simply row number of record in window function.

Ordering Data Results

Selecting Data from Multiple Tables ( Joins )

Q10. Different types of aggregate functions

08 Data Science | Part 2 - SQL

Q12 . What are ACID properties?

Q13 . How to find the 5th highest salary in SQL?

Q14. What is cte in SQL

Interview checklist for SQL :

09 Data Science | Part 2 - SQL

09 Data Science | Part 2 - SQL

Next Part 3 - P&S

Data Science | Part 1 - Python

You might also like