0% found this document useful (0 votes)

50 views15 pages

Real Data Analyst Interview Questions Answers

Interview questions

Uploaded by

sumitkumarbarnwal79

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views15 pages

Real Data Analyst Interview Questions Answers

Interview questions

Uploaded by

sumitkumarbarnwal79

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Data Analyst Interview Questions

(0-3 Years)
5-25 LPA
Data Analyst Questions
1. Write a query to find duplicate rows in a table.
To detect duplicates, identify columns that should be unique and group by them.
Example:
SELECT column1, column2, COUNT(*) AS count
FROM your_table
GROUP BY column1, column2
HAVING COUNT(*) > 1;
Explanation:
• GROUP BY combines rows with the same values in the specified columns.
• HAVING COUNT(*) > 1 filters those combinations that occur more than once, indicating
duplicates.
Tip: Add ROW_NUMBER() or RANK() with CTE to highlight or delete duplicates if needed.

2. Replace missing values with column mean

import pandas as pd

df['Age'].fillna(df['Age'].mean(), inplace=True);

Explanation:

• fillna() fills NaN values.

• Using mean() ensures no loss of data while maintaining column distribution.\

Tip: For categorical columns, replace with mode (df['col'].mode()[0]). For ML tasks, consider
sklearn.impute.SimpleImputer.
3. Difference between mean, median, and mode
Mean: Average value.
Median: Middle value when sorted.
Mode: Most frequent value.

Explanation:
Mean is sensitive to outliers.
Median is robust for skewed distributions.
Mode is useful for categorical data.

Tip: In a right-skewed distribution (e.g., income), mean > median > mode.

4. Difference between calculated column and measure.

• Calculated Column: Stored in the model, computed row by row (increases data size).
• Measure: Calculated on the fly, based on filters (lighter, more efficient).

Explanation:
• Calculated columns are like adding a new field in the dataset.
• Measures are dynamic, designed for aggregation in visuals.

Tip: Prefer measures for performance; use calculated columns only if you need
persistent row-level values.

5. Write a query to find employees earning more than their

managers.
Assume the table employees has:
emp_id, name, salary, manager_id
SELECT e.name AS employee_name, e.salary, m.name AS manager_name, m.salary AS
manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.emp_id
WHERE e.salary > m.salary;
Explanation:
• Self-join: matches employees (e) with their managers (m).
• Filters those where employee's salary > manager's salary.

6. Get top 3 highest-paid employees per department

(medium).
SELECT department, employee, salary
FROM (
SELECT department, employee, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) rn
FROM employees
)t
WHERE rn <= 3;

Explanation

ROW_NUMBER() ranks employees within each department.

Filtering with rn <= 3 gives top 3.

Tip: Use RANK() instead of ROW_NUMBER() if you want to include ties.

7. Write a query to fetch the top 3 performing products based

on sales.
Assume table sales_data has:
product_id, product_name, total_sales
SELECT product_id, product_name, total_sales
FROM sales_data
ORDER BY total_sales DESC
LIMIT 3;
Alternate using RANK() (if ties matter):

SELECT product_id, product_name, total_sales

FROM (
SELECT *, RANK() OVER (ORDER BY total_sales DESC) AS rank_num
FROM sales_data
) ranked_sales
WHERE rank_num <= 3;
8. Explain the difference between UNION and UNION ALL.
Feature UNION UNION ALL
Duplicates Removes duplicates Keeps all rows, including duplicates
Performance Slower (because of sorting) Faster (no de-duplication)
Use case When you want distinct rows When duplicates are meaningful

Example:
SELECT city FROM customers
UNION
SELECT city FROM vendors;
→ Returns a unique list of cities.
SELECT city FROM customers
UNION ALL
SELECT city FROM vendors;
→ Returns all cities, including duplicates.
9. Convert categorical variable into dummy variables

df_encoded = pd.get_dummies(df, columns=['Gender', 'City'], drop_first=True)

Explanation:
• get_dummies() converts categories into binary columns (one-hot encoding).
• drop_first=True avoids dummy variable trap (perfect collinearity).

Tip: For ML, prefer sklearn.preprocessing.OneHotEncoder for pipeline compatibility.

10. Explain p-value in hypothesis testing

The p-value is the probability of observing the sample result (or more extreme) if the null hypothesis
is true.
Explanation:
SELECT order_date, product_id, sales_amount,
Small p-value (<0.05) → reject null (evidence against H₀).
Large p-value → fail to reject null.

Tip: p-value ≠ probability that H₀ is true. It measures consistency of data with H₀.
11. What is a CTE (Common Table Expression), and how is it
used?
Definition:
A CTE (Common Table Expression) is a temporary, named result set that you can
reference within a SQL query.
It improves readability and simplifies complex subqueries or recursive logic.
Syntax:

WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;

Example – Filter top-paid employees using CTE:

WITH HighEarners AS (
SELECT emp_id, name, salary
FROM employees
WHERE salary > 100000
)
SELECT * FROM HighEarners;
Benefits:
• Reusable and readable
• Allows recursion (e.g., hierarchical data)
•
Avoids repeating subqueries

12. Write a query to identify customers who have made

transactions above $5,000 multiple times.
Assume transactions table has:
customer_id, transaction_amount
SELECT customer_id, COUNT(*) AS high_value_txns
FROM transactions
WHERE transaction_amount > 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;
Explanation:
• Filters high-value transactions (> $5000).
• Groups them by customer.
• Returns customers who’ve done this more than once.

13. Explain the difference between DELETE and TRUNCATE

commands.
Feature DELETE TRUNCATE
Removes rows Yes (can use WHERE condition) Yes (removes all rows)
WHERE
Yes No
supported?
Logging Logs each deleted row (slower) Minimal logging (faster)

Rollback Can be rolled back (if within Can be rolled back (in some
transaction) RDBMS)
Identity reset Retains identity Resets identity (in most DBs)
Use case Partial deletion or audit trail needed Full data wipe without audit needed

14. How do you optimize SQL queries for better performance?

Here are key SQL optimization techniques:
1. Use SELECT only required columns

-- Bad
SELECT * FROM orders;
-- Good
SELECT order_id, customer_id FROM orders;
2. Create proper indexes
• Index frequently used columns in JOIN, WHERE, ORDER BY.
3. Avoid functions on indexed columns

-- Slower (cannot use index)

WHERE YEAR(order_date) = 2024
-- Better
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
4. Use EXISTS instead of IN (for subqueries)

-- Prefer EXISTS (better for large datasets)

SELECT name FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
5. Avoid unnecessary joins or nested subqueries
6. Use appropriate data types and avoid implicit conversions
7. Analyze execution plans (EXPLAIN or EXPLAIN ANALYZE)

15. Write a query to find all customers who have not made
any purchases in the last 6 months.
Assume:
• customers(customer_id, name)
• transactions(customer_id, transaction_date)

SELECT c.customer_id, c.name

FROM customers c
LEFT JOIN transactions t
ON c.customer_id = t.customer_id
AND t.transaction_date >= CURRENT_DATE - INTERVAL '6 months'
WHERE t.customer_id IS NULL;
Explanation:
• LEFT JOIN includes all customers.
• WHERE t.customer_id IS NULL ensures the customer had no purchase in the last 6
months.

16. How do you handle NULL values in SQL? Provide

examples.
NULL represents missing or unknown data.

1. Using IS NULL / IS NOT NULL:

SELECT * FROM employees WHERE manager_id IS NULL;

2. Replace NULL using COALESCE() or IFNULL() (MySQL):

SELECT name, COALESCE(phone_number, 'Not Provided') AS contact
FROM customers;

3. Handling NULLs in aggregation (e.g., AVG, SUM):

• These functions ignore NULLs by default.
SELECT AVG(salary) FROM employees;

4. Conditional checks:
SELECT name,
CASE
WHEN salary IS NULL THEN 'Unknown'
ELSE 'Known'
END AS salary_status
FROM employees;

17. Write a query to transpose rows into columns.

Assume a table sales with:
region, month, sales_amount
We want to pivot month values into columns.
Using CASE:
SELECT region,
SUM(CASE WHEN month = 'Jan' THEN sales_amount ELSE 0 END) AS Jan,
SUM(CASE WHEN month = 'Feb' THEN sales_amount ELSE 0 END) AS Feb,
SUM(CASE WHEN month = 'Mar' THEN sales_amount ELSE 0 END) AS Mar
FROM sales
GROUP BY region;

Using PIVOT (SQL Server or Oracle syntax):

SELECT region, [Jan], [Feb], [Mar]
FROM (
SELECT region, month, sales_amount
FROM sales
) AS src
PIVOT (
SUM(sales_amount)
FOR month IN ([Jan], [Feb], [Mar])
) AS p;
18. Explain indexing and how it improves query performance.

What is an index?
An index is a data structure that improves the speed of data retrieval operations on a
database table at the cost of additional space and write-time performance.

How indexing helps:

Feature With Index Without Index
Search performance Fast (uses binary/tree search) Slow (scans every row — full scan)
Used in WHERE, JOIN, ORDER BY, GROUP BY Inefficient for large datasets
Types B-tree (default), Bitmap, Hash, etc. -

Example:
-- Creating index
CREATE INDEX idx_customer_id ON transactions(customer_id);
• This helps queries like:
SELECT * FROM transactions WHERE customer_id = 101;

Important notes:
• Too many indexes can slow down INSERT/UPDATE.
• Avoid indexing columns with low cardinality (e.g., gender).
• Use composite indexes when querying multiple columns together.

19. Write a query to fetch the maximum transaction amount

for each customer.
Assume a transactions table:
Column Description
customer_id ID of the customer
transaction_id Unique transaction ID
amount Transaction amount
Query:
SELECT customer_id, MAX(amount) AS max_transaction
FROM transactions
GROUP BY customer_id;
Explanation:
• GROUP BY groups all transactions by customer.
• MAX(amount) returns the highest transaction for each group (customer).

20. What is a self-join, and how is it used?

Definition:
A self-join is a regular join where a table is joined with itself.
It is useful when rows in a table are related to other rows in the same table.

Example Use Case – Employees and Managers:

Assume:
emp_id name manager_id
1 NlUicLeL
A
2 1B o b
3 Carol 1
4 David 2
Here, manager_id refers to emp_id of another employee.
Query: Get employee names along with their manager names
SELECT e.name AS employee_name, m.name AS manager_name
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.emp_id;
Explanation:
• e is an alias for employees (as employee).
• m is another alias for the same table (as manager).
•
The join links an employee to their manager using manager_id = emp_id.

Data Analysis/Scenario-Based Questions

21. How would you design a database to store credit card

transaction data?
To store credit card transaction data, we need to normalize the structure while keeping it
scalable, secure, and query-efficient.
Suggested Schema Design:
1. Customers Table customer_id (PK), name, email, phone, address 2. Cards Table
card_id (PK), customer_id (FK), card_number (masked), card_type, status, issued_date
3. Merchants Table merchant_id (PK), name, category, location 4. Transactions
Table transaction_id (PK), card_id (FK), merchant_id (FK), transaction_date, amount,
currency, status, location

Best Practices:
• Mask sensitive fields (like card numbers).
• Store card_number as encrypted or tokenized.
• Use partitioning on date fields for faster querying.
•
Add indexes on card_id, merchant_id, transaction_date.

22. Write a query to identify the most profitable regions based

on transaction data.
Assume a transactions table:
(transaction_id, customer_id, amount, region, transaction_date)
Query to find top 3 profitable regions:
SELECT region, SUM(amount) AS total_revenue
FROM transactions
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 3;
Explanation:
• Aggregates transaction amounts per region.
• Orders regions by total revenue.
•
Retrieves top 3 using LIMIT.
Optional: You could also calculate profit by subtracting costs (if a cost column is present).

23. How would you analyze customer churn using SQL?

Step-by-step SQL approach:

Step 1: Define churn

Let’s say a churned customer is one who hasn’t transacted in the last 6 months.

Step 2: Sample schema

• customers(customer_id, name, signup_date)
• transactions(customer_id, transaction_date, amount)
Step 3: Query to identify churned customers
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN transactions t
ON c.customer_id = t.customer_id
AND t.transaction_date >= CURRENT_DATE - INTERVAL '6 months'
WHERE t.transaction_id IS NULL;

Step 4: Analyze churn metrics

You could extend this analysis by calculating:
• Churn rate = (Churned Customers / Total Customers) * 100
• Monthly churn trend
•
Compare churned vs. active customers in terms of average spend

24. Explain the difference between OLAP and OLTP databases.

OLTP (O lin e Transaction OLAP (O line Analytical
Feature Processing) Processing)
Used for analytical/reporting queries
Handles real-time transactional SELECT (aggregate, group, slice,
Purpose queries
INSERT, UPDATE, DELETE dice)
De-normalized (star/snowflake
Operations
schema)
Data Structure Highly normalized (3NF) Fast for complex analytical queries
Speed Fast for read/write of single rows Business intelligence, dashboards,
sales trends
Examples Banking systems, e-commerce Analysts, Data Scientists
order processing
Users Clerks, DBAs Less frequent
Backup/Recovery Essential and frequent

In short:
• OLTP = operational, fast, real-time transactions.
• OLAP = analytical, slow-changing, historical data.

25. How would you determine the Average Revenue Per User
(ARPU) from transaction data?
ARPU = Total Revenue / Total Number of Users
Assume a transactions table:
(transaction_id, customer_id, amount, transaction_date)
SQL Query:
SELECT
SUM(amount) * 1.0 / COUNT(DISTINCT customer_id) AS ARPU
FROM transactions;
Explanation:
• SUM(amount) gets total revenue.
• COUNT(DISTINCT customer_id) counts unique users.
• Multiply by 1.0 to ensure float division.
You can also compute monthly ARPU by grouping by month.
SELECT
DATE_TRUNC('month', transaction_date) AS month,
SUM(amount) * 1.0 / COUNT(DISTINCT customer_id) AS monthly_arpu
FROM transactions
GROUP BY month
ORDER BY month;
26. Describe a scenario where you would use a LEFT JOIN
instead of an INNER JOIN.
Use LEFT JOIN when:
You want all records from the left table, even if there's no matching record in the right
table.

Real-life Scenario:
Question: List all customers and their transactions — even if they haven't made any.
Query:
SELECT c.customer_id, c.name, t.transaction_id, t.amount
FROM customers c
LEFT JOIN transactions t
ON c.customer_id = t.customer_id;
Why LEFT JOIN?
• Shows all customers, including those with no transactions (returns NULLs for
those).
• Using INNER JOIN would exclude customers with zero activity.

27. Write a query to calculate YoY (Year-over-Year) growth

for a set of transactions.
Assume a table named transactions with:
(customer_id, transaction_date, amount)
Step 1: Extract year-wise revenue
SELECT
EXTRACT(YEAR FROM transaction_date) AS year,
SUM(amount) AS total_revenue
FROM transactions
GROUP BY EXTRACT(YEAR FROM transaction_date);
Step 2: Calculate YoY Growth using a CTE and Self-Join
WITH yearly_revenue AS (
SELECT
EXTRACT(YEAR FROM transaction_date) AS year,
SUM(amount) AS total_revenue
FROM transactions
GROUP BY EXTRACT(YEAR FROM transaction_date)
)
SELECT
curr.year AS current_year,
curr.total_revenue,
prev.total_revenue AS previous_year_revenue,
ROUND(((curr.total_revenue - prev.total_revenue) / prev.total_revenue) * 100, 2) AS
yoy_growth_percent
FROM yearly_revenue curr
LEFT JOIN yearly_revenue prev
ON curr.year = prev.year + 1;
Explanation:
• Joins each year to its previous year.
• Computes YoY growth as a percentage.

28. How would you implement fraud detection using

transactional data?
Fraud detection typically involves pattern recognition, anomaly detection, and rule-based
filtering.
Possible SQL-Based Checks:
Type Rule
Unusual Amounts Flag transactions > 3x average amount of that user
Rapid Repeats Detect multiple transactions from same user within seconds
Location Mismatch Transactions from different countries within a short time
Card Sharing Same card used by different customers or IPs
Example Query – Unusual high amount per user:
WITH avg_txn AS (
SELECT customer_id, AVG(amount) AS avg_amount
FROM transactions
GROUP BY customer_id
)
SELECT t.*
FROM transactions t
JOIN avg_txn a
ON t.customer_id = a.customer_id
WHERE t.amount > 3 * a.avg_amount;
29. Write a query to find customers who have used more than
2 credit cards for transactions in a given month.
Assume a transactions table:
(customer_id, card_id, transaction_date)
Query:
SELECT customer_id,
TO_CHAR(transaction_date, 'YYYY-MM') AS txn_month,
COUNT(DISTINCT card_id) AS cards_used
FROM transactions
GROUP BY customer_id, TO_CHAR(transaction_date, 'YYYY-MM')
HAVING COUNT(DISTINCT card_id) > 2;
Explanation:
• Groups by customer_id and month.
• Counts distinct card_id used.
• Filters where more than 2 cards were used in a month.

30. How would you approach a business problem where you

need to analyze the spending patterns of premium customers?
Step-by-Step Structured Approach:

Step 1: Understand the Objective

• Clarify with stakeholders what "spending pattern" means.
o Is it frequency, amount, category, channel, or timing?
• Define “premium customer”:
o Based on credit score, card tier (e.g., Platinum, Centurion), monthly spend
threshold, etc.

Step 2: Data Collection

• Gather relevant datasets:
o Customer table (ID, tier, demographics)
o Transactions table (amount, date, category, location)
o Cards table (card_type, limits, activation)

Step 3: Data Cleaning & Preparation

• Handle missing values and outliers.
• Filter only premium customers using defined criteria.
• Enrich data (e.g., categorize merchant types or locations).

Step 4: Exploratory Data Analysis (EDA)

Use SQL/Python/Power BI to derive insights like:
Focus Area Example Analysis
Spend Amount Average monthly/yearly spend
Time Trends Seasonality or weekly spending behavior
Categories Where they spend most (Travel, Dining, Shopping)
Geography City or region-wise behavior
Focus Area Example Analysis
Trends Is their spend increasing/decreasing YoY?

Step 5: Segmentation
• Use clustering or thresholds to group premium customers into:
o High spenders
o Frequent spenders
o Category loyalists (e.g., only travel)

• Identify anomalies or subgroups with unique patterns.

Step 6: Business Recommendations

• Personalize rewards or offers based on their dominant categories.
• Enhance retention strategies for segments showing decline.
• Promote premium card upgrades based on usage patterns.

Bonus: Sample SQL Query

Get top 3 spending categories of premium customers monthly:
SELECT customer_id,
DATE_TRUNC('month', transaction_date) AS txn_month,
category,
SUM(amount) AS total_spend
FROM transactions
WHERE customer_id IN (
SELECT customer_id FROM customers WHERE tier = 'Premium'
)
GROUP BY customer_id, txn_month, category
ORDER BY customer_id, txn_month, total_spend DESC;

SQL Questions
No ratings yet
SQL Questions
14 pages
Aaaaaa
No ratings yet
Aaaaaa
15 pages
SQL Interview Guide For Three Previous Posts
No ratings yet
SQL Interview Guide For Three Previous Posts
15 pages
SQL Intervie Data Analyst
No ratings yet
SQL Intervie Data Analyst
24 pages
Interview Preparation Data Collection-01
No ratings yet
Interview Preparation Data Collection-01
14 pages
Real DSA and SQL Interview Questions Solutions
No ratings yet
Real DSA and SQL Interview Questions Solutions
15 pages
KPMG Data Analyst Interview Questions
No ratings yet
KPMG Data Analyst Interview Questions
30 pages
100 SQL Questions With Real Examples-2
No ratings yet
100 SQL Questions With Real Examples-2
16 pages
SQL Short Notes Top 10 Questions 1748266007
No ratings yet
SQL Short Notes Top 10 Questions 1748266007
8 pages
Wipro Data Analyst Interview Questions
No ratings yet
Wipro Data Analyst Interview Questions
29 pages
Tech Mahindra SQL Interview Questions For Data Engineer
No ratings yet
Tech Mahindra SQL Interview Questions For Data Engineer
6 pages
Real Data Analyst Interview Questions Detailed
No ratings yet
Real Data Analyst Interview Questions Detailed
14 pages
Data Engineer (3-5 Years of Experience.) PDF
No ratings yet
Data Engineer (3-5 Years of Experience.) PDF
7 pages
TCS Data Analyst Interview Questions
No ratings yet
TCS Data Analyst Interview Questions
8 pages
Top Advanced SQL Interview Questions & Answers
No ratings yet
Top Advanced SQL Interview Questions & Answers
6 pages
Detailed SQL Interview Questions
No ratings yet
Detailed SQL Interview Questions
4 pages
Interview - 7 - IMP
No ratings yet
Interview - 7 - IMP
26 pages
SQL Interview Questions With Answers
No ratings yet
SQL Interview Questions With Answers
8 pages
Tech Mahindra Data Analyst Interview Questions
No ratings yet
Tech Mahindra Data Analyst Interview Questions
11 pages
70+ SQL Interview Questions
No ratings yet
70+ SQL Interview Questions
19 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
4 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
5 pages
DBMS
No ratings yet
DBMS
24 pages
Basic SQL Interview Questions
No ratings yet
Basic SQL Interview Questions
18 pages
SQL Queries
No ratings yet
SQL Queries
18 pages
SQL Crash Sheet For MCQ Interview
No ratings yet
SQL Crash Sheet For MCQ Interview
12 pages
SQL Interview Questions For Cognizant GenC (Next - Pro) Roles (Scenario-Based)
No ratings yet
SQL Interview Questions For Cognizant GenC (Next - Pro) Roles (Scenario-Based)
4 pages
Myntra SQL
No ratings yet
Myntra SQL
34 pages
Day 9 1733668828
No ratings yet
Day 9 1733668828
8 pages
SQL Interview Questions and Answers
No ratings yet
SQL Interview Questions and Answers
5 pages
Ade 1737191501
No ratings yet
Ade 1737191501
29 pages
Day 10 1729086189
No ratings yet
Day 10 1729086189
14 pages
SQL Interview Questions Top 100
No ratings yet
SQL Interview Questions Top 100
18 pages
Most Asked SQL Queries in Interview
No ratings yet
Most Asked SQL Queries in Interview
6 pages
Ultimate SQL Interview Question Bank
No ratings yet
Ultimate SQL Interview Question Bank
4 pages
SQL Questions Guide For QA S 1738694053
No ratings yet
SQL Questions Guide For QA S 1738694053
24 pages
SQL For Data Analysis Cheat Sheet-By Srija Biswas
No ratings yet
SQL For Data Analysis Cheat Sheet-By Srija Biswas
22 pages
SQL 2
No ratings yet
SQL 2
15 pages
Basic SQL For Data Analyst Interview Questions
No ratings yet
Basic SQL For Data Analyst Interview Questions
10 pages
3.how Can I Retrive All Records of Emp1 Those Should Not Present in Emp2?
No ratings yet
3.how Can I Retrive All Records of Emp1 Those Should Not Present in Emp2?
6 pages
Scenario-Based Questions & Answers
No ratings yet
Scenario-Based Questions & Answers
18 pages
SQL Interview Queries Reference
No ratings yet
SQL Interview Queries Reference
6 pages
SQL Latest
No ratings yet
SQL Latest
7 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
12 pages
SQL Part1 (Basics)
No ratings yet
SQL Part1 (Basics)
6 pages
Advanced SQL Techniques Guide
No ratings yet
Advanced SQL Techniques Guide
48 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
7 pages
SQL Combined
No ratings yet
SQL Combined
24 pages
SQL 1732644814
No ratings yet
SQL 1732644814
7 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
27 pages
SQL Questions
No ratings yet
SQL Questions
7 pages
Data Warehousing & SQL Concepts
No ratings yet
Data Warehousing & SQL Concepts
14 pages
FFFF
No ratings yet
FFFF
5 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Crack Your Data Engineering SQL Round
No ratings yet
Crack Your Data Engineering SQL Round
112 pages
Books Isbn - N o Title Publisher - I D Year
No ratings yet
Books Isbn - N o Title Publisher - I D Year
29 pages
Complete SQL Overview
No ratings yet
Complete SQL Overview
4 pages
SQL 2
No ratings yet
SQL 2
119 pages
New in Town Ultimate Strategy Guide
No ratings yet
New in Town Ultimate Strategy Guide
5 pages
OWASP Vuln MGM Guide Jul23 2020
No ratings yet
OWASP Vuln MGM Guide Jul23 2020
20 pages
Drawing - Design pp1 Form 4
No ratings yet
Drawing - Design pp1 Form 4
8 pages
Day Trade Brokerage Note 02/10/2023
No ratings yet
Day Trade Brokerage Note 02/10/2023
2 pages
MCA - FY08 - Anheuser-Busch InBev India PVT LTD
No ratings yet
MCA - FY08 - Anheuser-Busch InBev India PVT LTD
51 pages
Liz Banks Liz Banks Liz Banks Liz Banks: Ducation
No ratings yet
Liz Banks Liz Banks Liz Banks Liz Banks: Ducation
5 pages
Precede/Proceed
No ratings yet
Precede/Proceed
10 pages
English Code 4 Phonics Book
100% (7)
English Code 4 Phonics Book
51 pages
PUTHANE 8290: Polyurethane Coating Specs
No ratings yet
PUTHANE 8290: Polyurethane Coating Specs
2 pages
Forensic Accounting 1st Edition Rufus Fast Access
0% (1)
Forensic Accounting 1st Edition Rufus Fast Access
311 pages
Star Trek
67% (3)
Star Trek
19 pages
Student Portal Features
No ratings yet
Student Portal Features
12 pages
GU-569 Rev.04 Capital Cost Estimating Guidline
100% (1)
GU-569 Rev.04 Capital Cost Estimating Guidline
41 pages
Maintenance Record & Record
No ratings yet
Maintenance Record & Record
2 pages
PBB School Level Form 1.0 Region IV A Elementary Final
No ratings yet
PBB School Level Form 1.0 Region IV A Elementary Final
4,317 pages
Zscaler Cisco SD WAN Deployment Guide FINAL
No ratings yet
Zscaler Cisco SD WAN Deployment Guide FINAL
129 pages
00 Quarterly Sample
0% (1)
00 Quarterly Sample
6 pages
Full Pharmacotherapy Casebook A Patient Focused Approach Edition PDF All Chapters
100% (5)
Full Pharmacotherapy Casebook A Patient Focused Approach Edition PDF All Chapters
34 pages
G.R. No. 144293 December 4, 2002 JOSUE R. LADIANA, Petitioner, People of The Philippines, Respondent
No ratings yet
G.R. No. 144293 December 4, 2002 JOSUE R. LADIANA, Petitioner, People of The Philippines, Respondent
6 pages
Grade 8 Lesson Plan: Transition Signals
No ratings yet
Grade 8 Lesson Plan: Transition Signals
7 pages
SBI Magnum Tax Gain Scheme
No ratings yet
SBI Magnum Tax Gain Scheme
14 pages
Tackling Math Anxiety in Schools
No ratings yet
Tackling Math Anxiety in Schools
2 pages
April 3-Good Friday
No ratings yet
April 3-Good Friday
130 pages
The Odyssey Books 5-8 Guided Reading Questions
No ratings yet
The Odyssey Books 5-8 Guided Reading Questions
2 pages
Cattle Clinical Exam Guide
No ratings yet
Cattle Clinical Exam Guide
10 pages
Billy Bride Ring Sizer
No ratings yet
Billy Bride Ring Sizer
1 page
And The Journal of Design History Before
No ratings yet
And The Journal of Design History Before
2 pages
An Introduction To Orthodontics. 4th Edition.
100% (24)
An Introduction To Orthodontics. 4th Edition.
23 pages
p.3 Term I II III Mathematics Creative Printers
No ratings yet
p.3 Term I II III Mathematics Creative Printers
48 pages
Bio-Controlling Capability of Probiotic Strain Lactobacillus Rhamnosus Against Some Common Foodborne Pathogens in Yoghurt ACCEPTED MANUSCRIPT PDF
No ratings yet
Bio-Controlling Capability of Probiotic Strain Lactobacillus Rhamnosus Against Some Common Foodborne Pathogens in Yoghurt ACCEPTED MANUSCRIPT PDF
30 pages

Real Data Analyst Interview Questions Answers

Uploaded by

Real Data Analyst Interview Questions Answers

Uploaded by

Data Analyst Interview Questions

2. Replace missing values with column mean

• fillna() fills NaN values.

4. Difference between calculated column and measure.

5. Write a query to find employees earning more than their

6. Get top 3 highest-paid employees per department

ROW_NUMBER() ranks employees within each department.

Tip: Use RANK() instead of ROW_NUMBER() if you want to include ties.

7. Write a query to fetch the top 3 performing products based

SELECT product_id, product_name, total_sales

df_encoded = pd.get_dummies(df, columns=['Gender', 'City'], drop_first=True)

Tip: For ML, prefer sklearn.preprocessing.OneHotEncoder for pipeline compatibility.

10. Explain p-value in hypothesis testing

Example – Filter top-paid employees using CTE:

12. Write a query to identify customers who have made

13. Explain the difference between DELETE and TRUNCATE

14. How do you optimize SQL queries for better performance?

-- Slower (cannot use index)

-- Prefer EXISTS (better for large datasets)

SELECT c.customer_id, c.name

16. How do you handle NULL values in SQL? Provide

1. Using IS NULL / IS NOT NULL:

2. Replace NULL using COALESCE() or IFNULL() (MySQL):

3. Handling NULLs in aggregation (e.g., AVG, SUM):

17. Write a query to transpose rows into columns.

Using PIVOT (SQL Server or Oracle syntax):

How indexing helps:

19. Write a query to fetch the maximum transaction amount

20. What is a self-join, and how is it used?

Example Use Case – Employees and Managers:

Data Analysis/Scenario-Based Questions

21. How would you design a database to store credit card

22. Write a query to identify the most profitable regions based

23. How would you analyze customer churn using SQL?

Step 1: Define churn

Step 2: Sample schema

Step 4: Analyze churn metrics

24. Explain the difference between OLAP and OLTP databases.

27. Write a query to calculate YoY (Year-over-Year) growth

28. How would you implement fraud detection using

30. How would you approach a business problem where you

Step 1: Understand the Objective

Step 2: Data Collection

Step 3: Data Cleaning & Preparation

Step 4: Exploratory Data Analysis (EDA)

• Identify anomalies or subgroups with unique patterns.

Step 6: Business Recommendations

Bonus: Sample SQL Query

You might also like