KEMBAR78
DBA Interview Questions | PDF | Principal Component Analysis | Factor Analysis
100% found this document useful (1 vote)
153 views21 pages

DBA Interview Questions

ok

Uploaded by

Mehul Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
153 views21 pages

DBA Interview Questions

ok

Uploaded by

Mehul Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

05/05/2018 50 Data Analyst Interview Questions

ABOUT INDEX WRITE FOR US

HOME SAS R PYTHON DATA SCIENCE SQL EXCEL VBA SPSS RESOURCES

INFOGRAPHICS MORE

SEARCH... GO

Home » Data Analyst Interview Questions » 50 Data Analyst Interview Follow us on Facebook
Questions

50 DATA ANALYST INTERVIEW QUESTIONS Join us with 5000+ Subscr


Deepanshu Bhalla 5 Comments Subscribe to Free Updates
Data Analyst Interview Questions
Enter your email... Su

This tutorial explains common and tricky data analyst


interview questions with answers. The main responsibility
of data analyst is to generate insights from data and
present it to stakeholders such as external or internal
clients. During this process, he/she extracts data from
database and then clean it up to prepare it for analysis.
Later data analysis step involves exploration of data with
descriptive statistics and then building predictive model for
prediction. Data analyst must know basics and
intermediary statistics and know how to apply it with SAS /
SPSS. Excel and SQL are the two most popular tools used
in data analytics so candidate must possess a good
knowledge of these tools. Excel is used for a variety of
purposes such as generating quick summaries and
presenting it in an interactive excel dashboard. Mostly
offline reporting deliverables are in either excel or
powerpoint report formats.

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 1/21
05/05/2018 50 Data Analyst Interview Questions

This tutorial covers interview questions on the


following topics:

1. MS Excel
2. Basic and Intermediate Statistics
3. SAS
4. SQL
5. HR / Project related questions

Excel Questions

The following is a list of some tricky or advanced excel


interview questions.

1. What is the default value of last parameter of


VLOOKUP?

TRUE/1 . It refers to finding the closest (approximate)


match and assuming the table is sorted in ascending order.
Whereas, FALSE/0 refers to exact match.

2. What is the main limitation of VLOOKUP function?

The lookup value should be at the most left side column in


the table array. VLOOKUP only looks right. It cannot look
right to left.

3. Does VLOOKUP look up case-sensitive values?

No, it is not case-sensitive. The text 'ram' and 'RAM' is


identical for VLOOKUP.

4. 2 ways to extract unique values in excel

Use Advanced Filter option (shortcut key : ALT D F A) and


'Remove Duplicates' option under Data tab.

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 2/21
05/05/2018 50 Data Analyst Interview Questions

5. How to find duplicates in a column?

Use CONDITIONAL FORMATING to highlight duplicate


values. OR use COUNTIF function as shown below. For
example, values are stored in cells D4:D7.

=COUNTIF(D4:D7,D4)

Apply filter on the column wherein you applied COUNTIF


function and select values greater than 1.

6. How to insert a drop down?

Go to Data tab >> Select Data Validation. Another way to


insert a drop down is to enable Developer tab and Insert
Combo box.

7. How to sum values based on some conditions?

Use SUMIF or SUMPRODUCT functions. The SUMIF


function is explained below -

=SUMIF(range, criteria, sum_range)


=SUMIF(A2:A5,"A",B2:B5)

Excel : SUMIF Function

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 3/21
05/05/2018 50 Data Analyst Interview Questions

8. How to create cross tabulation in Excel?

Use Pivot Table and select one variable in


Row label and the other variable in Column
label.

9. What is Excel Array Formula?

Excel Array Formula Explained

10. How to extract First Name from a full name?

Suppose you need to pull 'Neha' from 'Neha Sharma'. Use


MID and FIND functions.

=MID (A2,1,FIND(" ",A2)-1)

Tutorial : Practical Uses of MID Function

11. How Index and Match Function works?

Index function returns a value from a range based on row


number.

= INDEX(range, row_number)

See the image below -

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 4/21
05/05/2018 50 Data Analyst Interview Questions

Excel : Index Function

In this case, we are telling EXCEL to return second value


of the range A2:A4. It returns 30.

Match function returns the relative position of a value in


range.

= MATCH(lookup_value, range, match_type)

match_type can be exact match, largest/smallest value that


is less than or greater than equal to lookup_value.

Excel : Match Function

In this case, we are asking EXCEL to find the relative


position of 30 in the range A2:A4. It returns 2.

12. How Index and Match Function works together?

=INDEX(range, MATCH(lookup_value,
lookup_range, match_type))

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 5/21
05/05/2018 50 Data Analyst Interview Questions

Suppose information of Product and Sales are stored in


columns A and B. You need to look for product against
sales value so you need to tell EXCEL to look from right to
left as sales value is placed in the right hand side of the
range/table.

=INDEX(A2:A5,MATCH(45,B2:B5,0))

Nested INDEX MATCH Excel Functions

Basic and Intermediate Statistics

The following questions touch upon some basics and


intermediate statistics topics. These topics are generally
taught in undergraduate / graduate courses.

1. What is p-value?

It is the lowest level of significance at which you can reject


the null hypothesis. If p-value < 0.05, you reject the null
hypothesis at 5% level of significance.

2. Difference between MEAN. MEDIAN, MODE

Mean is calculated by summing all the values divided by


number of observations. Median is the middle value. And

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 6/21
05/05/2018 50 Data Analyst Interview Questions

Mode is the most occurring value.

3. In which data types MEAN, MEDIAN and MODE


are more suitable?

MEAN is suitable for continuous data with no outliers. It is


affected by extreme values (Outliers).
MEDIAN is suitable for continuous data with outliers or
ordinal data. Mode is suitable for categorical data
(including both nominal and ordinal data).

4. Different Types of Sampling Techniques?

There are following four main types of sampling


techniques.

1. Simple random sampling


2. Stratified sampling
3. Cluster sampling
4. Systematic sampling

5. Difference between Cluster and Stratified


Sampling?

The main difference between cluster and stratified


sampling is that in stratified sampling all the strata need to
be sampled. In cluster sampling one proceeds by first
selecting a number of clusters at random and then
sampling each cluster or conduct a census of each cluster.
But usually not all clusters would be included.

6. When should we use T-test than Z-test?

Theoretically, we should use T-test when sample size (N) is


less than 30. Practically, we always use t-test. It is because
t-test and z test are equivalent as N tends to infinity.

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 7/21
05/05/2018 50 Data Analyst Interview Questions

7. What is the difference between R-square and


Adjusted R-square?
Check out this link - R-square vs. Adjusted R-
square

8. How to detect outliers?

Box Plot Method - If a value is higher than the 1.5*IQR


above the upper quartile (Q3), the value will be considered
as outlier. Similarly, if a value is lower than the 1.5*IQR
below the lower quartile (Q1), the value will be considered
as outlier.

Standard Deviation Method - If a value is higher than the


mean plus or minus three Standard Deviation is considered
as outlier.

Ways to detect Outliers with SAS

9. Define Homoscedasticity?

In a linear regression model, there should be homogeneity


of variance of the residuals. In other words, the variance of
residuals are approximately equal for all predicted
dependent variable values.

Check Homoscedasticity with SAS

10. Difference between Standardized and


Unstandardized Coefficients?

To calculate Standardized Coefficients, first we need to


standardize both dependent and independent variables
and use the standardized variables in the regression model
to get standardized estimates. By 'standardize', it implies
subtracting mean from each observation and divide that by
the standard deviation. The standardized coefficient is

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 8/21
05/05/2018 50 Data Analyst Interview Questions

interpreted in terms of standard deviation. Whereas,


unstandardized coefficient is measured in actual units.

11. Difference between Factor Analysis and Principal


Component Analysis?

Both the analysis are very much similar but they are
different in terms of calculation and their practical usage :

1. In Principal Components Analysis, the components


are calculated as linear combinations of the raw
input variables. In Factor Analysis, the raw input
variables are defined as linear combinations of the
factors.
2. The main idea of using PCA is to explain as much of
the total variance in the variables as possible.
Whereas, the factor analysis explains the
covariances or correlations between the variables.
3. PCA is used when we need to reduce the number of
variables (dimensionality reduction) whereas FA is
used when we need to group variables into some
factors.

12. Difference between Linear and Logistic


Regression?

There are more than 10 differences between these two


algorithms. Check out the link below -
Linear vs. Logistic Regression

13. How to statistically compare means between


groups?

Use Independent T-test when a continuous variable and a


categorical variable having two independent categories.

Use Paired T-test when a continuous variable and a

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 9/21
05/05/2018 50 Data Analyst Interview Questions

categorical variable having two dependent or paired


categories.

Use one way ANOVA when a continuous variable and a


categorical variable having more than two independent
categories.

Use GLM Repeated Measures when a continuous


variable and a categorical variable more than two
dependent categories.

14. Explain eigenvalues and eigenvectors intuitively

Eigenvalues are variances explained by principal


components. By 'variances', i mean the diagonal values of
the covariance matrix below -

x y z

1.34 -0.16 0.19

-0.16 0.62 -0.13

0.19 -0.13 1.49

The sum of the diagonal values is 3.45.

Why eigenvalue greater than 1 is considered to retain


components? It is because the average eigenvalue will be
1, so > 1 implies higher than average.

Eigenvectors are the coefficients of orthogonal


(uncorrelated) transformation of variables into principal
components.

SAS

The following questions would help you to prepare for SAS


https://www.listendata.com/2017/01/data-analyst-interview-questions.html 10/21
05/05/2018 50 Data Analyst Interview Questions

interview round.

1. Difference between WHERE and IF statement?


WHERE statement can be used in procedures
to subset data while IF statement cannot be
used in procedures.
WHERE can be used as a data set option while
IF cannot be used as a data set option.
WHERE statement is more efficient than IF
statement. It tells SAS not to read all
observations from the data set
WHERE statement can not be used when
reading data using INPUT statement whereas
IF statement can be used.
When it is required to use newly created
variables, use IF statement as it doesn't require
variables to exist in the READIN data set.

2. How PROC MEANS works?

PROC MEANS DATA = dataset_name;


VAR analysis_variable;
CLASS grouping_variable;
RUN;

Detailed Explanation : PROC MEANS

3. Difference between INFORMAT and FORMAT?

Informat is used to read data whereas Format is used to


write or display data.

4. Difference between NODUPKEY and NODUP in


PROC SORT?

The NODUPKEY option removes duplicate observations


where value of a variable listed in BY statement is repeated
https://www.listendata.com/2017/01/data-analyst-interview-questions.html 11/21
05/05/2018 50 Data Analyst Interview Questions

while NODUP option removes duplicate observations


where values in all the variables are repeated.

5. How many maximum characters SAS library


name can take?

A valid library name must start with an alphabet and cannot


have more than 8 characters.

6. Which is more faster - Proc SQL or SAS data


step?

The SQL procedure performed better with the smaller


datasets whereas the data step performed better with the
larger datasets (more than approx. 100 MB).

7. Two main advantages of Proc SQL Joins over


Data Step Merging?

1. Proc SQL JOINS do not require variables to be


sorted prior to joining them whereas Data Step
Merging requires.
2. Proc SQL works perfectly when key variables have
different names

8. What would happen if i don't use 'BY statement in


MERGE?

Without 'BY' statement, SAS will merge the 1st observation


from dataset A and 1st observation from dataset B to form
the 1st observation of the final dataset. It might lead to
meaningless results.

9. What are the ways to create a macro variable?

There are 5 ways to create macro variables:

%Let

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 12/21
05/05/2018 50 Data Analyst Interview Questions

Iterative %DO statement


Call Symput
Proc SQl into clause
Macro Parameters

10. How to rename columns with PROC SQL?

Use AS alias.

Proc SQL;
select name as fullname from table1;
quit;

11. How to calculate percentile values with SAS?

We can use PROC MEANS or PROC UNIVARIATE to


calculate percentile values. For example, specify options
P10, P90 to calculate 10th and 90th percentile score.
PROC MEANS cannot calculate custom percentile values
such as 97.5th or 99.5th percentile. To calculate these
custom percentile values, you can use PCTLPTS= option
in PROC UNIVARIATE.

Tutorial : PROC UNIVARIATE

12. How to replace missing values of all the numeric


variables to 0 in a single run?

We can use _numeric_ to specify all the numeric variables


and dim functions in array to count the number of numeric
variables.

data temp;
set sampledata;

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 13/21
05/05/2018 50 Data Analyst Interview Questions

array Q(*) _numeric_;


do i= 1 to dim(Q);
if Q(i) = . then Q(i)= 0;
end;
run;

SQL

1. How to write conditional statements (IF ELSE) in


SQL?

In SQL, it is possible with CASE WHEN statements.

select
case when sex='M' then 1 else 0 end as
males
, case when sex='F' then 1 else 0 end
as females
from sashelp.class;
quit;

2. What are the common SQL data types ?

Data Type Format

Numeric NUMBER(L,D),
INTEGER,SMALLINT,DECIMAL(L,D)

Charact CHAR(L),VARCHAR(L)
er

Date DATE

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 14/21
05/05/2018 50 Data Analyst Interview Questions

3. How to subset or filter data in SQL?

We can use WHERE clause to subset or filter data.

SELECT *
FROM PRODUCT
WHERE SALES > 200

4. Difference between WHERE and HAVING clauses

The HAVING clause comes into effect when GROUP BY is


used. It runs after GROUP BY so it filters grouping
variable. Whereas WHERE clause runs prior to GROUP
BY clause so it does not filter a grouping variable.

5. Difference between Full Join and Cross Join?

A full join keeps all rows from both of the input tables even
if we cannot find a matching row.
Cross Join returns cartesian product of tables. It matches
every row of one table with every row of another table.

6. Difference between UNION and UNION ALL

The main use of UNION and UNION ALL is to join two


tables. The main difference between them is that UNION
removes duplicate records and UNION ALL keeps the
duplicate records. By 'duplicate records', all the values of
two or more rows are same.

7. How to create a blank table

Method I :

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 15/21
05/05/2018 50 Data Analyst Interview Questions

The following method creates a new table called temp2


with the same column names and attributes of table temp.

CREATE TABLE TEMP2 LIKE TEMP;

Method II :

In this case, we are creating a blank table by subsetting


data method. As 1 is not equal to 2, it returns zero row.

create table temp3 as


select * from temp
where 1=2;

8. What will be the result of the query below?

select case when null = null then 'Yes' else


'No' end as Result;

It will return NO as the above code is not right way to


compare null values. The correct way would be to use 'is'
keyword to compare -

select case when null is null then 'Yes' else


'No' end as Result;

9. Suppose you have a table named TEMP. You


need to recode values of column Y, Swap values 2
and 3 in column Y

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 16/21
05/05/2018 50 Data Analyst Interview Questions

Table

UPDATE TEMP
SET Y= CASE WHEN Y = 2 THEN 3
WHEN Y = 3 THEN 2
ELSE Y END;

10. Identify second maximum value

select max(y) from temp


where y not in (select max(y) from temp);

In this code, the logic is to remove the maximum value


from the main table and then calculate the max value to
figure out the second max value of the main table.

11. Identify second maximum value by a group

The code below first removes all the max values by a


group from the main table. Later we calculated the second
max value by a group.

select a.x, max(a.y) as maxy from temp a left


join
(select x, max(y) as maxy from temp group
by 1) b
on a.x = b. x and a.y = b.maxy
where b.x is null and b.maxy is null
group by 1;
https://www.listendata.com/2017/01/data-analyst-interview-questions.html 17/21
05/05/2018 50 Data Analyst Interview Questions

12. Is the query below correct? If not, what's the


issue?

SELECT custid, YEAR(ref_date) AS ref_year


FROM custmart
WHERE ref_year >= 2015;

The calculated column cannot be used in WHERE


condition so we need to modify the above code like this -

SELECT custid, YEAR(ref_date) AS ref_year


FROM custmart
WHERE YEAR(ref_date) >= 2015;

HR / Project Related Questions

1. Explain one of your project

Start from problem definition


Explain Data Cleaning, Exploration and Data
Preparation Steps
What technique / algorithm is used in the
project?
Financial (Dollar) value impact of the project

2. What are your strengths and weaknesses?

3. Why are you leaving the current organization?

4. Why should we hire you?

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 18/21
05/05/2018 50 Data Analyst Interview Questions

5. Where do you see yourself five years from now?

6. What was the toughest decision you ever had to


make?

End Notes

The above list of questions would assist you in preparing


for interviews for roles of senior / lead data analyst. Don't
just mug up answers, understand the concepts of the
topics covered in these questions.

Love this Post? Spread the Word


Facebook LinkedIn Twitter

About Author:
Deepanshu founded ListenData with a simple objective - Make analytics easy to
understand and follow. He has over 7 years of experience in data science and
predictive modeling. During his tenure, he has worked with global clients in
various domains like banking, Telecom, HR and Health Insurance.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :


Enter your email address Submit

*Please confirm your email address by clicking on the link


sent to your Email*

Related Posts:
50 Data Analyst Interview Questions

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 19/21
05/05/2018 50 Data Analyst Interview Questions

5 Responses to "50 Data Analyst Interview


Questions"

Anonymous 9 February 2017 at 11:36


Excellent blog

Reply

Anonymous 9 February 2017 at 11:37


Please add some more question related to excel and sas
base

Reply

Unknown 15 November 2017 at 23:46

extremely helpful and up to the mark blog. great job

Reply

Laxmiprasad DB 21 November 2017 at 21:00


3rd Answer is wrong. It should be 'Yes' cos its case
sensitive.

Reply

Replies

Deepanshu Bhalla 22 November 2017 at


12:30
It is not case-sensitive. can you try it once
using excel?

Reply

Enter your comment...

Comment as: Select profile...

Publish Preview

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 20/21
05/05/2018 50 Data Analyst Interview Questions

← PREV NEXT →

Copyright 2017 ListenData

https://www.listendata.com/2017/01/data-analyst-interview-questions.html 21/21

You might also like