Data Analysis Crash Course
Property of ICTech Solutions
MODULES
1. Review of Data Analysis Concepts
2. MS Excel for Data Analysis
3. IBM SPSS for Data Analysis
4. Programming: Python and R for Data Analysis
5. Exercises and Extras
MODULE 1
Review of Data Analysis Concepts
Data Analysis Concepts
Population
The entire collection from which data is sourced
Sample
A smaller set of the population that can represent the population
The sample can be the population
Respondent
Individual unit of the sample, in the case of a questionnaire or survey
Data
Records taken from the respondents, used for decision making or hypothesis testing
Hypothesis
A guess, a question based on the collected data. The statistician intends to reject the opposite of the guess (called a null
hypothesis)
Data Analysis Concepts
Data Measurement Levels
1. Nominal. Categorical data, order not important. E.g respondents’ religion (Christianity, Islam, etc)
2. Ordinal. Categorical data, order is important. E.g. respondents’ complexion (Dark, Chocolate, Fair, White)
3. Interval. Numerical data, discrete. E.g. respondents’ age (30, 50, 25)
4. Ratio. Numerical data, continuous. E.g. respondents’ weight (70, 65.8, 66.03)
Data Preparation
Involves different techniques employed by the analyst to achieve a desired result with the data. These
include sorting, filtering, coding, outlier management, rearrangement, hiding/unhiding, etc.
Data Visualisation
Using charts, graphs or other pictorial tools to present a part or the whole of the data in a more meaningful
way
Data Analysis Concepts
Descriptive Statistics
5 number summary is used to describe numerical data. It shows the lower quartile, median, upper quartile,
minimum and maximum. Boxplots can present these pictorially
Measures central tendency include mean, median and mode. These are used for numerical data, and work
best for data that are parametric (approximately normally distributed)
Measures of dispersion (or spread) include other statistics like range, quartiles and percentiles, standard
deviation and variance.
For categorical data, pictorial descriptives are most effective. These include forms like bar and contingency
charts
Test Statistics
Depending on the kind of data and the required question, several statical tests can be done, with each having
their test statistics. These tests include T-tests, ANOVAs, Chi-Square tests, etc., and non-parametric tests
MODULE 2
MS Excel for Data Analysis
Excel for Data Analysis
Table Operations
• Use CTRL + T to convert your data
into a table for easier analysis. Be sure
to select the required data range
• Click the arrow that appears on the
table headers. It will bring a drop down
menu from where further options like
sorting and filtering are possible.
Excel for Data Analysis
Pivot Tables
• Pivot tables can be used to summarise
data and present it in a meaningful way.
• Select Pivot Table from the Insert tab to
create a pivot table.
• Add columns to the different fields to
see the tabular visualization in real
time.
Excel for Data Analysis
Pivot Charts
• Pivot charts can be used to summarise
data and present it in a meaningful way.
• Select the data you want to visualize
and select “charts” from the Quick
Analysis balloon
• Add columns to the different fields to
see the visualization in real time.
• You can also change the type of chart
as other options as needed
MS Excel for Data Analysis: Other Visualization Methods
Pie Chart Line Graph Boxplot
Count of Response Text
Count of Response Text
7000
6000
5000
Agree 4000
Disagree
Not Applicable 3000
Strongly Agree 2000
Strongly Disagree
1000
(blank)
0
e )
ee re le
re
e
re
e k
gr ag ca
b
g ag la
n
A is i A is
pl (b
D p ly D
g
A n ly
ot tr
o
n
g
N S o
tr
S
MS Excel for Data Analysis: Data Analysis Toolkit
Excel has an inbuilt add-in for more advanced data analysis – the Analysis Toolkit.
It is not activated by default and must be turned on by navigating to File>Options>Add-ins.
Ensure that ‘Excel Add-ins’ under the ‘Manage’ section is selected, and click ‘Go’. Select the
‘Analysis ToolPak’ checkbox and hit OK. That will activate the Data Analysis toolkit and it will
appear in the analysis group in the Data Tab on the ribbon.
MS Excel for Data Analysis: Data Analysis Toolkit
From the Data Analysis Toolkit, you can Other analysis tools such as
Parametric and non-parametric tests
Descriptive Statistics
Other measures of dispersion and central tendency
MS Excel for Data Analysis: Making a Dashboard
A dashboard is a visual summary, that gets updated in real time, much like a
pivot table. The values in the dashboard are linked using the “=” operator
MODULE 3
IBM SPSS for Data Analysis
SPSS Views
SPSS displays data slightly differently from MS Excel. In SPSS, the excel columns are called
fields, and the Excel rows are called records.
There are two views used for data analysis and entry in SPSS: the Data View and the Variable
View.
The Data View is similar to the dataset organization in MS Excel, where it displays the vertical
fields and horizontal records. Unlike Excel, however, the fields are recorded as variables (var),
instead of alphabets.
The Variable View shows all the fields in the dataset, and the parameters of each of the fields
(variables). These parameters include the name, type, measure, among others. In this view, the
parameters of each of the variables can be adjusted and changed as required.