Data Analysis / Applied Statistics & Excel
Michael Wasdell, MA
Epidemiologist
Evaluation & Research Services
Michael.Wasdell@fraserhealth.ca
Fraser Health Authority, 2011
The Fraser Health Authority (FH) authorizes the use, reproduction and/or
modification of this publication for purposes other than commercial redistribution. In
consideration for this authorization, the user agrees that any unmodified
reproduction of this publication shall retain all copyright and proprietary notices. If
the user modifies the content of this publication, all FH copyright notices shall be
removed, however FH shall be acknowledged as the author of the source
publication.
Reproduction or storage of this publication in any form by any means for the purpose
of commercial redistribution is strictly prohibited.
This publication is intended to provide general information only, and should not be
relied on as providing specific healthcare, legal or other professional advice. The
Fraser Health Authority, and every person involved in the creation of this publication,
disclaims any warranty, express or implied, as to its accuracy, completeness or
currency, and disclaims all liability in respect of any actions, including the results of
any actions, taken or not taken in reliance on the information contained herein.
Department of Evaluation and Research Services
Susan Chunick
Director
Camille Viray
Education and Communications
Coordinator
Dina Shafey
Research Ethics Coordinator
Magdalena Swanson
Research and Grant
Development Facilitator
Michael Wasdell
Epidemiologist
2
http://research.fraserhealth.ca/
Objectives
Understand the relationship
between the research question,
research designs/method, type
of data and statistical analysis;
Use tools and resources to
identify the appropriate
statistical analysis, and;
Learn how to use the Excel
Data Analysis utility.
4
What is Statistics?
The collecting, summarizing, and
analyzing of data.
The term also refers to raw numbers, or
stats, and to the summarization of data.
Why is a physician held in much higher esteem than a
statistician?
A physician makes an analysis of a complex illness
whereas a statistician makes you ill with a complex
analysis!
5
Types of Statistics
Descriptive Statistics: Describe research
findings
z
e.g. Frequencies, averages.
Inferential Statistics: Makes inferences
about the population, usually based on a
random sample.
z
z
Allows generalization to population.
Quasi-experimental research does not
employ randomization, but might use
inferential statistics.
Statistics made simple
7
Tasty Statistics
1.
2.
3.
4.
5.
Everyone
Identify overall research goal.
Identify independent and dependent
variables.
Describe the level of the data.
Identify the number and pairing of
groups.
Check assumptions about data.
8
Identify Overall
Research Goal
Describe
Associate
Predict
Compare
9
Describe
Concerned with describing the status or
characteristics of a phenomenon
z
Case Study
an in-depth investigation of an individual, group, incident, or
community
Cross sectional
involve the collection of data from selected individuals in a
single time period.
Longitudinal
involve data collection at two or more times in order to
describe changes over time.
10
Associate
Concerned with identifying relationships and the
strength of relationships between two variables
Required before additional research is done to
assess causation
Two variables from the same subject are
assessed for association
z
Two different variables at the same point in time
(cross-sectional)
Same variable at two different points in time
(longitudinal)
11
Predict
Suspect certain factors contribute to a phenomenon
Concerned with identifying variables that are predictive
of particular outcomes
Independent (predictor) and dependent (outcome)
variables are identified
There may be more than one independent variable
There is a temporal order
z
Independent variable occurs before the dependent variable
Involves discovering a mathematical equation that can
be used to predict values for the data.
12
Compare
Interested in
z
identifying statistical differences between two
or more groups
Identifying statistical differences between
repeated observations within the same group
13
Identify independent and
dependent variables.
An independent variable is the variable
that you believe will influence some other
variable.
z
manipulated/assigned by researcher
(assigned to a treatment, workshop etc.)
pre-existing characteristic, not under control
of researcher (sex, age, exposure or
treatment)
A dependent variable is the variable that is
influenced by the independent variable(s).
14
Describe the Level
of the Data
Levels of Measurement
Categorical
Ordinal
Continuous
15
Variables Level of Measurement
Categorical mutually exclusive
unordered categories. E.g. food types,
gender, eye colour, ethnicity.
z
Categories cannot be arranged in any
particular order.
Can assign number codes, but calculations
would be meaningless.
Nominal, Dichotomous, Binary
16
Variables Level of Measurement
Ordinal - categories with an implied order,
but distance between intervals not always
equal or unimportant.
z
E.g. Low, middle and high income, or rating a
brand of soft drink on a scale of 1-5.
17
Variables Level of Measurement
Continuous
z
Interval - equal distance between each interval.
E.g. 1,2,3.
Arbitrary zero point
Celcius scale for temperature - temperature does not
cease to exist at 0 degrees.
Ratio - similar to interval scale, but has true zero
point meaning there is none of the variable. E.g.
Weight, salary ($0=$0).
with ratio variables, you can make assumptions about
the ratio of two measurements 6 grams is twice as
much as 3 grams.
18
Identify the Number
and Pairing of Groups
Study design and types of comparisons are
important determinants of statistical tests
How many groups are involved?
Are they paired (matched) in some way?
z
z
z
If the study uses a pre-post design, each participant is
assessed on the same measure at different points in
time.
Paired groups (two times)
Matched groups (three or more times)
19
Check Assumptions
About Data
There are various assumptions for each statistical test.
Before you select a test, be sure to check the
assumptions of each test.
Some examples of common assumptions are:
z
The dependent variable will need to be measured on a certain
level e.g. Continuous.
The independent variable(s) will need to be measured on a
certain level e.g. Categorical.
The population is normally distributed (not skewed).
If your data do not meet the assumptions for a specific
test, you may be able to use a non-parametric test
instead.
20
Parametric Tests
Parametric tests assume that the variable in question is
from a normal distribution (continuous).
Its a good idea to have a minimum of 30 cases for each
group.
Non-parametric tests do not require the assumption of
normality.
Most non-parametric tests do not require a
continuous/interval level of measurement; can be used
with nominal/ordinal level data.
21
1.
2.
3.
4.
5.
Identify overall research goal.
Identify independent and
dependent variables.
Describe the level of the data.
Identify the number and
pairing of groups.
Check assumptions about
data.
Having this recipe will
allow you to select an
appropriate statistical test.
Statistical test
decision tools link
recipe components
with the required
statistic.
Results in successful
statistical planning and
analysis.
22
1. Identify overall
research goal.
2. Identify independent
and dependent variables.
3. Describe the level of
the data.
4. Identify the number
and pairing of groups.
5. Check assumptions
about data.
Type of Dependent Variable Data
Goal
Continuous
Normal
Ordinal
Non-normal
Categorical
Describe one
group
Mean, SD
Median, interquartile
range
Proportion
Wilcoxon test
Chi-square
One-sample t test
Compare one
group to a
hypothetical value
Compare two
unpaired groups
Unpaired t test
Mann-Whitney test
Fisher's test
(chi-square for large
samples)
Compare two
paired groups
Paired t test
Wilcoxon test
McNemar's test
Compare three or One-way ANOVA
more unmatched
groups
Kruskal-Wallis test
Chi-square test
Compare three or Repeated-measures
ANOVA
more matched
groups
Friedman test
Cochrane Q
Quantify
association
between two
variables
Pearson correlation
Spearman correlation
Contingency coefficients
Predict value
from another
measured
variable
Simple linear regression
or
Nonlinear regression
Nonparametric
regression
Simple logistic regression
Predict value
from several
measured or
binomial
variables
Multiple linear regression
or
Multiple nonlinear
regression
Multiple logistic regression
23
Statistical Test Selection Group
Exercise
Using your table, select the appropriate
statistical tests for the research
scenarios.
24
During the group exercise
Steps to choose the appropriate statistical method
for the data analysis:
1.
Identify whether the research goal is one of describe,
associate, predict, or compare (difference).
Identify dependent and independent variables.
Identify the level of measurement in the dependent
variable (Categorical, Ordinal, Continuous).
Identify the number of groups. Are the groups
paired/matched (same group before and after) or
independent (not at all related, looking at different
groups)?
Select an appropriate statistical test using the decision
chart.
2.
3.
4.
5.
25
1. A pilot experiment designed to test the effectiveness of a
new approach to electrode placement for Electroconvulsive
Therapy (ECT) has been conducted over a one year time
period.
Patients from two different mood disorder clinics participated in
this study. Patients from Clinic X received ECT therapy
according to current practice guidelines. Patients from Clinic Y
received a new exploratory ECT treatment. Patients in each
clinic were matched for age, gender, and type of disorder. A
random sample of 30 was selected from each site for inclusion
in the study. At the end of one year, patients were administered
a memory test yielding a total score out of 100. What statistical
procedure needs to be selected to test for differences among
groups of patients on the memory test.
What is the goal: Compare
Indpendent variable: Type of therapy
Dependent variable: Memory Test
How many groups: 2
Paired/matched or independent: Independent
What is the level of measurement: Continuous
26
27
Statistical Analysis with
Excel Data Analysis
Check to see
if you have
excel Data
Analysis
28
Install Data Analysis Toolpack
Click on
Tools
Add-ins
Analysis
add Analysis ToolPak,
not Analysis ToolPakVBA
OK
29
Descriptive Statistics
Measures of central tendency (mean,
median, mode)
Variability (range, variance, standard
deviation)
Shape of Distribution (skewness, kurtosis)
Standard Error
Confidence Level
30
Descriptive Statistics - Procedure
Tools
Data Analysis
Descriptive
Statistics
Input Range
Rows/Columns
Output Options
z
z
New worksheet
Summary statistics
What is the age of the patients seen in our clinic?
31
Histogram
Graph showing the
frequency of data
falling within
various ranges.
Provides a visual
representation of
the central
tendency,
variability and
shape of the
distribution.
From http://www.microbiologybytes.com/maths/1011-17.html
32
Histogram Procedure
Tools
Data Analysis
Histogram
Input Range
Bin Range
Labels
Chart Output
33
Associate - Correlation
Allows an examination of the association
between variables
z
z
Information about the strength of association
Information about the direction of the association
A correlation coefficient of 0 means that there is
no relationship between the variables, -1
negative relationship, 1 positive relationship.
34
Correlation Procedure
Tools
Data Analysis
Correlation
Input Range
Rows/Columns
Output Options
Is there an association between age and pre-surgical functioning?
35
36
Predict Linear Regression
Linear Regression
Focuses on prediction
Involves discovering the equation for a line
that is the best fit for the given data.
The resulting linear equation is then used
to predict values for the data.
37
Linear Regression Procedure
Tools
Data Analysis
Regression
Input Y Range
(dependent variable)
Input X Range
(independent
variable)
Output Options
Does pre-surgical functioning predict post-surgical functioning?
38
Compare T-Test
Independent Samples T-Test
Allows the comparison of the means of 2
non-paired groups.
Compares actual difference between two
means in relation to the variation in the
data (expressed as the standard deviation
of the difference between the means).
39
Two Sample T-Test Procedure
Tools
Data Analysis
T-test: Two Sample
Assuming Equal
Variances
Input 1 Range
(dependent variable for
group 1)
Input 2 Range
(dependent variable for
group 2)
Output Options
Is there a difference in age between males and females?
40
Compare T-Test
Paired Samples T-Test
Allows the comparison of the means of 2
paired measures (eg., pre-post
measurement, repeated measurement
under different conditions).
41
Paired Samples T-Test Procedure
Tools
Data Analysis
T-test: PairedTwo
Sample for Means
Input 1 Range
(dependent
variable at time 1)
Input 2 Range
(dependent
variable at time 2)
Output Options
Is there a difference in functioning after surgery?
42
Compare ANOVA Single Factor
Single Factor (One Way) Analysis of Variance
T-test can only be used for comparison of two
groups
ANOVA allows us to identify differences among
the means of one variable measured in two or
more independent groups.
One way ANOVA comparing only two groups
provides similar outcomes to the t-test
43
Single Factor ANOVA Procedure
Tools
Data Analysis
ANOVA: Single
Factor
Input 1 Range
(dependent
variable organized
with one column or
row of data per
group)
Output Options
Is there a difference in pre-surgical functioning between
the three major age groups ?
44
Type of Data
Goal
Continuous
Ordinal or Non
Normal
Categorical
Describe one group
Mean, SD
Median, interquartile
range
Proportion
Compare one group
to a hypothetical
value
One-sample t test
Wilcoxon test
Chi-square
or
Binomial test **
Compare two
unpaired groups
Unpaired t test
Mann-Whitney test
Fisher's test
(chi-square for large
samples)
Compare two paired
groups
Paired t test
Wilcoxon test
McNemar's test
Compare three or
more unmatched
groups
One-way ANOVA
Kruskal-Wallis test
Chi-square test
Compare three or
more matched
groups
Repeated-measures
ANOVA
Friedman test
Cochrane Q**
Quantify association
between two
variables
Pearson correlation
Spearman correlation
Contingency
coefficients**
Predict value from
another measured
variable
Simple linear
regression
or
Nonlinear regression
Nonparametric
regression**
Simple logistic
regression*
Predict value from
several measured or
binomial variables
Multiple linear
regression*
or
Multiple nonlinear
regression**
Multiple logistic
regression*
45
Resources
Choosing a statistical test
http://www.wadsworth.com/psychology_d/templa
tes/student_resources/workshops/stat_workshp/
chose_stat/chose_stat_01.html
http://www.whichtest.info/
Online statistical calculators
http://statpages.org/
46
47