6CN010 - Dissertation
Quantitative Data Analysis
Quantitative data analysis
Coding
Descriptive statistics
Measure of distribution
Measure of central tendency
Measure of dispersion
From survey to usable data
Processing the data collected for analysis requires coding
Coding converting data into numeric form for analysis
Knowing how data is going to be analysed is essential to
designing surveys
Has to be done at the start not after the data has been
collected!!
Coding
Check the level of measurement make sure it is
appropriate for envisaged analysis.
Variables can be defined into types according to the level of
mathematical scaling that can be carried out on the data.
4 types of data or levels of measurement:
1. Nominal
2. Ordinal
3. Interval
4. Ratio
Coding nominal variables
Nominal (categorical) data comprises of categories that
cannot be rank ordered each category is just different.
No order to categories coding can be in any order but good
practice to code in order of appearance.
What is your
gender? (please tick)
Did you enjoy the
film? (please tick)
Male
Yes
No
Female
Sometimes coded 0 and 1.
Coding ordinal variables
Ordinal data is data that comprises of categories that can be
rank ordered.
Coding similar to nominal but coded in rank order.
Ranking can run low high or high low
How satisfied are you with the
level of service you have
received? (please tick)
What is your highest level of
qualification? (please tick)
Very satisfied
Somewhat satisfied
Neutral
Somewhat dissatisfied
Very dissatisfied
Degree or higher
Level 3 or equivalent
Level 2 or equivalent
Level 1 or equivalent
No qualifications
Coding interval and ratio variables
Scale data interval and ratio data
data is in numeric format (50, 100, 150)
can be measured on a continuous scale
distance between each can be observed and as a result
measured
data can be placed in rank order.
Coding scale data just enter the value (age = 52, number
of bedrooms in a house = 3)
Descriptive statistics
Descriptive statistics describe what the data is or what the
data shows.
Descriptive statistics are different from inferential statistics.
Inferential statistics are used to infer conclusions from the
data and make generalisations to the populations.
Descriptive statistics conducting analysis on one variable at
a time or univariate analysis.
Measures of distribution
Distribution is a summary for each variable of the frequency
or number of times a value or range of values occurs.
Examples:
number and percentage of male and female
ages of research participants.
Frequency tables
(1/3)
A frequency table is one of the most common methods used
to describe a single variable.
Used to describe nominal or ordinal variables those with a
category (yes & no, strongly agree to strongly disagree and
so on).
Shows number and/or percentage of the occurrence of a
category within a variable.
Frequency distributions can be depicted in two ways:
table
graph
Frequency table
(2/3)
Example
Age range
Number
Percentage
Less than 20
150
19.9
20 49
250
33.1
50 64
180
23.8
65 80
100
13.3
Over 80
75
9.9
755
100.0
Total
Frequency table
(3/3)
Example: other things you may see
Number
Percentage
Valid
Percentage
Cumulative
Percentage
Less than 20
150
19.3
19.9
19.9
20 49
250
32.2
33.1
53.0
50 64
180
23.2
23.8
76.8
65 80
100
12.9
13.3
90.1
Over 80
75
9.8
9.9
100.0
Total
755
97.4
100.0
100.0
Missing (not
20
2.6
775
100.0
Age range
recorded)
Total
Measures of central tendency
Measures of central tendency: quantification of the location
of the middle or centre of a data set what the typical or
average score/ result of a data set is.
So, identifying a typical value that best summarises the
distribution of values in a variable.
There are three main different measures:
180 220 280 320 380
1 Mean Average
x
276
5
2 Mode Most frequently occurring (280)
180
220
280
320
280
180
350
280
330
220
Measures of central tendency
(Contd)
2 Mode (contd)
Bi-modal: two most frequently occurring values in a
distribution (two pronounced views or patterns of response).
Multi-modal: where there are more than two modes in a
distribution (potentially several pronounced views or patterns
of response).
Measures of central tendency
(Contd)
3 Median
Median is the midpoint in a distribution, when arranged in
ascending or descending order.
180
220
280
320
380
280
Where there is an even number of observations the median
will be the average of the two middle values.
180
220
280
300
290
320
380
Appropriate measure
Level of measurement
Measure of central tendency
Nominal
Mode
Ordinal
Median and mode
Interval/Ratio
Mean, median and mode
Measures of dispersion
Measures of dispersion: statistical measures that summarise
the amount of spread or variation in the distribution of values
in a variable.
So, how values are spread within a distribution.
There are a number of different measures (applicable to
interval or ratio data):
Range
Standard deviation
Variance
Measures of dispersion
Type
Description
Range
Difference between the highest (maximum) and
lowest (minimum) value in the distribution of values
Variance
The measure of the spread.
Standard deviation
Shows the relation that a set of data has to the mean
of the sample data.
Range
Range is the difference between the highest and lowest value
in the distribution of values.
Example:
Weekly income of 10 people:
180 220 280 320 280 180 310 280 330 220
Range is maximum income minus minimum
income: 330-180 = 150.
Range using ordinal data
Of course, ordinal data can be ordered and so can give
information on range.
Example:
Survey question How useful did you find the book?
Very
useful
Very
unuseful
Useful
Unuseful
Very
unuseful
Useful
Very
useful
Very
useful
Useful
Unuseful
Range is from Very useful to Very un-useful
Inter quartile range
Inter quartile range (IQR) is another range measure but this
time looks at the data in terms of quarters or percentiles.
The range of data is divided into four equal percentiles or
quarters (25%).
Q1
25th
percentile
Q3
75th
percentile
IQR
Min
Max
Q2
Median
50th
Percentile
Ran
ge
Inter Quartile Range
IQR is the range of the middle 50% of the data. Therefore,
because it uses the middle 50%, it is not affected by minima or
maxima values (outliers).
Outliers variables that are the extreme lower or upper end
of the distribution. They are a typical, infrequent observations.
These will influence the mean (arithmetic). Why?
10 people record their height:
160, 162, 164, 166, 168, 170, 172, 174, 176 and 200 cm tall.
With those values the mean is 171cm.
(200cm is the outlier take it out and the mean is 168cm)
Variance
Where the mean is a measure of the centre of a group of
numbers, the variance is the measure of the spread.
It involves measuring the distance between each of the
values and the mean.
To calculate the variance :
1. calculate the mean
2.
for each value in the distribution subtract the mean
and then square the result (the squared difference)
3.
calculate the average of those squared differences.
Variance
N 1
= Sum of (observed value mean score) 2
Total number of scores -1
The larger the variance value the further the observed
values of the data set are dispersed from the mean.
A variance value of zero means all observed values are the
same as the mean.
Standard deviation
Standard deviation = The square root of the variance.
X X
N 1
As it is square rooted the results correspond to the original
data units. E.g. if the variable is height recorded in cm then the
standard deviation can be interpreted as cm.
Standard deviation: how far on average each value is from
the mean.
Appropriate descriptive statistics: summary
Level of
measurement
Univariate analysis
Nominal
Frequency table: count, %, valid %, cumulative %.
Measure of central tendency: mode
Measure of dispersion: no measure.
Ordinal
Frequency table: count, %, valid %, cumulative %.
Measure of central tendency: mode, median
Measure of dispersion: no measure.
Interval/Ratio
Frequency table: count, %, valid %, cumulative %.
Measure of central tendency: mode, median,
mean
Measure of dispersion: range, variance, standard
deviation
Further Reading
Creswell, John W (1994), Research design: Qualitative and Quantitative
Approaches. Sage Publication, London, Page 116-171
Holt, G. (1998). A guide to successful dissertation study for students of the
built environment, Second edition. Wolverhampton: Built Environment
Research Unit. ISBN: 1-902010-01-9, page 100-118
Naoum, S.G. (2007) Dissertation Research and Writing for Construction
Students, 2nd Edition. Oxford: Butterworth Heinemann. ISBN: 0 7506 2988
6, page 91-131