Quantitative Data analysis
Data preparations
Editing, coding, data entry
Editing
• Researcher must check the data for consistency
• In the event a respondent checks two answers eg
occupation
• Teacher
• Environmental officer
• District Officer
• Banker
• Till operator
• Decide on which answer is mostly consistent with the
general answers and then chose the one that is most
appropriate
• In an experiment check schedules if the data captured
is consistent
• Do not destroy originals and if cancellation use one
line and leave data legible
Coding
• Coding: this refers to the procedure of assigning
numbers to answers to enable grouping of response to
afford efficient analysis
• Rules for coding
– Appropriateness of grouping of categories
– Exhaustive and try to avoid other responses
– Mutually exclusive eg professional and environmentalist
– Single dimension one concept rule for a category
• Coding close ended question
• Dealing with don’t know answers
• Thematic analysis: further discussion as part of
qualitative data analysis
Data entry
• This is the procedure for converting data
collected into a medium for viewing and
manipulation
• Use key board , OCR (optical character recognition),
OMR (optical Mark recognition) voice recognition, bar
code entry
• Database and spreadsheet
• Database: record, data field and data file
Spread sheet
• These are specialised database that are
essential for data analysis that needs
tabulating, organising, cross referencing, and
simple statistics
• Spreadsheet offer some data management,
graphics and presentation capabilities eg Excel,
Access, Oracle, SPSS
• Try feeding some data on excel and create a
table
Descriptive statistics
• This is a procedure that is used to organise,
summarise and visualise quantitative data
• It is sometimes defined as the mathematical
techniques used to see underlying patterns of data
• The statistics are used by researchers to identify
underlying patterns in the data and are used as
support or evidence for an argument or claims on the
topic
• But some times this can be abused by researcher
hence the safeguard in place is to request researchers
to show how they arrived at their data and statistics or
how data has been obtained. Eg interview say 10
students on mining issues and then you draw
conclusion from this sample leading to serious bias -
due to size of sample
Visual aids for quantitative data analysis
• Tables: easily produced in word to illustrate actual values:
emphasis is in the way the data is presented
• Bar graphs: these are graphs designed to illustrate the
frequency of a score or the score per element of interest
• Pie charts: good for showing proportion, for easy reading try to
limit segment to less than 7
• Scatter graph: an important graph to illustrate relationship
between two variables; plot data point on graph with x-axis and
y-axis representing values of respective variables
• If no line can be seen in the distribution of the data then there is
no relationship but if there is a close tendency towards a line
then there is a relationship
• If the values of one variable increase and the same for the
other variable then you have a positive relationship but if as the
other increase the other decrease then there is a negative
relationship
Example of a pie chart
• Pie chart
Example scatter graph
• Scatter graph
Example line graph
• Line graph
Skewness and kurtosis
• Distribution can be illustrated using the shape
of the distribution curve either as symmetrical or
asymmetrical which is the skewedness
• This can reveal where the concentration of the
frequencies are located and this can describe
positive or negative skewedness
• Alternatively the shape can be described in
terms of kurtosis; this is the flatness or
peakedness of the distribution
• A symmetrical distribution or normal distribution
- bell shaped is called mesokurtic, the peaked
shape leptokurtic and flat shape platykurtic
Examples of skewness and kurtosis
• Skewedness and kurtosis
Central tendency
• Researchers are often keen to identify which value is
central to a distribution leading to summarising the
entire distribution
• Three measures of central tendency: mode, median,
mean
• Mode: score in a sample of scores that occurs with
the greatest frequency: differentiate uni- and bi modal
• Median: value or score such that half the observation
fall above it and half below it; scores must be arranged
in a ascending/descending order then pick the number
that falls in the middle
• Mean: a sum of sample of scores divided by the
number of scores in the sample
Describing the distribution
• Apart from knowing the central tendency researchers
are also keen to know about the distribution or
dispersion of the data
• Range: how far apart the highest point is to the lowest
data value but this can be misleading
• Eg 1,2, 7, 9, 11, 12, 13, 19, 39 and this gives a range
of 38 whereas 7 of the eight values are within the
range of 18
• Fractiles: median divides data into two parts but we
might need to divide it into even smaller parts such as
• Quartiles
• Deciles
• Percentiles
Measure of spread
• Standard deviation: this is the measure of
the score deviation from the mean (see
next slide)
• Standard variance is given as the square
of the standard deviation (see next slide)
• Standard error is given as the estimate of
the size of error caused by making the
mean of the distribution of sample to be
the true estimate of the population mean
Formula for standard deviation
• Formula
Calculating the standard error
• Standard error
exercise
• Standard deviation
• Given the following data as ages of mining
officers in a dept
• 33,28, 45, 43, 47 calculate the standard
deviation of a sample and the standard
error
Correlation Coefficient
• Researcher would like to find out the strength of the
apparent link and as such the two common
statistical measures of correlation are:
• Spearman rank correlation coefficient
• Pearson’s product moment correlation coefficient:
• The result of the calculation will be a value between
-1 and +1 meaning
• Positive coefficient e.g. 0.7 means positive
relationship and negative such as -0.6 mean a
negative relationship ie when one goes up the
other goes down
• Zero coefficient means no relationship
Inferential statistics
Hypothesis Test
• The researcher start of with the assumption that there
is no relationship and this is called the null
hypothesis
• This gives us a figure that test that the relationship is
or is not down to chance and this is expressed as a
probability (p) ie p<0.05 (less than 1 in 20) connection
is regarded as significant but if p>0.5 (more than 1 in
20) null hypothesis stands; down to chance
• Typical test for significance
• Chi-square (non parametric test related to nominal
data)
• Z and t-test – parametric test: data derived from
interval and ratio measurement ie very powerful
Statistical testing procedure
• 1. State the null hypothesis
• 2. Choose the statistical test
• 3. Select the desired level of significance
• 4. Compute the calculated difference value
• 5. Obtain the critical test value
• 6. Interpret the test
• NB Detailed manual calculation session to
be arranged
correlation
• https://statistics.laerd.com/statistical-
guides/spearmans-rank-order-correlation-
statistical-guide.php