KEMBAR78
Chapter 1 | PDF | Histogram | Quartile
0% found this document useful (0 votes)
11 views67 pages

Chapter 1

The document outlines the course 'Statistics for the Sciences' offered by the Department of Mathematics at the University of the West Indies for the academic year 2023-24, detailing class times, assessment methods, and key statistical definitions. It covers essential concepts such as statistics, data types, variables, and methods for graphing both qualitative and quantitative data. Additionally, it introduces measures of center and spread, including mean and median, along with practical examples and instructions for creating visual representations of data.

Uploaded by

Dylan Grant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views67 pages

Chapter 1

The document outlines the course 'Statistics for the Sciences' offered by the Department of Mathematics at the University of the West Indies for the academic year 2023-24, detailing class times, assessment methods, and key statistical definitions. It covers essential concepts such as statistics, data types, variables, and methods for graphing both qualitative and quantitative data. Additionally, it introduces measures of center and spread, including mean and median, along with practical examples and instructions for creating visual representations of data.

Uploaded by

Dylan Grant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Statistics for the

Sciences
2023-24

Department of Mathematics
University of the West Indies
Kingston, Jamaica
Course Code: STAT1001
Course Title: Statistics for the Sciences
Class Times & Venue: MONDAYS 8-10 (M3) / WEDNESDAY 1-2 (M3)

Lecturer: Ajani Ausaru


Office Hours: 9 – 10 a.m. Tuesdays
Location: Room #6, upstairs in the Department of
Mathematics
Email: ajani.ausaru02@uwimona.edu.jm

Assessment
Incourse exam … (20%) Project … (15%)
Homework Assignments (x2) … 15%
Final exam… 50%
1.0 Introduction- Definitions
Definitions
— The following definitions are important in
understanding the underlying concepts behind
statistics:

— Statistics: “the science of data involving collecting,


classifying, summarizing, organizing, analyzing, and
interpreting numerical information” -- McClave, Dietrich,
Sincich

— Observation: a single collected data value (point).

— Data or Data set: a set of numerical observations.


Definitions
— Individuals: The people or objects Individual Population
from whom, or about whom, data is
collected.
Individuals may be people, but they
may also be animals or things.
Example: Freshmen, 6-week-old babies, Sample
golden retrievers, fields of corn, cells

— Population: The entire group of


individuals that we wish to get
information about in a single study.

— Sample: A subgroup within a


population.
Definitions
— Variable: A characteristic of an individual.

— When we collect data about individuals, we collect


values of variables. A variable can take different
values for different individuals.

— More Examples of Variables: Weight, Age, Race,


Shoe Size, Favorite Football Team
Two types of variables
A variable can be either
— Quantitative
— Something that can be counted or measured for each individual and then added,
subtracted, averaged, etc., across individuals in the population.
— Example: How tall you are, your age, your blood cholesterol level, the number of
credit cards you own.
— Numerical characteristics or quantities of the individuals such as Height (65 inches
tall), Weight (180 pounds), and Income ($40,000 per year).
— HINT: most quantitative variables are accompanied by units

— Qualitative variable / Categorical


— Something that falls into one of several categories. What can be counted is the
count or proportion of individuals in each category.
— Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you
paid income tax last tax year or not.
— Non-numerical characteristics or labels of the individual such as their Race (Black,
White, Asian…), Gender (Male, Female), Political Party (Democrat, Republican,
Independent)
Two types of variables
Determine if the following variables are
Qualitative or Quantitative:

— Number of siblings –---------------------- ANSWER: ____

— Number on a football player’s jersey –- ANSWER:____

— Phone number – --------------------------- ANSWER:____

— Cost (in dollars) to fill up a Corvette –- ANSWER:____

— Marital status –---------------------------- ANSWER:____


Two Types of Quantitative Variables

— 1) Discrete variable: these kind of quantitative variables are


chosen from a finite set of numbers or from a countable set of
numbers.

— Example 1: Suppose we ask a UWI student the number of days


per week (7 days a week) that they go to school. The possible
values are {0,1, 2, 3, 4, 5, 6, 7}, which is a finite set that contains
8 number.

— Example 2: Suppose we ask UWI students how many times


they have visited library this year. The possible value for this
number will only be 0, 1, 2, 3, … … (it can not be 1.5, or 1.2),
which represent counts, so they are discrete data.
Two Types of Quantitative Variables
— 2) Continuous variable: these kind of quantitative
variables are values that are chosen from infinitely
many numbers, and there is no gap among these
numbers (i.e they are uncountable).

— Example of Continuous variable: Suppose we ask


UWI students the time that it takes them to drive to
school. It can be 10.0 minutes, 10.01 minutes, 10.001
minutes, etc. There are infinitely many possible
answers from each individual, and the variable is not
countable, so they are continuous data.
Discrete vs. Continuous
Determine if the following
variables are Discrete or
Continuous:

— Volume of water lost each day — ANSWER:____


through a leaky faucet

— Number of donors at blood bank — ANSWER:____

— Points scored in an NCAA — ANSWER:____


basketball game

— Weight of a randomly selected — ANSWER:____


person
Example BMW cars
Model Body Style Weight # of
— Identify the individuals.
— ____________________
(pd) Seats
— Identify the variables.
— _____________________ M/Z3 Coupe 2945 2
Coupe
— Identify the data corresponding to
the variables. M/Z3 Convertible 2690 2
— _____________________ Roadster
— _____________________
— _____________________
3 Series Coupe 2780 5
— Determine whether each variable
is qualitative, continuous, or 5 Series Sedan 3450 5
discrete.
— ___________________ 7 Series Sedan 4255 5
— ___________________
— ___________________ Z8 Convertible 3600 2
Graphing Qualitative/Categorical Data
— There are many different graphs (charts) that can be
used to visualize qualitative data; being that this is the
case, just three common graphs will be shown:

— Bar Chart
— Pareto Chart
— Pie Chart
Bar Chart Bar graph of accidents involving Firestone
tire models
Bar Chart:

— -Each category is labeled on the


x-axis and is represented by one
bar.

— -The bar’s height (y-axis) shows


the count or the percentage for
that particular category.

— NOTE: The categories can


appear in any order regardless
of the bar’s height.

— Interpretation:________________
____________________________
____________________________
Pareto Chart Pareto graph of accidents
involving Firestone tire models
Pareto Chart:

— is a just like a bar chart EXCEPT


the categories on the graph
appear in descending order.

— So using the above bar chart


example, the pareto chart looks
like chart across.

— Interpretation: From this chart,


we can see which categories the
majority of the individuals
belong to ___________________.
Pie Charts
Pie Chart:
— Each sector represents one category and
there must be no gaps between the
sectors.

— The proportion of the pie occupied by


each sector is equal to the percentage
contributed by the category it
represents.

— All percentages must add up to 100%.

— NOTE: Categories may be displayed


in any order around the pie.

— Interpretation:______________________
___________________________________
___________________________________
___________________________________
Graphing Quantitative Data
— Just as for qualitative data, there are many ways to
visualize quantitative data; being that this is the case,
just three common ways will be shown:

— 1) Histograms
— 2) Stem Plots
— 3) Time Plots
Histograms
— Histograms look like bar charts,
with some important differences.

— Data are grouped into classes and


the class limits are marked on the
horizontal axis.

— Counts are marked on the


vertical axis, which must begin at
zero.
— The data classes must be:
— non-overlapping,
— of equal width
— must cover the entire range of data
values without any gaps
The class intervals are labeled on the x-axis, and the counts or
percent of values that lie within a class are labeled on the y-axis.
Histograms
For large datasets and/or quantitative variables that take many values:
§ Divide the possible values into classes or intervals of equal widths.
§ Count how many observations fall into each interval. Instead of
counts, one may also use percents.
§ Draw a picture representing the distribution―each bar height is
equal to the number (or percent) of observations in its interval.

21
Interpreting histograms
When describing a quantitative variable, we look for the overall
pattern and for striking deviations from that pattern. We can describe
the overall pattern of a histogram by its shape, center, and spread.

Histogram with a line connecting Histogram with a smoothed curve


each column à too detailed highlighting the overall pattern of
the distribution
Shapes of histograms
14
10
12
8 10
6 8
6
4
4
2 2
0
0
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6

Uniform: the same number of data Symmetric: right half of histogram is


values in each class mirror image of left half

12
14

10
12

8 10

8
6

6
4
4
2
2

0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

Right-skewed: more low and less high Left-skewed: less low and more high data
data values values
Interpreting histograms
Overall Pattern

Shape: symmetric, right-skewed and left-skewed

Center: the value that has the property that roughly half of the values
(50% of the values) are larger than it and roughly half of the values are
smaller than it

Spread: lowest and highest values (or the distance between them)

Outliers : fall outside of the overall pattern. Deviations from the overall
pattern:
Interpreting histograms
Example: Histogram of Architectural Firm Staff
Employees

— Shape: right-skewed 9 8
8
7
— Center: 60 staff
6
6

Frequency
5
4 3
— Spread: 0 to 140 staff 3 2 2 2

members 2
1
1 1

0
10 30 50 70 90 110 130 150
— No outliers Staff counts
Stem Plots
— Stem Plots: used to draw organize quantitative data by
separating the numbers into stems and leaves.

— The stem is all digits except the last digit of the number

— The leaf is the last digit of the number you are given.

— In some cases, leaves may consist of more than just the last
digit and stems would then consist of all but those digits.
Stem Plots
Example:

— Given the number 118, the stem is 11, leaf is 8.

— Given the number 18, the stem is 1, leaf is 8.

— Given the number 2.1, the stem is 2, leaf is 1.


How to Construct a Stem Plot
— 1 – Separate each data value into a stem (all but the last digit) and a
leaf (the last digit)

— 2 – Draw up a table with two columns, label the left column "Stem"
and the right column "Leaves“

— 3 – In the Stem column, write down the stems in ascending order from
top to bottom, be sure to include all stems between the first and the
last, even if there is no data value that has that stem

— 4 – In the Leaves column, write down the leaves beside the


appropriate stem, in ascending order from left to right

— 5 – Write down a key with units for your stem plot: e.g. 8|9 = 8.9
grams.
Stem Plot - Example
Example:
Stem Leaves

— Construct a stem plot of the


following lengths (in inches):

— 0.61, 0.70, 0.74, 0.82, 0.86,


0.63, 0.92, 0.98, 0.65, 0.49,
0.67, 0.78

KEY:
Stem Plot
If there are very few stems (when the data cover only a very small range
of values), then we may want to create more stems by splitting the
original stems.
Example: If all of the data values are between 150 and 179, then we may
choose to use the following stems:

15
15 Leaves 0–4 would go on each upper stem (first
16 “15”), and leaves 5–9 would go on each lower
16 stem (second “15”).
17
17

33
Bar & Pie Charts in Excel
— This will be demonstrated in class using the excel
example.
— Bar Chart
— Commands – highlight the data then click on the
“insert” tab then click on “column chart” for bar chart.
— You can click on “chart design” to get different options
for the charts.
— Pie Chart
— Commands – highlight the data then click on the
“insert” tab then click on “column chart” for pie chart.
Excel Data & Output
Describing distributions with numbers
— Measure of center: mean and median

— Measure of spread: quartiles, IQR, standard


deviation.

— The five-number summary and boxplots

— Outliers

— Choosing among summary statistics


Measures of Center
— Measures of center- Mean, Median: a value that is used to
describe the center of a data set.

— Mean: the average of a data set. To find the mean, it is the sum
of all data values divided by the total number of values in the
data set.

— Sample mean (statistic): The mean of a sample data set is


denoted by x (what we are interested in finding since statistics
is based on sample data).

— Population mean (parameter): The mean of a population


data set is denoted byµ the Greek letter.
Mean: Mathematical notation
— Example: The following
gives the volumes (in ounces)
of the Coke in different cans.
Find the mean of this sample x 1 + x 2 + .... + xn
12.3, 12.1, 12.4, 12.1, 12.2
x=
n
— Step 1: Find n.
1 n
— Step 2: Use the formula to x = ∑ xi
find the mean. n i =1

Learn right away how to get the mean using your


calculators.
Median
Median: the middle value of an arranged data set placed in
ascending order.
— (i.e. 50% of the values lie below this value, and 50% lie above this
value).
Calculation of Median
— 1) Arrange the data values in an ascending order from smallest to
largest.

— 1a) If the number of data values is odd, then the median is the
number in the exact middle. Find the location of the median by
counting (n+1)/2 observations up from the bottom of the list.

— 1b) If the number of data values is even, the median is the mean
of the two middle numbers in the sorted data. Find the location of
the median is again (n+1)/2 observations up from the bottom of
the list.
Median
Example 1: (Odd data set) Find the median of the
following five students’ scores of an exam: 90, 60, 50,
41, 92.
— Step 1: Place the data values in ascending order.
— Step 2: Do we have an even/odd data set?
— Step 3: Find median.
— Example 2: (Even data set) Find the median of the
following six people’s salary (in thousand dollars): 90,
60, 45, 46, 100, 46.
Mode
— Mode – is the most frequently occurring number.
— Example; What is the mode of the following numbers;
— 3,4, 7, 1, 3, 5, 8, 9, 3.
Quartiles
— Quartiles divide the ordered data set into four equal-sized
groups. Q1, Q2 & Q3

— Finding the quartiles

— (i) Order the data set from smallest to largest, smallest on the
leftmost end.
— (ii) For Q2: Find the overall median of the entire data set .
— (iii) For Q1: Find the median of the first half of observations that lie
to the left of Q2.
— (iv) For Q3: Find the median of the second half of observations
that lie to the right of Q2.
Five-Number Summary
— Five number summary: used to describe the center,
variation of the data, distribution of the data, and
reveals whether outlier(s) exist. The five numbers are
as follows:
— Minimum data value (denote it as min.)
— Maximum data value (denote it as max.)
— Q2 (median)
— Q1
— Q3
Five-Number Summary Examples
Examples:
— Find the quartiles of the following data sets.
— (i) 36, 74 , 85 , 29 , 10
— Step 1: Place in ascending order. 10, 29, 36, 74, 85
— Step 2: Find Q2: 36
— Step 3: Find Q1: (10+29) / 2 = 19.5
— Step 4: Find Q3: (74+85) / 2 = 79.5

— (ii) 36 , 74 , 85 , 29 , 10 , 43
Five-Number Summary and Box Plot
The Five-Number Summary of a Distribution:
Minimum, Q1, M, Q3, Maximum

Box Plot (Graphical Display of the Five-


Number Summary):

Min Q1 M Q3 Max

Box Plots can be displayed vertically also!!!


Five-Number Summary and Box Plot
— Example: Write out the five number summary and construct a
box plot for the given data set: 25, 3, 40, 33, 99, 60, 58, 42, 44.
— Step 1: Place in ascending order. 3, 25, 33, 40, 42, 44, 58, 60, 99
— Step 2: Min = 3
— Step 3: Max = 99
— Step 4: Find Q2: 42
— Step 5: Find Q1: (33+25) / 2 = 29
— Step 6: Find Q3: (60+58) / 2 = 59
— Step 7: Construct box plot (in space below).
You should get something like
this:
Measures of Variation/ Spread
— Measures of Variation- Range, IQR, Variance,
Standard Deviation: a value that is used to describe
the spread/variability within a data set.

— Range: the difference between the largest value and


the smallest value in a data set.

— Range=largest value-smallest value


Range
— It is generally not a good measure of spread because
of its extreme sensitivity to outliers.
— We only use the max and min of the data set to
calculate the range, so if either of these is an outlier it
will increase the range dramatically.
— The latter then will not give us a good measure of the
spread within the data set.
— Example of Finding the Range
— Given the data set: 2.5, 5.7, 1.9, 4.8, 6.8, 3.1
Range = 6.8 - 1.9 = 4.9
Inter-quartile Range or IQR

— Inter-quartile Range or IQR: the difference


between the third Quartile and the first Quartile.
IQR=Q3-Q1

— So the IQR in the Box-Plot example is: IQR = 59-29 =


30
Identifying Outliers
Identifying Outliers - Calculating the “fences”
The "fences" of a data set help us to identify the
outliers (or absence thereof).

Lower fence = Q1 − (1.5 × IQR)

Upper fence = Q3 + (1.5 × IQR)

* You can show the above by letting L = Lower fence and U =


Upper fence.
NOTE: Outliers are now any data points that are either less than
the lower fence or greater than the upper fence.
Calculating Outliers
Example:

— Consider the following sorted teacher’s salaries


(dollars per academic year):
26700, 27500, 28000, 29000, 29750, 30000, 31000,
31500, 31500, 33000, 34000, 89000

— Conduct a formal statistical test for outliers.


Calculating Outliers –Example Cont’d
Answer

— 1) Find Q1, Q2, and Q3: Q1=28,500, Q2=30,500, Q3=32,250.

— 2) Calculate IQR: IQR= Q3-Q1=3750

— 3) U = Q3 + 1.5 * IQR = 32,250 + 1.5 * 3750 = 37875.

— 4) L = Q1 - 1.5 * IQR = 28,500 - 1.5 * 3750 = 22875.


—
— Low Outlier(s): there is no value that lies below 22,875.

— High Outlier(s): 89,000 is an outlier because it lies above 37,875.


Standard deviation
— Sample standard deviation (statistic): measures
the variation of sample data values from the sample
mean x .
The sample standard deviation is denoted by s.

1 n
s= ∑
n −1 1
( xi − x ) 2
Standard deviation
n
We need to calculate ∑ i
( x − x ) 2
i =1
It is often simpler to complete this step using the tabular
method as follows.
Step 1) Calculate x

Step 2) Calculate the difference between each data value


and the sample mean:
xi − x
Step 3) Square each difference:

( xi − x ) 2
Step 4) Add all the squared differences together:

n
∑ ( xi − x )2
i =1
Standard deviation
— While this gives a concise formula to follow, we will
break the calculations into parts to keep things simple
for everybody.

— Example: Find the sample standard deviation of the


following sample data.
12, 12.3, 11.5, 10.5, 8.7, 2.5, 13.5
Standard deviation – Example Cont’d
— Step 1) First we calculate the sample mean

n = 7, so we have
n 7

∑x ∑x
i =1
i
i =1
i
12 + 12.3 + 11.5 + 10.5 + 8.7 + 2.5 + 13.5
x= = = = 71 / 7 ≈ 10.1429
n 7 7

— Although this example has 4 decimal places, you can


round to 2 decimal places to keep things simple.
Standard deviation – Example Cont’d

--Step 2 xi − x --Step 3 ( xi − x ) 2
xi
12 (12-10.1429)= 1.8571 1.85722=3.4488
12.3 (12.3-10.1429)=2.1571 2.15712=4.6531
11.5 (11.5-10.1429)=1.3571 1.35712=1.8417
10.5 (10.5-10.1429)=0.3571 0.35712=0.1275
8.7 (8.7-10.1429)=-1.4429 -1.44292=2.0820
2.5 (2.5-10.1429)=-7.6429 -7.64292=58.4139
13.5 (13.5-10.1429)=3.3571 3.35712=11.2701
∑ =81.8371 – Step 4
Standard deviation – Example Cont’d
Step 5) Divide the sum (from step 4) by n-1:
This gives
n

∑ (x
i =1
i − x)2

n −1

= 81.8371 / (7 - 1) = 13.6395

Step 6) Finally, take the square root and this give us what we are looking for.

n
∑ ( xi − x )2
s = i =1 = 13.6395 = 3.6932
n −1
Properties of the standard deviation
— 1. s is always positive or zero.

— 2. s = 0 only when there is absolutely no variation, i.e. when all


the observations are the same.

— 3. s is not resistant — outliers and extreme values in skewed


distributions increase the value of s.

— Variance: square of the standard deviation. For a sample, the


sample variance is s 2
— Example: What is the sample variance of the previous example?
When to use which
When the distribution is summarize center and spread with
fairly symmetric with no outliers x s
and

strongly skewed or has extreme outliers M and IQR


Properties of Mean & Median
— The median is resistant to extreme values and the mean is not.

— The median is more appropriate for strongly skewed distributions or


distributions with outliers.

— The mean should be reserved for fairly symmetric distributions with


no outliers.

— It is appropriate to use the interquartile range as the measure of


spread when the median is used as the measure of center, since the
median itself is a quartile and the quartiles are resistant to extreme
values and outliers.

— Standard deviation should be used as the measure of spread when the


mean is used as the measure of center because it measures spread
about the mean.

You might also like