KEMBAR78
BIOM4025 - Statistical Modelling - QA Session 2 | PDF | Standard Error | Errors And Residuals
0% found this document useful (0 votes)
94 views24 pages

BIOM4025 - Statistical Modelling - QA Session 2

This document discusses a Q&A session on statistical modeling. It addresses questions about using R vs RStudio, fixing a broken URL, what to include in scientific papers, explaining variance and degrees of freedom, and clarifying the differences between standard deviation, standard error, and confidence intervals. The document also provides examples of the normal and t-distributions and discusses when to standardize data and how critical values are determined.

Uploaded by

Lauren Joslyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views24 pages

BIOM4025 - Statistical Modelling - QA Session 2

This document discusses a Q&A session on statistical modeling. It addresses questions about using R vs RStudio, fixing a broken URL, what to include in scientific papers, explaining variance and degrees of freedom, and clarifying the differences between standard deviation, standard error, and confidence intervals. The document also provides examples of the normal and t-distributions and discusses when to standardize data and how critical values are determined.

Uploaded by

Lauren Joslyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

BIOM4025 - Statistical Modelling - Q&A session 2

Data and distributions


Erik Postma / Centre for Ecology and Conservation / University of Exeter
Today
Data and distributions
Different types of data

Mean, median and mode


Variance, standard deviation and standard error

The normal distribution


Probability density functions
Probabilistic statements about data and estimates derived from these
data

Standard normal (or 𝑧 ) distribution


𝑡 -distribution
3/24
Questions about the lecture
Questions about the lecture
" Should we use R or Rstudio?

Both!
‘R’ does the calculations
‘RStudio’ makes ‘R’ easier to use

5/24
Questions about the lecture
'https://shiny01.cles.ex.ac.uk/biom4025/app_02_1_1/
I can’t seem to get the links for the

(https://shiny01.cles.ex.ac.uk/biom4025/app_02_1_1/) to work, I’ve tried


safari and google chrome but no luck unfortunately. Not sure if its just me?

Sorry, I made a typo in the URL. Should be fixed now!


Next time, post questions like this in the Questions about the module
channel, where I will see them earlier.

6/24
Questions about the lecture
'analysis
If we were to write a scientific paper, would we do stuff like this in the
or is it just for us to understand the principles of stats?

See Practicals 2-5 for examples of what to write in a paper.


Means, variances, standard deviations, standard errors and confidence
intervals are all commonly reported.
Degrees of freedom, 𝑧 and 𝑡 -values are central to most statistical tests
as they will provide you with the p-value. More in Lecture 3!

7/24
Questions about the lecture
'‘ThisWhile explaining the n - 1 part of the equation for variation you say
is because we have first estimated the mean from our data’. You make
reference to it again saying ‘We lose one degree of freedom because we
have estimated the mean from the data’. I didn’t quite understand what that
meant.

'degrees
could you please explain the concept of variance and in particular the
of freedom again

'in variation.
Please could you further explain why we subtract 1 from the sample size

8/24
Variance
𝑛 ⎯⎯⎯ 2
2
∑𝑖=1 (𝑥𝑖 − 𝑥)
𝜎𝑥 =
𝑛−1

The mean squared deviation from the estimated mean is always larger
than the mean squared deviation from the true mean
Our estimate of the true mean will explain some of the variance around
the true mean
By dividing by 𝑛 − 1 we account for the fact that we estimate the mean
from our data and we don’t use the (unknown) true mean

9/24
Degrees of freedom

10/24
Degrees of freedom
The number of independent values that can vary freely
For example:
5 values: 6 , 4 , 5 , 2 , 3
6+4+5+2+3 20
Mean = 5
= 5
=4

If you know four out of five values and the mean, you know the fifth
value
Every parameter we estimate from our data constrains the value of an
observation
Degrees of freedom (d.f.) is sample size minus number of parameters
estimated from the data
11/24
Questions about the lecture
' Could you explain the variance histogram?
' When to use standard error?
'errorcanshould
you talk more about how and when standard deviation or standard
be applied to data and on graphs? can you go over confidence
interval of the mean again?

12/24
Questions about the lecture
'deviation
I am having a hard time understanding the difference between standard
and standard error - do you mind going over them again?

'dataCanandyouwhen
go over when you would use standard error when reporting
you would use standard deviation?

'thanWhydegrees
do we use sample size as the denominator for standard error rather
of freedom?

13/24
Estimating the mean
Sample size:
1 30 100

1 11 21 31 41 51 61 71 81 91 100

Number of repetitions:
1 500 1,000

1 101 201 401 601 801 1,000

Add samples one at a


time

Draw new sample

Error bars:

None

14/24
Standard deviation
‾∑
‾‾‾‾‾‾‾‾‾‾‾‾
𝑛 ⎯⎯⎯ 2‾


𝑖=1 (𝑥𝑖 − 𝑥)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑜𝑟 𝜎) =
𝑛−1

= √‾𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
‾‾‾‾‾‾‾‾‾‾‾‾‾
(𝑜𝑟 𝜎 2‾)

Standard deviation: Measure of the amount of variation among


individuals in a sample

15/24
Standard error
𝑆𝐷
SE𝑥¯ =
√𝑛
Standard error: Measure of the uncertainty around an estimate

IF we were to repeat our experiment many times, the standard error


would be the standard deviation of our estimates

In practice we have just a single estimate, so we infer the standard error


from the variation in our data (in the case of the mean, from the
standard deviation)

16/24
Standard deviation vs. standard error
Only report standard deviation if you want to quantify the amount of
variation among observations (e.g. individuals)
Report standard errors whenever you are presenting estimats, e.g. of
the mean, the regression coefficient, or the difference between two
means.

17/24
Questions about the lecture
'andCanwhether
you please explain the bit about confidence interval of the mean
a 0 is included again?

95% confidence interval gives us the range that, with a probability of


95%, contains the true mean

There is 5% probability that the true mean lies outside of this range

If the 95% confidence interval excludes zero, the probability that the
true mean is zero, is less than 5%

Testing a mean against zero is usually not very interesting, but the
same logic applies to all estimates (e.g. of a slope or a difference
between two means) 18/24
The normal distribution
True mean: The mean of 𝑥 :
-100 0 100

[1] -0.2988209
-100 -60 -40 -20 0 20 40 60 80 100

True variance:
1 10 100 The variance of 𝑥 :

1 11 21 31 41 51 61 71 81 91 100
[1] 8.101418
Sample size
100 1,000
The standard deviation of 𝑥 :
10 109 208 406 604 802 1,000

Add fitted normal [1] 2.846299


distribution to plot
The standard error of the mean of 𝑥 :

[1] 0.2860638

95% confidence interval of mean of


𝑥:

[1] -0.8595059 0.2618642


19/24
Questions about the lecture
' Do we always standardise data?

No, but you can and it can be useful sometimes


We standardise parameters estimated from our data (e.g. slope,
difference among groups) all the time

Express slope or difference in standard errors units


Allows to obtain p-value using standard normal or 𝑡 distribution
What is the probability of finding a difference equal to or larger than 𝑥
standard errors if the true difference is 0?

20/24
Questions about the lecture
'decided?
What is the definition of a critical value? How is the critical value

The value of 𝑧 (and −𝑧 ) or 𝑡 (and −𝑡 ) for which you would like the area
under the curve
You decide on the critical value you want to use
For significance testing and a significance threshold of 5%, it is the
value for which the area under the curve between −𝑧 and 𝑧 (or −𝑡 and 𝑡
) is 0.95

21/24
t-distribution
𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒
𝑡=
𝑠. 𝑒.
Critical value:
0 1.96 4

0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4

Sample size:
3 200

3 23 43 63 83 103 123 143 163 183 200

Invert?

Size of shaded area (i.e. probability):

[1] 0.949

22/24
Questions about the lecture
'1.96xS.E
In the normal distribution the confidence interval is mean(x) +/-
as 95% of data falls between +/- 1.96 S.E, but as each t-
distribution is different and there is no set value for where 95% of the data
fall, how do you work out the 95% confidence interval if the data is instead
from a t distribution?

Quick and dirty: Mean ± 2 × standard error


Exact confidence interval depends on sample size
In R: use the confint() function

23/24
Questions about the lecture
'butI Iunderstood the math behind the confidence interval and t-distributions
didn’t quite understand what it was useful for in real-life. Can we see
an example?

See Lecture 3.

24/24

You might also like