KEMBAR78
Statistics Notes | PDF | Statistics | Sampling (Statistics)
0% found this document useful (0 votes)
204 views17 pages

Statistics Notes

This document provides an overview of key concepts in statistics including: 1) Inferential statistics involves drawing conclusions about populations from samples and is based on uncertainty and variation. 2) Descriptive statistics are used to summarize and describe samples through measures like means, medians, and standard deviations. 3) Probability is the transition between descriptive statistics and inferential methods and is important for statistical inference. 4) Common sampling procedures include simple random sampling, stratified random sampling, and experimental design.

Uploaded by

anon_673298142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views17 pages

Statistics Notes

This document provides an overview of key concepts in statistics including: 1) Inferential statistics involves drawing conclusions about populations from samples and is based on uncertainty and variation. 2) Descriptive statistics are used to summarize and describe samples through measures like means, medians, and standard deviations. 3) Probability is the transition between descriptive statistics and inferential methods and is important for statistical inference. 4) Common sampling procedures include simple random sampling, stratified random sampling, and experimental design.

Uploaded by

anon_673298142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 1 then the discipline of probability must be

taught first.
Inferential statistics - You can’t learn anything about a
- drawing of conclusions/inferences about population of a sample until the
scientific system analyst learns uncertainty in that
- Scientific judgments based on uncertainty and sample
variation
- Quality of process Sampling Procedures
- No statistics = no variability ❖ Simple random sampling- specified
Samples/observations sample size has the same chance of
- gathered from population (scientific system) being selected as any other sample of
Experimental design the same size (avoid bias)
- factors can be selected ➢ Sample size -number of
Observational study elements in the sample
- factor levels isn’t preselected
Descriptive statistics ➢ aids in the elimination of the
Statistical inference problem of having the sample
Means reflect a different (possibly more
Medians confined) population than the
Standard deviation one about which inferences
Single number statistics need to be made.
- Stem-and-leaf plots, dot plots, box plots ❖ Stratified random sampling - random
Probability selection of a sample within each
- transition between descriptive statistics and stratum
inferential methods ➢ Strata- nonoverlapping groups
P-value ➢ In order not to disregard or
- “bottom line” in data interpretation overrepresent any group
❖ Experimental design
Relationship between probability and statistical ➢ Treatments or treatment
inference combinations
- For a statistical problem the sample ➢ Variability
along with inferential statistics allows us ➢ Experimental unit
to draw conclusions about the ➢ Completely randomized design
population, with inferential statistics
making clear use of elements of Why assign experimental units randomly?
probability. This reasoning is ​inductive ​in - Variability (avoid bias) - “wash away”
nature. - Descriptive statistics
- problems in probability allow us to draw
conclusions about the characteristics of Measures of Location/Central Tendency
hypothetical data taken from the
population based on known features of Sample mean - average
the population. This type of reasoning is Sample median - reflect central tendency that is
deductive ​in nature. uninfluenced by outliers
- The only certainty concerning the x (n+1)÷2 n is odd
pedagogy of the two disciplines lies in 1
n is even
2( x n/2 + x n/2+1 )
the fact that if statistics is to be taught:
at more than merely a "cookbook" level,
Trimmed mean Relative frequency histogram
- removing outliers when averaging
Statistical quality control Quartiles - ​tails ​of distribution
Sample standard deviation - Third quartile - separates upper quarter
- measure of variability from the rest
(n-1) - Degrees of freedom associated with the - Second quartile - median
variance estimate; depicts the number of - First quartile - lower quartile from the
independent pieces of information available for rest
computing variability
* Large variability in data set = large variance Parallelism - same
Independent squared deviations Observational study - if factors are not controlled
Average squared deviation Retrospective study - historical data
Variance Disadvantages:
- measure of the average squared (i) Validity and reliability of historical data are
deviation from mean often in doubt.
- measures how far a set of (random)
numbers are spread out from their (ii) If time is an important aspect of the structure
average value. of the data, there may be data missing.
Population parameters (characteristic of
population) (iii) There may be errors in collection of the data
- Population mean that are not known.
- Population variance
(iv) Again, as is the case of observational data,
Discrete​ - countable as a whole there is no control on the ranges of the
Continuous​ - measured measured variables (the factors in a study).
Count data Indeed, the ranges found in historical data may
Sample proportion - mean of the ones and not be relevant for current studies.
zeroes
Statistical modelling examples ---------------------
● Postulated model Relative frequency = f/n
● Regression model Stem-and-leaf = * separates 0-4 and 5-9
● Estimation theory


N
∑ (x−x) 2
Note: Sample STDev s = i=1
- just a subset
N −1
(1) The type of model used to describe the data
of the population
often depends on the goal of the experiment;
and
Why is it n-1?
(2) the structure of the model should take
- Make the distribution smaller
advantage of nonstatistical scientific input.
- Actual Standard deviation
- When we get a sample, we are tryna get
Fundamental assumption -selection of model
a conclusion about a population
Exploratory data analysis (plots)
- Population std might be wrong because
Violation of assumptions
there are data outside the range which
might affect the std. The data away from
Probability distribution
peak the makes the variance larger
- bell-shaped(symmetric or skewed)
- Variance = ​measures how far a
Stem-and-leaf plot
set of (random) numbers are
- can be either double or single
spread out from their average
value


N
∑ (x−x) 2
- Subtract 1 to make it smaller which gets Population STDev σ = i=1
N
a slightly bigger value now reflects the
real std
- Removes the bias
- everything

HOW TO STEM-AND-LEAF
(sample problem)

2.3, 2.5, 2.5, 2.7, 2.8 3.2, 3.6, 3.6, 4.5, 5.0 WITH *
(plain = 0 to 4 leaf digit; * = 5 to 9 leaf digit)

STEM LEAF

2 35578

3 266

4 5

5 0

RELATIVE FREQUENCY TABLE


1.20.​ (midpoint is for histogram)
Class Interval Midpoint Frequency (f) Relative Frequency Cumulative
(total pop = 50) (frequency/total pop) Relative
Frequency

0-4 2 2 2/50 = 0.04 0.04

5-9 7 17 0.34 0.04+0.34 = 0.38

10-14 12 16 0.32 0.70

15-19 17 10 0.20 0.90

20-24 22 3 0.06 0.96

25-29 27 1 0.02 0.98

30-34 32 1 0.02 1.00


HISTOGRAM

(skewed to the right)


x axis (midpoints) vs y axis (relative frequency)

Chapter 2 Definition 5
Experiment - generates set of data ➢ Two events A and B are ​mutually
Definition 1
Sample space(S) - set of all possible outcomes
exclusive, ​or ​disjoint, ​if A ∩ B = ϕ , that
of a statistical experiment is, if A and B have no elements in
- Each outcome is called element/sample common
point
- S = { A, B, C } - Finite Definition 6
- statement/rule - infinite/large sample ➢ The ​union o ​ nd ​B,
​ f the two events ​A a
points denoted by the symbol AU ​B, ​is the
Definition 2 event containing all the elements that
Event - subset of sample space belong to ​A o​ r ​B ​or both

Definition 3 Definition 7
➢ The complement of an event ​A ​with Permutation is an arrangement of all or part of a
respect to ​S ​is the subset of all elements set of objects.
of S that are not in ​A. ​We denote the EXAMPLE: three letters a, b, and c. The
complement, of ​A b ​ y the symbol ​A'. possible permutations are abc, acb, bac, bca,
cab, and cba.
Definition 4
➢ The intersection of two events A and B, Definition 8 ​- n factorial
denoted by the symbol A D B, is the Definition 9
event containing all elements that are - The probability of an event A is the sum
common to A and B. of the weights of all sample joints in A.
Therefore,
0 < ​P(A) < 1​ ​ ​(S)=1
P (ϕ) = 0, and P
​ 3,..​. i​ s a sequence of
Furthermore if A1, ​A2. A Rule 1
mutually exclusive events, then - If events A and B come from the same
sample space, the probability that both
P(​A1U A2 U A3 U - ​••) = P( A
​ 1)+ P(A2) + P(A3) + … A and B occur is equal to the probability
the event A occurs times the probability
Definition 10 that B occurs, given that A has
Conditional probability - “the probability that B occurred.
occurs given that A occurs” or “the probability of Rule 2
B, given A” - If an operation can be performed in ​n1
ways, and if for each of these a second
operation can be performed in ​n2 ​ways,
and for each of the first two a third
operation can be performed in ​n3 ​ways,
Definition 10 and so forth, then the sequence of ​k
operations can be performed in ​n1n2- •
​ ays.
-nk w

Rule 3
Definition 11 - If an experiment can result in any one of
N ​different equally likely outcomes, and
if exactly ​n ​of these outcomes
correspond to event ​A, ​then the
probability of event ​A ​is
Definition 12 P (A) = Nn

Theorem 1
- The number of permutations of n objects is n!.

Theorem 2

n!
nP r = (n−r)! where n objects taken r at a time

Find the number of ways that 6 teachers


can be assigned to 4 sections of an
introductory psychology course if no
teacher is assigned to more than one
section.

n! 6! n! 40!
nP r = (n−r)! = 2! = 360 nP r = (n−r)! = 37! = 59, 280
Theorem 3 (Circular Permutations)
​ bjects arranged in a circle is (n — 1)!.
- The number of permutations of ​n o

In how many ways can 5 different trees be


planted in a circle?

(n — 1)! = 4! = 24 (n-1)! = 7! = 5040

Theorem 4
- The number of distinct permutations of n things of which n1 are one kind, n2 of a second kind, …,
nk of a kth kind is

n!
n 1 !n 2 !... n k !

How many distinct permutations can be


made from the letters of the word
INFINITY ?
8 - total letters 1 - letter F
3 - 3 letter I’s 1 - letter T
2 - 2 letter N’s 1 - letter Y
8! 9!
3!2! = 3360 3!4!2! = 1260

Theorem 5
- The​ number of ways of partitioning a set of n objects​ into r cells with n1 elements in the first
cell, n2 elements in the second, and so forth is

n!
= n 1 !n 2 !... n k !

Theorem 6
- The​ number of combinations ​of n distinct objects taken r at a time is

How many ways are there to select 3 candidates


from 8 equally qualified recent graduates for
openings in an accounting firm?
8!
3!5! = 56

Theorem 7 (Additive Rule)


- If A and B are two events, then
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Theory 8
- For three events A, B, C

P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A∩ C) − P (B ∩ C) + P (A∩ B ∩ C)


Theory 9
- If A and A’ are complementary events, then
P(A) + P(A’) = 1
Theory 10
- If in an experiment the events A and B can both occur, then

P (A ∩ B) = P(A) x P(B | A) provided P(A) >0

Theorem 11

Theorem 12

Theory 13

Theory 14 (BAYES RULE)

Example:
Sample Problem:
1.
● 1% of women have breast cancer (and therefore 99% do not).
● 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
● 9.6% of mammograms detect breast cancer when it’s ​not ​there (and therefore 90.4% correctly
return a negative result).

Question: What are the chances you have cancer? (true positive)

(cross multiply)

desired event
P =​ true positive + f alse positive
(all possibilities = all positives)
true positive
P =​ true positive + f alse positive
0.008
P =​ 0.008 + 0.09504
P =​ ​0.0776 = 7.8%

A firm is accustomed to training operators who do certain tasks on a production line. Those operators who
attend the training course are known to be able to meet their production quotas 90% of the time. New
operators who do not take the training course only meet their quotas 65% of the time. Fifty percent of new
operators attend the course. Given that a new operator meets her production quota, what is the
probability that she attended the program?

With training(T) (0.5) Without training (T’) (0.65)

Meets quota(M) (0.9) (0.9)(0.5) = 0.45 (0.65) (0.5) = 0.325

Does not meet quota (M’) (0.1) (0.5)(0.1) = 0.05 0.05

meets quota x with training


P =​ everything in row
0.45
P =​ 0.45 + 0.325 = 0.581

Corollary 1

Corollary 2

Corollary 3

Chapter 3
PROBABILITY DENSITY FUNCTION

=
=1

A continuous random variable X that can as-


sume values between x = 1 and x = 3 has a
density function given by f(x) = 1/2.

(a) Show that the area under the curve is equal


to 1.
b) Find P(2 < X < 2.5).
(c) Find P (X ≤ 1.6).

PROBABILITY MASS FUNCTION


The probability mass function, f(x) = P(X = x), of a discrete random variable X
has the following properties:
1. All probabilities are positive: fx(x) ≥ 0.
2. Any event in the distribution (e.g. “scoring between 20 and 30”) has
a probability of happening of between 0 and 1 (e.g. 0% and 100%).
3. The sum of all probabilities is 100% (i.e. 1 as a decimal): Σfx(x) = 1.
4. An individual probability is found by adding up the x-values in event
A. P(X Ε A) =

● For PMF, you can make use of binomial distribution instead of listing following formula

n is sample space,
x is number of success or binomial random variable,
p is probability of success, and
(1-p) is probability of failure or complement of p

Example:

Question Answer + Explanation

From a box containing 4 black balls and 2 green (Book answer)


balls, 3 balls are drawn in succession, each ball
being replaced in the box before the next draw is
made. Find the probability distribution for the
number of green balls.

(WITH REPLACEMENT)

(by Binomial distribution)


- f(balls at a time)
Two cards are drawn at random from a pack of
cards with replacement. Let the random variable X
be the number of cards drawn from the heart suit.

(WITH REPLACEMENT)

Three cards are drawn in succession from a deck


of playing cards without replacement. Find the
probability distribution for the number of hearts.
Summarize in a table.

(WITHOUT REPLACEMENT)

An urn contains five green balls, two blue balls, (0 red balls ; 3 non-red balls)
and three red balls. You remove three balls at
random without replacement. Let X denote the
number of red balls. Find the probability mass = 7/24
function describing the distribution of X.
(1 red ball; 2 non-red balls)
(WITHOUT REPLACEMENT)
= 21/40

(2 red balls; 1 non-red ball)


= 7/40

(3 red balls ; 0 non-red balls)

= 1/120

CUMULATIVE DISTRIBUTION FUNCTION

F(x)=P(X≤x), for all x∈ℝ.

QUESTIONS ANSWERS
Chapter 4 (Expectations?)

σ 2 = E(X 2 ) − μ 2

μ = E (X) = ∑ xf (x ) if x is discrete
x

μ = E (X) = ∫ xf (x )dx if x is continuous
−∞

Chapter 5
Binomial Distribution
-number of successes in Bernoulli experiment is binomial random variable

FORMULA:

Where n is sample space,


x is number of success or binomial random variable,
p is probability of success, and
q is probability of failure or complement of p

NOTE: FOR INPUT IN CALCULATOR THIS TERM IS JUST BASICALLY ​N​C​X

SAMPLE PROBLEMS:

Questions Solution + explanation

a) Looking for 4 or more welds error (x>=4) so that


it can be rejected using standard of good
machine (99% success rate). IT SHOULD BE
NOTED that we use the weld error here as
success of the binomial random variable
b) Same with item A but changing success rate of
weld to 95% to reflect the inefficient machine.
NOTE that we use p(x<=3) because the
standard of acceptance is 3 or less missed
welds
a. You want at least 1 of the dice to give the
desired result, so P(x>=1) is used, use its
complement, 1-P(x=0) which is the likelihood of
not getting a single 1. (4 0) is used because
there are four die and you want 0 successes.
From there get the complement of the formula.

Hypergeometric Distribution
-used in situations where no replacement is done upon trial

FORMULA:
Where N is population, n is sample size, k is successes, N-k is failures.
TL;DR add combinations on top per category all over total combinations

SAMPLE PROBLEMS:

Questions Solution + explanation

5 combinations above represent the 5 categories or


brands. The denominator is the combinations of
chosen car from total number of cars

a) There are 12 face cards and u choose 2 of


them. There are 52 cards in a standard deck
(not including the 2 jokers)
b) Since at least 1, we find the complement of
zero queens. There are 4 queens in a deck

Geometric Distribution
Basically only one success out of many attempts (stops after success)
Formula: p × q x−1 where p is success probability and q is complement.
SAMPLE PROBLEMS

Questions Answer
Negative Binomial Distribution
- The experiment consists of x repeated trials. (e.g. 2 successes in 7 trials)
- Each trial can result in just two possible outcomes (success/failure)
- The probability of success, denoted by P, is the same on every trial.
- The trials are independent
- The experiment continues until r successes are observed, where r is specified in advance

where x= no of trials; k=trials w success


SAMPLE PROBLEMS
Questions Solution + explanation

a) k= 2nd success occurs on or before x=6th


trial

Poisson Distribution
-used when population is not known but we have an observation or data such as deaths/yr
-used when no one knows the probability of a success of a single entity

FORMULA:
Where λ is the poisson variable
t is time
x is number of success

SAMPLE PROBLEMS:

Questions Solution + explanation


a) Since t=1 we only use λ = 5 and we use
summation from x = 0 until 3 because at
most 3 cars
b) Same as A except we use complement of 1
car and below to get more than 1 car

b) mean is lambda x t

NORMAL DISTRIBUTION
Basically just the z table lmao
x−μ
Z= σ
The probability is the value on the z-table of the z score.

also

Also
inf inity 2
− x2
∫ 1
√2x
e
−inf inity

Concepts Discussed (Problems)

Concepts discussed:
1.18 Stem and leaf
Relative Frequency
Histogram
Mean, median, s

1.22. Mean, median


Histogram

2.41. Permutation
2.105. Bayes Theorem
2.126 Bayes Theorem

3.18 Density function


3.26. Probability distribution
3.42 Joint density function
3.43 Joint density function

5.4 Binomial
5.5 Binomial
5.16 Binomial Distribution
5.20 Multinomial
5.27 Binomial
5.33 Hypergeometric distribution
5.47 Hypergeometric
5.50 Negative binomial distribution
5.56 Poisson distribution
5.65 Poisson
5.67 Poisson
5.79 Hypergeometric
5.81 Multinomial
5.85 Binomial
5.92 Negative binomial
5.97 Binomial

6.2 Normal Distribution


6.3 Areas Under a Normal Curve

IMPT TABLES : PAGES 747 TO 757 AND SEPARATE PDF IN GC

Topics in the Test:


Expectation (Chap 4)
Discrete Distribution (Chap 5) <- diba he will also add continuous?
Normal Curve (Chap 6)
Basic Probability (Chap 2)

You might also like