KEMBAR78
1 Module Notes | PDF | Variance | Probability Distribution
0% found this document useful (0 votes)
13 views9 pages

1 Module Notes

The document provides an introduction to probability and random variables, explaining the concepts of probability, random variables, and their properties. It covers basic probability rules, the distinction between discrete and continuous random variables, and how to calculate expected values and variances. Additionally, it introduces the Bernoulli distribution as a fundamental discrete probability distribution.

Uploaded by

devin.do100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views9 pages

1 Module Notes

The document provides an introduction to probability and random variables, explaining the concepts of probability, random variables, and their properties. It covers basic probability rules, the distinction between discrete and continuous random variables, and how to calculate expected values and variances. Additionally, it introduces the Bernoulli distribution as a fundamental discrete probability distribution.

Uploaded by

devin.do100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

MAST 6474 Introduction to Data Analysis I

Probability and Random Variables

Probabilities
We use probabilities to talk about uncertain events. Probability is defined as the chance, likelihood, or possibility that a
particular event will occur. We often use the letter “p” to represent a probability. When writing specific probability statements,
however, we will use “Pr(Event)” instead.

Probabilities can be represented as proportions (0 to 1) or as percentages (0% to 100%). A probability of 0 or 0% means that
the event never happens—it is impossible. A probability of 1 or 100% means that the event always happens—it is certain.

We can learn about probabilities in different ways. Some events have theoretical probabilities. When flipping a coin, for
example, we know that the probability that the coin will land heads (or tails) up is .5. When rolling dice with six faces, we know
that the probability that a die will land with any particular face up—either 1, 2, 3, 4, 5, or 6 — is 1/6 = .1667. Because a 52-card
deck includes 4 aces, we know the probability that a randomly selected card is an ace is 4/52 = .0769. We learn the probability
of other events empirically, using the actual frequency that the event occurs. For example, a manufacturer of children’s
products might want to know the probability that an American family has children aged 5 or younger. Using the most recent US
census, you can divide the number of families with children aged 5 or younger by the total number of families. A financial
analyst might want to know the probability that Apple’s stock price increases after a new product introduction. Using Apple’s
stock price history, the analyst can divide the number of times that the stock has increased after a new product introduction by
the total number of new product introductions. It is important to note that using only a sample of US families or new product
introductions is not enough to determine the true probability, though we will talk about how to use samples of data in Module 3.

Copyright Edward Fox and John Semple 2019 1


MAST 6474 Introduction to Data Analysis I

Basic properties of probability

Probabilities are numbers assigned to events and have the following rules.
- 0 ≤ Pr ( A ) ≤ 1: If Pr ( A )=0 , there is absolutely no possible way that the event A will result in a trial. If Pr ( A )=1, the event A
will certainly occur.
- Pr ( A )=1−Pr ⁡( A c ): The probability of an event occurring is 1 minus the probability of it not occurring. Here, Ac is referred
to the complement of the event A. For example, when tossing a coin, tail is the complement of head.
- Independence: Two events are independent of each other if the occurrence of one has no influence on the probability
of the other. For example, when tossing two coins, the outcome of one coin will not affect the other.
o Pr ( A∧B )=Pr ( A ) × Pr ⁡(B): For two independent events A and B, the probability that both A and B occur is the
1 1 1
product of their probabilities. When tossing two coins, getting head both times is Pr ( Head ) × Pr ( Head )= × = .
2 2 4
- Two events are mutually exclusive (or disjoint) if two events have no outcomes in common. For example, when
tossing a coin, we cannot get “heads” and “tails” at the same time.
o Pr ( A∨B )=Pr ( A ) + Pr ⁡(B): the probability of mutually exclusive outcome occurs is the sum of their probabilities. For
example, when tossing a coin, Pr ( Head∨Tail )=Pr ( Head ) + Pr ( Tail ) =1.
o Pr ( A∨B )=Pr ( A ) + Pr ( B )−Pr ⁡(A∧B): If two events are not mutually exclusive, we must subtract the probability of
both occurring.

Copyright Edward Fox and John Semple 2019 2


MAST 6474 Introduction to Data Analysis I

Random Variables

If an uncertain event has numeric outcomes (for example, the rate of return on an asset, the selling price of a commercial or
residential property, the daily/weekly demand for a product or service, the daily production of plant, a customer’s satisfaction
score, etc.) we call it a random variable. The vast majority of this course will focus on random variables.

To describe a random variable completely, we need to know two things: (1) every possible numeric outcome and (2) the
probability of each outcome. If we have a complete description of both, then we know the random variable’s distribution.
There are two general types of random variables: discrete and continuous.

A discrete random variable is one for which all possible outcomes can be listed. A continuous random variable is one for which
the outcomes are so numerous that they cannot be listed. An example of a continuous random variable is the time it takes to
process a customer order. If measured with infinitesimal precision, one could not list all of the possible outcomes. In practice,
however, continuous distributions are often used to approximate discrete random variables if the number of possible outcomes
is very large.

Example: Craps, a Discrete Distribution. Define a random variable whose value is the sum of the dots observed when
rolling a pair of dice. Construct the probability distribution for this random variable.

Outcomes 2 3 4 5 6 7 8 9 10 11 12

Probabilities

Copyright Edward Fox and John Semple 2019 3


MAST 6474 Introduction to Data Analysis I

We frequently summarize information for a discrete random variable by means of a probability histogram. The probability
histogram is a visual display of the outcomes (plotted along the x-axis), and the probabilities of those outcomes, which are
represented by vertical bars.

Histogram for Sum of Dice Roll


0.175
0.15
0.125
Probability

0.1
0.075
0.05
0.025
0

10

12
2

11
Sum

Copyright Edward Fox and John Semple 2019 4


MAST 6474 Introduction to Data Analysis I

Describing a Distribution: Expectation and Variance (for a Discrete Random Variable)

The expected value or mean of random variable X is denoted E(X) or  ; for a discrete random variable, the formula is

μ=E ( X )=∑ x i p i
i

where i indexes the possible outcomes.

The expected value is the “theoretical” average that is computed by weighting each outcome by its probability and then
summing over all possible outcomes. For the sum of the dice:

Possible values ( x i) Probability ( pi) Product ( x i ∙ pi)

2 1/36 2/36
3 2/36 6/36
4 3/36 12/36
5 4/36 20/36
6 5/36 30/36
7 6/36 42/36
8 5/36 40/36 Sum of third
9 4/36 36/36 column
10 3/36 30/36
11 2/36 22/36
12 1/36 12/36
252/36 = 7 ¿ μ=E ( X )

Find the value 7 on the histogram above. We say that E ( X ) or μ is a measure of central tendency for the random variable X.

Copyright Edward Fox and John Semple 2019 5


MAST 6474 Introduction to Data Analysis I

Example: Electric Motors. You sell large electric motors to a single customer. Based on your historical data, you know that
demand for your motors from your main customer can be 0, 1, 2, or 4 (4 come on a pallet). The distribution is

Demand (xi) 0 1 2 4
Probability .40 .40 .10 .10
(pi)

What is the expected demand for a week?

E ( X )=¿ (0)(.40)+(1)(.40)+(2)(.10)+(4)(.10) = 1.00

Another key measure is the expected value of (X – E(X))2 — this is called the variance of X. The variance, written either as
2
Var(X) or σ , is defined by the formula

σ =Var ( X )=∑ ( ( x i−E ( X ) ) ∙ pi )


2 2

i
.
Remember that pi is the probability that X takes the value x i. The formula looks complicated, but a few simple examples will
clarify its calculation and help us understand what it tells us. Observe that you must compute the expected value of X before
you can compute the variance. Var(X) or σ 2 measures the dispersion of a random variable.

To compute the variance of demand for the motor example discussed, follow these steps:

Step 1. Determine E(X). From a previous calculation, we know it is 1.00.

Copyright Edward Fox and John Semple 2019 6


MAST 6474 Introduction to Data Analysis I

Step 2. List all outcomes for ( x i−E ( X ) )2, their associated probabilities, and the products.

Possible Outcomes Probability Product


2 2
for ( x i−E ( X ) ) ( pi ) ( x i−E ( X ) ) ∙ pi
——————— ————— —————
(0−1) 2 = 1 .40 1 ×.40 = .40
(1−1) 2 = 0 .40 0 ×.40 = .00
(2−1) 2 = 1 .10 1 ×.10= .10
(4−1) 2 = 9 .10 9 ×.10 = .90

Step 3. Sum the products → Sum = σ 2=Var ( X )=¿ 1.40

The variance of a random variable is not the only measure of dispersion for a random variable. An easier measure to interpret
is the standard deviation, which is the square root of the variance. For the random variable in the preceding example, the
standard deviation, denoted by the Greek letter σ , is √ 1.40.1

The standard deviation helps us determine which outcomes are more or less likely. A general rule of thumb for practical
statistical applications is that about 95% of observed outcomes will come within two standard deviations (± 2 σ ) of the mean.
Over 99.5% of all observed outcomes will be values that are within three standard deviations ( ± 3 σ ) of the mean.

Note: The term six sigma refers to the probability of a defect or error in a production process. In a six-sigma ( ± 6 σ ) process,
99.99966% of the products are expected to be free of defects or errors.

Note that this is consistent with using 


2
1
for the variance.

Copyright Edward Fox and John Semple 2019 7


MAST 6474 Introduction to Data Analysis I

Translating and Scaling Random Variables


There are other rules for calculating means and variances when scaling or translating random variables that can help you save
time. These rules will be demonstrated in the associated video. The rules are summarized below.

1. If X is a random variable with mean E(X) and variance Var(X), then for any constant c, cX is a (new) random variable with mean cE(X)
and variance c 2Var(X).

2. If X is a random variable with mean E(X) and variance Var(X), then for any constant d, d + X is a (new) random variable with mean d
+ E(X) and variance Var(X).

Copyright Edward Fox and John Semple 2019 8


MAST 6474 Introduction to Data Analysis I

Example: Translating and Scaling Demand for Motors. Using the random variable X from the previous motor problem and
the formulas above, calculate the mean and variance of:

(a) 3X

(b) 2+7X

Distribution of a Random Variable – Bernoulli Distribution

The simplest discrete probability distribution is the Bernoulli distribution. It assigns the numerical value of 1 to an event
occurring, the numerical value of 0 to the event not occurring. Such an event is often called a Bernoulli trial.

(a) What is the expected value or mean of a Bernoulli distribution?

(b) What is the variance of a Bernoulli distribution?

The Bernoulli distribution, including its mean and variance, will be important in discussing our second discrete distribution...the
Binomial.

Copyright Edward Fox and John Semple 2019 9

You might also like