Introduction to probability
Probability is a measure of the likelihood of an event to occur or Probability is a
branch of mathematics that deals with the likelihood or chance of events
occurring. It provides a framework for quantifying uncertainty, which is present in
many real-world situations, from predicting weather patterns to analyzing game
strategies or assessing risk in financial markets. Many events cannot be predicted
with total certainty. We can predict only the chance of an event to occur i.e., how
likely they are going to happen, using it. Probability can range from 0 to 1, where
0 means the event to be an impossible one and 1 indicates a certain event.
Probability for Class 10 is an important topic for the students which explain all the
basic concepts of this topic. The probability of all the events in a sample space
adds up to 1.
1. What is Probability?
Probability measures how likely an event is to occur. It is always a number
between 0 and 1, inclusive:
• 0 means an event will not happen.
• 1 means an event will definitely happen.
• A probability of 0.5 means there’s an equal chance of the event happening
or not happening (like flipping a fair coin).
The probability of an event AAA is denoted as P(A).
2. Sample Space (S)
The sample space of an experiment is the set of all possible outcomes. For
example:
• Tossing a coin: The sample space is S={Heads, Tails}.
• Rolling a die: The sample space is S={1,2,3,4,5,6}.
3. Events
An event is any subset of the sample space. For example:
• In a dice roll, the event "rolling an even number" is E={2,4,6}.
• The event "getting heads" when flipping a coin is E={Heads}.
4. Calculating Probability
For equally likely outcomes, the probability of an event E occurring is given by:
P(E)= Number of favorable outcomes for E
Total number of possible outcomes in the sample space
Example: If you roll a fair six-sided die, the probability of rolling a 4 is:
P(4) = 1 / 6
5. Basic Probability Rules
• Addition Rule: The probability that either of two events A or B occurs is:
P(A∪B)=P(A)+P(B)−P(A∩B)
If A and B are mutually exclusive (i.e., they cannot happen at the same
time), then P(A∩B) = 0, and the formula simplifies to:
P(A∪B)=P(A)+P(B)
• Multiplication Rule: The probability that two independent events A and B
both occur is:
P(A∩B)=P(A)×P(B)
If A and B are dependent events, you use the conditional probability:
P(A∩B)=P(A)×P(B∣A)
where P(B∣A) is the probability of event B happening given that A has
occurred.
6. Conditional Probability
Conditional probability is the probability of an event occurring given that another
event has already occurred. It is denoted as P(A∣B), and it’s calculated as:
P(A∣B)=P(A∩B)
P(B)
This is useful in many real-world situations, like determining the likelihood of rain
if it's already cloudy.
7. Complementary Events
The complement of an event A (denoted Ac) is the event that A does not occur.
The probability of the complement is:
P(Ac)=1−P(A)
For example, if the probability of it raining tomorrow is 0.3, the probability that it
does not rain is:
P(not rain)= 1 − 0.3 = 0.7
8. Common Distributions
Some common probability distributions include:
• Uniform Distribution: All outcomes are equally likely (e.g., rolling a fair die).
• Binomial Distribution: Models the number of successes in a fixed number
of independent trials, each with two possible outcomes (success/failure).
• Normal Distribution: A continuous probability distribution that is
symmetric about the mean, often used in statistics and natural sciences.
9. Bayes' Theorem
Bayes' Theorem is a powerful tool for updating probabilities based on new
evidence. It is written as:
P(A∣B)=P(B∣A) × P(A)
P(B)
This theorem is frequently used in areas like medical testing, machine learning,
and decision-making under uncertainty.
For example, when we toss a coin, either we get Head OR Tail, only two possible
outcomes are possible (H, T). But when two coins are tossed then there will be
four possible outcomes, i.e {(H, H), (H, T), (T, H), (T, T)}.
Formula for Probability
The probability formula is defined as the possibility of an event to happen is equal
to the ratio of the number of favourable outcomes and the total number of
outcomes.
Probability of event to happen P(E) = Number of favourable outcomes/Total Number
of outcomes
Sometimes students get mistaken for “favourable outcome” with “desirable
outcome”. This is the basic formula. But there are some more formulas for
different situations or events.
1) There are 6 pillows in a bed, 3 are red, 2 are yellow and 1 is blue. What is the
probability of picking a yellow pillow?
Ans: The probability is equal to the number of yellow pillows in the bed divided by
the total number of pillows, i.e. 2/6 = 1/3.
2) There is a container full of coloured bottles, red, blue, green and orange.
Some of the bottles are picked out and displaced. Sumit did this 1000 times and
got the following results:
• No. of blue bottles picked out: 300
• No. of red bottles: 200
• No. of green bottles: 450
• No. of orange bottles: 50
a) What is the probability that Sumit will pick a green bottle?
Ans: For every 1000 bottles picked out, 450 are green.
Therefore, P(green) = 450/1000 = 0.45
b) If there are 100 bottles in the container, how many of them are likely to be
green?
Ans: The experiment implies that 450 out of 1000 bottles are green.
Therefore, out of 100 bottles, 45 are green.
Conclusion
Probability theory is a powerful tool for understanding and modeling uncertainty.
Whether you're flipping a coin, rolling a die, or analyzing complex systems, the
principles of probability help quantify the likelihood of events and guide decision-
making. As you dive deeper, you’ll encounter more advanced topics like random
variables, expectation, and statistical inference, which expand on these
foundational ideas.
Probability Theory
Probability theory is a branch of mathematics that deals with the analysis of
random events or phenomena. It provides a framework for understanding
uncertainty, helping us make informed predictions about the likelihood of various
outcomes in uncertain situations.
Probability theory had its root in the 16th century when J. Cardan, an Italian
mathematician and physician, addressed the first work on the topic, The Book on
Games of Chance. After its inception, the knowledge of probability has brought to
the attention of great mathematicians. Thus, Probability theory is the branch of
mathematics that deals with the possibility of the happening of events. Although
there are many distinct probability interpretations, probability theory interprets
the concept precisely by expressing it through a set of axioms or hypotheses.
These hypotheses help form the probability in terms of a possibility space, which
allows a measure holding values between 0 and 1. This is known as the probability
measure, to a set of possible outcomes of the sample space.
At its core, probability theory aims to quantify uncertainty by assigning a
numerical value between 0 and 1 to events, representing the likelihood of their
occurrence. The higher the value, the more likely the event is to occur.
Key Concepts in Probability Theory:
1. Experiment and Sample Space
• Experiment: An action or process that results in one of several possible
outcomes. For example, tossing a coin or rolling a die.
• Sample Space (S): The set of all possible outcomes of an experiment. For
example, the sample space for tossing a coin is S = {Heads, Tails}, and the
sample space for rolling a fair die is S={1,2,3,4,5,6}.
2. Events
• Event: A subset of the sample space. An event may consist of one or more
outcomes. For example, the event "rolling an even number" when tossing a
fair die is E={2,4,6}.
3. Probability of an Event
• The probability of an event E, denoted as P(E), is a number between 0 and 1
that represents the likelihood that event E will occur.
• For an experiment with equally likely outcomes, the probability of event E
is given by:
P(E) = Number of favorable outcomes
Total number of possible outcomes
For example, the probability of rolling an even number on a fair die is:
P(even number)= 3 / 6 =0.5
4. Basic Probability Rules
• Addition Rule (for mutually exclusive events): If A and B are two mutually
exclusive events (they cannot both happen at the same time), then the
probability of either A or B occurring is:
P(A∪B) = P(A) + P(B)
• Multiplication Rule (for independent events): If A and B are independent
events (the occurrence of one does not affect the other), then the probability
of both A and B occurring is:
P(A∩B)=P(A)×P(B)
• Complement Rule: The probability that event A does not occur is:
P(Ac)=1−P(A)
5. Conditional Probability
• Conditional Probability refers to the probability of event A occurring given
that event B has already occurred, denoted as P(A∣B). It is calculated as:
P(A∣B)=P(A∩B)
P(B)
If P(B)>0.
6. Random Variables
• A random variable is a numerical outcome of a random process. It can be:
o Discrete: Takes on a finite or count ably infinite set of values (e.g.,
number of heads in 3 coin tosses).
o Continuous: Takes on any value in a continuous range (e.g., the
height of a randomly chosen person).
• Probability Distribution: A function that gives the probabilities of the
possible values of a random variable.
o For a discrete random variable, this is known as the probability mass
function (PMF).
o For a continuous random variable, this is known as the probability
density function (PDF).
7. Expected Value and Variance
• Expected Value (Mean): The expected value of a random variable is a
measure of the "central tendency" or average value it takes. For a discrete
random variable X, it is given by:
E(X) = ∑ I xi P(xi)
where xi are the possible values of X, and P(xi) is the probability of xi.
• Variance and Standard Deviation: The variance of a random variable
measures the spread or dispersion of its possible values. It is calculated as:
Var(X) = E[(X−E(X))2]
The standard deviation is the square root of the variance, providing a measure of
how spread out the values of X are around the expected value.
Applications of Probability Theory:
Probability theory is widely used in many fields, such as:
• Statistics: For analyzing and interpreting data, making inferences, and
estimating parameters.
• Finance and Economics: In risk assessment, pricing of options, and portfolio
management.
• Computer Science: In algorithms, machine learning, and artificial
intelligence (e.g., Monte Carlo simulations).
• Engineering: For reliability analysis, signal processing, and network theory.
• Games and Gambling: In understanding odds and making optimal
strategies.
• Physics and Biology: In modeling phenomena with inherent randomness,
like quantum mechanics or genetic variation.
8.Central Limit Theorem
A fundamental theorem in probability theory that states that the sum (or
average) of a large number of independent, identically distributed random
variables, regardless of their original distribution, will be approximately normally
distributed.
9. Moment Generating Function (MGF)
A function that helps describe the distribution of a random variable. It is defined
as: MX(t)=E[etX]
where t is a real number, and E[etX] is the expected value of etX.
10. Cumulative Distribution Function (CDF)
A function that describes the probability that a random variable X takes on a value
less than or equal to a specific value x. It is given by:
FX(x)=P( X ≤ x)
These are just a few of the fundamental terms in probability theory. Many of
these concepts are interrelated and form the foundation for more advanced
studies in statistics, machine learning, and other fields involving uncertainty and
randomness.
fundamental concept in probability - axioms of probability
The axioms of probability form the foundation for the mathematical theory of
probability. They are a set of three basic rules that must be satisfied by any
probability measure on a sample space. These axioms were formalized by the
Russian mathematician Andrey Kolmogorov in 1933, and they provide a rigorous
framework for understanding probability.
Here are the three axioms:
1. Non-negativity (Non-negativity of probability)
For any event A, the probability of A is greater than or equal to zero:
P(A)≥0
This means that the probability of any event must be a non-negative number. It
can never be negative.
2. Normalization (Total probability is 1)
The probability of the sample space Ω (the set of all possible outcomes) is 1:
P(Ω)=1
This axiom asserts that the probability of some outcome in the sample space must
always occur, i.e., one of the events in the sample space will happen with
certainty.
3. Additivity (Countable additivity)
For any two mutually exclusive (disjoint) events A and B, the probability of their
union is the sum of their probabilities:
P(A∪B)=P(A)+P(B), if A∩B=∅
If two events A and B cannot occur simultaneously (i.e., their intersection is the
empty set), the probability of their union is simply the sum of the individual
probabilities.
This axiom can be extended to countably many mutually exclusive events. If
A1,A2,A3,… are disjoint events, then:
P(⋃i=1∞Ai)=∑i=1∞P(Ai)
Derived properties from the axioms:
• Complementary rule: For any event A, the probability of its complement Ac
(the event that A does not happen) is:
P(Ac)=1−P(A)
• Inclusion-Exclusion Principle: For two events A and B, if they are not
mutually exclusive, then the probability of their union is given by:
P(A∪B)=P(A)+P(B)−P(A∩B)
• Monotonicity: If A⊆B, then P(A)≤P(B).
Intuition behind the axioms:
• The first axiom ensures that probabilities are always non-negative,
reflecting the fact that probabilities are measures of the likelihood of
events and cannot be negative.
• The second axiom guarantees that the total probability of all possible
outcomes (the entire sample space) sums to 1, reflecting certainty.
• The third axiom reflects the idea that if two events cannot occur together,
their probabilities simply add together. This is fundamental to the concept
of "mutually exclusive" events in probability theory.
These axioms provide a rigorous mathematical foundation for defining and
manipulating probabilities in various contexts.
Application of simple probability rules - Association rule learning
In association rule learning, probability rules are often applied to identify
relationships or patterns in large datasets, typically in the context of market
basket analysis, where we aim to find associations between different items that
are frequently purchased together. While association rule learning itself is not
typically framed in terms of classical probability theory, probability concepts are
fundamental in evaluating the strength of these associations.
Let’s break down how simple probability rules can be applied within association
rule learning:
1. Support (Frequency-based probability)
Support is a measure of how frequently a particular item or item set appears in
the dataset. In terms of probability, support can be viewed as the probability of
an item or item set occurring in the dataset.
• Support of item set X:
P(X)=Number of transactions containing X
Total number of transactions
This is the empirical probability of observing the item set X in the dataset.
For example, if in a retail dataset, 100 out of 1,000 transactions contain both
bread and butter, the support of the item set {bread, butter} is:
P({bread, butter}) = 100 / 1000=0.1
This means that 10% of transactions contain both bread and butter.
2. Confidence (Conditional Probability)
Confidence is the probability that an item Y will be purchased given that item X
has been purchased. It is a measure of the reliability of the rule X→Y, and can be
thought of as a conditional probability.
• Confidence of rule X→Y:
P(Y∣X)=Support of (X∪Y)
Support of X
This is the conditional probability of Y given X, or the probability that Y occurs
when X occurs.
For instance, if 80 transactions contain both bread and butter, and 100
transactions contain bread, then the confidence of the rule {bread}→{butter} is:
P({butter}∣{bread})=80/100=0.8
This means that if a customer buys bread, there’s an 80% chance they will also
buy butter.
3. Lift (Ratio of Joint Probability to Independent Probability)
Lift is a measure of how much more likely two items are to appear together than
if they were independent. It compares the joint probability of X and Y with the
product of their individual probabilities.
• Lift of rule X→Y:
Lift(X→Y) = P(X∩Y)
P(X)×P(Y)
Where: P(X∩Y)is the joint probability of both X and Y occurring.
o P(X) and P(Y) are the individual probabilities of X and Y occurring.
Lift measures whether the occurrence of one item increases the likelihood of the
other item occurring:
• Lift > 1: Items are positively associated (purchased together more often
than expected by chance).
• Lift = 1: Items are independent (purchased together as expected by
chance).
• Lift < 1: Items are negatively associated (purchased together less often than
expected by chance).
For example, if:
• 50 out of 1,000 transactions contain both bread and butter, so P(bread ∩
butter ) = 50 / 1000 = 0.05 ,
• P(bread)=100/1000=0.1 and P(butter)=80/1000=0.08,
Then the lift of the rule {bread}→{butter} is:
Lift({bread}→{butter}) = 0.05 / 0.1×0.08=6.25
This means that buying bread increases the likelihood of buying butter by a factor
of 6.25 compared to random chance.
4. Expected Confidence (Under Independence Assumption)
A variant of confidence is the expected confidence, which assumes that the items
are independent. It is used as a baseline to evaluate how much better the actual
rule is compared to what would be expected under the assumption of
independence.
• Expected Confidence of X→Y: P(Y) = P(X) × P(Y)
If the actual confidence is significantly higher than the expected confidence, then
the items are considered to have a strong association.
For example, if we have:
• P(X)=0.1 (probability of buying bread),
• P(Y)=0.08 (probability of buying butter),
• The expected probability under independence is 0.1×0.08=0.008.
If the actual confidence P(Y∣X) is much higher than 0.008, then this indicates a
strong association.
5. Statistical Significance (Chi-Squared Test)
In some cases, the associations found in rules might be evaluated for statistical
significance using tests like the chi-squared test. This is useful to determine if the
observed association is due to chance or if it is statistically significant.
Bayes Theorem:-
Bayes' Theorem is a fundamental principle in probability theory that describes
how to update the probability of a hypothesis based on new evidence. It
combines prior knowledge with new data to provide an updated probability.
The theorem is named after Thomas Bayes, an 18th-century statistician, and it is
expressed mathematically as:
P(H∣E) = P(E∣H)⋅P(H)
P(E)
Where:
• P(H∣E) is the posterior probability, which is the probability of the
hypothesis H being true given the evidence E.
• P(E∣H) is the likelihood, the probability of observing the evidence E
assuming the hypothesis H is true.
• P(H) is the prior probability, which is the initial probability of the hypothesis
before seeing the evidence.
• P(E) is the marginal likelihood or evidence, the total probability of
observing the evidence E under all possible hypotheses.
Intuitive Breakdown:
1. Prior Probability P(H): This is your belief about the hypothesis before
considering the new evidence. For instance, it could be based on past
knowledge or experience.
2. Likelihood P(E∣H): This represents how likely the observed evidence is,
assuming the hypothesis is true. For example, if you have a hypothesis that
it will rain today, the likelihood is how probable it is that you would see
cloudy skies if it were indeed going to rain.
3. Marginal Likelihood P(E): This is the total probability of observing the
evidence, considering all possible hypotheses. It can be seen as a
normalizing constant that ensures the probabilities sum to 1.
4. Posterior Probability P(H∣E): This is the updated probability of the
hypothesis after considering the new evidence. It's the quantity that Bayes'
Theorem helps us compute.
Example:
Imagine you’re trying to diagnose whether a patient has a disease based on a
positive test result.
• Let H represent the hypothesis "The patient has the disease."
• Let E represent the evidence "The test result is positive."
Using Bayes' Theorem:
P(H∣E)=P(E∣H)⋅P(H)
P(E)
Where:
• P(H) is the prior probability that the patient has the disease (e.g., based on
general population statistics).
• P(E∣H) is the probability of a positive test result, given that the patient has
the disease (i.e., the sensitivity of the test).
• P(E) is the total probability of a positive test result, regardless of whether
the patient has the disease or not. This can be calculated as
P(E)=P(E∣H)⋅P(H)+P(E∣¬H)⋅P(¬H), where ¬H is the hypothesis that the patient
does not have the disease.
Why It’s Useful:
Bayes' Theorem allows for updating beliefs in light of new data, and it is widely
used in various fields such as medical diagnosis, machine learning, risk analysis,
and even legal reasoning. It essentially helps refine decisions by incorporating
new information in a statistically sound way.
Random Variable:
A random variable is a variable whose value is subject to chance. It is a
fundamental concept in probability theory and statistics, used to quantify
outcomes of random experiments or events. A random variable can take different
values based on the outcome of an uncertain process.
There are two main types of random variables:
1. Discrete Random Variable: Takes on a finite or countably infinite set of
values. These values are typically integers or specific categories. An
example would be the number of heads in 10 coin tosses, which could be
any integer between 0 and 10.
o Example: Rolling a six-sided die. The outcome (number of dots on the top
face) is a discrete random variable that can take values from 1 to 6.
2. Continuous Random Variable: Takes on any value within a given range or
interval. These variables are not countable but can take any value within a
continuous spectrum, such as real numbers. For example, the exact height
of a person or the time it takes to complete a race could be modeled by
continuous random variables.
o Example: The time taken for a computer to process a task. This could
be any positive real number.
Probability Distribution
The possible outcomes of a random variable and their probabilities are often
described using a probability distribution. For a discrete random variable, the
distribution is given by a probability mass function (PMF), which assigns a
probability to each possible outcome. For continuous random variables, the
distribution is described by a probability density function (PDF).
• For Discrete Random Variables: The sum of all probabilities in the
probability mass function must be 1.
• For Continuous Random Variables: The area under the probability density
curve must equal 1. The probability of the variable taking a specific value is
technically 0, but the probability of it falling within a range of values can be
non-zero.
Key Metrics for Random Variables
• Expected Value (Mean): The average or mean value of a random variable.
It is the weighted average of all possible outcomes, where each outcome is
weighted by its probability.
For a discrete random variable, the expected value is:
E[X]= ∑ xi P(xi)
where xi is a possible outcome and P(xi) is the probability of that outcome.
• Variance: Measures the spread or variability of the random variable's
possible values. A higher variance indicates that the values of the random
variable are more spread out from the expected value.
For a discrete random variable:
Var (X)=E[(X−E[X])2]
• Standard Deviation: The square root of the variance, which also measures
the spread but in the same units as the original random variable.
Example of a Random Variable
Imagine a simple game where you roll a fair die, and the random variable X
represents the outcome of the roll. The values of X are 1, 2, 3, 4, 5, and 6, each
with probability 1/6. The expected value of X would be the average of these
values, weighted by their probabilities:
E[X] = 1/6(1+2+3+4+5+6) = 21/6 = 3.5
This means that on average, you would expect the roll to result in a 3.5 (though in
reality, you can't roll a 3.5 on a die).
Probability Mass Function (PMF) and Probability Density Function (PDF)
These are both fundamental concepts in probability theory, but they apply to
different types of random variables:
1. Probability Mass Function (PMF)
The Probability Mass Function (PMF) applies to discrete random variables—
variables that take on a finite or countably infinite number of distinct values. The
PMF gives the probability that a discrete random variable takes a specific value.
• Definition: A function PX(x) that assigns probabilities to each possible value
xxx of a discrete random variable X.
• Properties:
1. PX(x)≥0 for all x (non-negative).
2. The sum of all probabilities must equal 1:
Example: Suppose you roll a fair six-sided die. The random variable X represents
the outcome (1 through 6), and the PMF for X is:
This means each outcome has a probability of 1/6.
2. Probability Density Function (PDF)
The Probability Density Function (PDF) applies to continuous random variables—
variables that can take on any value within a continuous range (e.g., between 0
and 1, or any interval of real numbers). The PDF describes the likelihood of a
random variable falling within a particular range of values.
• Definition: A function fX(x) that describes the relative likelihood for a
random variable X to take on a given value.
• Properties:
1. fX(x)≥0 for all x (non-negative).
2. The total area under the curve (integral of fX(x)) must equal 1:
3. The probability that XXX lies within an interval [a,b] is given by the
area under the curve between a and b:
• Example: If X is a random variable representing the height of individuals in
a population, the PDF fX(x) might be a normal distribution (bell curve). The
exact shape of fX(x) depends on the parameters (mean and standard
deviation) of the distribution.
For a standard normal distribution, the PDF is given by:
The probability that X lies within a certain range, like between -1 and 1,
would be found by integrating this PDF over that range.
Key Differences:
1. Type of Random Variable:
o PMF: Used for discrete random variables.
o PDF: Used for continuous random variables.
2. Probability Interpretation:
o PMF: PX(x) gives the exact probability of X=x.
o PDF: f X(x) gives the relative likelihood of X being near x, but not the
exact probability. Probabilities for continuous variables are given
over intervals.
3. Summation vs. Integration:
o PMF: You sum the probabilities.
o PDF: You integrate the density function.
Example to Illustrate the Difference:
• Discrete: The number of heads when flipping a fair coin three times.
o The PMF gives the probability of each outcome (e.g., 0, 1, 2, or 3
heads).
• Continuous: The exact weight of a person.
o The PDF gives the likelihood of being a certain weight, but the
probability of having an exact weight (like 70.0 kg) is technically zero;
instead, you would calculate the probability of a range of weights.
The Cumulative Distribution Function (CDF) is a fundamental concept in
probability and statistics. It describes the probability that a random variable X
takes a value less than or equal to a specific value x.
Definition:
The CDF of a random variable X, denoted by FX(x), is defined as:
FX(x)=P(X ≤ x)
Where:
• P( X ≤ x) is the probability that the random variable X takes a value less than
or equal to x.
• x is any value in the sample space of X.
Properties of the CDF:
1. Non-decreasing: The CDF is always non-decreasing. That is, if x1 ≤ x2, then
FX(x1)≤FX(x2).
2. Limits:
o Lim x→−∞FX(x)=0 (the probability of the random variable being less
than or equal to a very small value approaches zero).
o Lim x→∞FX(x)=1 (the probability of the random variable being less
than or equal to a very large value approaches one).
3. Right-continuous: The CDF is always right-continuous, meaning that FX(x)
equals its limit from the right at every point x.
For Discrete Random Variables:
For a discrete random variable X, the CDF is a step function, where it jumps at
each possible value that X can take.
FX(x) = P(X=x1) + P(X=x2)+⋯+P(X=xn)for x=xn
For Continuous Random Variables:
For a continuous random variable X, the CDF is the integral of the probability
density function (PDF), f X(x), from −∞ to x:
Example:
Discrete CDF:
Consider a random variable XXX that can take the values 1, 2, or 3 with the
following probabilities:
• P(X=1)=0.2
• P(X=2)=0.5
• P(X=3)=0.3
The CDF FX(x) is:
• For x<1, FX(x)=0.
• For 1≤x<2, FX(x)=0.2.
• For 2≤ x <3, FX(x)=0.7.
• For x≥3x , FX(x)=1.
Continuous CDF:
For a continuous random variable, say X, with PDF f X(x), the CDF is given by:
For example, if X follows a normal distribution with mean μ (mu) and standard
deviation σ (sigma), then the CDF FX(x) is related to the standard normal CDF,
often denoted Φ(x).
Visualizing the CDF:
• The CDF graph starts at 000 and asymptotically approaches 1.
• For discrete variables, the graph consists of horizontal segments with
jumps.
• For continuous variables, the graph is a smooth, non-decreasing curve.
Binomial distribution
The binomial distribution is a discrete probability distribution that models the
number of successes in a fixed number of independent trials, each with two
possible outcomes: "success" or "failure." This distribution is commonly used
when you have a series of identical experiments (trials), and you want to find the
probability of a specific number of successes.
Key Features of the Binomial Distribution:
1. Fixed number of trials (n): The experiment is repeated a certain number of
times, and each trial is independent.
2. Two possible outcomes (success or failure): Each trial has only two
possible outcomes, often labeled "success" and "failure."
3. Constant probability of success (p): The probability of success on each trial
is the same (denoted by ppp), and the probability of failure is 1−p.
4. Independence of trials: The outcome of each trial does not affect the
others.
Probability Mass Function (PMF):
The probability of exactly k successes in n independent trials, each with
probability p of success, is given by the binomial probability formula:
Where:
• P(X=k) is the probability of having exactly k successes.
• n is the number of trials.
• k is the number of successes.
• p is the probability of success on a single trial.
• is the binomial coefficient, which counts the number of
ways to choose k successes out of n trials.
Example:
Suppose you flip a coin 5 times, and you're interested in the probability of getting
exactly 3 heads (successes), with the probability of heads being 0.5 on each flip
(fair coin).
Here:
• n=5 (the number of trials),
• k=3 (the number of successes you're interested in),
• p=0.5 (the probability of getting heads on each flip).
The binomial probability is:
So, the probability of getting exactly 3 heads in 5 flips is 0.3125, or 31.25%.
Mean and Variance of the Binomial Distribution:
For a binomial random variable X, the mean and variance are given by:
• Mean (Expected value): μ = n⋅p
• Variance: σ2=n ⋅ p⋅(1−p)
• Standard deviation: σ=n ⋅ p⋅(1−p)
Conditions for Using a Binomial Distribution:
To model a situation with a binomial distribution, the following conditions must
be met:
1. The number of trials must be fixed.
2. Each trial must have only two possible outcomes (success or failure).
3. The trials must be independent of each other.
4. The probability of success must remain constant for each trial.
Applications:
The binomial distribution is widely used in various fields, such as:
• Quality control (e.g., the number of defective products in a batch),
• Medical research (e.g., the number of patients who respond to a
treatment),
• Sports (e.g., the number of goals scored by a player in a fixed number of
games),
• Finance (e.g., the number of favorable trades in a fixed number of
attempts).
Poisson Distribution
The Poisson distribution is a probability distribution that describes the number of
events occurring in a fixed interval of time or space, given that these events
happen independently and at a constant average rate. It's commonly used to
model rare events or events that occur with a known constant mean rate, such as
the number of phone calls received by a call center in an hour, or the number of
accidents at an intersection during a day.
Key Properties:
1. Parameter: The Poisson distribution is defined by a single parameter λ,
which represents the average number of occurrences (the rate) within a
specified interval.
2. Support: The distribution only takes non-negative integer values, i.e.,
x=0,1,2,3,…
3. Memoryless: The Poisson distribution is a discrete distribution, and it is
often used for modeling independent events that occur at a constant rate.
Probability Mass Function (PMF):
The probability of observing exactly x events in a given interval, when the average
rate of occurrences is λ, is given by the Poisson probability mass function:
Where:
• P(X=x) is the probability of observing x events.
• λ is the average number of events (the rate parameter).
• x is the number of events (a non-negative integer).
• e is Euler's number (approximately 2.71828).
Important Properties:
• Mean and Variance: For a Poisson-distributed random variable X, the mean
μ and variance σ2 are both equal to λ.
Mean: E[X]=λ
Variance: Var(X)=λ
• Skewness: The Poisson distribution is positively skewed when λ is small, but
as λ increases, the distribution becomes more symmetric and approximates
a normal distribution.
When to Use the Poisson Distribution:
• Events happen independently.
• The rate at which events occur is constant.
• Events occur one at a time (i.e., no simultaneous occurrences).
• The probability of more than one event occurring in an infinitesimally small
time interval is negligible.
Example:
Suppose a call center receives an average of 5 calls per hour. The number of calls
received in a given hour follows a Poisson distribution with λ=5. The probability of
receiving exactly 3 calls in the next hour is:
Thus, the probability of receiving exactly 3 calls in the next hour is approximately
14.04%.
Approximation:
For large λ, the Poisson distribution can be approximated by a normal distribution
with mean λ and variance λ, i.e.,
X∼N (λ, λ)X
This approximation is useful for calculating probabilities for large values of λ,
especially when exact calculation is difficult.
Geometric Distribution
The Geometric Distribution is a discrete probability distribution that models the
number of trials required before a success occurs in a sequence of independent
Bernoulli trials. In simpler terms, it tells us the probability of getting the first
success on the k-th trial in a sequence of independent trials, where each trial has
two possible outcomes: success (with probability p) or failure (with probability
1−p).
Key Features of the Geometric Distribution:
1. Trial Characteristics:
o The trials are independent.
o Each trial has two possible outcomes: success or failure.
o The probability of success, p, is constant across trials.
o The random variable represents the number of trials until the first
success occurs.
2. Probability Mass Function (PMF): The probability that the first success
occurs on the k-th trial (where k=1,2,3,…) is given by the formula:
P(X=k)=(1−p)k−1⋅p
where:
o X is the random variable representing the number of trials until the
first success.
o p is the probability of success on each trial.
o k is the number of trials until the first success.
3. Cumulative Distribution Function (CDF): The cumulative probability that
the first success occurs on or before the k-th trial is given by:
P(X ≤ k)=1−(1−p)k
4. Mean (Expected Value): The expected number of trials until the first
success is:
This tells you, on average, how many trials it will take to get the first
success.
5. Variance: The variance of the number of trials until the first success is:
Example:
Suppose you have a game where the probability of winning (success) on any given
trial is p=0.2. What is the probability that the first win happens on the 3rd trial?
Using the PMF formula:
P( X = 3) = (1−0.2)3−1⋅0.2=(0.8)2⋅0.2 = 0.64⋅0.2 = 0.
So, the probability that the first success occurs on the 3rd trial is 0.128 or 12.8%.
Applications:
The Geometric Distribution is commonly used in scenarios such as:
• Modeling the number of coin flips before a head appears.
• Counting the number of attempts needed before a person successfully
answers a question in a quiz.
• Determining how many customer calls a call center agent must handle
before receiving a complaint.
uniform distribution
A uniform distribution is a type of probability distribution where all outcomes are
equally likely to occur. It can be either discrete or continuous, depending on the
nature of the data being modeled.
1. Discrete Uniform Distribution
In a discrete uniform distribution, there is a finite number of equally likely
outcomes. For example, when rolling a fair die, each of the six faces (1 through 6)
has an equal probability of occurring.
• Probability Mass Function (PMF): If you have n equally likely outcomes, the
probability of each outcome xi is given by:
where n is the total number of possible outcomes.
Example: If you roll a fair die, the probability of rolling any particular
number (e.g., 1) is:
2. Continuous Uniform Distribution
In a continuous uniform distribution, any number within a given range is equally
likely to occur. For example, if you randomly select a real number from the
interval [a,b], the probability of any specific number is zero, but the probability of
choosing a number from a subinterval is proportional to the length of that
interval.
• Probability Density Function (PDF): The probability density function for a
continuous uniform distribution over the interval [a, b] is given by:
The total probability over the entire range [a, b] must be 1, so the constant
1/b−a ensures that the area under the curve is 1.
Example: If X is uniformly distributed between 0 and 10 (i.e., a=0 and
b=10), then the probability density function is:
The probability that X falls within any subinterval [c, d] of [0,10] is given by:
For example, the probability that X falls between 3 and 7 is:
Key Properties of Uniform Distributions:
• Mean (Expected Value):
o For a discrete uniform distribution with n equally likely outcomes,
the mean is the average of all possible values.
For a continuous uniform distribution over [a, b], the mean is the
midpoint of the interval:
• Variance:
o For a discrete uniform distribution with n outcomes, the variance is:
o For a continuous uniform distribution over [a, b], the variance is:
Applications:
• Discrete Uniform Distribution: Used for scenarios where all outcomes are
equally likely, such as rolling dice, drawing cards from a well-shuffled deck,
or flipping a fair coin.
• Continuous Uniform Distribution: Often used in simulations and models
where any value within a range is equally likely, such as generating random
numbers within a specific interval or modeling certain types of noise in
data.
The Exponential Distribution is a continuous probability distribution that is often
used to model the time between events in a Poisson process. A Poisson process is
a type of random process where events occur independently and at a constant
average rate. The Exponential distribution describes the waiting time between
successive events in this process.
Key Features:
• Memoryless Property: The Exponential distribution is "memoryless,"
meaning the probability of an event occurring in the future is independent
of the past. If you've already waited for some amount of time without the
event happening, the distribution "forgets" that and treats the situation as
if you're starting fresh.
• Parameter: The Exponential distribution is usually parameterized by its rate
parameter, λ, which is the rate at which events occur (often called the rate
parameter). The mean of the distribution is 1/λ.
Probability Density Function (PDF):
The probability density function for the Exponential distribution is:
Where:
• x is the time between events (or the random variable),
• λ is the rate parameter (the average number of events per unit time),
• e is Euler's number (approximately 2.71828).
Cumulative Distribution Function (CDF):
The cumulative distribution function, which gives the probability that the time
until the next event is less than or equal to some value xxx, is:
F(x ; λ) = 1 – e –λx , x ≥ 0
Mean and Variance:
• Mean of the Exponential distribution: 1/λ
• Variance of the Exponential distribution: 1/λ2
Applications:
• Queuing Theory: The Exponential distribution is commonly used to model
the time between arrivals in queuing systems (e.g., the time between
customer arrivals at a service desk).
• Reliability Engineering: It models the lifetime of systems or components,
especially when failure rates are constant over time.
• Physics and Biology: The distribution is used to model random decay
processes, such as radioactive decay or the time until an organism
experiences a certain event (like death or reproduction).
Example:
If a bus arrives at a station on average every 15 minutes, the time between two
consecutive bus arrivals follows an Exponential distribution with a rate parameter
λ=1/15 buses per minute.
To calculate the probability that the time until the next bus is less than 10
minutes:
So, there's approximately a 48.66% chance that the next bus will arrive in less
than 10 minutes.
The Chi-Square Distribution is a special case of the Gamma Distribution and is
commonly used in statistics, especially in hypothesis testing and in the
construction of confidence intervals. It is a continuous probability distribution
that arises when summing the squares of independent standard normal variables.
Key Characteristics of the Chi-Square Distribution:
1. Definition: The chi-square distribution is the distribution of a sum of the
squares of kkk independent standard normal random variables. If
Z1,Z2,...,Zk are independent random variables, each with a standard normal
distribution (N(0,1)), then:
X=Z12+Z22+⋯+Zk2
follows a chi-square distribution with k degrees of freedom.
2. Degrees of Freedom (df): The number of degrees of freedom k is a key
parameter of the chi-square distribution, and it is typically related to the
number of independent variables that are summed. The distribution
becomes more symmetric and bell-shaped as k increases.
3. Probability Density Function (PDF): The probability density function of a
chi-square distribution with k degrees of freedom is given by:
where Γ(⋅) is the Gamma function. The function shows how the probabilities
are distributed over the values of x.
4. Mean and Variance:
o The mean of a chi-square distribution is k, where k is the number of
degrees of freedom.
o The variance of the chi-square distribution is 2k.
5. Skewness and Kurtosis:
o For small values of k, the distribution is positively skewed (right-
skewed), but as k increases, the distribution becomes more
symmetric and approaches a normal distribution.
o The skewness of the chi-square distribution is and the
kurtosis is 12/k.
6. Applications:
o Goodness-of-fit tests: In hypothesis testing, the chi-square
distribution is used in the chi-square test to assess whether a sample
data set fits an expected distribution.
o Independence tests: It’s used in tests of independence (e.g., chi-
square test for independence) to determine if two categorical
variables are independent.
o Confidence intervals for variances: In statistical inference, it is used
to construct confidence intervals for the variance of a normally
distributed population.
7. Critical Values: The chi-square distribution is often used in statistical
hypothesis testing, and critical values depend on the significance level α
and the degrees of freedom k. For example, to test a hypothesis at the 5%
significance level, we can compare the calculated chi-square statistic with a
critical value from the chi-square distribution table.
Example Usage:
• Chi-Square Goodness-of-Fit Test: Suppose you want to test if a die is fair. You roll
it 60 times and record the outcomes. The chi-square test could be used to determine if
the observed frequencies match the expected frequencies (which would be 10 for each
of the six sides, assuming a fair die). The chi-square statistic would be computed as:
where O i is the observed frequency for each outcome and E i is the
expected frequency.
• Chi-Square Test for Independence: You may use the chi-square test for
independence to test if two categorical variables are independent. For example, you
might want to test if there is an association between gender and whether people prefer
tea or coffee. The contingency table of observed frequencies is compared to the
expected frequencies under the assumption of independence.
The Student's t-distribution and the F-distribution are both probability
distributions commonly used in inferential statistics, particularly in hypothesis
testing and confidence intervals. Here's an overview of each:
1.Student’s t-Distribution:
The Student's t-distribution is used in situations where the sample size is small
and/or the population variance is unknown. It was introduced by William Sealy
Gosset under the pseudonym "Student" in 1908.
Characteristics:
• Shape: The t-distribution is symmetric and bell-shaped, similar to the
standard normal distribution, but with heavier tails. This means it accounts
for the increased variability when dealing with small sample sizes.
• Degrees of Freedom (df): The shape of the t-distribution depends on the
degrees of freedom, which typically equals n−1, where n is the sample size.
As the degrees of freedom increase, the t-distribution approaches a
standard normal distribution.
• Heavy Tails: The tails of the t-distribution are heavier than those of the
standard normal distribution. This reflects the greater variability in
estimates of the mean when the sample size is small.
When to use the t-distribution:
• When performing hypothesis testing or constructing confidence intervals
for the mean of a population when the sample size is small (typically n<30).
• When the population variance is unknown and must be estimated from the
sample.
Common Uses:
• One-sample t-test: To test if the mean of a population is equal to a specific
value.
• Two-sample t-test: To compare the means of two independent samples.
• Paired t-test: To compare the means of two related or paired samples.
• Confidence intervals: For estimating the population mean when the
population variance is unknown.
2. F-Distribution:
The F-distribution is used primarily in the analysis of variance (ANOVA), regression
analysis, and hypothesis testing involving variances.
Characteristics:
• Shape: The F-distribution is positively skewed (i.e., it has a longer right tail),
and its shape depends on two parameters: the degrees of freedom for the
numerator (df1) and the denominator (df2).
• Range: The F-distribution only takes positive values because it represents a
ratio of variances, which are always non-negative.
• Non-central F-distribution: When applied to non-central situations (i.e., in
the presence of a non-zero mean), the F-distribution may shift.
Degrees of Freedom:
• Numerator degrees of freedom (df1): Associated with the variability of the
group means or treatment variances.
• Denominator degrees of freedom (df2): Associated with the variability of
the error or within-group variances.
When to use the F-distribution:
• When comparing the variances of two or more populations.
• In Analysis of Variance (ANOVA): To test whether there are significant
differences between the means of three or more groups.
• In regression analysis: To test the overall significance of a regression
model.
Common Uses:
• ANOVA: To test whether the means of multiple groups are significantly
different.
• F-test: To compare the variances of two populations.
• Regression analysis: To test the fit of a regression model by comparing the
model variance to the residual variance.
Key Differences:
Feature t-Distribution F-Distribution
Shape Symmetric and bell-shaped Positively skewed (right tail)
Two sets of degrees of
Parameters Degrees of freedom (df)
freedom (df1,df2d)
Range of Positive real numbers (0 to
Entire real line (-∞ to +∞)
values ∞)
Feature t-Distribution F-Distribution
Estimating population mean with Comparing variances or
Main Use
small samples testing model fit
t-tests (one-sample, two-sample, F-tests (ANOVA, comparing
Type of test
paired) variances)
Assumes normality of data, Assumes independence and
Assumptions
unknown population variance normality of data
Example of Application:
• Student’s t-distribution: Suppose you are testing if the mean score of a
sample of 20 students on a final exam is different from 75. Since you don’t
know the population standard deviation, you would use a t-test and refer
to the t-distribution to determine the critical value based on your sample
size (df = 19).
• F-distribution: Suppose you have three different teaching methods and
want to test whether their effects on student performance differ
significantly. You would perform a one-way ANOVA, which involves
comparing the variance between the groups to the variance within each
group using an F-distribution.
Both distributions are fundamental for hypothesis testing, but they serve
different purposes depending on the structure of the data and the hypothesis
being tested.