KEMBAR78
Module 1 Topic 2 | PDF | Probability Distribution | Variance
0% found this document useful (0 votes)
12 views22 pages

Module 1 Topic 2

The document provides an overview of the Binomial Distribution, which models the number of successes in a fixed number of independent Bernoulli trials characterized by parameters n (number of trials) and p (probability of success). It details the probability mass function, properties, and formulas for mean, variance, and standard deviation, along with applications in AI and data science. Additionally, the document includes several problems and solutions illustrating the calculation of probabilities using the Binomial Distribution.

Uploaded by

aayushrasaily04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views22 pages

Module 1 Topic 2

The document provides an overview of the Binomial Distribution, which models the number of successes in a fixed number of independent Bernoulli trials characterized by parameters n (number of trials) and p (probability of success). It details the probability mass function, properties, and formulas for mean, variance, and standard deviation, along with applications in AI and data science. Additionally, the document includes several problems and solutions illustrating the calculation of probabilities using the Binomial Distribution.

Uploaded by

aayushrasaily04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Module I: Probability Distributions

Topic 2: Binomial Distribution

Dr. P. Rajendra, Professor, Department of Mathematics,


CMRIT, Bengaluru.
Introduction to Binomial Distribution

▶ The Binomial Distribution is a discrete probability


distribution that models the number of successes in a fixed
number of independent Bernoulli trials.
▶ Each trial has two possible outcomes: success (with
probability p) or failure (with probability 1 − p).
▶ The Binomial Distribution is characterized by two parameters:

▶ n: The number of trials.


▶ p: The probability of success on each trial.

Example
In a binary classification problem, the Binomial Distribution can
model the number of correct predictions (successes) out of n total
predictions made by a machine learning model.
Binomial Distribution(Continued..)
Let X be a discrete random variable, p be the probability of
success, and q be the probability of failure. The probability mass
function of the binomial distribution can be defined as:
( 
n x n−x
x p q , x ≥0
P(X = x) = b(n, p, x) =
0, Otherwise
where n is the number of trials and n p are the parameters.

The binomial distribution follows the following properties:


1. P(X = x) = b(n, p, x) ≥ 0
2. p
P+ q = 1
n n x n−x
3. x=0 x p q =1

The mean, variance, and standard deviation of the binomial


distribution are given by:
▶ Mean: µ = np
▶ Variance: σ 2 = npq

▶ Standard Deviation: σ = npq
Applications in AI and Data Science

▶ Model Evaluation: The Binomial Distribution is used to


model the number of successes (correct predictions) in a fixed
number of trials (predictions).
▶ A/B Testing: In A/B testing, it can model the number of
users who perform a specific action (e.g., click a button) out
of a total number of users exposed to the variant.
▶ Natural Language Processing (NLP): Used to model the
occurrence of a particular word in a fixed number of text
documents.
▶ Spam Detection: Can model the number of spam emails
correctly identified out of a set number of emails.
Derivation for Mean, Variance of a Binomial Random
Variable

Let X be a binomial random variable with parameters n (number


of trials) and p (probability of success). The probability mass
function (PMF) of X is given by:
 
n x n−x
P(X = x) = p q , x = 0, 1, 2, . . . , n
x

where q = 1 − p.
The mean (expected value) µX of X is defined as:
n
X
µX = E (X ) = x · P(X = x)
x=0

Substituting the PMF:


n  
X n x n−x
µX = x p q
x
x=0

We can express this as:


n  
X n − 1 x−1 (n−1)−(x−1)
µX = np p q
x −1
x=1

Letting y = x − 1, the sum becomes:


n−1  
X n − 1 y (n−1)−y
µX = np p q = np(p + q)n−1
y
y =0

Since p + q = 1, we have: µX = np
To find the variance, we first calculate E (X (X − 1)):
n
X
E (X (X − 1)) = x(x − 1) · P(X = x)
x=0
Substituting the PMF:
n  
X n x n−x
E (X (X − 1)) = x(x − 1) p q
x
x=0
This can be simplified as:
n  
X n − 2 x−2 (n−2)−(x−2)
2
E (X (X − 1)) = n(n − 1)p p q
x −2
x=2
Letting y = x − 2, the sum becomes:
n−2  
2
X n − 2 y (n−2)−y
E (X (X −1)) = n(n−1)p p q = n(n−1)p 2 (p+q)n−2
y
y =0

Since p + q = 1, we have:
E (X (X − 1)) = n(n − 1)p 2
The variance σX2 of X is given by:
σX2 = E (X 2 ) − (E (X ))2
Expanding E (X 2 ):
E (X 2 ) = E (X (X − 1)) + E (X )
Substituting the values:
σX2 = n(n − 1)p 2 + np − (np)2
Simplifying:
σX2 = np(1 − p) = npq
σX2 = npq.
Standard Deviation of the Binomial Distribution
The standard deviation σX is the square root of the variance:

σX = npq
Thus, the standard deviation of the binomial distribution is

σX = npq.
Problem 1

Let X be a binomially distributed random variable based on 6


repetitions of an experiment with probability of success p = 0.3.
Evaluate the following probabilities: (i)P(X ≤ 3), (ii)P(X > 4)

Solution: The probability mass function of a binomial random


variable X with parameters n and p is given by:
 
n k
P(X = k) = p (1 − p)n−k
k
where n = 6 and p = 0.3.
(i). Calculate P(X ≤ 3):

P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)


 
6
P(X = 0) = (0.3)0 (0.7)6 = 1 · 1 · (0.7)6 = (0.7)6
0
 
6
P(X = 1) = (0.3)1 (0.7)5 = 6 · (0.3) · (0.7)5
1
 
6
P(X = 2) = (0.3)2 (0.7)4 = 15 · (0.3)2 · (0.7)4
2
 
6
P(X = 3) = (0.3)3 (0.7)3 = 20 · (0.3)3 · (0.7)3
3

∴ P(X ≤ 3) = (0.7)6 +6·0.3·(0.7)5 +15·(0.3)2 ·(0.7)4 +20·(0.3)3 ·(0.7)3

P(X ≤ 3) ≈ 0.1176 + 0.3025 + 0.3241 + 0.1852 = 0.9294


(ii). Calculate P(X > 4):

P(X > 4) = P(X = 5) + P(X = 6)


 
6
P(X = 5) = (0.3)5 (0.7)1 = 6 · (0.3)5 · (0.7)
5
 
6
P(X = 6) = (0.3)6 (0.7)0 = 1 · (0.3)6 · 1
6

∴ P(X > 4) = 6 · (0.3)5 · (0.7) + (0.3)6

P(X > 4) ≈ 0.0102 + 0.0007 = 0.0109


Thus,

P(X ≤ 3) ≈ 0.9294, P(X > 4) ≈ 0.0109.


Problem 2

The probability that a pen manufactured by a company will be


defective is 0.1. If 12 such pens are selected at random, find the
probability that: (i) Exactly 2 pens will be defective, (ii) At most 2
pens will be defective, (iii) None will be defective.

Solution: Given that the probability of a pen being defective is


p = 0.1, and the number of pens selected is n = 12, we use the
binomial distribution formula:
 
n k
P(X = k) = p (1 − p)n−k
k
where X is the random variable representing the number of
defective pens.
(i). Probability that exactly 2 pens will be defective
 
12
P(X = 2) = (0.1)2 (0.9)10
2
Calculate 12

2 :
 
12 12! 12 · 11
= = = 66
2 2!(12 − 2)! 2·1
Then:

P(X = 2) = 66 · (0.1)2 · (0.9)10

P(X = 2) = 66 · 0.01 · 0.3487 = 0.2301


(ii). Probability that at most 2 pens will be defective

P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)


Calculate each term:
 
12
P(X = 0) = (0.1)0 (0.9)12 = 1 · 1 · (0.9)12 = (0.9)12
0

P(X = 0) = (0.9)12 = 0.2824


 
12
P(X = 1) = (0.1)1 (0.9)11 = 12 · 0.1 · (0.9)11
1

P(X = 1) = 0.3766

∴, P(X ≤ 2) = 0.2824 + 0.3766 + 0.2301 = 0.8891


(iii). Probability that none will be defective (P(X = 0))

P(X = 0) = (0.9)12 = 0.2824


Problem 3

The number of telephonic lines busy at an instant is a binomial


variant with a probability of 0.1. If 10 lines are chosen at random,
what is the probability that: (i) No line is busy, (ii) All lines are
busy, (iii) At least one line is busy, (iv) At most two lines are busy.

Solution: Given that the probability of a line being busy is


p = 0.1, and the number of lines chosen is n = 10, we use the
binomial distribution formula:
 
n k
P(X = k) = p (1 − p)n−k
k
where X is the random variable representing the number of busy
lines.
(i). Probability that no line is busy (P(X = 0))
 
10
P(X = 0) = (0.1)0 (0.9)10
0
Calculate P(X = 0):

P(X = 0) = 1 · 1 · (0.9)10 = (0.9)10

P(X = 0) ≈ 0.3487
(ii). Probability that all lines are busy (P(X = 10))
 
10
P(X = 10) = (0.1)10 (0.9)0
10
Calculate P(X = 10):

P(X = 10) = 1 · (0.1)10 · 1 = (0.1)10

P(X = 10) = 10−10 ≈ 10−10 ≈ 0.0000000001


(iii). Probability that at least one line is busy (P(X ≥ 1))

P(X ≥ 1) = 1 − P(X = 0)
Using the value of P(X = 0) calculated earlier:

P(X ≥ 1) = 1 − 0.3487 = 0.6513


(iv). Probability that at most two lines are busy (P(X ≤ 2))

P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)


Calculate P(X = 1) and P(X = 2):
 
10
P(X = 1) = (0.1)1 (0.9)9 = 10 · 0.1 · (0.9)9
1

P(X = 1) = 10 · 0.1 · 0.3874 = 0.3874


 
10
P(X = 2) = (0.1)2 (0.9)8 = 45 · 0.01 · (0.9)8
2

P(X = 2) = 45 · 0.01 · 0.4305 = 0.1937


Add these probabilities:

P(X ≤ 2) = 0.3487 + 0.3874 + 0.1937 = 0.9298


Problem 4

When a coin is tossed 4 times, find the probability of getting: (i)


Exactly one head, (ii) At most three heads, (iii) At least two heads.

Solution: Given that a coin is tossed 4 times, we have a binomial


distribution with n = 4 and probability of getting a head p = 0.5.
The binomial probability formula is:
 
n k
P(X = k) = p (1 − p)n−k
k
where X is the random variable representing the number of heads.
(i). Probability of getting exactly one head (P(X = 1))
 
4
P(X = 1) = (0.5)1 (0.5)4−1
1
Calculate P(X = 1):

P(X = 1) = 4 · (0.5)1 · (0.5)3 = 4 · 0.5 · 0.125 = 0.25


(ii). Probability of getting at most three heads (P(X ≤ 3))

P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)


Calculate each term:
 
4
P(X = 0) = (0.5)0 (0.5)4 = 1 · 1 · 0.0625 = 0.0625
0
 
4
P(X = 2) = (0.5)2 (0.5)2 = 6 · 0.25 · 0.25 = 0.375
2
 
4
P(X = 3) = (0.5)3 (0.5)1 = 4 · 0.125 · 0.5 = 0.25
3
Add these probabilities:

P(X ≤ 3) = 0.0625 + 0.25 + 0.375 + 0.25 = 0.9375


(iii). Probability of getting at least two heads (P(X ≥ 2))

P(X ≥ 2) = 1 − P(X < 2) = 1 − (P(X = 0) + P(X = 1))

Using the values calculated earlier:

P(X < 2) = P(X = 0) + P(X = 1) = 0.0625 + 0.25 = 0.3125

P(X ≥ 2) = 1 − 0.3125 = 0.6875


Assignment Problems
1. The probability of germination of a seed in a packet of seeds
is found to be 0.7. If 10 seeds are taken for experimenting on
germination in a laboratory, find the probability that: (i) 8
seeds germinate. [Ans: 0.2334], (ii) At least 8 seeds
germinate. [Ans: 0.3826], (iii) At most 8 seeds germinate.
[Ans: 0.8508]
2. A communication channel receives independent pulses at the
rate of 12 pulses per microsecond. The probability of
transmission error is 0.001 for each microsecond. Compute
the probability of: (i) No error during a microsecond. [Ans:
0.9880], (ii) 1 error. [Ans: 0.0118], (iii) At least 1 error. [Ans:
0.0120], (iv) 2 errors. [Ans: 0.000065], (v) At most 2 errors.
[Ans: 0.9999]
3. In 800 families with 5 children each, how many families would
be expected to have: (i) 3 boys. [Ans: 250], (ii) 5 girls. [Ans:
25], (iii) At most 2 girls. [Ans: 400], (v) Either 2 or 3 boys.
[Ans: 500], [Note:- Assuming the probability for boys and girls
to be equal.]

You might also like