Introduction to Probability
Unit 3
Learning Objectives
 Understand uncertainty and how probability concepts are used for
  measuring and modelling uncertainty.
 Learn basic concepts in probability: axioms of probability, frequency
  estimate of probability, conditional probability and Bayes’ theorem.
 Learn how simple probability rules are used for solving business
  problems using association rule mining and its applications in market
  basket analysis and recommender systems.
 Understand the concept of random variables, discrete and continuous
  random variables, probability density function, and cumulative
  distribution function.
 Understand     various discrete distributions such as binomial
  distribution, Poisson distribution, and geometric distribution and their
  applications for solving business problems.
 Understand various continuous distributions such as uniform,
  exponential, normal, chi-square, t, and F distributions and their
  applications for solving business problems.
Introduction to Probability
 One of the primary objectives in analytics is to measure the
  uncertainty associated with an event or key performance
  indicator.
 Axioms of probability and the concept of random variable
  are fundamental building blocks of analytics that are used
  for measuring uncertainty associated with key performance
  indicators of importance for a business.
 Probability theory is the foundation on which descriptive
  and predictive analytics models are built.
Introduction to Probability
Analytics  applications involve tasks such as
 prediction of probability of occurrence of an
 event, testing a hypothesis, building models to
 explain variation in a variable of importance to
 the business such as profitability, market share,
 demand, etc.
Many important tasks in analytics deal with
 uncertain events and it is essential to understand
 probability theory that can be used to predict and
 measure uncertain events.
Introduction to Probability
 Probability quantifies the uncertainty of the outcomes
  of a random variable. Or, it quantifies likelihood or
  possibilities of an event.
 Specifically, it quantifies how likely a specific outcome
  is for a random variable, such as the flip of a coin, the
  roll of a dice, or drawing a playing card from a deck.
 For a random variable x, P(x) is a function that assigns
  a probability to all values of x.
 Probability Density of x = P(x)
PROBABILITY THEORY – TERMINOLOGY
Random Experiment
Random experiment is an experiment in which the
 outcome is not known with certainty. That is, the output of
 a random experiment cannot be predicted with certainty.
Predictive analytics mainly deals with random
 experiments such as:
  predicting quarterly revenue of an organization
  customer churn (whether a customer is likely to churn or
   how many customers are likely to churn before next quarter)
  demand for a product at a future time period
  number of views for an YouTube video
  outcome of a football match (win, draw or lose), etc.
PROBABILITY THEORY – TERMINOLOGY
Sample Space
 Sample space is the universal set that consists of all possible
  outcomes of an experiment. Sample space is usually represented
  using the letter ‘S’ and individual outcomes are called the
  elementary events.
 The sample space can be finite or infinite.
 Few random experiments and their sample spaces are discussed
  below:
 Experiment: Outcome of a football match
        Sample Space = S = {Win, Draw, Lose}
 Experiment: Predicting customer churn at an individual customer
  level
        Sample Space = S = {Churn, No Churn}
 Experiment: Predicting percentage of customer churn
        Sample Space = S = {X | X ∈ R, 0 ≤ X ≤ 100}, that is X is a real
number that can take any value between 0 and 100 percentage.
 Experiment: Life of a turbine blade used in an aircraft engine
        Sample Space = S = {X | X ∈ R, 0 ≤ X < ∞}, that is X is a real
number that can take any value between 0 and ∞.
PROBABILITY THEORY – TERMINOLOGY
 Event
 Event (E) is a subset of a sample space and probability is
  usually calculated with respect to an event.
 An event can be represented using the Venn diagram in
  Figure below
 The Venn diagram in Figure indicates that the event E is a
  subset of the sample space S, that is, E ⊂ S (E is a subset of
  S).
 Consider the random experiment that predicts number of
  customers who are likely to churn within a quarter from a
  customer base of 100 customers.
PROBABILITY THEORY – TERMINOLOGY
The corresponding sample space = {X| X ∈ Z,
 0 ≤ X ≤ 100}, that is X is a real number that
 can take any integer value between 0 and
 100. Now we can define several events such
 as:
  Event A = Number of customer churn less than
   10
  Event B = Number of customer churn between
   10 and 30
  Event C = Number of customer churn
   exceeding 30
PROBABILITY THEORY – TERMINOLOGY
Probability Estimation using Relative
 Frequency
 The classical approach to probability estimation of an
 event is based on the relative frequency of the
 occurrence of that event. According to frequency
 estimation, the probability of an event X, P(X), is given
 by:
 For example, say a company has 1000 employees and
 every year about 200 employees leave the job. Then the
 probability of attrition of an employee per annum is
 200/1000 = 0.2.
Algebra of Events
Assume that X, Y and Z are three events of a sample space. Then the
following algebraic relationships are valid and are useful while
deriving probabilities of events:
 Commutative rule: X ∪ Y = Y ∪ X and X ∩ Y = Y ∩ X
 Associative rule: (X ∪ Y) ∪ Z = X ∪ (Y ∪ Z) and (X ∩ Y) ∩ Z = X ∩ (Y
  ∩ Z)
 Distributive rule: X ∪ (Y ∩ Z) = (X ∪ Y) ∩ (X ∪ Z)
 X ∩ (Y ∪ Z) = (X ∩ Y) ∪ (X ∩ Z)
The above rules of algebra will be useful while calculating the
probability of events. The following rules known as DeMorgan’s Laws
on complementary sets are useful while deriving probabilities:
 (X ∪ Y)C = XC ∩ YC
 (X ∩ Y)C = XC ∪ YC
 where XC and YC are the complementary events of X and Y,
  respectively.
FUNDAMENTAL CONCEPTS IN PROBABILITY
– AXIOMS OF PROBABILITY
According to axiomatic theory of probability, the probability
of an event E satisfies the following axioms:
1. The probability of event E always lies between 0 and
   1. That is, 0 ≤ P(E) ≤ 1.
2. The probability of the universal set S is 1. That is,
   P(S) = 1.
3. P(X ∪ Y) = P(X) + P(Y), where X and Y are two
   mutually exclusive events.
The following elementary rules of probability are directly deduced
from the original three axioms of probability, using the set theory
relationships:
Example
 The probability of an event not occurring, is called the
  complement.
 This can be calculated by one minus the probability of the
  event, or 1 – P(A).
 For example, the probability of not rolling a 5 would be
   1 – P(5) or 1 – 0.166 or about 0.833 or about 83.333%.
 Probability of Not Event A = 1 – P(A)
 Probability can range in from 0 to 1, where 0 means the
  event to be an impossible one and 1 indicates a certain
  event.
 The probability of all the events in a sample space
  adds up to 1.
 Basic Probability Concepts
Marginal Probability
Joint Probability
Conditional Probability
Probability Trees and Bayes’ Theorem
Problems and Solutions on Probability
Question 1: Find the probability of ‘getting
3 on rolling a die’.
Solution:
 Sample Space = S = {1, 2, 3, 4, 5, 6}
 Total number of outcomes = n(S) = 6
 Let A be the event of getting 3.
 Number of favorable outcomes = n(A) = 1
 i.e. A = {3}
 Probability, P(A) = n(A)/n(S) = 1/6
 Hence, P(getting 3 on rolling a die) = 1/6
Question 2: Draw a random card from a pack of cards.
What is the probability that the card drawn is a face
card?
Solution:
  A standard deck has 52 cards.
  Total number of outcomes = n(S) = 52
  Let E be the event of drawing a face card.
  Number of favorable events = n(E) = 4 x 3 = 12
   (considered Jack, Queen and King only)
  Probability, P = Number of Favorable Outcomes/Total
   Number of Outcomes
  P(E) = n(E)/n(S)
  = 12/52
  = 3/13
  P(the card drawn is a face card) = 3/13
Question 3: A vessel contains 4 blue balls, 5 red balls
and 11 white balls. If three balls are drawn from the
vessel at random, what is the probability that the first
ball is red, the second ball is blue, and the third ball is
white?
 Solution:
 The probability to get the first ball is red or the first event is
  5/20.
 Since we have drawn a ball for the first event to occur, then
  the number of possibilities left for the second event to occur is
  20 – 1 = 19.
 Hence, the probability of getting the second ball as blue or the
  second event is 4/19.
 Again with the first and second event occurring, the number of
  possibilities left for the third event to occur is 19 – 1 = 18.
 And the probability of the third ball is white or the third event
  is 11/18.
 Therefore, the probability is 5/20 x 4/19 x 11/18 = 44/1368 =
  0.032.
 Or we can express it as: P = 3.2%.
Question 4: Two dice are rolled, find the probability that the sum
is:
1. equal to 1
2. less than 13
Solution:
 To find the probability that the sum is equal to 1 we have to first determine
  the sample space S of two dice as shown below.
 S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }
 So, n(S) = 36
 1) Let E be the event “sum equal to 1”. Since, there are no outcomes which
  where a sum is equal to 1, hence, P(E) = n(E) / n(S) = 0 / 36 = 0
 2) Let B be the event of getting the sum of numbers on dice is less than 13.
 From the sample space, we can see all possible outcomes for the event B,
  which gives a sum less than B. Like: (1,1) or (1,6) or (2,6) or (6,6). So you
  can see the limit of an event to occur is when both dies have number 6, i.e.
  (6,6). Thus, n(B) = 36
 Hence, P(B) = n(B) / n(S) = 36 / 36 = 1
Equally Likely Events
 When the events have the same theoretical probability of
  happening, then they are called equally likely events.
 The results of a sample space are called equally likely if all
  of them have the same probability of occurring.
    Getting 3 and 5 on throwing a die
    Getting an even number and an odd number on a die
    Getting 1, 2 or 3 on rolling a die
Complementary Events
 The possibility that there will be only two outcomes which
  states that an event will occur or not.
 Basically, the complement of an event occurring in the exact
  opposite that the probability of it is not occurring. Some
  more examples are:
• It will rain or not rain today
• The student will pass the exam or not pass.
• You win the lottery or you don’t.
Independent Events
Independent events are those events whose occurrence is not
dependent on any other event. For example, if we flip a coin in the
air and get the outcome as Head, then again if we flip the coin but
this time we get the outcome as Tail. In both cases, the occurrence
of both events is independent of each other.
   If the probability of occurrence of an event A is not affected by the
    occurrence of another event B, then A and B are said to be
    independent events.
   Consider an example of rolling a die.
   If A is the event ‘the number appearing is odd’ and B be the event
    ‘the number appearing is a multiple of 3’, then
   P(A)= 3/6 = 1/2 and P(B) = 2/6 = 1/3
   Also A and B is the event ‘the number appearing is odd and a
    multiple of 3’ so that
   P(A ∩ B) = 1/6
   P(A│B) = P(A ∩ B)/ P(B) = (1/6) / (1/3) = 1/2
   P(A) = P(A│B) = 1/2 , which implies that the occurrence of event B
    has not affected the probability of occurrence of the event A .
   If A and B are independent events, then P(A│B) = P(A)
   Using Multiplication rule of probability, P(A ∩ B) = P(B) .P(A│B)
  
Mutually Exclusive Events
Two events are said to be mutually exclusive if they
cannot occur at the same time or simultaneously.
 They are also called disjoint events.
 If two events are considered disjoint events, then the probability
  of both events occurring at the same time will be zero.
 If the events A and B are not mutually exclusive, the probability
  of getting A or B that is P (A ∪ B) formula is given as follows:
 P (A ∪ B) = P(A) + P(B) – P (A and B)
 Here P (A and B) means P(A ∩ B) is zero
 When tossing a coin, the event of getting head and tail are
  mutually exclusive.
 In a six-sided die, the events “2” and “5” are mutually exclusive.
Marginal Probability
 The probability of an event occurring (p(A)), it may be
  thought of as an unconditional probability.
 It is not conditioned on another event.
 Example: the probability that a card drawn is red
        (p(red) = 0.5).
 Another example: the probability that a card drawn is 4
        (p(four)=1/13).
Marginal Probability
Joint Probability
 It is the probability of two different event A and event
  B occurring at the same time.
 It is the probability of the intersection of two or more
  events.
 The probability of the intersection of A and B may be
  written p(A ∩ B) or p(A and B).
 Example: the probability that a card is a four and red
  =
 p(four and red) = 2/52=1/26.
 (There are two red fours in a deck of 52, the 4 of
  hearts and the 4 of diamonds).
Joint Probability
Question 1 At an e-commerce customer service centre a
total of 112 complaints were received. 78 customers
complained about late delivery of the items and 40
complained about poor product quality.
  (a) Calculate the probability that a customer complaint will be
  about both late delivery and product quality.
  (b) What is the probability that a complaint is only about poor
  quality of the product?
  Solution
  Let A = Late delivery and B = Poor quality of the product. Let
  n(A) and n(B) be the number of cases in favour of A and B. So
  n(A) = 78 and n(B) = 40. Since the total number of
  complaints is 112 (here complaints is treated as the sample
  space), hence
         n(A ∩ B) = 118 – 112 = 6
Joint Probability
Conditional Probability
 If A and B are events in a sample space, then the
  conditional probability of the event B given that the event A
  has already occurred, denoted by P(B|A), is defined as:
 The conditional probability symbol P(B|A) is read as the
  probability of B given A. It is necessary to satisfy the
  condition that P(A) > 0, because it does not make sense to
  consider the probability of B given that event A is
  impossible.
 the conditional probability of default given divorced is
       P(Default|Divorced) = 0.013/0.05 = 0.26 and
similarly probability of default given single is
       P(Default|Single) = 0.042/0.3 = 0.14
APPLICATION OF SIMPLE PROBABILITY
RULES – ASSOCIATION RULE LEARNING
 We can use simple probability concepts such as joint probability and
  conditional probability to solve analytics problems such as market
  basket analysis and recommender systems using algorithms such as
  Association Rule Learning (aka Association Rule Mining).
 Association rule mining is one of the popular algorithms used to
  solve problems such as market basket analysis and recommender
  systems.
 Market basket analysis (MBA) is used frequently by retailers to
  predict products a customer is likely to buy together, which further
  can be used for designing planogram and product promotions. The
  primary objective of MBA is to find probability of buying two
  products (A and B) together.
 Recommender systems are models that produce list of
  recommendations to a customer on products such as books, movies,
  news items, etc. and is an important analytics technique.
Association Rule Learning
  In general, association rule learning (also known as
    association rule mining) is a method of finding association
    between different entities in a database.
   In a retail context, association rule learning is a method for
    finding association relationships that exist in frequently
    purchased items.
   Association rule is a relationship of the form X → Y (that is,
    X implies Y). Here, X and Y are two mutually exclusive sets
    (set 3.2,
In Table of stock   keeping
              transaction ID isunits or SKUs).
the    transaction   reference
number and apple, orange, etc.
are the different SKUs sold by
the store. Binary code is used
to represent whether the SKU
was purchased (equal to 1) or
not (equal to 0) during a
transaction. The strength of
association    between     two
mutually exclusive subsets can
be measured using ‘support’,
Association Rule Learning
 Support   between two sets (of products purchased)          is
  calculated using the joint probability of those events:
 where n(X ∩ Y) is the number of times both X and Y is
  purchased together and N is the total number of transactions.
  That is, support is proportion of times X and Y are purchased
  together.
 Confidence is the conditional probability of purchasing
  product Y given the product X is purchased.
 The third measure in association rule mining is lift, which is
  given by
 Lift overcomes one of the disadvantages of using confidence.
  For example, P(X) could be very small, making it less
  attractive for MBA and recommendation among millions of
Association Rule Learning
 In Table 3.2, assume that X = Apple and Y = Banana. Then
 Association rules can be generated based on threshold
  values of support, confidence and lift. For example, assume
  that the cut-off for support is 0.25 and confidence is 0.5 (lift
  should be greater than 1). Then we can conclude that X
  implies Y (that is, purchase of apple implies purchase of
  banana, however this rule will be ineffective since lift is
  less than 1).
Bayes’ Theorem
 It describes the probability of an event, based on prior
  knowledge of conditions that might be related to that
  event.
 It can also be considered for conditional probability
  examples.
 It is used where the probability of occurrence of a
  particular event is calculated based on other conditions
  which are also called conditional probability.
 For example: There are 3 bags, each containing some
  white marbles and some black marbles in each bag. If a
  white marble is drawn at random. With probability to find
  that this white marble is from the first bag. In cases like
  such, we use the Bayes’ Theorem.
Bayes’ Theorem
 Bayes’ theorem is one of the most important concepts in analytics since several
  problems are solved using Bayesian statistics. Consider two events A and B. We can
  write the following two conditional probabilities:
 Using the two equations, we can show that
 Equation (3.13) is the Bayes’ theorem. Bayes’ theorem helps the data scientists to
  update the probability of an event (B) when any additional information is provided.
  This makes Bayesian statistics a very attractive technique since it helps the
  decision maker to fine-tune his/her belief with every additional data that is
  received.
 The following terminologies are used to describe various components in Eq.
(3.13).
   P(B) is called the prior probability (estimate of the probability without any additional
     information).
   P(B|A) is called the posterior probability (that is, given that the event A has
     occurred, what is the probability of occurrence of event B). That is, post the
     additional information (or additional evidence) that A has occurred, what is
     estimated probability of occurrence of B.
   P(A|B) is called the likelihood of observing evidence A if B is true.
Generalization of Bayes’ Theorem
 The probability of evidence P(A) may come from mutually
  exclusive subsets (events) as described in Figure 3.2.
 For better understating, consider a part manufactured by
  different suppliers B1 , B2 , ..., Bn . Let A denote a
  defective part. P(A) can be written as:
Bayes’ Theorem-example
There are three urns containing 3 white and 2 black balls; 2
white and 3 black balls; 1 black and 4 white balls respectively.
There is an equal probability of each urn being chosen. One
ball is equal probability chosen at random. what is the
probability that a white ball is drawn?
 Let E1, E2, and E3 be the events of choosing the first,
  second, and third urn respectively. Then,
 P(E1) = P(E2) = P(E3) =1/3
 Let E be the event that a white ball is drawn. Then,
 P(E/E1) = 3/5, P(E/E2) = 2/5, P(E/E3) = 4/5
 By theorem of total probability, we have
 P(E) = P(E/E1) . P(E1) + P(E/E2) . P(E2) + P(E/E3) . P(E3)
     = (3/5 * 1/3) + (2/5 * 1/3) + (4/5 * 1/3)
     = 9/15 = 3/5
Bayes’ Theorem- example
At an electronics plant, it is known from past
 experience that the probability is 0.83 that a new
 worker who has attended the company’s training
 program will meet the production quota and that
 the corresponding probability is 0.35 for a new
 worker who has not attended the company’s
 training program. If 80 % of all new workers
 attend the training program, what is the
 probability that a new worker will meet the
 production quota? Also, find the probability that a
 new worker who meets the production quota will
 have attended the company’s training program.
Bayes’ Theorem- example
60% of the companies that increased their share
 price by more than 5% in the last three years
 replaced their CEOs during the period.
At the same time, only 35% of the companies that
 did not increase their share price by more than
 5% in the same period replaced their CEOs.
 Knowing that the probability that the stock prices
 grow by more than 5% is 4%, find the probability
 that the shares of a company that fires its CEO
 will increase by more than 5%.
Bayes’ Theorem- example
Before finding the probabilities, you must first
define the notation of the probabilities.
• P(A) – the probability that the stock price
  increases by 5%
• P(B) – the probability that the CEO is replaced
• P(A|B) – the probability of the stock price
  increases by 5% given that the CEO has been
  replaced
• P(B|A) – the probability of the CEO replacement
  given the stock price has increased by 5%.
Using the Bayes’ theorem, we can find the
required probability:
Thus, the probability that the shares of a company
 that replaces its CEO will grow by more than 5% is
 6.67%.
Random Variable
 A variable is defined as any symbol that can take any
  particular set of values.
 If the value of a variable depends upon the outcome of
  a random experiment, it is a random variable and can take
  up any real value.
 Such an experiment, where we know the set of all possible
  results but find its impossible to predict one at any
  particular execution, is a random experiment.
 Mathematically, a random variable is a real-valued function
  whose domain is a sample space S of a random experiment.
 Random variable is always denoted by capital letter like
  X,Y,M.
 Lowercase letters like x, y, z, m etc. represent its value.
Random Variable
 X denotes the Probability Distribution of random variable
  X.
 P(X) denotes the Probability of X.
 p(X=x) denotes the Probability that random variable X is
  equivalent to any particular value, represented by x.
 Experiment is tossing a coin 2 times.
 Sample space(S) is {HH, TH, HT, TT}.
 X(Random Variable) is the number of both heads when we
  toss a coin 2 times.
 P(X=HT)=0.25,        P(X=TT)=0.25,         P(X=HH)=0.25,
  P(X=HT)=0.25
 For outcome {HT},
 Then, X(HH) = 0, X(TH) = 0, X(HT) = 1, X(TT) = 0.
Random Variable
Since there are two forms of data, discrete and
 continuous, there are two types of random
 variables.
It can be categorized into two types:
   Discrete Random Variable
   Continuous Random variable
Discrete random variable
 If the random variable X can assume only a finite or countably
  infinite set of values, then it is called a discrete random variable.
  There are very many situations where the random variable X can
  assume only finite or countably infinite set of values.
Examples of discrete random variables are:
 Credit rating (usually classified into different categories such as
  low, medium and high or using labels such as AAA, AA, A, BBB,
  etc.).
 Number of orders received at an e-commerce retailer which can
  be countably infinite.
 Customer churn [the random variables take binary values: (a)
  Churn and (b) Do not churn].
 Fraud [the random variables take binary values: (a) Fraudulent
  transaction and (b) Genuine transaction].
 Any experiment that involves counting (for example, number of
  returns in a day from customers of e-commerce portals such as
  Amazon, Flipkart; number of customers not accepting job offers
Continuous random variable
 A random variable X which can take a value from an infinite set
  of values is called a continuous random variable.
Examples of continuous random variables are listed below:
 Market share of a company (which take any value from an
  infinite set of values between 0 and 100%).
 Percentage of attrition among employees of an organization.
 Time to failure of engineering systems.
 Time taken to complete an order placed at an e-commerce
  portal.
 Time taken to resolve a customer complaint at call and service
  centers.
 Height, Weight, Amount of rainfall, etc.
In many situations, a continuous variable may be converted to a
discrete random variable for modelling purpose.
Probability Distributions
 A probability distribution is a function that calculates the
  likelihood of all possible values for a random variable.
 For any event of a random experiment, we can find its
  corresponding probability.
 For different values of the random variable, we can find
  its respective probability.
 The values of random variables along with the
  corresponding        probabilities     are   the    probability
  distribution of the random variable.
 A probability distribution and probability mass functions
  can both be used to define a discrete probability distribution.
 A continuous probability distribution is described using a
  probability distribution function and a probability density
  function.
Discrete Random Variables
 The probability distribution of a discrete random variable is
  a list of probabilities associated with each of its possible
  values.
 It is also sometimes called the probability function or the
  probability mass function.
 More formally, the probability distribution of a discrete
  random variable X is a function which gives the probability
  p(xi) that the random variable equals xi, for each value xi:
 p(xi) = P(X=xi)
 It satisfies the following conditions:
• 0 <= p(xi) <= 1
• sum of all p(xi) is 1
Discrete Random Variables
 Consider a random variable X= number of heads after tossing
  a coin thrice.
 x ∈ {0,1,2,3}.
 All the possible outcomes after a coin is flipped thrice are,
  {HHH,HHT,HTT,TTT,TTH,THH,THT,HTH}.
 What will be the probability that 0 heads occur?
 We denote it as P(X=0)=1/8=0.125
 probability of getting exactly 1 head=P(X=1)=3/8=0.375
 P(X=2)=3/8=0.375
 P(x=3)=1/8=0.125
 If we sum up the probabilities of all outcomes, it will be equal
  to one. This gives us the Probability Distribution of that
  random variable.
Probability Mass Function(PMF) of Discrete
Random Variable
 In the case of Discrete Random Variables, the function that
  denotes the probability of the random variable for each x in
  the range of X is known as the Probability Mass
  Function(PMF).
 It can be shown using tables or graph or mathematical
  equation.
  Probability Distribution Function(PDF)
  of Discrete Random Variable
 Note that the values of x take on all possible cases. and the sum
  of the probabilities add to 1. mathematically, this can be written
  as f(x) = p(x = x). the set of ordered pairs (x, f(x)) is called
  the probability                function, probability         mass
  function or probability distribution function of the discrete
  random variable x. f(x) is considered a probability mass function
  if it satisfies the following conditions:
 In case of rolling of a die, the probability of each value X can
  take is the same. So the probability distribution in this case will
  be:
  P(X=1) = 1/6, P(X=2) = 1/6 and so on.
 Note that the values of x take on all possible cases. And the sum
  of the probabilities add to 1.
Probability Distribution Function(PDF)
of Discrete Random Variable
 Mathematically, this can be written as f(x) = P(X = x).
 The set of ordered pairs (x, f(x)) is called the probability
  function, probability     mass     function or probability
  distribution function of the discrete random variable X.
 f(x) is considered a probability mass function if it satisfies
  the following conditions:
  Example
Cumulative distribution Function(CDF) of
Discrete Random Variable
 However, many times we may wish to compute the
  probability that the random variable X be lesser than or
  equal to some real number x.
 Writing F(x) = P(X ≤ x) for every real number x, we define
  F(x) to be the cumulative distribution function of the
  random variable X.
Example 1
Cumulative distribution function, F(xi ), is the probability that the
random variable X takes values less than or equal xi . That is, F(xi
) = P(X ≤ xi ).
 F(2) = P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.60
Example 2
The probability of X is less than or 1 is 0.1.
Similarly, probability of X is less than or equal to 2 is (0.1+0.3)
=0.4 and so on.
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
 The probability of a continuous random variable assuming
  exactly any of its values is 0.
 Hence, the probability distribution for a continuous random
  variable cannot be given in tabular form.
 The probability density function of a continuous random variable
  is a function which can be integrated to obtain the probability
  that the random variable takes a value in a given interval.
 The probability for a continuous random variable is always
  computed at intervals : P(a≤X≤b).
 The probability distribution of a continuous random variable can be
  stated as a formula; and f(x) is called the probability density function,
  or simply a density function, of X.
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
 Here probability is given by the surface area under the
  curve with a interval. To find the probability of a certain
  interval, say a to b, we find the area under that curve by
  integrating the PDF in that interval.
 fₓ(x) is the Probability Density Function
 And the cumulative distribution function F(x) of a
  continuous random variable is given by:
PROBABILITY DENSITY FUNCTION (PDF) AND CUMULATIVE
DISTRIBUTION FUNCTION (CDF) OF A CONTINUOUS RANDOM
VARIABLE
 Probability density function and cumulative distribution
  function of a continuous random variable satisfy the
  following properties:
 The expected value of a continuous random variable, E(X),
  is given by
 The variance of a continuous random variable, Var(X), is
  given by
Probability Distribution
Summary
Depending on type of random variables, its
  probability distribution can be categorized into :
1. Discrete probability distributions
2. Continuous probability distributions
Bernoulli Distribution
 This distribution is generated when we perform an experiment
  once and it has only two possible outcomes – success and
  failure.
 The trials of this type are called Bernoulli trials, which form
  the basis for many distributions
 Let p be the probability of success and 1 – p is the probability
  of failure.
 The PMF is given as
Examples:
 flipping a coin once. p is the probability of getting a head and
  1 – p is the probability of getting a tail.
 Will you pass or fail a test?
 Will your favourite sports team win or lose their next match?
 Will you be accepted or rejected for that job you applied for?
Binomial Distribution
Binomial distribution is one of the most important discrete
probability distribution due to its applications in several contexts.
A random variable X is said to follow a Binomial distribution when
1. The random variable can have only two outcomes success and failure
(also known as Bernoulli trials).
2. The objective is to find the probability of getting k successes out of n
trials.
3. The probability of success is p and thus the probability of failure is (1 −
p).
4. The probability p is constant and does not change between trials .
Success and failure are generic terminologies used in binomial
distribution; based on the context, the interpretation will change (winning
a lottery can be considered as success and not winning as failure).
Binomial Distribution
In analytics, the following are few example problems that can
be associated with Binomial distribution:
 Customer churn where the outcomes are: (a) Customer churn and (b) No
  customer churn.
 Fraudulent insurance claims where the outcomes are: (a) Fraudulent
  claim and (b) Genuine claim.
 Loan repayment default by a customer where the outcomes are: (a)
  Default and (b) No default.
 Cart abandonment in e-commerce (a situation where the customer adds
  items to his/her cart but does not make the purchase), where the
  outcomes are: (a) Cart abandonment and (b) No cart abandonment.
 Employee attrition at a company where the outcomes are: (a) The
  employee leaves (exits) the company and (b) The employee does not
  leave the company.
Any business context in which there are only two outcomes can be
analysed using Binomial distribution
Binomial Distribution
Example:
 Flipping a coin n number of times and calculating the
  probabilities of getting a particular number of heads.
 More real-world examples include the number of successful
  sales calls for a company or whether a drug works for a
  disease or not.
 Number of winning lottery tickets when you buy 10 tickets
  of the same kind
 Number of left-handers in a randomly selected sample of
  100 unrelated people
Binomial Distribution
Binomial Distribution
 For example, suppose we shuffle a standard deck of cards,
  and we turn over the top card. We put the card back in the
  deck and reshuffle. We repeat this process five times. Let X
  equal the number of Jacks we observe. Is this a binomial
  distribution?
 B – binary – yes, either it’s a Jack or it isn’t
 I – independent – yes, because we replace the card each
  time, the trials are independent.
 N – number of trials fixed in advance – yes, we are told to
  repeat the process five times.
 S – successes (probability of success) are the same – yes, the
  likelihood of getting a Jack is 4 out of 52 each time you turn
  over a card.
 Therefore, this is an example of a binomial distribution.
 Suppose that Charlie makes a free throw has probability of
 0.82 on any one try. Assuming that this probability doesn’t
 change, find the chance that Charlie makes 4 out of the next
 seven free throws.
 let’s determine the number of free throws
 Charlie should expect to make and the standard
 deviation.
Example
Example
Poisson Distribution
It describes the events that occur in a fixed
 interval of time or space.
 Examples:
Consider the case of the number of calls received by
 a customer care center per hour. We can estimate the
 average number of calls per hour but we cannot
 determine the exact number and the exact time at
 which there is a call. Each occurrence of an event is
 independent of the other occurrences.
The PMF is given as,
Poisson Distribution
where λ is the average number of times the event
 has occurred in a certain period of time,
x is the poisson random variable (with desired
 outcome)
and e is the base of logarithm , Euler’s number,
 and e = 2.71828 (approx).
Properties of Poisson
Distribution
The occurrence of the event are independent in an
 interval.
An infinite number of occurrences of the of the event
 are possible in the interval.
The probability of a single event in the interval is
 proportional to the length of the event.
In an infinitely small portion of the interval, the
 probability of more than one occurrence of the event is
 negligible.
The Poisson distribution is limited when the number of
 trials n is indefinitely large.
If the mean is large, then the Poisson distribution is
 approximately a normal distribution.
Poisson Distribution
In Poisson distribution, the mean is represented
 as μ = E(X) = λ.
The    mean and the variance of Poisson
 Distribution are equal. It means that E(X) = V(X)
Where,
V(X) is the variance.
The standard deviation is always equal to the
 square root of the mean μ.
Applications of Poisson Distribution
 • To count the number of defects of a finished
     product
 •   To count the number of deaths in a country
     by any disease or natural calamity
 •   To count the number of infected plants in the
     field
 •   To count the number of bacteria in the
     organisms or the radioactive decay in atoms
 •   To calculate the waiting time between the
     events.
Poisson Distribution-Example
In a cafe, the customer arrives at a mean rate of 2 per
 min. Find the probability of arrival of 5 customers in 1
 minute using the Poisson distribution formula.
Solution:
Given: λ = 2, and x = 5.
Using the Poisson distribution formula:
P(X = x) = (e-λ λx )/x!
P(X = 5) = (e-2 25 )/5!
P(X = 5) = 0.036
Answer: The probability of arrival of 5 customers
 per minute is 3.6%.
Poisson Distribution-Example
Find the mass probability of function at x = 6, if
 the value of the mean is 3.4.
Solution:
Given: λ = 3.4, and x = 6.
Using the Poisson distribution formula:
P(X = x) = (e-λ λx )/x!
P(X = 6) = (e-3.4 3.46 )/6!
P(X = 6) = 0.072
Answer: The probability of function is 7.2%.
Poisson Distribution-Example
 If 3% of electronic units manufactured by a company are
  defective. Find the probability that in a sample of 200 units,
  less than 2 bulbs are defective.
 Solution:
 The probability of defective units p = 3/100 = 0.03
 Give n = 200.
 We observe that p is small and n is large here. Thus it is a Poisson
  distribution.
 Mean λ= np = 200 × 0.03 = 6
 P(X= x) is given by the Poisson Distribution Formula as (e -λ λx )/x!
 P(X < 2) = P(X = 0) + P(X= 1)
 =(e-6 60 )/0! + (e-661 )/1!
 = e-6 + e-6 × 6
 = 0.00247 + 0.0148
 P(X < 2) = 0.01727
 Answer: The probability that less than 2 bulbs are defective
Continuous probability distributions
  These distributions model the probabilities of random
   variables that can have any possible outcome, also real.
  Two continuous probability distribution function are
   associated with such continuous random variables:
  Probability Density Function (PDF)
  Cumulative Density/Distribution Function(CDF)
  For example, the possible values for the random
   variable X that represents weights of citizens in a town
   which can have any value like 34.5, 47.7, etc.,
  Examples:      Normal,    Student’s     T,   Chi-square,
   Exponential, etc.,
Probability Density function
Probability Density function (PDF) estimate
 the probability that it lies within a particular
 range      of   values      for    any     given
 outcome(continuous)          for     continuous
 distributions.
Cumulative Distribution Function
The Cumulative Distribution Function of X,
 evaluated at x is the probability that X will
 take a value less than or equal to x.
 Continuous probability distributions
When working with continuous random variables,
 such as X, we only calculate the probability
 that X lie within a certain interval;
like P(X≤k) or P(a≤X≤b).
We don't calculate the probability of X being
 equal to a specific value k.
In fact the result, P(X=k)=0 , will always be true:
This can be explained by the fact that the total
 number of possible values of a continuous random
 variable X is infinite, so the likelihood of any one
 single outcome tends towards 0.
Continuous probability distributions
The idea is to integrate the probability density
 function f(x) to define a new function F(x),
 known as the cumulative density function.
To calculate the probability that X be within a
 certain range,
say a≤X≤b, we calculate F(b)−F(a), using
 the cumulative density function.
Put "simply" we calculate probabilities as:
P(a≤X≤b)=
where f(x) is the variable's probability density
 function.
Normal Distribution
It has two parameters namely mean and standard
 deviation.
The mean has the highest probability and all other
 values are distributed equally on either side of the
 mean in a symmetric fashion.
The standard normal distribution is a special case
 where the mean is 0 and the standard deviation of
 1.
68% of the values are 1 standard deviation away,
 95% percent of them are 2 standard deviations
 away, 99.7% are 3 standard deviations away from
 mean.
Normal Distribution
 The standard normal distribution is one of the forms of the
 normal distribution. It occurs when a normal random variable
 has a mean equal to zero and a standard deviation equal to one.
 In other words, a normal distribution with a mean 0 and
 standard deviation of 1 is called the standard normal
 distribution. Also, the standard normal distribution is centred
 at zero, and the standard deviation gives the degree to which a
 given measurement deviates from the mean.
 A Z score represents how many standard deviations an
 observation is away from the mean. The mean of the standard
 normal distribution is 0. Z scores above the mean are positive
 and Z scores below the mean are negative.
 Once you have computed a Z-score, you can look up the
 probability in a table for the standard normal distribution
Standard Normal Distribution
 The random variable of a standard normal distribution is known as
  the standard score or a z-score. It is possible to transform every
  normal random variable X into a z score using the following
  formula:
 z = (X – μ) / σ
 where X is a normal random variable, μ is the mean of X, and σ is
  the standard deviation of X. You can also find the normal
  distribution formula here. In probability theory, the normal or
  Gaussian distribution is a very common continuous probability
  distribution.
 Standardizing a normal distribution When you standardize a normal
  distribution, the mean becomes 0 and the standard deviation
  becomes
 This allows you to easily calculate the probability of certain
  values occurring in your distribution, or to compare data sets
  with different means and standard deviations.
Normal Distribution
The PDF is given by,
where μ is the mean of the random variable X and σ is the standard
deviation.
Example
Example
Example
Z table
Z table
Example
Example
Example2
Example2
Example2
Exponential Distribution
To predict the amount of waiting time until the
  next event in a Poisson process (i.e., success,
  failure, arrival, etc.).
For example, we want to predict the following:
• The amount of time until the customer finishes
  browsing and actually purchases something in
  your store (success).
• The amount of time until the hardware on AWS
  EC2 fails (failure).
• The amount of time you need to wait until the bus
  arrives (arrival).
  Exponential Distribution
where λ is the rate parameter. λ = 1/(average time between
events) = 1/μ
and e=2.71828
The mean of the exponential distribution is 1/λ.
And the variance of the exponential distribution is 1/λ2.
 Exponential Distribution
For example, suppose the mean number of minutes
 between eruptions for a certain geyser is 40 minutes.
 If a geyser just erupts, what is the probability that
 we’ll have to wait less than 50 minutes for the next
 eruption?
To solve this, we need to calculate rate parameter:
λ = 1/μ => λ = 1/40 => λ = .025
plug in λ = .025 and x = 50 to the formula for the
 CDF:
P(X ≤ x) = 1 – e-λx => P(X ≤ 50) = 1 – e-.025(50)
P(X ≤ 50) = 0.7135
Exponential Distribution
 Assume that you usually get 2 phone calls per hour.
  calculate the probability, that a phone call will come
  within the next hour.
 Solution:
 It is given that, 2 phone calls per hour.
 So, it would expect that one phone call at every half-
  an-hour.
 So, we can take λ = 0.5
 So, the computation is as follows:
                              = 0.393469
 Therefore, the probability of arriving the phone calls
  within the next hour is 0.393469