KEMBAR78
Probability Distributions Guide | PDF | Probability Distribution | Chi Squared Distribution
100% found this document useful (4 votes)
4K views198 pages

Probability Distributions Guide

This document provides an overview of probability distributions, which describe the possible values of random variables and the probabilities associated with those values. It discusses key concepts such as discrete and continuous distributions, probability mass functions, probability density functions, and cumulative distribution functions. It also describes how to simulate sampling from a probability distribution using the inverse of the cumulative distribution function.

Uploaded by

gregoriopiccoli
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
4K views198 pages

Probability Distributions Guide

This document provides an overview of probability distributions, which describe the possible values of random variables and the probabilities associated with those values. It discusses key concepts such as discrete and continuous distributions, probability mass functions, probability density functions, and cumulative distribution functions. It also describes how to simulate sampling from a probability distribution using the inverse of the cumulative distribution function.

Uploaded by

gregoriopiccoli
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 198

Probability Distributions

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information.
PDF generated at: Fri, 01 Oct 2010 15:16:46 UTC
Contents
Articles
Probability distribution 1

Continous Distributions 7
Beta distribution 7
Burr distribution 13
Cauchy distribution 15
Chi-square distribution 21
Dirichlet distribution 27
F-distribution 34
Gamma distribution 37
Exponential distribution 45
Erlang distribution 53
Kumaraswamy distribution 57
Inverse Gaussian distribution 60
Laplace distribution 64
Lévy distribution 68
Log-logistic distribution 71
Log-normal distribution 78
Logistic distribution 84
Normal distribution 88
Pareto distribution 108
Student's t-distribution 117
Uniform distribution (continuous) 129
Weibull distribution 134

Discrete distributions 139


Bernoulli distribution 139
Beta-binomial distribution 141
Binomial distribution 147
Uniform distribution (discrete) 155
Geometric distribution 157
Hypergeometric distribution 162
Negative binomial distribution 169
Multivariate distributions 176
Multinomial distribution 176
Multivariate normal distribution 179
Wishart distribution 186

References
Article Sources and Contributors 190
Image Sources, Licenses and Contributors 193

Article Licenses
License 195
Probability distribution 1

Probability distribution
In probability theory and statistics, a probability distribution identifies either the probability of each value of a
random variable (when the variable is discrete), or the probability of the value falling within a particular interval
(when the variable is continuous).[1] The probability distribution describes the range of possible values that a random
variable can attain and the probability that the value of the random variable is within any (measurable) subset of that
range.
When the random variable takes values
in the set of real numbers, the
probability distribution is completely
described by the cumulative
distribution function, whose value at
each real x is the probability that the
random variable is smaller than or
equal to x.

The concept of the probability


distribution and the random variables
which they describe underlies the The Normal distribution, often called the "bell curve".

mathematical discipline of probability


theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a
population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.); almost all measurements are
made with some intrinsic error; in physics many processes are described probabilistically, from the kinetic properties
of gases to the quantum mechanical description of fundamental particles. For these and many other reasons, simple
numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate.

There are various probability distributions that show up in various different applications. Two of the most important
ones are the normal distribution and the categorical distribution. The normal distribution, also known as the Gaussian
distribution, has a familiar "bell curve" shape and approximates many different naturally occurring distributions over
real numbers. The categorical distribution describes the result of an experiment with a fixed, finite number of
outcomes. For example, the toss of a fair coin is a categorical distribution, where the possible outcomes are heads
and tails, each with probability 1/2.

Formal definition
In the measure-theoretic formalization of probability theory, a random variable is defined as a measurable function X
from a probability space to measurable space . A probability distribution is the pushforward
measure X*P = PX −1 on .

Probability distributions of real-valued random variables


Because a probability distribution Pr on the real line is determined by the probability of a real-valued random
variable X being in a half-open interval (-∞, x], the probability distribution is completely characterized by its
cumulative distribution function:
Probability distribution 2

Discrete probability distribution


A probability distribution is called discrete if its cumulative distribution function only increases in jumps. More
precisely, a probability distribution is discrete if there is a finite or countable set whose probability is 1.
For many familiar discrete distributions, the set of possible values is topologically discrete in the sense that all its
points are isolated points. But, there are discrete distributions for which this countable set is dense on the real line.
Discrete distributions are characterized by a probability mass function, such that

Continuous probability distribution


By one convention, a probability distribution is called continuous if its cumulative distribution function
is continuous and, therefore, the probability measure of singletons for all .
Another convention reserves the term continuous probability distribution for absolutely continuous distributions.
These distributions can be characterized by a probability density function: a non-negative Lebesgue integrable
function defined on the real numbers such that

Discrete distributions and some continuous distributions (like the Cantor distribution) do not admit such a density.

Terminology
The support of a distribution is the smallest closed interval/set whose complement has probability zero. It may be
understood as the points or elements that are actual members of the distribution.
A discrete random variable is a random variable whose probability distribution is discrete. Similarly, a continuous
random variable is a random variable whose probability distribution is continuous.

Simulated sampling
The following algorithm lets one sample from a probability distribution (either discrete or continuous). This
algorithm assumes that one has access to the inverse of the cumulative distribution (easy to calculate with a discrete
distribution, can be approximated for continuous distributions) and a computational primitive called "random()"
which returns an arbitrary-precision floating-point-value in the range of [0,1).
define function sampleFrom(cdfInverse (type="function")):

// input:

// cdfInverse(x) - the inverse of the CDF of the probability distribution

// example: if distribution is [[Gaussian]], one can use a [[Taylor approximation]] of the inverse of [[erf]](x)

// example: if distribution is discrete, see explanation below pseudocode

// output:

// type="real number" - a value sampled from the probability distribution represented by cdfInverse

r = random()

while(r == 0): (make sure r is not equal to 0; discontinuity possible)

r = random()

return cdfInverse(r)
Probability distribution 3

For discrete distributions, the function cdfInverse (inverse of cumulative distribution function) can be calculated
from samples as follows: for each element in the sample range (discrete values along the x-axis), calculating the total
samples before it. Normalize this new discrete distribution. This new discrete distribution is the CDF, and can be
turned into an object which acts like a function: calling cdfInverse(query) returns the smallest x-value such that the
CDF is greater than or equal to the query.
define function dataToCdfInverse(discreteDistribution (type="dictionary"))

// input:

// discreteDistribution - a mapping from possible values to frequencies/probabilities

// example: {0 -> 1-p, 1 -> p} would be a [[Bernoulli distribution]] with chance=p

// example: setting p=0.5 in the above example, this is a [[fair coin]] where P(X=1)->"heads" and P(X=0)->"tails"

// output:

// type="function" - a function that represents (CDF^-1)(x)

define function cdfInverse(x):

integral = 0

go through mapping (key->value) in sorted order, adding value to integral...

stop when integral > x (or integral >= x, doesn't matter)

return last key we added

return cdfInverse

Note that often, mathematics environments and computer algebra systems will have some way to represent
probability distributions and sample from them. This functionality might even have been developed in third-party
libraries. Such packages greatly facilitate such sampling, most likely have optimizations for common distributions,
and are likely to be more elegant than the above bare-bones solution.

Some properties
• The probability density function of the sum of two independent random variables is the convolution of each of
their density functions.
• The probability density function of the difference of two independent random variables is the cross-correlation
of their density functions.
• Probability distributions are not a vector space – they are not closed under linear combinations, as these do not
preserve non-negativity or total integral 1 – but they are closed under convex combination, thus forming a convex
subset of the space of functions (or measures).

Common probability distributions


The following is a list of some of the most common probability distributions, grouped by the type of process that
they are related to. For a more complete list, see list of probability distributions, which groups by the nature of the
outcome being considered (discrete, continuous, multivariate, etc.)
Note also that all of the univariate distributions below are singly-peaked; that is, it is assumed that the values cluster
around a single point. In practice, actually-observed quantities may cluster around multiple values. Such quantities
can be modeled using a mixture distribution.
Probability distribution 4

Related to real-valued quantities that grow linearly (e.g. errors, offsets)


• Normal distribution (aka Gaussian distribution), for a single such quantity; the most common continuous
distribution
• Multivariate normal distribution (aka multivariate Gaussian distribution), for vectors of correlated outcomes that
are individually Gaussian-distributed

Related to positive real-valued quantities that grow exponentially (e.g. prices, incomes,
populations)
• Log-normal distribution, for a single such quantity whose log is normally distributed
• Pareto distribution, for a single such quantity whose log is exponentially distributed; the prototypical power law
distribution

Related to real-valued quantities that are assumed to be uniformly distributed over a


(possibly unknown) region
• Discrete uniform distribution, for a finite set of values (e.g. the outcome of a fair die)
• Continuous uniform distribution, for continuously-distributed values

Related to Bernoulli trials (yes/no events, with a given probability)

Basic distributions
• Bernoulli distribution, for the outcome of a single Bernoulli trial (e.g. success/failure, yes/no)
• Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed total
number of independent occurrences
• Negative binomial distribution, for binomial-type observations but where the quantity of interest is the number of
failures before a given number of successes occurs
• Geometric distribution, for binomial-type observations but where the quantity of interest is the number of failures
before the first success; a special case of the negative binomial distribution

Related to sampling schemes over a finite population


• Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed
number of total occurrences, using sampling with replacement
• Hypergeometric distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a
fixed number of total occurrences, using sampling without replacement
• Beta-binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed
number of total occurrences, sampling using a Polya urn scheme (in some sense, the "opposite" of sampling
without replacement)
Probability distribution 5

Related to categorical outcomes (events with K possible outcomes, with a given probability
for each outcome)
• Categorical distribution, for a single categorical outcome (e.g. yes/no/maybe in a survey); a generalization of the
Bernoulli distribution
• Multinomial distribution, for the number of each type of catergorical outcome, given a fixed number of total
outcomes; a generalization of the binomial distribution
• Multivariate hypergeometric distribution, similar to the multinomial distribution, but using sampling without
replacement; a generalization of the hypergeometric distribution

Related to events in a Poisson process (events that occur independently with a given rate)
• Poisson distribution, for the number of occurrences of a Poisson-type event in a given period of time
• Exponential distribution, for the time before the next Poisson-type event occurs

Useful for hypothesis testing related to normally-distributed outcomes


• Chi-square distribution, the distribution of a sum of squared standard normal variables; useful e.g. for inference
regarding the sample variance of normally-distributed samples (see chi-square test)
• Student's t distribution, the distribution of the ratio of a standard normal variable and the square root of a scaled
chi squared variable; useful for inference regarding the mean of normally-distributed samples with unknown
variance (see Student's t-test)
• F-distribution, the distribution of the ratio of two scaled chi squared variables; useful e.g. for inferences that
involve comparing variances or involving R-squared (the squared correlation coefficient)

Useful as conjugate prior distributions in Bayesian inference


• Beta distribution, for a single probability (real number between 0 and 1); conjugate to the Bernoulli distribution
and binomial distribution
• Gamma distribution, for a non-negative scaling parameter; conjugate to the rate parameter of a Poisson
distribution or exponential distribution, the precision (inverse variance) of a normal distribution, etc.
• Dirichlet distribution, for a vector of probabilities that must sum to 1; conjugate to the categorical distribution and
multinomial distribution; generalization of the beta distribution
• Wishart distribution, for a symmetric non-negative definite matrix; conjugate to the inverse of the covariance
matrix of a multivariate normal distribution; generalzation of the gamma distribution

See also
• Copula (statistics) • Inverse transform • Probability density function
• Cumulative distribution function sampling • Random variable
• Histogram • Likelihood function • Riemann–Stieltjes integral application to probability theory
• List of statistical topics

External links
• An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the
randomness of the beans dropping through the quincunx pattern. [2] from Index Funds Advisors IFA.com [3],
youtube.com
• Interactive Discrete and Continuous Probability Distributions [4], socr.ucla.edu
• A Compendium of Common Probability Distributions [5]
• A Compendium of Distributions [6], vosesoftware.com
Probability distribution 6

• Statistical Distributions - Overview [7], xycoon.com


• Probability Distributions [8] in Quant Equation Archive, sitmo.com
• A Probability Distribution Calculator [9], covariable.com
• Sourceforge.net [10], Distribution Explorer: a mixed C++ and C# Windows application that allows you to explore
the properties of 20+ statistical distributions, and calculate CDF, PDF & quantiles. Written using open-source
C++ from the Boost.org [11] Math Toolkit library.
• Explore different probability distributions and fit your own dataset online - interactive tool [12], xjtek.com

References
[1] Everitt, B.S. (2006) The Cambridge Dictionary of Statistics, Third Edition. pp. 313–314. Cambridge University Press, Cambridge. ISBN
0521690277
[2] http:/ / www. youtube. com/ watch?v=AUSKTk9ENzg
[3] http:/ / www. ifa. com
[4] http:/ / www. socr. ucla. edu/ htmls/ SOCR_Distributions. html
[5] http:/ / www. causascientia. org/ math_stat/ Dists/ Compendium. pdf
[6] http:/ / www. vosesoftware. com/ content/ ebook. pdf
[7] http:/ / www. xycoon. com/ contdistroverview. htm
[8] http:/ / www. sitmo. com/ eqcat/ 8
[9] http:/ / www. covariable. com/ continuous. html
[10] http:/ / sourceforge. net/ projects/ distexplorer/
[11] http:/ / www. boost. org
[12] http:/ / www. xjtek. com/ anylogic/ demo_models/ 111/
7

Continous Distributions

Beta distribution
Beta

Probability density function

Cumulative distribution function

parameters: shape (real)


shape (real)
support:
pdf:

cdf:
mean:

median: no closed form


mode:
for

variance:

skewness:

ex.kurtosis: see text


entropy: see text
mgf:

cf:
Beta distribution 8

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined
on the interval (0, 1) parameterized by two positive shape parameters, typically denoted by α and β. It is the special
case of the Dirichlet distribution with only two parameters. Just as the Dirichlet distribution is the conjugate prior of
the multinomial distribution and categorical distribution, the beta distribution is the conjugate prior of the binomial
distribution and bernoulli distribution. In Bayesian statistics, it can be seen as the likelihood of the parameter p of a
binomial distribution from observing α − 1 independent events with probability p and β − 1 with probability 1 − p.

Characterization

Probability density function


The probability density function of the beta distribution is:

where is the gamma function. The beta function, B, appears as a normalization constant to ensure that the total
probability integrates to unity.

Cumulative distribution function


The cumulative distribution function is

where is the incomplete beta function and is the regularized incomplete beta function.

Properties
The expected value ( ), variance (second central moment), skewness (third central moment), and kurtosis excess
(forth central moment) of a Beta distribution random variable X with parameters α and β are:

The skewness is

The kurtosis excess is:

In general, the th raw moment is given by


Beta distribution 9

where is a Pochhammer symbol representing rising factorial. It can also be written in a recursive form as

One can also show that

Quantities of information
Given two beta distributed random variables, X ~ Beta(α, β) and Y ~ Beta(α', β'), the information entropy of X is [1]
where is the digamma function.
The cross entropy is
It follows that the Kullback–Leibler divergence between these two beta distributions is

Shapes
The beta density function can take on different shapes depending on the values of the two parameters:
• is the uniform [0,1] distribution
• is U-shaped (red plot)
• or is strictly decreasing (blue plot)
• is strictly convex
• is a straight line
• is strictly concave
• or is strictly increasing (green plot)
• is strictly convex
• is a straight line
• is strictly concave
• is unimodal (purple & black plots)
Moreover, if then the density function is symmetric about 1/2 (red & purple plots).

Parameter estimation
Let

be the sample mean and

be the sample variance. The method-of-moments estimates of the parameters are

When the distribution is required over an interval other than [0, 1], say , then replace with and

with in the above equations.[2] [3]


Beta distribution 10

There is no closed-form of the maximum likelihood estimates for the parameters.

Related distributions
• If X has a beta distribution, then T = X/(1 − X) has a "beta distribution of the second kind", also called the beta
prime distribution.
• The connection with the binomial distribution is mentioned below.
• The Beta(1,1) distribution is identical to the standard uniform distribution.
• If X has the Beta(3/2,3/2) distribution and R > 0 is a real parameter, then Y := 2RX – R has the Wigner semicircle
distribution.
• If X and Y are independently distributed Gamma(α, θ) and Gamma(β, θ) respectively, then X / (X + Y) is
distributed Beta(α, β).
• If X and Y are independently distributed Beta(α,β) and F(2β, 2α) (Snedecor's F distribution with 2β and 2α
degrees of freedom), then Pr(X ≤ α/(α + xβ)) = Pr(Y > x) for all x > 0.
• The beta distribution is a special case of the Dirichlet distribution for only two parameters.
• The Kumaraswamy distribution resembles the beta distribution.
• If has a uniform distribution, then , which is a special case of the Beta
distribution called the power-function distribution.
• Binomial opinions in subjective logic are equivalent to Beta distributions.
• Beta(1/2,1/2) is the Jeffreys prior for a proportion and is equivalent to arcsine distribution.
Beta(i, j) with integer values of i and j is the distribution of the i-th order statistic (the i-th smallest value) of a sample
of i + j − 1 independent random variables uniformly distributed between 0 and 1. The cumulative probability from 0
to x is thus the probability that the i-th smallest value is less than x, in other words, it is the probability that at least i
of the random variables are less than x, a probability given by summing over the binomial distribution with its p
parameter set to x. This shows the intimate connection between the beta distribution and the binomial distribution.

Applications

Rule of succession
A classic application of the beta distribution is the rule of succession, introduced in the 18th century by Pierre-Simon
Laplace in the course of treating the sunrise problem. It states that, given s successes in n conditionally independent
Bernoulli trials with probability p, that p should be estimated as . This estimate may be regarded as the

expected value of the posterior distribution over p, namely Beta(s + 1, n − s + 1), which is given by Bayes' rule if one
assumes a uniform prior over p (i.e., Beta(1, 1)) and then observes that p generated s successes in n trials.

Bayesian statistics
Beta distributions are used extensively in Bayesian statistics, since beta distributions provide a family of conjugate
prior distributions for binomial (including Bernoulli) and geometric distributions. The Beta(0,0) distribution is an
improper prior and sometimes used to represent ignorance of parameter values.

Task duration modeling


The beta distribution can be used to model events which are constrained to take place within an interval defined by a
minimum and maximum value. For this reason, the beta distribution — along with the triangular distribution — is
used extensively in PERT, critical path method (CPM) and other project management / control systems to describe
the time to completion of a task. In project management, shorthand computations are widely used to estimate the
mean and standard deviation of the beta distribution:
Beta distribution 11

where a is the minimum, c is the maximum, and b is the most likely value.
Using this set of approximations is known as three-point estimation and are exact only for particular values of α and
β, specifically when[4] :

or vice versa.
These are notably poor approximations for most other beta distributions exhibiting average errors of 40% in the
mean and 549% in the variance[5] [6] [7]

Information theory
We introduce one exemplary use of beta distribution in information theory, particularly for the information theoretic
performance analysis for a communication system. In sensor array systems, the distribution of two vector production
is used for the performance estimation in frequent. Assume that s and v are vectors the (M − 1)-dimensional
nullspace of h with isotropic i.i.d. where s, v and h are in CM and the elements of h are i.i.d complex Gaussian
random values. Then, the production of s and v with absolute of the result |sHv| is beta(1, M − 2) distributed.

Four parameters
A beta distribution with the two shape parameters α and β is supported on the range [0,1]. It is possible to alter the
location and scale of the distribution by introducing two further parameters representing the minimum and maximum
values of the distribution.[8]
The probability density function of the four parameter beta distribution is given by

The standard form can be obtained by letting

External links
• Weisstein, Eric W., "Beta Distribution [9]" from MathWorld.
• "Beta Distribution" [10] by Fiona Maclachlan, the Wolfram Demonstrations Project, 2007.
• Beta Distribution – Overview and Example [11], xycoon.com
• Beta Distribution [12], brighton-webs.co.uk
• Beta Distributions [13] – Applet showing beta distributions in action.
Beta distribution 12

References
[1] A. C. G. Verdugo Lazo and P. N. Rathie. "On the entropy of continuous probability distributions," IEEE Trans. Inf. Theory,
IT-24:120–122,1978.
[2] Engineering Statistics Handbook (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda366h. htm)
[3] Brighton Webs Ltd. Data & Analysis Services for Industry & Education (http:/ / www. brighton-webs. co. uk/ distributions/ beta. asp)
[4] Grubbs, Frank E. (1962). Attempts to Validate Certain PERT Statistics or ‘Picking on PERT’. Operations Research 10(6), p. 912–915.
[5] Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p.
1086–1091.
[6] Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5),
p. 595–609.
[7] DRMI Newsletter, Issue 12, April 8, 2005 (http:/ / www. nps. edu/ drmi/ docs/ 1apr05-newsletter. pdf)
[8] Beta4 distribution (http:/ / www. vosesoftware. com/ ModelRiskHelp/ Distributions/ Continuous_distributions/ Beta_distribution. htm)
[9] http:/ / mathworld. wolfram. com/ BetaDistribution. html
[10] http:/ / demonstrations. wolfram. com/ BetaDistribution/
[11] http:/ / www. xycoon. com/ beta. htm
[12] http:/ / www. brighton-webs. co. uk/ distributions/ beta. asp
[13] http:/ / isometricland. com/ geogebra/ geogebra_beta_distributions. php
Burr distribution 13

Burr distribution
Burr

Probability density function

Cumulative distribution function

parameters:

support:
pdf:

cdf:
mean: where B() is the beta function
median:

mode:

variance:
skewness:
ex.kurtosis:
entropy:
Burr distribution 14

mgf:
cf:

In probability theory, statistics and econometrics, the Burr Type XII distribution or simply the Burr distribution
is a continuous probability distribution for a non-negative random variable. It is also known as the Singh-Maddala
distribution and is one of a number of different distributions sometimes called the "generalized log-logistic
distribution". It is most commonly used to model household income (See: Household income in the U.S. and
compare to magenta graph at right).
The Burr distribution has probability density function:[1] [2]

and cumulative distribution function:

See also
Log-logistic distribution

References
[1] Maddala, G.S.. 1983, 1996. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press.
[2] Tadikamalla, Pandu R. (1980), "A Look at the Burr and Related Distributions" (http:/ / links. jstor. org/
sici?sici=0306-7734(198012)48:3<337:ALATBA>2. 0. CO;2-Z), International Statistical Review 48 (3): 337–344, doi:10.2307/1402945,
Cauchy distribution 15

Cauchy distribution
Not to be confused with the Lorenz curve.

Cauchy–Lorentz

Probability density function

The purple curve is the standard Cauchy distribution


Cumulative distribution function

parameters: location (real)


scale (real)
support:
pdf:

cdf:

mean: not defined


median:
mode:
variance: not defined
skewness: not defined
ex.kurtosis: not defined
entropy:
mgf: not defined
cf:

The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability
distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is
known as the Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution.
Cauchy distribution 16

Its importance in physics is due to its being the solution to the differential equation describing forced resonance.[1] In
mathematics, it is closely related to the Poisson kernel, which is the fundamental solution for the Laplace equation in
the upper half-plane. In spectroscopy, it is the description of the shape of spectral lines which are subject to
homogeneous broadening in which all atoms interact in the same way with the frequency range contained in the line
shape. Many mechanisms cause homogeneous broadening, most notably collision broadening, and Chantler–Alda
radiation.[2]

Characterization

Probability density function


The Cauchy distribution has the probability density function

where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter
which specifies the half-width at half-maximum (HWHM). γ is also equal to half the interquartile range. Cauchy
himself exploited such a density function in 1827, with infinitesimal scale parameter, in defining a Dirac delta
function (see there).
The amplitude of the above Lorentzian function is given by

The special case when x0 = 0 and γ = 1 is called the standard Cauchy distribution with the probability density
function

In physics, a three-parameter Lorentzian function is often used, as follows:

where I is the height of the peak.


Cauchy distribution 17

Cumulative distribution function


The cumulative distribution function (cdf) is:

and the inverse cumulative distribution function of the Cauchy distribution is

Properties
The Cauchy distribution is an example of a distribution which has no mean, variance or higher moments defined. Its
mode and median are well defined and are both equal to x0.
When U and V are two independent normally distributed random variables with expected value 0 and variance 1,
then the ratio U/V has the standard Cauchy distribution.
If X1, ..., Xn are independent and identically distributed random variables, each with a standard Cauchy distribution,
then the sample mean (X1 + ... + Xn)/n has the same standard Cauchy distribution (the sample median, which is not
affected by extreme values, can be used as a measure of central tendency). To see that this is true, compute the
characteristic function of the sample mean:

where is the sample mean. This example serves to show that the hypothesis of finite variance in the central limit
theorem cannot be dropped. It is also an example of a more generalized version of the central limit theorem that is
characteristic of all stable distributions, of which the Cauchy distribution is a special case.
The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly stable distribution.
The standard Cauchy distribution coincides with the Student's t-distribution with one degree of freedom.
Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is closed under linear
transformations with real coefficients. In addition, the Cauchy distribution is the only univariate distribution which is
closed under linear fractional transformations with real coefficients. In this connection, see also McCullagh's
parametrization of the Cauchy distributions.

Characteristic function
Let X denote a Cauchy distributed random variable. The characteristic function of the Cauchy distribution is given
by
which is just the Fourier transform of the probability density. It follows that the probability may be expressed in
terms of the characteristic function by:

Explanation of undefined moments

Mean
If a probability distribution has a density function f(x) then the mean is

The question is now whether this is the same thing as


Cauchy distribution 18

If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the Cauchy
distribution, both the positive and negative terms of (2) are infinite. This means (2) is undefined. Moreover, if (1) is
construed as a Lebesgue integral, then (1) is also undefined, since (1) is then defined simply as the difference (2)
between positive and negative parts.
However, if (1) is construed as an improper integral rather than a Lebesgue integral, then (2) is undefined, and (1) is
not necessarily well-defined. We may take (1) to mean

and this is its Cauchy principal value, which is zero, but we could also take (1) to mean, for example,

which is not zero, as can be seen easily by computing the integral.


Various results in probability theory about expected values, such as the strong law of large numbers, will not work in
such cases.

Second moment
Without a defined mean, it is impossible to consider the variance or standard deviation of a standard Cauchy
distribution, as these are defined with respect to the mean. But the second moment about zero can be considered. It
turns out to be infinite:

Estimation of parameters
Since the mean and variance of the Cauchy distribution are not defined, attempts to estimate these parameters will
not be successful. For example, if N samples are taken from a Cauchy distribution, one may calculate the sample
mean as:

Although the sample values will be concentrated about the central value , the sample mean will become
increasingly variable as more samples are taken, due to the increased likelihood of encountering sample points with a
large absolute value. In fact, the distribution of the sample mean will be equal to the distribution of the samples
themselves, i.e., the sample mean of a large sample is no better (or worse) an estimator of than any single
observation from the sample. Similarly, calculating the sample variance will result in values that grow larger as more
samples are taken.
Therefore, more robust means of estimating the central value and the scaling parameter are needed. One
simple method is to take the median value of the sample as an estimator of and half the sample interquartile
range as an estimator of . Other, more precise and robust methods have been developed [3] For example, the
truncated mean of the middle 24% sample order statistics produces an estimate for that is more efficient than
[4] [5]
using either the sample median or the full sample mean. However, due to the fat tails of the Cauchy distribution,
the efficiency of the estimator decreases if the mean more than 24% of the sample is used.[4] [5]
Maximum likelihood can also be used to estimate the parameters and . However, this tends to be complicated
by the fact that this requires finding the roots of a high degree polynomial, and there can be multiple roots that
represent local maxima.[6] Also, while the maximum likelihood estimator is asymptotically efficient, it is relatively
inefficient for small samples.[7] The log-likelihood function for the Cauchy distribution for sample size n is:
Maximizing the log likelihood function with respect to and produces the following system of equations:
Cauchy distribution 19

Solving just for requires solving a polynomial of degree 2n − 1,[6] and solving just for requires solving a
polynomial of degree (first for , then ). It is also worthwhile to note that is a

monotone function in and that the solution must satisfy .


Therefore, whether solving for one parameter or for both paramters simultaneously, a numerical solution on a
computer is typically required. The benefit of maximum likelihood estimation is asymptotic efficiency; estimating
using the sample median is only about 81% as asymptotically efficient as estimating by maximum
[5] [8]
likelihood. The truncated sample mean using the middle 24% order statistics is about 88% as asymptotically
efficient an estimator of as the maimum likelihood estimate.[5] When Newton's method is used to find the
solution for the maximum likelihood estimate, the middle 24% order statistics can be used as an initial solution for
.

Multivariate Cauchy distribution


A random vector X = (X1, …, Xk)′ is said to have the multivariate Cauchy distribution if every linear combination of
its components Y = a1X1 + … + akXk has a Cauchy distribution. That is, for any constant vector a ∈ Rk, the random
variable Y = a′X should have a univariate Cauchy distribution.[9] The characteristic function of a multivariate Cauchy
distribution is given by:

where and are real functions with a homogeneous function of degree one and a positive
[9] [9]
homogeneous function of degree one. More formally:
and

An example of a bivariate Cauchy distribution can be given by:[10]

Note that in this example, even though there is no analogue to a covariance matrix, x and y are not statistically
independent.[10]

Related distributions
• The ratio of two independent standard normal random variables is a standard Cauchy variable, a Cauchy(0,1).
Thus the Cauchy distribution is a ratio distribution.
• The standard Cauchy(0,1) distribution arises as a special case of Student's t distribution with one degree of
freedom.
• Relation to stable distribution: if X ~ Stable , then X ~Cauchy(μ, γ).

Relativistic Breit–Wigner distribution


In nuclear and particle physics, the energy profile of a resonance is described by the relativistic Breit–Wigner
distribution, while the Cauchy distribution is the (non-relativistic) Breit–Wigner distribution.
Cauchy distribution 20

See also
• McCullagh's parametrization of the Cauchy distributions
• Lévy flight and Lévy process
• Slash distribution
• Wrapped Cauchy distribution

External links
• Earliest Uses: The entry on Cauchy distribution has some historical information. [11]
• Weisstein, Eric W., "Cauchy Distribution [12]" from MathWorld.
• GNU Scientific Library – Reference Manual [13]

References
[1] http:/ / webphysics. davidson. edu/ Projects/ AnAntonelli/ node5. html Note that the intensity, which follows the Cauchy distribution, is the
square of the amplitude.
[2] E. Hecht (1987). Optics (2nd ed.). Addison-Wesley. p. 603.
[3] Cane, Gwenda J. (1974). "Linear Estimation of Parameters of the Cauchy Distribution Based on Sample Quantiles" (http:/ / www. jstor. org/
stable/ 2285535). Journal of the American Statistical Association 69 (345): 243–245. .
[4] Rothenberg, Thomas J.; Fisher, Franklin, M.; Tilanus, C.B. (1966). "A note on estimation from a Cauchy sample". Journal of the American
Statistical Association 59 (306): 460–463.
[5] Bloch, Daniel (1966). "A note on the estimation of the location parameters of the Cauchy distribution" (http:/ / www. jstor. org/ pss/
2282794). Journal of the American Statistical Association 61 (316): 852–855. .
[6] Ferguson, Thomas S. (1978). "Maximum Likelihood Estimates of the Parameters of the Cauchy Distribution for Samples of Size 3 and 4"
(http:/ / www. jstor. org/ pss/ 2286549). Journal of the American Statistical Association 73 (361): 211. .
[7] Cohen Freue, Gabriella V. (2007). "The Pitman estimator of the Cauchy location parameter" (http:/ / faculty. ksu. edu. sa/ 69424/ USEPAP/
Coushy dist. pdf). Journal of Statistical Planning and Inference 137: 1901. .
[8] Barnett, V. D. (1966). "Order Statistics Estimators of the Location of the Cauchy Distribution" (http:/ / www. jstor. org/ pss/ 2283210).
Journal of the American Statistical Association 61 (316): 1205. .
[9] Ferguson, Thomas S. (1962). "A Representation of the Symmetric Bivariate Cauchy Distribution" (http:/ / www. jstor. org/ pss/ 2237984).
Journal of the American Statistical Association: 1256. .
[10] Molenberghs, Geert; Lesaffre, Emmanuel (1997). "Non-linear Integral Equations to Approximate Bivariate Densities with Given Marginals
and Dependence Function" (http:/ / www3. stat. sinica. edu. tw/ statistica/ oldpdf/ A7n310. pdf). Statistica Sinica 7: 713–738. .
[11] http:/ / jeff560. tripod. com/ c. html
[12] http:/ / mathworld. wolfram. com/ CauchyDistribution. html
[13] http:/ / www. gnu. org/ software/ gsl/ manual/ gsl-ref. html#SEC294
Chi-square distribution 21

Chi-square distribution
Probability density function

Cumulative distribution function

notation: or
parameters: k ∈ N1 — degrees of freedom

support: x ∈ [0, +∞)


pdf:

cdf:

mean: k
median:

mode: max{ k − 2, 0 }
variance: 2k
skewness:
ex.kurtosis: 12 / k
entropy:

mgf: (1 − 2 t)−k/2   for t < ½


cf: (1 − 2 i t)−k/2      
[1]

In probability theory and statistics, the chi-square distribution (also chi-squared or χ²-distribution) with k degrees
of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of
the most widely used probability distributions in inferential statistics, e.g. in hypothesis testing, or in construction of
confidence intervals.[2] [3] [4] [5]
The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness
of fit of an observed distribution to a theoretical one, and of the independence of two criteria of classification of
Chi-square distribution 22

qualitative data. Many other statistical tests also lead to a use of this distribution, like Friedman's analysis of variance
by ranks.
The chi-square distribution is a special case of the gamma distribution.

Definition
If X1, …, Xk are independent, standard normal random variables, then the sum of their squares

is distributed according to the chi-square distribution with k degrees of freedom. This is usually denoted as

The chi-square distribution has one parameter: k — a positive integer that specifies the number of degrees of
freedom (i.e. the number of Xi’s)

Characteristics
Further properties of the chi-square distribution can be found in the box at right.

Probability density function


The probability density function (pdf) of the chi-square distribution is

where Γ(k/2) denotes the Gamma function, which has closed-form values at the half-integers.
For derivations of the pdf in the cases of one and two degrees of freedom, see Proofs related to chi-square
distribution.

Cumulative distribution function


Its cumulative distribution function is:

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function.
In a special case of k = 2 this function has a simple form:

Tables of this distribution — usually in its cumulative form — are widely available and the function is included in
many spreadsheets and all statistical packages. For a closed form approximation for the CDF, see under Noncentral
chi-square distribution.
Chi-square distribution 23

Additivity
It follows from the definition of the chi-square distribution that the sum of independent chi-square variables is also
chi-square distributed. Specifically, if {Xi}i=1n are independent chi-square variables with {ki}i=1n degrees of freedom,
respectively, then Y = X1 + ⋯ + Xn is chi-square distributed with k1 + ⋯ + kn degrees of freedom.

Information entropy
The information entropy is given by
where ψ(x) is the Digamma function.

Noncentral moments
The moments about zero of a chi-square distribution with k degrees of freedom are given by[6] [7]

Cumulants
The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic
function:

Asymptotic properties
By the central limit theorem, because the chi-square distribution is the sum of k independent random variables, it
converges to a normal distribution for large k (k > 50 is “approximately normal”).[8] Specifically, if X ~ χ²(k), then as
k tends to infinity, the distribution of tends to a standard normal distribution. However,
convergence is slow as the skewness is and the excess kurtosis is 12/k.

Other functions of the chi-square distribution converge more rapidly to a normal distribution. Some examples are:
• If X ~ χ²(k) then is approximately normally distributed with mean and unit variance (result credited
to R. A. Fisher).
• If X ~ χ²(k) then is approximately normally distributed with mean and variance (Wilson
and Hilferty, 1931)

Related distributions
A chi-square variable with k degrees of freedom is defined as the sum of the squares of k independent standard
normal random variables.
If Y is a k-dimensional Gaussian random vector with mean vector μ and rank k covariance matrix C, then
X = (Y−μ)TC−1(Y−μ) is chi-square distributed with k degrees of freedom.
The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields
a generalization of the chi-square distribution called the noncentral chi-square distribution.
If Y is a vector of k i.i.d. standard normal random variables and A is a k×k idempotent matrix with rank k−n then the
quadratic form YTAY is chi-square distributed with k−n degrees of freedom.
The chi-square distribution is also naturally related to other distributions arising from the Gaussian. In particular,

• Y is F-distributed, Y ~ F(k1,k2) if where X1 ~ χ²(k1) and X2  ~ χ²(k2) are statistically independent.


• If X is chi-square distributed, then is chi distributed.
• If X1  ~  χ k1 and X2  ~  χ k2 are statistically independent, then X1 + X2  ~ χ2k1+k2. If X1 and X2 are not
2 2

independent, then X1 + X2 is not chi-square distributed.


Chi-square distribution 24

Generalizations
The chi-square distribution is obtained as the sum of the squares of k independent, zero-mean, unit-variance
Gaussian random variables. Generalizations of this distribution can be obtained by summing the squares of other
types of Gaussian random variables. Several such distributions are described below.

Chi-square distributions

Noncentral chi-square distribution


The noncentral chi-square distribution is obtained from the sum of the squares of independent Gaussian random
variables having unit variance and nonzero means.

Generalized chi-square distribution


The generalized chi-square distribution is obtained from the quadratic form z′Az where z is a zero-mean Gaussian
vector having an arbitrary covariance matrix, and A is an arbitrary matrix.

Gamma, exponential, and related distributions


The chi-square distribution X ~ χ²(k) is a special case of the gamma distribution, in that X ~ Γ(k/2, 2) (using the
shape parameterization of the gamma distribution).
Because the exponential distribution is also a special case of the Gamma distribution, we also have that if X ~ χ²(2),
then X ~ Exp(1/2) is an exponential distribution.
The Erlang distribution is also a special case of the Gamma distribution and thus we also have that if X ~ χ²(k) with
even k, then X is Erlang distributed with shape parameter k/2 and scale parameter 1/2.

Applications
The chi-square distribution has numerous applications in inferential statistics, for instance in chi-square tests and in
estimating variances. It enters the problem of estimating the mean of a normally distributed population and the
problem of estimating the slope of a regression line via its role in Student’s t-distribution. It enters all analysis of
variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent
chi-squared random variables divided by their respective degrees of freedom.
Following are some of the most common situations in which the chi-square distribution arises from a
Gaussian-distributed sample.

• if X1, …, Xn are i.i.d. N(μ, σ2) random variables, then where

• The box below shows probability distributions with name starting with chi for some statistics based on Xi ∼
Normal(μi, σ2i), i = 1, ⋯, k, independent random variables:
Chi-square distribution 25

Name Statistic

chi-square distribution

noncentral chi-square
distribution

chi distribution

noncentral chi distribution

Table of χ² value vs P value


The P-value is the probability of observing a test statistic at least as extreme in a Chi-square distribution.
Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the
probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the
P-value. The table below gives a number of P-values matching to χ² for the first 10 degrees of freedom. A P-value of
0.05 or less is usually regarded as statistically significant.

Degrees of freedom [9]


χ² value
(df)

1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83

2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82

3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27

4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47

5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52

6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46

7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32

8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12

9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88

10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59

P value (Probability) 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001

Nonsignificant Significant
Chi-square distribution 26

See also
• Cochran's theorem
• Degrees of freedom (statistics)
• Fisher's method for combining independent tests of significance
• Generalized chi-square distribution
• High-dimensional space
• Inverse-chi-square distribution
• Noncentral chi-square distribution
• Normal distribution
• Pearson's chi-square test
• Proofs related to chi-square distribution
• Wishart distribution

References

Notations
• Wilson, E.B. Hilferty, M.M. (1931) The distribution of chi-square. Proceedings of the National Academy of
Sciences, Washington, 17, 684–688.

External links
• Earliest Uses of Some of the Words of Mathematics: entry on Chi square has a brief history [11]
• Course notes on Chi-Square Goodness of Fit Testing [10] from Yale University Stats 101 class.
• Mathematica demonstration showing the chi-squared sampling distribution of various statistics, e.g. Σx², for a
normal population [11]
• Simple algorithm for approximating cdf and inverse cdf for the chi-square distribution with a pocket calculator
[12]

References
[1] M.A. Sanders. "Characteristic function of the central chi-square distribution" (http:/ / www. planetmathematics. com/ CentralChiDistr. pdf). .
Retrieved 2009-03-06.
[2] Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26" (http:/ / www. math. sfu. ca/ ~cbm/ aands/ page_940. htm), Handbook of
Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, pp. 940, MR0167642, ISBN 978-0486612720,
.
[3] NIST (2006). Engineering Statistics Handbook - Chi-Square Distribution (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/
eda3666. htm)
[4] Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and
Sons. ISBN 0-471-58495-9.
[5] Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246).
McGraw-Hill. ISBN 0-07-042864-6.
[6] Chi-square distribution (http:/ / mathworld. wolfram. com/ Chi-SquaredDistribution. html), from MathWorld, retrieved Feb. 11, 2009
[7] M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN
978-0-387-34657-1
[8] Box, Hunter and Hunter. Statistics for experimenters. Wiley. p. 46.
[9] Chi-Square Test (http:/ / www2. lv. psu. edu/ jxm57/ irp/ chisquar. html) Table B.2. Dr. Jacqueline S. McLaughlin at The Pennsylvania State
University. In turn citing: R.A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV
[10] http:/ / www. stat. yale. edu/ Courses/ 1997-98/ 101/ chigf. htm
[11] http:/ / demonstrations. wolfram. com/ StatisticsAssociatedWithNormalSamples/
[12] http:/ / www. jstor. org/ stable/ 2348373
Dirichlet distribution 27

Dirichlet distribution

Several images of the probability density of the Dirichlet distribution


when K=3 for various parameter vectors α. Clockwise from top left:
α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).

In probability and statistics, the Dirichlet distribution (after Johann Peter Gustav Lejeune Dirichlet), often denoted
, is a family of continuous multivariate probability distributions parametrized by a vector of positive
reals. It is the multivariate generalization of the beta distribution, and conjugate prior of the categorical distribution
and multinomial distribution in Bayesian statistics. That is, its probability density function returns the belief that the
probabilities of K rival events are given that each event has been observed times.
The support of the Dirichlet distribution (i.e. the set of values for which the density is non-zero) is a
-dimensional vector of real numbers in the range , all of which sum to 1. These can be viewed as the
probabilities of a K-way categorical event. Another way to express this is that the domain of the Dirichlet
distribution is itself a probability distribution, specifically a -dimensional discrete distribution. Note that the
technical term for the set of points in the support of a -dimensional Dirichlet distribution is the open standard
-simplex, which is a generalization of a triangle, embedded in the next-higher dimension. For example, with
, the support looks like an equilateral triangle embedded in a downward-angle fashion in three-dimensional
space, with vertices at and , i.e. touching each of the coordinate axes at a point 1 unit
away from the origin.
A very common special case is the symmetric Dirichlet distribution, where all of the elements making up the
vector have the same value. In this case, the distribution can be parametrized by a single scalar value , called
the concentration parameter. When this value is 1, the symmetric Dirichlet distribution is equivalent to a uniform
distribution over the open standard standard -simplex, i.e. it is uniform over all points in its support. Values
of the concentration parameter above 1 prefer variates that are dense, evenly-distributed distributions, i.e. all
probabilities returned are similar to each other. Values of the concentration parameter below 1 prefer sparse
distributions, i.e. most of the probabilities returned will be close to 0, and the vast majority of the mass will be
concentrated in a few of the probabilities.
The infinite-dimensional generalization of the Dirichlet distribution is the Dirichlet process.
Dirichlet distribution 28

Probability density function


The Dirichlet distribution of order K ≥ 2 with parameters α1, ..., αK > 0 has a probability density function with
respect to Lebesgue measure on the Euclidean space RK–1 given by

for all x1, ..., xK–1 > 0 satisfying x1 + ... + xK–1 < 1, where xK is an abbreviation for 1 – x1 – ... – xK–1. The density is
zero outside this open (K − 1)-dimensional simplex.
The normalizing constant is the multinomial beta function, which can be expressed in terms of the gamma function:

Properties
Let , meaning that the first K – 1 components have the above density and

Define . Then

in fact, the marginals are Beta distributions:

Furthermore, if

(note that the matrix so defined is singular). The mode of the distribution is the vector (x1, ..., xK) with

Conjugate to multinomial
The Dirichlet distribution is conjugate to the multinomial distribution in the following sense: if

where βi is the number of occurrences of i in a sample of n points from the discrete distribution on {1, ..., K} defined
by X, then

This relationship is used in Bayesian statistics to estimate the hidden parameters, X, of a categorical distribution
(discrete probability distribution) given a collection of n samples. Intuitively, if the prior is represented as Dir(α),
then Dir(α + β) is the posterior following a sequence of observations with histogram β.
Dirichlet distribution 29

Entropy
If X is a Dir(α) random variable, then we can use the exponential family differential identities to get an analytic
expression for the expectation of and its associated covariance matrix:

and

where is the digamma function, is the trigamma function, and is the Kronecker delta. The formula for
yields the following formula for the information entropy of X:

Aggregation
If , then . This aggregation property may be used to derive the
marginal distribution of mentioned above.

Neutrality
If , then the vector~ is said to be neutral[1] in the sense that is
independent of and similarly for .
Observe that any permutation of is also neutral (a property not possessed by samples drawn from a generalized
Dirichlet distribution).
The derivation of the neutrality property:
Let . And let

,  ,  ,  , 

For the purpose of convenience, we set . Here we aim to derive that

also follow a Dirichlet distribution as .


We start the derivation with change of variables from to .
The Jacobian can be calculated easily:
Thus, the probability density function of is the following:
From the above equation, it is obvious that the derived probability density function is actually a joint distribution of
two independent parts, a Beta distributed part and a Dirichlet distributed part. By
trivially integrating out , the result is obvious.
Dirichlet distribution 30

Related distributions
• If, for

then

and
Though the Xis are not independent from one another, they can be seen to be generated from a set of
independent gamma random variables. Unfortunately, since the sum is lost in forming X, it is not possible
to recover the original gamma random variables from these values alone. Nevertheless, because independent
random variables are simpler to work with, this reparametrization can still be useful for proofs about properties
of the Dirichlet distribution.
The following is a derivation of Dirichlet distribution from Gamma distribution.
Let Yi, i=1,2,...K be a list of i.i.d variables, following Gamma distributions with the same scale parameter θ

then the joint distribution of Yi, i=1,2,...K is

Through the change of variables, set


Then, it's easy to derive that
Then, the Jacobian is
It means

So,

By integrating out γ, we can get the Dirichlet distribution as the following.

According to the Gamma distribution,

Finally, we get the following Dirichlet distribution

where XK is (1-X1 - X2... -XK-1)


• Multinomial opinions in subjective logic are equivalent to Dirichlet distributions.
Dirichlet distribution 31

Random number generation

Gamma distribution
A fast method to sample a random vector from the K-dimensional Dirichlet distribution with
parameters follows immediately from this connection. First, draw K independent random samples
from gamma distributions each with density

and then set

Marginal beta distributions


A less efficient algorithm[2] relies on the univariate marginal and conditional distributions being beta and proceeds as

follows. Simulate from a distribution. Then simulate in order, as follows.

For , simulate from a distribution, and let .

Finally, set .
Dirichlet distribution 32

Intuitive interpretations of the parameters

String cutting
One example use of the Dirichlet distribution is if one wanted to cut strings (each of initial length 1.0) into K pieces
with different lengths, where each piece had a designated average length, but allowing some variation in the relative
sizes of the pieces. The α/α0 values specify the mean lengths of the cut pieces of string resulting from the
distribution. The variance around this mean varies inversely with α0.

Pólya's urn
Consider an urn containing balls of K different colors. Initially, the urn contains α1 balls of color 1, α2 balls of color
2, and so on. Now perform N draws from the urn, where after each draw, the ball is placed back into the urn with an
additional ball of the same color. In the limit as N approaches infinity, the proportions of different colored balls in
the urn will be distributed as Dir(α1,...,αK).[3]
For a formal proof, note that the proportions of the different colored balls form a bounded [0,1]K-valued martingale,
hence by the martingale convergence theorem, these proportions converge almost surely and in mean to a limiting
random vector. To see that this limiting vector has the above Dirichlet distribution, check that all mixed moments
agree.
Note that each draw from the urn modifies the probability of drawing a ball of any one color from the urn in the
future. This modification diminishes with the number of draws, since the relative effect of adding a new ball to the
urn diminishes as the urn accumulates increasing numbers of balls. This "diminishing returns" effect can also help
explain how large α values yield Dirichlet distributions with most of the probability mass concentrated around a
single point on the simplex.
Dirichlet distribution 33

See also
• Beta distribution
• Binomial distribution
• Categorical distribution
• Generalized Dirichlet distribution
• Latent Dirichlet allocation
• Dirichlet process
• Multinomial distribution
• Multivariate Polya distribution

External links
• Dirichlet Distribution [4]
• Estimating the parameters of the Dirichlet distribution [5]
• Non-Uniform Random Variate Generation [6], Luc Devroye

References
[1] Connor, Robert J.; Mosimann, James E (1969). "Concepts of Independence for Proportions with a Generalization of the Dirichlet
Distribution" (http:/ / jstor. org/ stable/ 2283728). Journal of the American statistical association (American Statistical Association) 64 (325):
194–206. doi:10.2307/2283728. .
[2] A. Gelman and J. B. Carlin and H. S. Stern and D. B. Rubin (2003). Bayesian Data Analysis (2nd ed.). pp. 582. ISBN 1-58488-388-X.
[3] Blackwell, David; MacQueen, James B. (1973). "Ferguson distributions via Polya urn schemes". Ann. Stat. 1 (2): 353–355.
doi:10.1214/aos/1176342372.
[4] http:/ / www. cis. hut. fi/ ahonkela/ dippa/ node95. html
[5] http:/ / research. microsoft. com/ ~minka/ papers/ dirichlet/ minka-dirichlet. pdf
[6] http:/ / cg. scs. carleton. ca/ ~luc/ rnbookindex. html
F-distribution 34

F-distribution
Fisher-Snedecor

Probability density function

Cumulative distribution function

parameters: deg. of freedom

support:
pdf:

cdf:

mean:
for

median:
mode:
for

variance:
for
F-distribution 35

skewness:

for
ex.kurtosis: see text
entropy:
mgf: does not exist, raw moments defined elsewhere
[1] [2]

cf: defined elsewhere


[1] [2]

In probability theory and statistics, the F-distribution is a continuous probability distribution.[1] [2] [3] [4] It is also
known as Snedecor's F distribution or the Fisher-Snedecor distribution (after R.A. Fisher and George W.
Snedecor). The F-distribution arises frequently as the null distribution of a test statistic, especially in likelihood-ratio
tests, perhaps most notably in the analysis of variance; see F-test.

Characterization
A random variate of the F-distribution arises as the ratio of two chi-squared variates:

where
• U1 and U2 have chi-square distributions with d1 and d2 degrees of freedom respectively, and
• U1 and U2 are independent (see Cochran's theorem for an application).
The probability density function of an F(d1, d2) distributed random variable is given by

for real x ≥ 0, where d1 and d2 are positive integers, and B is the beta function.

The cumulative distribution function is


where I is the regularized incomplete beta function.
The expectation, variance, and other details about the are given in the sidebox; for , the kurtosis
is

where
The F-distribution is a particular parametrization of the beta prime distribution, which is also called the beta
distribution of the second kind.
F-distribution 36

Generalization
A generalization of the (central) F-distribution is the noncentral F-distribution.

Related distributions and properties


• If then has the chi-square distribution
• is equivalent to the scaled Hotelling's T-square distribution

• If then .

• if has a Student's t-distribution then .

• if and then has a Beta-distribution.

• if is the quantile for and is the quantile for


then .

External links
• Table of critical values of the F-distribution [5]
• Earliest Uses of Some of the Words of Mathematics: entry on F-distribution contains a brief history [6]

References
[1] Johnson, Norman Lloyd; Samuel Kotz, N. Balakrishnan (1995). Continuous Univariate Distributions, Volume 2 (Second Edition, Section 27).
Wiley. ISBN 0-471-58494-0.
[2] Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26" (http:/ / www. math. sfu. ca/ ~cbm/ aands/ page_946. htm), Handbook of
Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, pp. 946, MR0167642, ISBN 978-0486612720,
.
[3] NIST (2006). Engineering Statistics Handbook - F Distribution (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda3665. htm)
[4] Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 246-249).
McGraw-Hill. ISBN 0-07-042864-6.
[5] http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda3673. htm
[6] http:/ / jeff560. tripod. com/ f. html
Gamma distribution 37

Gamma distribution
Gamma

Probability density function

Cumulative distribution function

parameters: shape
scale
support:
pdf:

cdf:

mean:
median: no simple closed form
mode:
variance:
skewness:

ex.kurtosis:

entropy:

mgf:
cf:

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability
distributions. It has a scale parameter θ and a shape parameter k. If k is an integer, then the distribution represents an
Erlang distribution, i.e., the sum of k independent exponentially distributed random variables, each of which has a
mean of θ (which is equivalent to a rate parameter of θ −1) .
The gamma distribution is frequently a probability model for waiting times; for instance, in life testing, the waiting
time until death is a random variable that is frequently modeled with a gamma distribution.[1] Gamma distributions
Gamma distribution 38

were fitted to rainfall amounts from different storms, and differences in amounts from seeded and unseeded storms
were reflected in differences in estimated k and parameters [2]

Characterization
A random variable X that is gamma-distributed with scale θ and shape k is denoted

Probability density function


The probability density function of the gamma distribution can be expressed in terms of the gamma function
parameterized in terms of a shape parameter k and scale parameter θ. Both k and θ will be positive values.
The equation defining the probability density function of a gamma-distributed random variable x is

(This parameterization is used in the infobox and the plots.)


Alternatively, the gamma distribution can be parameterized in terms of a shape parameter α = k and an inverse scale
parameter β = 1/θ, called a rate parameter:

If α is a positive integer, then

Both parametrizations are common because either can be more convenient depending on the situation.

Cumulative distribution
function
The cumulative distribution function is
the regularized gamma function:

Illustration of the Gamma PDF for parameter values over k and x with θ set to
1, 2, 3, 4, 5 and 6. One can see each θ layer by itself here [3] as well as by k [4] and x.
[5].
Gamma distribution 39

where is the lower incomplete gamma function.


It can also be expressed as follows, if k is a positive integer (i.e., the distribution is an Erlang distribution)[6] :

Properties

Summation
If Xi has a Γ(ki, θ) distribution for i = 1, 2, ..., N, then

provided all Xi' are independent.


The gamma distribution exhibits infinite divisibility.

Scaling
If

then for any α > 0,

Exponential family
The Gamma distribution is a two-parameter exponential family with natural parameters k − 1 and −1/θ, and natural
statistics X and ln (X).

Information entropy
The information entropy is given by
where ψ(k) is the digamma function.
One can also show that (if we use the shape parameter k and the inverse scale parameter β),

Or alternately, using the scale parameter θ,


Gamma distribution 40

Kullback–Leibler divergence
The directed Kullback–Leibler
divergence between Γ(θ0, β0) ('true'
distribution) and Γ(θ, β)
('approximating' distribution), for shape
parameter θ and inverse scale parameter
β is given by

Laplace transform
The Laplace transform of the gamma
PDF is

Illustration of the Kullback–Leibler (KL) divergence for two Gamma PDF's. Here
β = β0 + 1 which are set to 1, 2, 3, 4, 5 and 6. The typical asymmetry for the KL
divergence is clearly visible.

Parameter estimation

Maximum likelihood estimation


The likelihood function for N iid observations (x1, ..., xN) is

from which we calculate the log-likelihood function


Finding the maximum with respect to θ by taking the derivative and setting it equal to zero yields the maximum
likelihood estimator of the θ parameter:

Substituting this into the log-likelihood function gives

Finding the maximum with respect to k by taking the derivative and setting it equal to zero yields

where

is the digamma function.


There is no closed-form solution for k. The function is numerically very well behaved, so if a numerical solution is
desired, it can be found using, for example, Newton's method. An initial value of k can be found either using the
Gamma distribution 41

method of moments, or using the approximation

If we let

then k is approximately

which is within 1.5% of the correct value. An explicit form for the Newton-Raphson update of this initial guess is
given by Choi and Wette (1969) as the following expression:

where denotes the trigamma function (the derivative of the digamma function).
The digamma and trigamma functions can be difficult to calculate with high precision. However, approximations
known to be good to several significant figures can be computed using the following approximation formulae:
and
For details, see Choi and Wette (1969).

Bayesian minimum mean-squared error


With known k and unknown , the posterior PDF for theta (using the standard scale-invariant prior for ) is

Denoting

Integration over θ can be carried out using a change of variables, revealing that 1/θ is gamma-distributed with
parameters .
The moments can be computed by taking the ratio (m by m = 0)

which shows that the mean ± standard deviation estimate of the posterior distribution for theta is
Gamma distribution 42

Generating gamma-distributed random variables


Given the scaling property above, it is enough to generate gamma variables with as we can later convert to
any value of with simple division.
Using the fact that a distribution is the same as an distribution, and noting the method of
generating exponential variables, we conclude that if is uniformly distributed on , then − is
distributed . Now, using the "α-addition" property of gamma distribution, we expand this result:

where are all uniformly distributed on and independent.


All that is left now is to generate a variable distributed as for and apply the "α-addition"
property once more. This is the most difficult part.
We provide an algorithm without proof. It is an instance of the acceptance-rejection method:
1. Let be 1.
2. Generate , and as independent uniformly distributed on variables.
3. If , where , then go to step 4, else go to step 5.

4. Let . Go to step 6.
5. Let .
6. If , then increment and go to step 2.
7. Assume to be the realization of
Now, to summarize,

where is the integral part of , and has been generated using the algorithm above with (the
fractional part of ), and are distributed as explained above and are all independent.

Related distributions

Specializations
• If , then X has an exponential distribution with rate parameter λ.
• If , then X is identical to χ2(ν), the chi-square distribution with ν degrees of
freedom. Conversely, if and c is a positive constant, then .
• If is an integer, the gamma distribution is an Erlang distribution and is the probability distribution of the
waiting time until the -th "arrival" in a one-dimensional Poisson process with intensity 1/θ.
• If , then X has a Maxwell-Boltzmann distribution with parameter a.
• , then , i.e. an exponential distribution: see skew-logistic
distribution.
Gamma distribution 43

Conjugate prior
In Bayesian inference, the gamma distribution is the conjugate prior to many likelihood distributions: the Poisson,
exponential, normal (with known mean), Pareto, gamma with known shape σ, and inverse gamma with known shape
parameter.
The Gamma distribution's conjugate prior is [7] :

Where Z is the normalizing constant, which has no closed form solution. The posterior distribution can be found by
updating the parameters as follows.

Where is the number of observations, and is the observation.

Others
• If X has a Γ(k, θ) distribution, then 1/X has an inverse-gamma distribution with parameters k and θ-1.
• If X and Y are independently distributed Γ(α, θ) and Γ(β, θ) respectively, then X / (X + Y) has a beta distribution
with parameters α and β.
• If Xi are independently distributed Γ(αi,θ) respectively, then the vector (X1 / S, ..., Xn / S), where S = X1 + ... + Xn,
follows a Dirichlet distribution with parameters α1, ..., αn.
• For large k the gamma distribution converges to Gaussian distribution with mean and variance
.
• The Gamma distribution is the conjugate prior for the precision of the normal distribution with known mean.
• The Wishart distribution is a multivariate generalization of the gamma distribution (samples are positive-definite
matrices rather than positive real numbers).
• The Gamma distribution is a special case of the generalized gamma distribution.
• Among the discrete distributions, the negative binomial distribution is sometimes considered the discrete
analogue of the Gamma distribution

Applications
The gamma distribution has been used to model the size of insurance claims and rainfalls. This means aggregate
insurance claims and the amount of rainfall accumulated in a reservoir are modelled by a gamma process. The
gamma distribution is also used to model errors in multi-level Poisson regression models, because the combination
of the Poisson distribution and a gamma distribution is a negative binomial distribution.

See also
• Gamma process
• Lukacs's proportion-sum independence theorem
Gamma distribution 44

References
• R. V. Hogg and A. T. Craig. Introduction to Mathematical Statistics, 4th edition. New York: Macmillan, 1978.
(See Section 3.3.)
• Weisstein, Eric W., "Gamma distribution [8]" from MathWorld.
• Engineering Statistics Handbook [9]
• S. C. Choi and R. Wette. (1969) Maximum Likelihood Estimation of the Parameters of the Gamma Distribution
and Their Bias, Technometrics, 11(4) 683–690

References
[1] See Hogg and Craig Remark 3.3.1. for an explicit motivation.test
[2] Rice, John (1995), Mathematical Statistics and Data Analysis (Second ed.), Duxbury Press, p. 244, ISBN 0-534-20934-3
[3] http:/ / commons. wikimedia. org/ wiki/ File:Gamma-PDF-3D-by-k. png
[4] http:/ / commons. wikimedia. org/ wiki/ File:Gamma-PDF-3D-by-Theta. png
[5] http:/ / commons. wikimedia. org/ wiki/ File:Gamma-PDF-3D-by-x. png
[6] Papoulis, Pillai, Probability, Random Variables, and Stochastic Processes, Fourth Edition
[7] Fink, D. 1995 A Compendium of Conjugate Priors (http:/ / www. stat. columbia. edu/ ~cook/ movabletype/ mlm/ CONJINTRnew+ TEX.
pdf). In progress report: Extension and enhancement of methods for setting data quality objectives. (DOE contract 95‑831).
[8] http:/ / mathworld. wolfram. com/ GammaDistribution. html
[9] http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda366b. htm
Exponential distribution 45

Exponential distribution
Exponential

Probability density function

Cumulative distribution function

parameters: λ > 0 rate, or inverse scale

support: x ∈ [0, ∞)
pdf: λ e−λx
cdf: 1 − e−λx
mean: λ−1
median: λ−1 ln 2
mode: 0
variance: λ−2
skewness: 2
ex.kurtosis: 6
entropy: 1 − ln(λ)
mgf:

cf:

In probability theory and statistics, the exponential distributions (a.k.a. negative exponential distributions) are a
class of continuous probability distributions. They describe the times between events in a Poisson process, i.e. a
process in which events occur continuously and independently at a constant average rate.
Exponential distribution 46

Characterization

Probability density function


The probability density function (pdf) of an exponential distribution is

Here λ > 0 is the parameter of the distribution, often called the rate parameter. The distribution is supported on the
interval [0, ∞). If a random variable X has this distribution, we write X ~ Exp(λ).

Cumulative distribution function


The cumulative distribution function is given by

Alternative parameterization
A commonly used alternative parameterization is to define the probability density function (pdf) of an exponential
distribution as

where β > 0 is a scale parameter of the distribution and is the reciprocal of the rate parameter, λ, defined above. In
this specification, β is a survival parameter in the sense that if a random variable X is the duration of time that a
given biological or mechanical system manages to survive and X ~ Exponential(β) then E[X] = β. That is to say, the
expected duration of survival of the system is β units of time. The parameterisation involving the "rate" parameter
arises in the context of events arriving at a rate λ, when the time between events (which might be modelled using an
exponential distribution) has a mean of β = λ−1.
The alternative specification is sometimes more convenient than the one given above, and some authors will use it as
a standard definition. This alternative specification is not used here. Unfortunately this gives rise to a notational
ambiguity. In general, the reader must check which of these two specifications is being used if an author writes
"X ~ Exponential(λ)", since either the notation in the previous (using λ) or the notation in this section (here, using β
to avoid confusion) could be intended.

Occurrence and applications


The exponential distribution occurs naturally when describing the lengths of the inter-arrival times in a homogeneous
Poisson process.
The exponential distribution may be viewed as a continuous counterpart of the geometric distribution, which
describes the number of Bernoulli trials necessary for a discrete process to change state. In contrast, the exponential
distribution describes the time for a continuous process to change state.
In real-world scenarios, the assumption of a constant rate (or probability per unit time) is rarely satisfied. For
example, the rate of incoming phone calls differs according to the time of day. But if we focus on a time interval
during which the rate is roughly constant, such as from 2 to 4 p.m. during work days, the exponential distribution can
be used as a good approximate model for the time until the next phone call arrives. Similar caveats apply to the
following examples which yield approximately exponentially distributed variables:
Exponential distribution 47

• The time until a radioactive particle decays, or the time between clicks of a geiger counter
• The time it takes before your next telephone call
• The time until default (on payment to company debt holders) in reduced form credit risk modeling
Exponential variables can also be used to model situations where certain events occur with a constant probability per
unit length, such as the distance between mutations on a DNA strand, or between roadkills on a given road.
In queuing theory, the service times of agents in a system (e.g. how long it takes for a bank teller etc. to serve a
customer) are often modeled as exponentially distributed variables. (The inter-arrival of customers for instance in a
system is typically modeled by the Poisson distribution in most management science textbooks.) The length of a
process that can be thought of as a sequence of several independent tasks is better modeled by a variable following
the Erlang distribution (which is the distribution of the sum of several independent exponentially distributed
variables).
Reliability theory and reliability engineering also make extensive use of the exponential distribution. Because of the
memoryless property of this distribution, it is well-suited to model the constant hazard rate portion of the bathtub
curve used in reliability theory. It is also very convenient because it is so easy to add failure rates in a reliability
model. The exponential distribution is however not appropriate to model the overall lifetime of organisms or
technical devices, because the "failure rates" here are not constant: more failures occur for very young and for very
old systems.
In physics, if you observe a gas at a fixed temperature and pressure in a uniform gravitational field, the heights of the
various molecules also follow an approximate exponential distribution. This is a consequence of the entropy property
mentioned below.

Properties

Mean, variance, and median


The mean or expected value of an exponentially distributed random variable X with rate parameter λ is given by

In light of the examples given above, this makes sense: if you receive phone calls at an average rate of 2 per hour,
then you can expect to wait half an hour for every call.
The variance of X is given by

The median of X is given by

where ln refers to the natural logarithm. Thus the absolute difference between the mean and median is

in accordance with the median-mean inequality.


Exponential distribution 48

Memorylessness
An important property of the exponential distribution is that it is memoryless. This means that if a random variable T
is exponentially distributed, its conditional probability obeys

This says that the conditional probability that we need to wait, for example, more than another 10 seconds before the
first arrival, given that the first arrival has not yet happened after 30 seconds, is equal to the initial probability that
we need to wait more than 10 seconds for the first arrival. So, if we waited for 30 seconds and the first arrival didn't
happen (T > 30), probability that we'll need to wait another 10 seconds for the first arrival (T > 30 + 10) is the same
as the initial probability that we need to wait more than 10 seconds for the first arrival (T > 10). This is often
misunderstood by students taking courses on probability: the fact that Pr(T > 40 | T > 30) = Pr(T > 10) does not mean
that the events T > 40 and T > 30 are independent.
To summarize: "memorylessness" of the probability distribution of the waiting time T until the first arrival means

It does not mean

(That would be independence. These two events are not independent.)


The exponential distributions and the geometric distributions are the only memoryless probability distributions.
The exponential distribution is consequently also necessarily the only continuous probability distribution that has a
constant Failure rate.

Quartiles
The quantile function (inverse cumulative distribution function) for Exponential(λ) is

for 0 ≤ p < 1. The quartiles are therefore:


first quartile
ln(4/3)/λ
median
ln(2)/λ
third quartile
ln(4)/λ

Kullback–Leibler divergence
The directed Kullback–Leibler divergence between Exp(λ0) ('true' distribution) and Exp(λ) ('approximating'
distribution) is given by

Maximum entropy distribution


Among all continuous probability distributions with support [0,∞) and mean μ, the exponential distribution with λ =
1/μ has the largest entropy.
Exponential distribution 49

Distribution of the minimum of exponential random variables


Let X1, ..., Xn be independent exponentially distributed random variables with rate parameters λ1, ..., λn. Then

is also exponentially distributed, with parameter

This can be seen by considering the complementary cumulative distribution function:


The index of the variable which achieves the minimum is distributed according to the law

Note that

is not exponentially distributed.

Parameter estimation
Suppose a given variable is exponentially distributed and the rate parameter λ is to be estimated.

Maximum likelihood
The likelihood function for λ, given an independent and identically distributed sample x = (x1, ..., xn) drawn from the
variable, is
where

is the sample mean.


The derivative of the likelihood function's logarithm is
Consequently the maximum likelihood estimate for the rate parameter is

While this estimate is the most likely reconstruction of the true parameter λ, it is only an estimate, and as such, one
can imagine that the more data points are available the better the estimate will be. It so happens that one can compute
an exact confidence interval – that is, a confidence interval that is valid for all number of samples, not just large
ones. The 100(1 − α)% exact confidence interval for this estimate is given by[1]

Where is the MLE estimate, λ is the true value of the parameter, and χ2k; is the value of the chi squared
x
distribution with k degrees of freedom that gives x cumulative probability (i.e. the value found in chi-squared tables
[2]).
Exponential distribution 50

Bayesian inference
The conjugate prior for the exponential distribution is the gamma distribution (of which the exponential distribution
is a special case). The following parameterization of the gamma pdf is useful:

The posterior distribution p can then be expressed in terms of the likelihood function defined above and a gamma
prior:

Now the posterior density p has been specified up to a missing normalizing constant. Since it has the form of a
gamma pdf, this can easily be filled in, and one obtains

Here the parameter α can be interpreted as the number of prior observations, and β as the sum of the prior
observations.

Prediction
Having observed a sample of n data points from an unknown exponential distribution a common task is to use these
samples to make predictions about future data from the same source. A common predictive distribution over future
samples is the so-called plug-in distribution, formed by plugging a suitable estimate for the rate parameter λ into the
exponential density function. A common choice of estimate is the one provided by the principle of maximum
likelihood, and using this yields the predictive density over a future sample xn+1, conditioned on the observed
samples x = (x1, ..., xn) given by

The Bayesian approach provides a predictive distribution which takes into account the uncertainty of the estimated
parameter, although this may depend crucially on the choice of prior. A recent alternative that is free of the issues of
choosing priors is the Conditional Normalized Maximum Likelihood (CNML) predictive distribution [3]

The accuracy of a predictive distribution may be measured using the distance or divergence between the true
exponential distribution with rate parameter, λ0, and the predictive distribution based on the sample x. The
Kullback–Leibler divergence is a commonly used, parameterisation free measure of the difference between two
distributions. Letting Δ(λ0||p) denote the Kullback–Leibler divergence between an exponential with rate parameter λ0
and a predictive distribution p it can be shown that

where the expectation is taken with respect to the exponential distribution with rate parameter λ0 ∈ (0, ∞), and ψ( · )
is the digamma function. It is clear that the CNML predictive distribution is strictly superior to the maximum
likelihood plug-in distribution in terms of average Kullback–Leibler divergence for all sample sizes n > 0.
Exponential distribution 51

Generating exponential variates


A conceptually very simple method for generating exponential variates is based on inverse transform sampling:
Given a random variate U drawn from the uniform distribution on the unit interval (0, 1), the variate

has an exponential distribution, where F −1 is the quantile function, defined by

Moreover, if U is uniform on (0, 1), then so is 1 − U. This means one can generate exponential variates as follows:

Other methods for generating exponential variates are discussed by Knuth[4] and Devroye.[5]
The ziggurat algorithm is a fast method for generating exponential variates.
A fast method for generating a set of ready-ordered exponential variates without using a sorting routine is also
available.[5]

Related distributions
• An exponential distribution is a special case of a gamma distribution with α = 1 (or k = 1 depending on the
parameter set used).
• Both an exponential distribution and a gamma distribution are special cases of the phase-type distribution.
• Y ∼ Weibull(γ, λ), i.e. Y has a Weibull distribution, if Y = X1/γ and X ∼ Exponential(λ−). In particular, every
exponential distribution is also a Weibull distribution.
• Y ∼ Rayleigh(σ), i.e. Y has a Rayleigh distribution, if and X ∼ Exponential(λ).
• Y ∼ Gumbel(μ, β), i.e. Y has a Gumbel distribution if Y = μ − βlog(Xλ) and X ∼ Exponential(λ).
• Y ∼ Laplace, i.e. Y has a Laplace distribution, if Y = X1 − X2 for two independent exponential distributions X1 and
X2.
• Y ∼ Exponential, i.e. Y has an exponential distribution if Y = min(X1, …, XN) for independent exponential
distributions Xi.
• Y ∼ Uniform(0, 1), i.e. Y has a uniform distribution if Y = exp( − Xλ) and X ∼ Exponential(λ).
• X ∼ χ22, i.e. X has a chi-square distribution with 2 degrees of freedom, if .
• Let X1…Xn ∼ Exponential(λ) be exponentially distributed and independent and Y = ∑i=1nXi. Then Y ∼ Gamma(n,
1/λ), i.e. Y has a Gamma distribution.
• X ∼ SkewLogistic(θ), then log(1 + e−−) ∼ Exponential(θ): see skew-logistic distribution.

• Let X ∼ Exponential(λX) and Y ∼ Exponential(λY) be independent. Then has probability density

function . This can be used to obtain a confidence interval for .

Other related distributions:


• Hyper-exponential distribution – the distribution whose density is a weighted sum of exponential densities.
• Hypoexponential distribution – the distribution of a general sum of exponential random variables.
• exGaussian distribution – the sum of an exponential distribution and a normal distribution.
Exponential distribution 52

See also
• Dead time – an application of exponential distribution to particle detector analysis.

References
[1] K. S. Trivedi, Probability and Statistics with Reliability, Queueing and Computer Science applications, Chapter 10 Statistical Inference, http:/
/ www. ee. duke. edu/ ~kst/ BLUEppt/ chap10f_secure. pdf
[2] http:/ / www. unc. edu/ ~farkouh/ usefull/ chi. html
[3] D. F. Schmidt and E. Makalic, "Universal Models for the Exponential Distribution", IEEE Transactions on Information Theory, Volume 55,
Number 7, pp. 3087–3090, 2009 doi:10.1109/TIT.2009.2018331
[4] Donald E. Knuth (1998). The Art of Computer Programming, volume 2: Seminumerical Algorithms, 3rd edn. Boston: Addison–Wesley. ISBN
0-201-89684-2. See section 3.4.1, p. 133.
[5] Luc Devroye (1986). Non-Uniform Random Variate Generation (http:/ / cg. scs. carleton. ca/ ~luc/ rnbookindex. html). New York:
Springer-Verlag. ISBN 0-387-96305-7. See chapter IX (http:/ / cg. scs. carleton. ca/ ~luc/ chapter_nine. pdf), section 2, pp. 392–401.
Erlang distribution 53

Erlang distribution
Erlang

Probability density function

Cumulative distribution function

parameters: shape
rate (real)
alt.: scale (real)
support:
pdf:

cdf:

mean:
median: no simple closed form
mode: for
variance:
skewness:

ex.kurtosis:

entropy:

mgf: for
cf:

The Erlang distribution is a continuous probability distribution with wide applicability primarily due to its relation
to the exponential and Gamma distributions. The Erlang distribution was developed by A. K. Erlang to examine the
number of telephone calls which might be made at the same time to the operators of the switching stations. This
work on telephone traffic engineering has been expanded to consider waiting times in queueing systems in general.
Erlang distribution 54

The distribution is now used in the fields of stochastic processes and of biomathematics.

Overview
The distribution is a continuous distribution, which has a positive value for all real numbers greater than zero, and is
given by two parameters: the shape , which is a non-negative integer, and the rate , which is a non-negative
real number. The distribution is sometimes defined using the inverse of the rate parameter, the scale .
When the shape parameter equals 1, the distribution simplifies to the exponential distribution. The Erlang
distribution is a special case of the Gamma distribution where the shape parameter is an integer. In the Gamma
distribution, this parameter is not restricted to the integers.

Characterization

Probability density function


The probability density function of the Erlang distribution is

The parameter is called the shape parameter and the parameter is called the rate parameter. An alternative, but
equivalent, parametrization uses the scale parameter which is the reciprocal of the rate parameter (i.e.
):

When the scale parameter equals 2, then distribution simplifies to the chi-square distribution with 2k degrees of
freedom. It can therefore be regarded a generalized chi-square distribution.
Because of the factorial function in the denominator, the Erlang distribution is only defined when the parameter k is
a positive integer. In fact, this distribution is sometimes called the Erlang-k distribution (e.g., an Erlang-2
distribution is an Erlang distribution with k=2). The Gamma distribution generalizes the Erlang by allowing to be
any real number, using the gamma function instead of the factorial function.

Cumulative distribution function


The cumulative distribution function of the Erlang distribution is:

where is the lower incomplete gamma function. The CDF may also be expressed as
Erlang distribution 55

Occurrence

Waiting times
Events which occur independently with some average rate are modeled with a Poisson process. The waiting times
between k occurrences of the event are Erlang distributed. (The related question of the number of events in a given
amount of time is described by the Poisson distribution.)
The Erlang distribution, which measures the time between incoming calls, can be used in conjunction with the
expected duration of incoming calls to produce information about the traffic load measured in Erlang units. This can
be used to determine the probability of packet loss or delay, according to various assumptions made about whether
blocked calls are aborted (Erlang B formula) or queued until served (Erlang C formula). The Erlang-B and C
formulae are still in everyday use for traffic modeling for applications such as the design of call centers.

Compartment models
The Erlang distribution also occurs as a description of the rate of transition of elements through a system of
compartments. Such systems are widely used in biology and ecology. For example, in mathematical epidemiology,
an individual may progress at an exponential rate from healthy to carrier and again exponentially from carrier to
infectious. The probability of seeing an infectious individual at time t would then be given by Erlang distribution
with k=2. Such models have the useful property that the variance in the infectious compartment is large. In a pure
exponential model the variance is - which is often unrealistically small.

Stochastic processes
The Erlang distribution is the distribution of the sum of k independent identically distributed random variables
each having an exponential distribution. The rate of the Erlang distribution is the rate of this exponential distribution.

See also
• Erlang B formula
• Exponential distribution
• Gamma distribution
• Poisson distribution
• Coxian distribution
• Poisson process
• Erlang unit
• Engset calculation
• Phase-type distribution
• Traffic generation model
Erlang distribution 56

External links
• Erlang Distribution [1]
• An Introduction to Erlang B and Erlang C by Ian Angus [2] (PDF Document - Has terms and formulae plus short
biography)
• Resource Dimensioning Using Erlang-B and Erlang-C [3]
• Erlang-C [4]
• Erlang-B and Erlang-C spreadsheets [5]

References
[1] http:/ / www. xycoon. com/ erlang. htm
[2] http:/ / www. tarrani. net/ linda/ ErlangBandC. pdf
[3] http:/ / www. eventhelix. com/ RealtimeMantra/ CongestionControl/ resource_dimensioning_erlang_b_c. htm
[4] http:/ / www. kooltoolz. com/ Erlang-C. htm
[5] http:/ / www. pccl. demon. co. uk/ spreadsheets/
Kumaraswamy distribution 57

Kumaraswamy distribution
Kumaraswamy

Probability density function

Cumulative distribution function

parameters: (real)
(real)
support:
pdf:
cdf:
mean:

median:

mode:
for

variance: (complicated-see text)


skewness: (complicated-see text)
ex.kurtosis: (complicated-see text)
entropy:
Kumaraswamy distribution 58

mgf:
cf:

In probability and statistics, the Kumaraswamy's double bounded distribution is a family of continuous
probability distributions defined on the interval [0,1] differing in the values of their two non-negative shape
parameters, a and b.
It is similar to the Beta distribution, but much simpler to use especially in simulation studies due to the simple closed
form of both its probability density function and cumulative distribution function. This distribution was originally
proposed by Poondi Kumaraswamy for variables that are lower and upper bounded.

Characterization

Probability density function


The probability density function of the Kumaraswamy distribution is

Cumulative distribution function


The cumulative distribution function is therefore

Generalizing to arbitrary range


In its simplest form, the distribution has a range of [0,1]. In a more general form, we may replace the normalized
variable x with the unshifted and unscaled variable z where:

The distribution is sometimes combined with a "pike probability" or a Dirac delta function, e.g.:

Properties
The raw moments of the Kumaraswamy distribution are given by :

where B is the Beta function. The variance, skewness, and excess kurtosis can be calculated from these raw
moments. For example, the variance is:
Kumaraswamy distribution 59

Relation to the Beta distribution


The Kuramaswamy distribution is closely related to Beta distribution. Assume that Xa,b is a Kumaraswamy
distributed random variable with parameters a and b. Then Xa,b is the a-th root of a suitably defined Beta distributed
random variable. More formally, Let Y1,b denote a Beta distributed random variable with parameters and
. One has the following relation between Xa,b and Y1,b.

with equality in distribution.

One may introduce generalised Kuramaswamy distributions by considering random variables of the form ,
with and where denotes a Beta distributed random variable with parameters and . The raw
moments of this generalized Kumaraswamy distribution are given by:

Note that we can reobtain the original moments setting , and . However, in general the
cumulative distribution function does not have a closed form solution.

Example
A good example of the use of the Kumaraswamy distribution is the storage volume of a reservoir of capacity zmax
whose upper bound is zmax and lower bound is 0 (Fletcher, 1996).

References
• Kumaraswamy, P. (1980). "A generalized probability density function for double-bounded random processes".
Journal of Hydrology 46: 79–88. doi:10.1016/0022-1694(80)90036-0.
• Fletcher, S.G., and Ponnambalam, K. (1996). "Estimation of reservoir yield and storage distribution using
moments analysis". Journal of Hydrology 182: 259–275. doi:10.1016/0022-1694(95)02946-X.
Inverse Gaussian distribution 60

Inverse Gaussian distribution


In probability theory, the inverse Gaussian distribution (also known as the Wald distribution) is a two-parameter
family of continuous probability distributions with support on (0,∞).
Its probability density function is given by

Inverse Gaussian

Probability density function

parameters:

support:
pdf:

cdf:
where is the standard normal (standard

Gaussian) distribution c.d.f.


mean:
median:
mode:

variance:

skewness:

ex.kurtosis:

entropy:
mgf:

cf:

for x > 0, where is the mean and is the shape parameter.


Inverse Gaussian distribution 61

As λ tends to infinity, the inverse Gaussian distribution becomes more like a normal (Gaussian) distribution. The
inverse Gaussian distribution has several properties analogous to a Gaussian distribution. The name can be
misleading: it is an "inverse" only in that, while the Gaussian describes a Brownian Motion's level at a fixed time,
the inverse Gaussian describes the distribution of the time a Brownian Motion with positive drift takes to reach a
fixed positive level.
Its cumulant generating function (logarithm of the characteristic function) is the inverse of the cumulant generating
function of a Gaussian random variable.
To indicate that a random variable X is inverse Gaussian-distributed with mean μ and shape parameter λ we write

Properties

Summation
If Xi has a IG(μ0wi, λ0wi²) distribution for i = 1, 2, ..., n and all Xi are independent, then

Note that

is constant for all i. This is a necessary condition for the summation. Otherwise S would not be inverse Gaussian.

Scaling
For any t > 0 it holds that

Exponential family
The inverse Gaussian distribution is a two-parameter exponential family with natural parameters -λ/(2μ²) and -λ/2,
and natural statistics X and 1/X.

Relationship with Brownian motion


The stochastic process Xt given by

(where Wt is a standard Brownian motion and ) is a Brownian motion with drift ν.


Then, the first passage time for a fixed level by Xt is distributed according to an inverse-gaussian:
Inverse Gaussian distribution 62

When drift is zero


A common special case of the above arises when the Brownian motion has no drift. In that case, parameter μ tends to
infinity, and the first passage time for fixed level α has probability density function

Maximum likelihood
The model where

with all wi known, (μ, λ) unknown and all Xi independent has the following likelihood function
Solving the likelihood equation yields the following maximum likelihood estimates

and are independent and

Generating random variates from an inverse-Gaussian distribution


Generate a random variate from a normal distribution with a mean of 0 and 1 standard deviation

Square the value

and use this relation

Generate another random variate, this time sampled from a uniformed distribution between 0 and 1

If

then return

else return

Sample code in Java language:

public double inverseGaussian(double mu, double lambda) {


Random rand = new Random();
double v = rand.nextGaussian(); // sample from a normal
distribution with a mean of 0 and 1 standard deviation
double y = v*v;
Inverse Gaussian distribution 63

double x = mu + (mu*mu*y)/(2*lambda) - (mu/(2*lambda)) *


Math.sqrt(4*mu*lambda*y + mu*mu*y*y);
double test = rand.nextDouble(); // sample from a uniform
distribution between 0 and 1
if (test <= (mu)/(mu + x))
return x;
else
return (mu*mu)/x;
}

See also
• Generalized inverse Gaussian distribution
• Tweedie distributions

References
• The inverse gaussian distribution: theory, methodology, and applications by Raj Chhikara and Leroy Folks, 1989
ISBN 0-8247-7997-5
• System Reliability Theory by Marvin Rausand and Arnljot Høyland
• The Inverse Gaussian Distribution by Dr. V. Seshadri, Oxford Univ Press, 1993

External links
• Inverse Gaussian Distribution [1] in Wolfram website.

References
[1] http:/ / mathworld. wolfram. com/ InverseGaussianDistribution. html
Laplace distribution 64

Laplace distribution
Laplace

Probability density function

Cumulative distribution function

parameters: location (real)


scale (real)
support:
pdf:

cdf: see text


mean:
median:
mode:
variance:
skewness:
ex.kurtosis:
entropy:
mgf:
for

cf:

In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after
Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as
two exponential distributions (with an additional location parameter) spliced together back-to-back, but the term
double exponential distribution is also sometimes used to refer to the Gumbel distribution. The difference between
two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a
Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance
gamma process evaluated over the time scale also have a Laplace distribution.
Laplace distribution 65

Characterization

Probability density function


A random variable has a Laplace(μ, b) distribution if its probability density function is

Here, μ is a location parameter and b > 0 is a scale parameter. If μ = 0 and b = 1, the positive half-line is exactly an
exponential distribution scaled by 1/2.
The pdf of the Laplace distribution is also reminiscent of the normal distribution; however, whereas the normal
distribution is expressed in terms of the squared difference from the mean μ, the Laplace density is expressed in
terms of the absolute difference from the mean. Consequently the Laplace distribution has fatter tails than the normal
distribution.

Cumulative distribution function


The Laplace distribution is easy to integrate (if one distinguishes two symmetric cases) due to the use of the absolute
value function. Its cumulative distribution function is as follows:

The inverse cumulative distribution function is given by

Generating random variables according to the Laplace distribution


Given a random variable U drawn from the uniform distribution in the interval (-1/2, 1/2], the random variable

has a Laplace distribution with parameters μ and b. This follows from the inverse cumulative distribution function
given above.
A Laplace(0, b) variate can also be generated as the difference of two i.i.d. Exponential(1/b) random variables.
Equivalently, a Laplace(0, 1) random variable can be generated as the logarithm of the ratio of two iid uniform
random variables.
Laplace distribution 66

Parameter estimation
Given N independent and identically distributed samples x1, x2, ..., xN, an estimator of is the sample median,[1]
and the maximum likelihood estimator of b is

(revealing a link between the Laplace distribution and least absolute deviations).

Moments

Related distributions
• If then is an exponential distribution.
• If and independent of , then
.
• If and independent of , then

• If and independent of , then
.
• The generalized Gaussian distribution (version 1) equals the Laplace distribution when its shape parameter is
set to 1. The scale parameter is then equal to .

Relation to the exponential distribution


A Laplace random variable can be represented as the difference of two iid exponential random variables. One way to
show this is by using the characteristic function approach. For any set of independent continuous random variables,
for any linear combination of those variables, its characteristic function (which uniquely determines the distribution)
can be acquired by multiplying the correspond characteristic functions.
Consider two i.i.d random variables . The characteristic functions for are

, respectively. On multiplying these characteristic functions (equivalent to the characteristic

function of the sum of therandom variables ), the result is .

This is the same as the characteristic function for , which is .


Laplace distribution 67

Sargan distributions
Sargan distributions are a system of distributions of which the Laplace distribution is a core member. A p'th order
Sargan distribution has density[2] [3]

for parameters α > 0, βj   ≥ 0. The Laplace distribution results for p=0.

See also
• Log-Laplace distribution
• Cauchy distribution, also called the "Lorentzian distribution" (the Fourier transform of the Laplace)
• Characteristic function (probability theory)

References
[1] Robert M. Norton (May 1984). "The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator" (http:/ /
www. jstor. org/ pss/ 2683252). The American Statistician (American Statistical Association) 38 (2): 135–136. doi:10.2307/2683252. .
[2] Everitt, B.S. (2002) The Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-x
[3] Johnson, N.L., Kotz S., Balakrishnan, N. (1994) Continuous Univariate Distributions, Wiley. ISBN 0-471-58495-9. p. 60
Lévy distribution 68

Lévy distribution
Lévy (unshifted)

Probability density function

Cumulative distribution function

parameters:

support:
pdf:

cdf:

mean: infinite
median:
mode:

variance: infinite
skewness: undefined
ex.kurtosis: undefined
entropy:
is Euler
gamma
mgf: undefined
cf:

In probability theory and statistics, the Lévy distribution, named after Paul Pierre Lévy, is a continuous probability
distribution for a non-negative random variable. In spectroscopy this distribution, with frequency as the dependent
variable, is known as a van der Waals profile.[1]
Lévy distribution 69

It is one of the few distributions that are stable and that have probability density functions that are analytically
expressible, the others being the normal distribution and the Cauchy distribution. All three are special cases of the
stable distributions, which does not generally have an analytically expressible probability density function.

Definition
The probability density function of the Lévy distribution over the domain is

where is the location parameter and is the scale parameter. The cumulative distribution function is

where is the complementary error function. The shift parameter has the effect of shifting the curve to
the right by an amount , and changing the support to the interval [ , ). Like all stable distributions, the
Levy distribution has a standard form f(x;0,1) which has the following property:

where y is defined as

The characteristic function of the Lévy distribution is given by

Note that the characteristic function can also be written in the same form used for the stable distribution with
and :

Assuming , the nth moment of the unshifted Lévy distribution is formally defined by:

which diverges for all n > 0 so that the moments of the Lévy distribution do not exist. The moment generating
function is then formally defined by:

which diverges for and is therefore not defined in an interval around zero, so that the moment generating
function is not defined per se. Like all stable distributions except the normal distribution, the wing of the probability
density function exhibits heavy tail behavior falling off according to a power law:

This is illustrated in the diagram below, in which the probability density functions for various values of c and
are plotted on a log-log scale.
Lévy distribution 70

Probability density function for the Lévy distribution on a log-log scale.

Related distributions
• Relation to stable distribution: If then
• Relation to Scale-inverse-chi-square distribution: If then
• Relation to inverse gamma distribution: If then

• Relation to Normal distribution: If then

• Relation to Folded normal distribution: If then

Applications
• The Lévy distribution is of interest to the financial modeling community due to its empirical similarity to the
returns of securities.
• It is claimed that fruit flies follow a form of the distribution to find food (Lévy flight).[2]
• The frequency of geomagnetic reversals appears to follow a Lévy distribution
• The time of hitting a single point (different from the starting point 0) by the Brownian motion has the Lévy
distribution.
• The length of the path followed by a photon in a turbid medium follows the Lévy distribution. [3]
• The Lévy distribution has been used post 1987 crash by the Options Clearing Corporation for setting margin
requirements because its parameters are more robust to extreme events than those of a normal distribution, and
thus extreme events do not suddenly increase margin requirements which may worsen a crisis.[4]
• The statistics of solar flares are described by a non-Gaussian distribution. The solar flare statistics were shown to
be describable by a Lévy distribution and it was assumed that intermittent solar flares perturb the intrinsic
fluctuations in Earth’s average temperature. The end result of this perturbation is that the statistics of the
temperature anomalies inherit the statistical structure that was evident in the intermittency of the solar flare data.
[5]
Lévy distribution 71

References
• "Information on stable distributions" [6]. Retrieved July 13 2005. - John P. Nolan's introduction to stable
distributions, some papers on stable laws, and a free program to compute stable densities, cumulative distribution
functions, quantiles, estimate parameters, etc. See especially An introduction to stable distributions, Chapter 1 [7]

References
[1] "van der Waals profile" appears with lowercase "van" in almost all sources, such as: Statistical mechanics of the liquid surface by Clive
Anthony Croxton, 1980, A Wiley-Interscience publication, ISBN 0471276634, 9780471276630, (http:/ / books. google. it/
books?id=Wve2AAAAIAAJ& q="Van+ der+ Waals+ profile"& dq="Van+ der+ Waals+ profile"& hl=en); and in Journal of technical
physics, Volume 36, by Instytut Podstawowych Problemów Techniki (Polska Akademia Nauk), publisher: Państwowe Wydawn. Naukowe.,
1995, (http:/ / books. google. it/ books?id=2XpVAAAAMAAJ& q="Van+ der+ Waals+ profile"& dq="Van+ der+ Waals+ profile"& hl=en)
[2] "The Lévy distribution as maximizing one's chances of finding a tasty snack" (http:/ / www. livescience. com/ animalworld/
070403_fly_tricks. html). . Retrieved April 7 2007.
[3] Rogers, Geoffrey L, Multiple path analysis of reflectance from turbid media. Journal of the Optical Society of America A, 25:11, p 2879-2883
(2008).
[4] Do economists make markets?: on the performativity of economics (http:/ / books. google. com/ books?id=7BkByw1gtigC) by Donald A.
MacKenzie, Fabian Muniesa, Lucia Siu, Princeton University Press, 2007, ISBN 978 0 69113016 3, p. 80 (http:/ / books. google. com/
books?id=7BkByw1gtigC& pg=PA80)
[5] Scafetta, N., Bruce, J.W., Is climate sensitive to solar variability? Physics Today, 60, 50-51 (2008) (http:/ / www. fel. duke. edu/ ~scafetta/
pdf/ opinion0308. pdf).
[6] http:/ / academic2. american. edu/ ~jpnolan/ stable/ stable. html
[7] http:/ / academic2. american. edu/ ~jpnolan/ stable/ chap1. pdf

Log-logistic distribution
Log-logistic

Probability density function

values of as shown in legend


Log-logistic distribution 72

Cumulative distribution function

values of as shown in legend

parameters: scale
shape
support:
pdf:

cdf:

mean:

if , else undefined
median:
mode:

if , 0 otherwise
variance: See main text
skewness:
ex.kurtosis:
entropy:
mgf:
cf:

In probability and statistics, the log-logistic distribution (known as the Fisk distribution in economics) is a
continuous probability distribution for a non-negative random variable. It is used in survival analysis as a parametric
model for events whose rate increases initially and decreases later, for example mortality from cancer following
diagnosis or treatment. It has also been used in hydrology to model stream flow and precipitation, and in economics
as a simple model of the distribution of wealth or income.
The log-logistic distribution is the probability distribution of a random variable whose logarithm has a logistic
distribution. It is similar in shape to the log-normal distribution but has heavier tails. Its cumulative distribution
function can be written in closed form, unlike that of the log-normal.
Log-logistic distribution 73

Characterisation
There are several different parameterizations of the distribution in use. The one shown here gives reasonably
interpretable parameters and a simple form for the cumulative distribution function.[1] [2] The parameter is a
scale parameter and is also the median of the distribution. The parameter is a shape parameter. The
distribution is unimodal when and its dispersion decreases as increases.
The cumulative distribution function is

where , ,
The probability density function is

Properties

Moments
The th raw moment exists only when when it is given by[3] [4]

where B() is the beta function. Expressions for the mean, variance, skewness and kurtosis can be derived from this.
Writing for convenience, the mean is

and the variance is

Explicit expressions for the skewness and kurtosis are lengthy.[5] As tends to infinity the mean tends to , the
variance and skewness tend to zero and the excess kurtosis tends to 6/5 (see also related distributions below).
Log-logistic distribution 74

Quantiles
The quantile function (inverse cumulative distribution function) is :

It follows that the median is , the lower quartile is and the upper quartile is .

Applications

Survival analysis
The log-logistic distribution provides one
parametric model for survival analysis. Unlike the
more commonly-used Weibull distribution, it can
have a non-monotonic hazard function: when
the hazard function is unimodal (when
 ≤ 1, the hazard decreases monotonically). The
fact that the cumulative distribution function can
be written in closed form is particularly useful for
analysis of survival data with censoring.[6] The
log-logistic distribution can be used as the basis
of an accelerated failure time model by allowing
to differ between groups, or more generally by Hazard function. values of as shown in legend
introducing covariates that affect but not
by modelling as a linear function of the
[7]
covariates.
The survival function is

and so the hazard function is

Hydrology
The log-logistic distribution has been used in hydrology for modelling stream flow rates and precipitation.[1] [2]

Economics
The log-logistic has been used as a simple model of the distribution of wealth or income in economics, where it is
known as the Fisk distribution.[8] Its Gini coefficient is .[9]

Related distributions
• If X has a log-logistic distribution with scale parameter and shape parameter then Y = log(X) has a logistic
distribution with location parameter and scale parameter .
• As the shape parameter of the log-logistic distribution increases, its shape increasingly resembles that of a
(very narrow) logistic distribution. Informally, as →∞,
Log-logistic distribution 75

• The log-logistic distribution with shape parameter and scale parameter is the same as the generalized
Pareto distribution with location parameter , shape parameter and scale parameter

Generalizations
Several different distributions are sometimes referred to as the generalized log-logistic distribution, as they contain
the log-logistic as a special case.[9] These include the Burr Type XII distribution (also known as the Singh-Maddala
distribution) and the Dagum distribution, both of which include a second shape parameter. Both are in turn special
cases of the even more general generalized beta distribution of the second kind. Another more straightforward
generalization of the log-logistic is given in the next section.

Shifted log-logistic distribution

Shifted log-logistic

Probability density function

values of as shown in legend

Cumulative distribution function

values of as shown in legend


Log-logistic distribution 76

parameters: location (real)


scale (real)
shape (real)
support:

pdf:

where
cdf:

where
mean:

where
median:
mode:

variance:

where
skewness:
ex.kurtosis:
entropy:
mgf:
cf:

The shifted log-logistic distribution is also known as the generalized log-logistic or the three-parameter
log-logistic distribution.[10] [11] It has also been called the generalized logistic distribution,[12] but this conflicts with
other uses of the term. It can be obtained from the log-logistic distribution by addition of a shift parameter : if
has a log-logistic distribution then has a shifted log-logistic distribution. So has a shifted log-logistic
distribution if has a logistic distribution. The shift parameter adds a location parameter to the scale and
shape parameters of the (unshifted) log-logistic.
The properties of this distribution are straightforward to derive from those of the log-logistic distribution. However,
an alternative parameterisation, similar to that used for the generalized Pareto distribution and the generalized
extreme value distribution, gives more interpretable parameters and also aids their estimation.
In this parameterisation, the cumulative distribution function of the shifted log-logistic distribution is

for , where is the location parameter, the scale parameter and the
[12] [13]
shape parameter. Note that some references use to parameterise the shape.
The probability density function is
Log-logistic distribution 77

again, for
The shape parameter is often restricted to lie in [-1,1], when the probability density function is bounded. When
, it has an asymptote at . Reversing the sign of reflects the pdf and the cdf about
.

Related distributions
• When the shifted log-logistic reduces to the log-logistic distribution.
• When → 0, the shifted log-logistic reduces to the logistic distribution.
• The shifted log-logistic with shape parameter is the same as the generalized Pareto distribution with shape
parameter

Applications
The three-parameter log-logistic distribution is used in hydrology for modelling flood frequency.[12] [13] [14]

See also
• Probability distributions: List of important distributions supported on semi-infinite intervals

References
[1] Shoukri, M.M.; Mian, I.U.M.; Tracy, D.S. (1988), "Sampling Properties of Estimators of the Log-Logistic Distribution with Application to
Canadian Precipitation Data" (http:/ / links. jstor. org/ sici?sici=0319-5724(198809)16:3<223:SPOEOT>2. 0. CO;2-E), The Canadian Journal
of Statistics (The Canadian Journal of Statistics / La Revue Canadienne de Statistique, Vol. 16, No. 3) 16 (3): 223–236, doi:10.2307/3314729,
[2] Ashkar, Fahim; Mahdi, Smail (2006), "Fitting the log-logistic distribution by generalized moments", Journal of Hydrology 328: 694–703,
doi:10.1016/j.jhydrol.2006.01.014
[3] Tadikamalla, Pandu R.; Johnson, Norman L. (1982), "Systems of Frequency Curves Generated by Transformations of Logistic Variables"
(http:/ / links. jstor. org/ sici?sici=0006-3444(198208)69:2<461:SOFCGB>2. 0. CO;2-Y), Biometrika 69 (2): 461–465,
doi:10.1093/biomet/69.2.461,
[4] Tadikamalla, Pandu R. (1980), "A Look at the Burr and Related Distributions" (http:/ / links. jstor. org/
sici?sici=0306-7734(198012)48:3<337:ALATBA>2. 0. CO;2-Z), International Statistical Review (International Statistical Review / Revue
Internationale de Statistique, Vol. 48, No. 3) 48 (3): 337–344, doi:10.2307/1402945,
[5] McLaughlin, Michael P. (2001), A Compendium of Common Probability Distributions (http:/ / www. causascientia. org/ math_stat/ Dists/
Compendium. pdf), p. A-37, , retrieved 2008-02-15
[6] Bennett, Steve (1983), "Log-Logistic Regression Models for Survival Data" (http:/ / links. jstor. org/
sici?sici=0035-9254(1983)32:2<165:LRMFSD>2. 0. CO;2-F), Applied Statistics (Journal of the Royal Statistical Society. Series C (Applied
Statistics), Vol. 32, No. 2) 32 (2): 165–171, doi:10.2307/2347295,
[7] Collett, Dave (2003), Modelling Survival Data in Medical Research (2nd ed.), CRC press, ISBN 1584883251
[8] Fisk, P.R. (1961), "The Graduation of Income Distributions" (http:/ / links. jstor. org/ sici?sici=0012-9682(196104)29:2<171:TGOID>2. 0.
CO;2-Y), Econometrica (Econometrica, Vol. 29, No. 2) 29 (2): 171–185, doi:10.2307/1909287,
[9] Kleiber, C.; Kotz, S (2003), Statistical Size Distributions in Economics and Actuarial Sciences, Wiley, ISBN 0471150649
[10] Venter, Gary G. (Spring 1994), "Introduction to selected papers from the variability in reserves prize program" (http:/ / www. casact. org/
pubs/ forum/ 94spforum/ 94spf091. pdf), Casualty Actuarial Society Forum 1: 91–101,
[11] Geskus, Ronald B. (2001), "Methods for estimating the AIDS incubation time distribution when date of seroconversion is censored",
Statistics in Medicine 20 (5): 795–812, doi:10.1002/sim.700, PMID 11241577
[12] Hosking, Jonathan R. M.; Wallis, James R (1997), Regional Frequency Analysis: An Approach Based on L-Moments, Cambridge University
Press, ISBN 0521430453
[13] Robson, A.; Reed, D. (1999), Flood Estimation Handbook, 3: "Statistical Procedures for Flood Frequency Estimation", Wallingford,
UK: Institute of Hydrology, ISBN 0948540893
[14] Ahmad, M. I.; Sinclair, C. D.; Werritty, A. (1988), "Log-logistic flood frequency analysis", Journal of Hydrology 98: 205–224,
doi:10.1016/0022-1694(88)90015-7
Log-normal distribution 78

Log-normal distribution
Log-normal

Probability density function

Cumulative distribution function

notation:
parameters: σ2 > 0 — squared scale (real),
μ ∈ R — location
support: x ∈ (0, +∞)
pdf:

cdf:

mean:
median:
mode:
variance:
skewness:

ex.kurtosis:
Log-normal distribution 79

entropy:

mgf: (defined only on the negative half-axis, see text)


cf:
representation is asymptotically divergent but sufficient for numerical purposes

Fisher information:

In probability theory, a log-normal distribution is a probability distribution of a random variable whose logarithm
is normally distributed. If Y is a random variable with a normal distribution, then X = exp(Y) has a log-normal
distribution; likewise, if X is log-normally distributed, then Y = log(X) is normally distributed. (This is true regardless
of the base of the logarithmic function: if loga(Y) is normally distributed, then so is logb(Y), for any two positive
numbers a, b ≠ 1.)
Log-normal is also written log normal or lognormal. It is occasionally referred to as the Galton distribution or
Galton's distribution, after Francis Galton.
A variable might be modeled as log-normal if it can be thought of as the multiplicative product of many independent
random variables each of which is positive. For example, in finance, a long-term discount factor can be derived from
the product of short-term discount factors. In wireless communication, the attenuation caused by shadowing or slow
fading from random objects is often assumed to be log-normally distributed. See log-distance path loss model.

Characterization

Probability density function


The probability density function of a log-normal distribution is:

where μ and σ are the mean and standard deviation of the variable’s natural logarithm (by definition, the variable’s
logarithm is normally distributed).

Cumulative distribution function

where erfc is the complementary error function, and Φ is the standard normal cdf.

Mean and standard deviation


If X is a lognormally distributed variable, its expected value (mean), variance, and standard deviation are

Equivalently, parameters μ and σ can be obtained if the values of mean and variance are known:
Log-normal distribution 80

The geometric mean of the log-normal distribution is , and the geometric standard deviation is equal to .

Mode and median


The mode is the point of global maximum of the pdf function. In particular, it solves the equation (ln ƒ)′ = 0:

The median is such a point where FX = ½:

Confidence interval
If X is distributed log-normally with parameters μ and σ, then the (1 − α)-confidence interval for X will be

where q* is the (1 − α/2)-quantile of the standard normal distribution: q* = Φ−1(1 − α/2).

Moments
For any real or complex number s, the sth moment of log-normal X is given by

A log-normal distribution is not uniquely determined by its moments E[Xk] for k ≥ 1, that is, there exists some other
distribution with the same moments for all k. In fact, there is a whole family of distributions with the same moments
as the log-normal distribution.

Characteristic function and moment generating function


The characteristic function E[e itX] has a number of representations. The integral itself converges for Im(t) ≤ 0. The
simplest representation is obtained by Taylor expanding e itX and using formula for moments above.

This series representation is divergent for Re(σ2) > 0, however it is sufficient for numerically evaluating the
characteristic function at positive as long as the upper limit in sum above is kept bounded, n ≤ N, where

and σ2 < 0.1. To bring the numerical values of parameters μ, σ into the domain where strong inequality holds true
one could use the fact that if X is log-normally distributed then Xm is also log-normally distributed with parameters
μm, σm. Since , the inequality could be satisfied for sufficiently small m. The sum of series first
converges to the value of φ(t) with arbitrary high accuracy if m is small enough, and left part of the strong inequality
is satisfied. If considerably larger number of terms are taken into account the sum eventually diverges when the right
part of the strong inequality is no longer valid.
Another useful representation was derived by Roy Lepnik (see references by this author and by Daniel Dufresne
below) by means of double Taylor expansion of e(ln x − μ)2/(2σ2).
The moment-generating function for the log-normal distribution does not exist on the domain R, but only exists on
the half-interval (−∞, 0].
Log-normal distribution 81

Partial expectation
The partial expectation of a random variable X with respect to a threshold k is defined as g(k) = E[X | X > k]P[X > k].
For a log-normal random variable the partial expectation is given by

This formula has applications in insurance and economics, it is used in solving the partial differential equation
leading to the Black–Scholes formula.

Maximum likelihood estimation of parameters


For determining the maximum likelihood estimators of the log-normal distribution parameters μ and σ, we can use
the same procedure as for the normal distribution. To avoid repetition, we observe that

where by ƒL we denote the probability density function of the log-normal distribution and by ƒN that of the normal
distribution. Therefore, using the same indices to denote distributions, we can write the log-likelihood function thus:
Since the first term is constant with regard to μ and σ, both logarithmic likelihood functions, ℓL ℓL and ℓN, reach their
maximum with the same μ and σ. Hence, using the formulas for the normal distribution maximum likelihood
parameter estimators and the equality above, we deduce that for the log-normal distribution it holds that

Generating log-normally-distributed random variates


Given a random variate N drawn from the normal distribution with 0 mean and 1 standard deviation, then the variate

has a Log-normal distribution with parameters and .

Related distributions
• If is a normal distribution, then
• If is distributed log-normally, then is a normal random variable.
• If are n independent log-normally distributed variables, and , then Y is
also distributed log-normally:

• Let be independent log-normally distributed variables with possibly varying σ and μ


parameters, and . The distribution of Y has no closed-form expression, but can be reasonably
approximated by another log-normal distribution Z at the right tail. Its probability density function at the
neighborhood of 0 is characterized in (Gao et al., 2009) and it does not resemble any log-normal distribution. A
commonly used approximation (due to Fenton and Wilkinson) is obtained by matching the mean and variance:

In the case that all have the same variance parameter , these formulas simplify to
Log-normal distribution 82

• If , then X + c is said to have a shifted log-normal distribution with support x ∈ (c, +∞).
E[X + c] = E[X] + c, Var[X + c] = Var[X].
• If , then Y = aX is also log-normal,
• If , then Y = 1⁄X is also log-normal,
• If and a ≠ 0, then Y = Xa is also log-normal,

Similar distributions
• A substitute for the log-normal whose integral can be expressed in terms of more elementary functions (Swamee,
2002) can be obtained based on the logistic distribution to get the CDF

This is a log-logistic distribution.


• An exGaussian distribution is the distribution of the sum of a normally distributed random variable and an
exponentially distributed random variable. This has a similar long tail, and has been used as a model for reaction
times.

Further reading
• Robert Brooks, Jon Corson, and J. Donal Wales. "The Pricing of Index Options When the Underlying Assets All
Follow a Lognormal Diffusion" [1], in Advances in Futures and Options Research, volume 7, 1994.

References
[1] http:/ / papers. ssrn. com/ sol3/ papers. cfm?abstract_id=5735

• The Lognormal Distribution, Aitchison, J. and Brown, J.A.C. (1957)


• Log-normal Distributions across the Sciences: Keys and Clues (http://stat.ethz.ch/~stahel/lognormal/
bioscience.pdf), E. Limpert, W. Stahel and M. Abbt,. BioScience, 51 (5), p. 341–352 (2001).
• Eric W. Weisstein et al. Log Normal Distribution (http://mathworld.wolfram.com/LogNormalDistribution.
html) at MathWorld. Electronic document, retrieved October 26, 2006.
• Swamee, P.K. (2002). Near Lognormal Distribution (http://scitation.aip.org/getabs/servlet/
GetabsServlet?prog=normal&id=JHYEFF000007000006000441000001&idtype=cvips&gifs=yes), Journal of
Hydrologic Engineering. 7(6): 441-444
• Roy B. Leipnik (1991), On Lognormal Random Variables: I - The Characteristic Function (http://anziamj.
austms.org.au/V32/part3/Leipnik.html), Journal of the Australian Mathematical Society Series B, vol. 32, pp
327–347.
• Gao et al. (2009), (http://www.hindawi.com/journals/ijmms/2009/630857.html), Asymptotic Behaviors of
Tail Density for Sum of Correlated Lognormal Variables. International Journal of Mathematics and Mathematical
Sciences.
• Daniel Dufresne (2009), (http://www.soa.org/library/proceedings/arch/2009/arch-2009-iss1-dufresne.pdf),
SUMS OF LOGNORMALS, Centre for Actuarial Studies, University of Melbourne.
Log-normal distribution 83

See also
• Normal distribution
• Geometric mean
• Geometric standard deviation
• Error function
• Log-distance path loss model
• Slow fading
• Stochastic volatility
Logistic distribution 84

Logistic distribution
Logistic

Probability density function

Cumulative distribution function

parameters: location (real)


scale (real)
support:
pdf:

cdf:

mean:
median:
mode:
variance:

skewness:
ex.kurtosis:
entropy:
mgf:
for , Beta
function
cf:
for

In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative
distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It
resembles the normal distribution in shape but has heavier tails (higher kurtosis).
Logistic distribution 85

Specification

Cumulative distribution function


The logistic distribution receives its name from its cumulative distribution function (cdf), which is an instance of the
family of logistic functions:

In this equation, x is the random variable, μ is the mean, and s is a parameter proportional to the standard deviation.

Probability density function


The probability density function (pdf) of the logistic distribution is given by:

Because the pdf can be expressed in terms of the square of the hyperbolic secant function "sech", it is sometimes
referred to as the sech-square(d) distribution.
See also: hyperbolic secant distribution

Quantile function
The inverse cumulative distribution function of the logistic distribution is , a generalization of the logit
function, defined as follows:

Alternative parameterization
An alternative parameterization of the logistic distribution can be derived using the substitution .
This yields the following density function:

Applications
The logistic distribution and the S-shaped pattern that results from it have been extensively used in many different
areas, including:
• Biology – to describe how species populations grow in competition[1]
• Epidemiology – to describe the spreading of epidemics[2]
• Psychology – to describe learning[3]
• Technology – to describe how new technologies diffuse and substitute for each other[4]
• Market – the diffusion of new-product sales[5]
• Energy – the diffusion and substitution of primary energy sources[6]
Logistic distribution 86

Both the United States Chess Federation and FIDE have switched their formulas for calculating chess ratings from
the normal distribution to the logistic distribution; see Elo rating system.

Related distributions
If log(X) has a logistic distribution then X has a log-logistic distribution and X – a has a shifted log-logistic
distribution.

Derivations

Expected Value

Substitute:

Note the odd function:

Higher order moments


The n-th order central moment can be expressed in terms of the quantile function:

This integral is well-known[7] and can be expressed in terms of Bernoulli numbers:

See also
• Generalized logistic distribution
• Logistic regression
• Sigmoid function

References
• N., Balakrishnan (1992). Handbook of the Logistic Distribution. Marcel Dekker, New York.
ISBN 0-8247-8587-8.
• Johnson, N. L., Kotz, S., Balakrishnan N. (1995). Continuous Univariate Distributions. Vol. 2 (2nd Ed. ed.).
ISBN 0-471-58494-0.
Logistic distribution 87

References
[1] P. F. Verhulst, "Recherches mathématiques sur la loi d'accroissement de la population", Nouveaux Mémoirs de l'Académie Royale des
Sciences et des Belles-Lettres de Bruxelles, vol. 18 (1845); Alfred J. Lotka, Elements of Physical Biology, (Baltimore, MD: Williams &
Wilkins Co., 1925).
[2] Theodore Modis, Predictions: Society's Telltale Signature Reveals the Past and Forecasts the Future, Simon & Schuster, New York, 1992,
pp 97-105.
[3] Theodore Modis, Predictions: Society's Telltale Signature Reveals the Past and Forecasts the Future, Simon & Schuster, New York, 1992,
Chapter 2.
[4] J. C. Fisher and R. H. Pry , "A Simple Substitution Model of Technological Change", Technological Forecasting & Social Change, vol. 3, no.
1 (1971).
[5] Theodore Modis, Conquering Uncertainty, McGraw-Hill, New York, 1998, Chapter 1.
[6] Cesare Marchetti, "Primary Energy Substitution Models: On the Interaction between Energy and Society", Technological Forecasting &
Social Change, vol. 10, (1977).
[7] (http:/ / www. research. att. com/ ~njas/ sequences/ A001896)
Normal distribution 88

Normal distribution
Probability density function

The red line is the standard normal distribution


Cumulative distribution function

Colors match the image above


notation:
parameters: μ ∈ R — mean (location)
σ2 ≥ 0 — variance (squared scale)
support: x ∈ R   if σ2 > 0
x = μ   if σ2 = 0
pdf:

cdf:

mean: μ
median: μ
mode: μ
variance: σ2
skewness: 0
ex.kurtosis: 0
entropy:
mgf:

cf:

Fisher information:

In probability theory and statistics, the normal distribution, or Gaussian distribution, is an absolutely continuous
probability distribution whose cumulants of all orders above two are zero. The graph of the associated probability
density function is  “bell”-shaped, with peak at the mean, and is known as the Gaussian function or bell curve:[1]
Normal distribution 89

where parameters μ and σ 2 are the mean and the variance. The distribution with μ = 0 and σ 2 = 1 is called standard
normal.
The normal distribution is often used to describe, at least approximately, any variable that tends to cluster around the
mean. For example, the heights of adult males in the United States are roughly normally distributed, with a mean of
about 70 inches (1.8 m). Most men have a height close to the mean, though a small number of outliers have a height
significantly above or below the mean. A histogram of male heights will appear similar to a bell curve, with the
correspondence becoming closer if more data are used.
By the central limit theorem, under certain conditions the sum of a number of random variables with finite means
and variances approaches a normal distribution as the number of variables increases. For this reason, the normal
distribution is commonly encountered in practice, and is used throughout statistics, natural sciences, and social
sciences[2] as a simple model for complex phenomena. For example, the observational error in an experiment is
usually assumed to follow a normal distribution, and the propagation of uncertainty is computed using this
assumption.
The Gaussian distribution was named after Carl Friedrich Gauss, who introduced it in 1809 as a way of rationalizing
the method of least squares. One year later Laplace proved the first version of the central limit theorem,
demonstrating that the normal distribution occurs as a limiting distribution of arithmetic means of independent,
identically distributed random variables with finite second moment. For this reason the normal distribution is
sometimes called Laplacian, especially in French-speaking countries.

Definition
The simplest case of a normal distribution is known as the standard normal distribution, described by the probability
density function

The constant in this expression ensures that the total area under the curve ϕ(x) is equal to one,[proof] and 1⁄2 in the
exponent makes the  “width” of the curve (measured as half of the distance between the inflection points of the
curve) also equal to one. It is traditional[3] in statistics to denote this function with the Greek letter ϕ (phi), whereas
density functions for all other distributions are usually denoted with letters ƒ or p. The alternative glyph φ is also used
quite often, however within this article we reserve  “φ” to denote characteristic functions.
More generally, a normal distribution results from exponentiating a quadratic function (just as an exponential
distribution results from exponentiating a linear function):

This yields the classic  “bell curve” shape (provided that a < 0 so that the quadratic function is concave). Notice that
f(x) > 0 everywhere. One can adjust a to control the  “width” of the bell, then adjust b to move the central peak of the
bell along the x-axis, and finally adjust c to control the  “height” of the bell. For f(x) to be a true probability density
function over R, one must choose c such that (which is only possible when a < 0).
Rather than using a, b, and c, it is far more common to describe a normal distribution by its mean μ = −b/(2a) and
variance σ2 = −1/(2a). Changing to these new parameters allows us to rewrite the probability density function in a
convenient standard form,

Notice that for a standard normal distribution, μ = 0 and σ2 = 1. The last part of the equation above shows that any
other normal distribution can be regarded as a version of the standard normal distribution that has been stretched
horizontally by a factor σ and then translated rightward by a distance μ. Thus, μ specifies the position of the bell
Normal distribution 90

curve’s central peak, and σ specifies the  “width” of the bell curve.
The parameter μ is at the same time the mean, the median and the mode of the normal distribution. The parameter σ2
is called the variance; as for any random variable, it describes how concentrated the distribution is around its mean.
The square root of σ2 is called the standard deviation and is the width of the density function.
The normal distribution is usually denoted by N(μ, σ2).[4] Commonly the letter N is written in calligraphic font (typed
as \mathcal{N} in LaTeX). Thus when a random variable X is distributed normally with mean μ and variance σ2,
we write

Alternative formulations
Some authors[5] instead of σ2 use its reciprocal τ = σ−2, which is called the precision. This parameterization has an
advantage in numerical applications where σ2 is very close to zero and is more convenient to work with in analysis as
τ is a natural parameter of the normal distribution. Another advantage of using this parameterization is in the study of
conditional distributions in multivariate normal case.
The question which normal distribution should be called the  “standard” one is also answered differently by various
authors. Starting from the works of Gauss the standard normal was considered to be the one with variance σ2 = 1/2:

Stigler (1982) goes even further and suggests the standard normal with variance σ2 = 1/(2π):

According to the author, this formulation is advantageous because of a much simpler and easier-to-remember
formula, the fact that the pdf has unit height at zero, and simple approximate formulas for the quantiles of the
distribution.

Characterization
In the previous section the normal distribution was defined by specifying its probability density function. However
there are other ways to characterize a probability distribution. They include: the cumulative distribution function, the
moments, the cumulants, the characteristic function, the moment-generating function, etc.

Probability density function


The probability density function (pdf) of a random variable describes the relative frequencies of different values for
that random variable. The pdf of the normal distribution is given by the formula explained in detail in the previous
section:
This is a proper function only when the variance σ2 is not equal to zero. In that case this is a continuous smooth
function, defined on the entire real line, and which is called the  “Gaussian function”.
When σ2 = 0, the density function doesn’t exist. However we can consider a generalized function that would behave
in a manner similar to the regular density function (in the sense that it defines a measure on the real line, and it can
be plugged in into an integral in order to calculate expected values of different quantities):

This is the Dirac delta function, it is equal to infinity at x = μ and is zero elsewhere.
Properties:
• Function ƒ(x) is symmetric around the point x = μ, which is at the same time the mode, the median and the mean
of the distribution.
Normal distribution 91

• The inflection points of the curve occur one standard deviation away from the mean (i.e., at x = μ − σ and x = μ +
σ).
• The standard normal density ϕ(x) is an eigenfunction of the Fourier transform.
• The function is supersmooth of order 2, implying that it is infinitely differentiable.
• The first derivative of ϕ(x) is ϕ′(x) = −x·ϕ(x); the second derivative is ϕ′′(x) = (x2 − 1)ϕ(x). More generally, the
n-th derivative is given by ϕ(n)(x) = (−1)nHn(x)ϕ(x), where Hn is the Hermite polynomial of order n.[6]

Cumulative distribution function


The cumulative distribution function (cdf) describes probabilities for a random variable to fall in the intervals of the
form (−∞, x]. The cdf of the standard normal distribution is denoted with the capital Greek letter Φ (phi), and can be
computed as an integral of the probability density function:

This integral can only be expressed in terms of a special function erf, called the error function. The numerical
methods for calculation of the standard normal cdf are discussed below. For a generic normal random variable with
mean μ and variance σ2 > 0 the cdf will be equal to

For a normal distribution with zero variance, the cdf is the Heaviside step function:

The complement of the standard normal cdf, Q(x) = 1 − Φ(x), is referred to as the Q-function, especially in
engineering texts.[7] [8] This represents the tail probability of the Gaussian distribution, that is the probability that a
standard normal random variable X is greater than the number x. Other definitions of the Q-function, all of which are
simple transformations of Φ, are also used occasionally.[9]
Properties:
• The standard normal cdf is 2-fold rotationally symmetric around point (0, ½):  Φ(−x) = 1 − Φ(x).
• The derivative of Φ(x) is equal to the standard normal pdf ϕ(x):  Φ′(x) = ϕ(x).
• The antiderivative of Φ(x) is:  ∫ Φ(x) dx = x Φ(x) + ϕ(x).

Quantile function
The inverse of the standard normal cdf, called the quantile function or probit function, is expressed in terms of the
inverse error function:

Quantiles of the standard normal distribution are commonly denoted as zp. The quantile zp represents such a value
that a standard normal random variable X has the probability of exactly p to fall inside the (−∞, zp] interval. The
quantiles are used in hypothesis testing, construction of confidence intervals and Q-Q plots. The most  “famous”
normal quantile is 1.96 = z0.975. A standard normal random variable is greater than 1.96 in absolute value in only 5%
of cases.
For a normal random variable with mean μ and variance σ2, the quantile function is
Normal distribution 92

Characteristic function and moment generating function


The characteristic function φX(t) of a random variable X is defined as the expected value of eitX, where i is the
imaginary unit, and t ∈ R is the argument of the characteristic function. Thus the characteristic function is the Fourier
transform of the density ϕ(x). For a normally distributed X with mean μ and variance σ2, the characteristic function is
[10]

The moment generating function is defined as the expected value of etX. For a normal distribution, the moment
generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulants are nonzero.

Moments
The normal distribution has moments of all orders. That is, for a normally distributed X with mean μ and variance σ
2
, the expectation E|X|p exists and is finite for all p such that Re[p] > −1. Usually we are interested only in moments
of integer orders: p = 1, 2, 3, ….
• Central moments are the moments of X around its mean μ. Thus, a central moment of order p is the expected
value of (X − μ) p. Using standardization of normal random variables, this expectation will be equal to σ p · E[Zp],
where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every other number from n to 1.
• Central absolute moments are the moments of |X − μ|. They coincide with regular moments for all even orders,
but are nonzero for all odd p’s.

• Raw moments and raw absolute moments are the moments of X and |X| respectively. The formulas for these
moments are much more complicated, and are given in terms of confluent hypergeometric functions 1F1 and U.

These expressions remain valid even if p is not integer. See also generalized Hermite polynomials.
• First two cumulants are equal to μ and σ 2 respectively, whereas all higher-order cumulants are equal to zero.
Normal distribution 93

Order Raw moment Central moment Cumulant

1 μ 0 μ

2 μ2 + σ2 σ2 σ2

3 0 0
μ3 + 3μσ2

4 0
μ4 + 6μ2σ2 + 3σ4 3σ 4

5 0 0
μ5 + 10μ3σ2 + 15μσ4

6 0
μ6 + 15μ4σ2 + 45μ2σ4 + 15σ6 15σ 6

7 0 0
μ7 + 21μ5σ2 + 105μ3σ4 + 105μσ6

8 0
μ8 + 28μ6σ2 + 210μ4σ4 + 420μ2σ6 + 105σ8 105σ 8

Properties

Standardizing normal random variables


As a consequence of property 1, it is possible to relate all normal random variables to the standard normal. For
example if X is normal with mean μ and variance σ2, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely, having a standard normal
random variable Z we can always construct another normal random variable with specific mean μ and variance σ2:

This  “standardizing” transformation is convenient as it allows one to compute the pdf and especially the cdf of a
normal distribution having the table of pdf and cdf values for the standard normal. They will be related via

Standard deviation and confidence intervals


Normal distribution 94

About 68% of values drawn from a


normal distribution are within one
standard deviation σ > 0 away from the
mean μ; about 95% of the values are
within two standard deviations and
about 99.7% lie within three standard
deviations. This is known as the
68-95-99.7 rule, or the empirical rule,
or the 3-sigma rule.

To be more precise, the area under the


bell curve between μ − nσ and μ + nσ Dark blue is less than one standard deviation from the mean. For the normal distribution,
this accounts for about 68% of the set (dark blue), while two standard deviations from the
in terms of the cumulative normal
mean (medium and dark blue) account for about 95%, and three standard deviations
distribution function is given by (light, medium, and dark blue) account for about 99.7%.

where erf is the error function. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma points are:

i.e. 1 minus ... or 1 in ...

1 0.682689492137 0.317310507863 3.15148718753

2 0.954499736104 0.045500263896 21.9778945081

3 0.997300203937 0.002699796063 370.398347380

4 0.999936657516 0.000063342484 15,787.192684

5 0.999999426697 0.000000573303 1,744,278.331

6 0.999999998027 0.000000001973 506,842,375.7

The next table gives the reverse relation of sigma multiples corresponding to a few often used values for the area
under the bell curve. These values are useful to determine (asymptotic) confidence intervals of the specified levels
based on normally distributed (or asymptotically normal) estimators:

0.80 1.281551565545

0.90 1.644853626951

0.95 1.959963984540

0.98 2.326347874041

0.99 2.575829303549

0.995 2.807033768344

0.998 3.090232306168

0.999 3.290526731492

0.9999 3.890591886413

0.99999 4.417173413469
Normal distribution 95

where the value on the left of the table is the proportion of values that will fall within a given interval and n is a
multiple of the standard deviation that specifies the width of the interval.

Central limit theorem


The theorem states that under certain, fairly common conditions, the sum of a large number of random variables will
have an approximately normal distribution. For example if (x1, …, xn) is a sequence of iid random variables, each
having mean μ and variance σ2 but otherwise distributions of xi’s can be arbitrary, then the central limit theorem
states that

The theorem will hold even if the summands xi are not iid, although some constraints on the degree of dependence
and the growth rate of moments still have to be imposed.
The importance of the central limit theorem cannot be overemphasized. A great number of test statistics, scores, and
estimators encountered in practice contain sums of certain random variables in them, even more estimators can be
represented as sums of random variables through the use of influence functions — all of these quantities are
governed by the central limit theorem and will have asymptotically normal distribution as a result.
Another practical consequence of the central limit
theorem is that certain other distributions can be
approximated by the normal distribution, for example:
• The binomial distribution B(n, p) is approximately
normal N(np, np(1 − p)) for large n and for p not too
close to zero or one.
• The Poisson(λ) distribution is approximately normal
N(λ, λ) for large values of λ.
• The chi-squared distribution χ2(k) is approximately
normal N(k, 2k) for large ks.
As the number of discrete events increases, the function begins to
• The Student’s t-distribution t(ν) is approximately resemble a normal distribution
normal N(0, 1) when ν is large.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the
rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in
the tails of the distribution.
A general upper bound for the approximation error in the central limit theorem is given by the Berry–Esseen
theorem, improvements of the approximation are given by the Edgeworth expansions.

Miscellaneous
1. The family of normal distributions is closed under linear transformations. That is, if X is normally distributed
with mean μ and variance σ2, then a linear transform aX + b (for some real numbers a and b) is also normally
distributed:

Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and standard deviations σ1, σ2,
then their linear combination will also be normally distributed: [proof]

2. The converse of (1) is also true: if X1 and X2 are independent and their sum X1 + X2 is distributed normally, then
both X1 and X2 must also be normal. This is known as Cramér’s theorem. The interpretation of this property is that
Normal distribution 96

a normal distribution is only divisible by other normal distributions.


3. It is a common fallacy that if two normal random variables are uncorrelated then they are also independent. This
is false.[proof] The correct statement is that if the two random variables are jointly normal and uncorrelated, only
then they are independent.
4. Normal distribution is infinitely divisible: for a normally distributed X with mean μ and variance σ2 we can find n
independent random variables {X1, …, Xn} each distributed normally with means μ/n and variances σ2/n such that

5. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent N(μ, σ2) random variables and
a, b are arbitrary real numbers, then

where X3 is also N(μ, σ2). This relationship directly follows from property (1).
6. The Kullback–Leibler divergence between two normal distributions X1 ∼ N(μ1, σ21 )and X2 ∼ N(μ2, σ22 )is given
by:[11]

The Hellinger distance between the same distributions is equal to

7. The Fisher information matrix for normal distribution is diagonal and takes form

8. Normal distributions belongs to an exponential family with natural parameters and , and natural
statistics x and x . The dual, expectation parameters for normal distribution are η1 = μ and η2 = μ + σ2.
2 2

9. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution N(μ, σ2) is the
one with the maximum entropy.
10. The family of normal distributions forms a manifold with constant curvature −1. The same family is flat with
respect to the (±1)-connections ∇(e) and ∇(m).[12]

Related distributions
• If X is distributed normally with mean μ and variance σ2, then
• The exponent of X is distributed log-normally: eX ~ lnN (μ, σ2).
• The absolute value of X has folded normal distribution: IXI ~ Nf (μ, σ2). If μ = 0 this is known as the
half-normal distribution.
• The square of X/σ has the non-central chi-square distribution with one degree of freedom: X2/σ2 ~ χ21(μ2/σ2). If
μ = 0, the distribution is called simply chi-square.
• Variable X restricted to an interval [a, b] is called the truncated normal distribution.
• (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.
• If X1 and X2 are two independent standard normal random variables, then
• Their sum and difference is distributed normally with mean zero and variance two: X1 ± X2 ∼ N(0, 2).
• Their product Z = X1·X2 follows the  “product-normal” distribution[13] with density function fZ(z) = π−1K0(|z|),
where K0 is the modified Bessel function of the second kind. This distribution is symmetric around zero,
unbounded at z = 0, and has the characteristic function φZ(t) = (1 + t 2)−1/2.
• Their ratio follows the standard Cauchy distribution: X1 ÷ X2 ∼ Cauchy(0, 1).
Normal distribution 97

• Their Euclidean norm has the Rayleigh distribution, also known as the chi distribution with 2
degrees of freedom.
• If X1, X2, …, Xn are independent standard normal random variables, then the sum of their squares has the
chi-square distribution with n degrees of freedom: .
• If X1, X2, …, Xn are independent normally distributed random variables with means μ and variances σ2, then their
sample mean is independent from the sample standard deviation, which can be demonstrated using the Basu’s
theorem or Cochran’s theorem. The ratio of these two quantities will have the Student’s t-distribution with n − 1
degrees of freedom:
• If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio of their normalized
sums of squares will have the F-distribution with (n, m) degrees of freedom:

Extensions
The notion of normal distribution, being one of the most important distributions in probability theory, has been
extended far beyond the standard framework of the univariate (that is one-dimensional) case. All these extensions are
also called normal or Gaussian laws, so a certain ambiguity in names exists.
• Multivariate normal distribution describes the Gaussian law in the k-dimensional Euclidean space. A vector X ∈
Rk is multivariate-normally distributed if any linear combination of its components     has a
(univariate) normal distribution. The variance of X is a k×k symmetric positive-definite matrix V.
• Complex normal distribution deals with the complex normal vectors. A complex vector X ∈ Ck is said to be
normal if both its real and imaginary components jointly possess a 2k-dimensional multivariate normal
distribution. The variance-covariance structure of X is described by two matrices: the variance matrix Γ, and the
relation matrix C.
• Matrix normal distribution describes the case of normally distributed matrices.
• Gaussian processes are the normally distributed stochastic processes. These can be viewed as elements of some
infinite-dimensional Hilbert space H, and thus are the analogues of multivariate normal vectors for the case k = ∞.
A random element h ∈ H is said to be normal if for any constant a ∈ H the scalar product (a, h) has a (univariate)
normal distribution. The variance structure of such Gaussian random element can be described in terms of the
linear covariance operator K: H → H. Several Gaussian processes became popular enough to have their own
names:
• Brownian motion,
• Brownian bridge,
• Ornstein-Uhlenbeck process.
• Gaussian q-distribution is an abstract mathematical construction which represents a  “q-analogue” of the normal
distribution.
One of the main practical uses of the Gaussian law is to model the empirical distributions of many different random
variables encountered in practice. In such case a possible extension would be a richer family of distributions, having
more than two parameters and therefore being able to fit the empirical distribution more accurately. The examples of
such extensions are:
• Pearson distribution — a four-parametric family of probability distributions that extend the normal law to include
different skewness and kurtosis values.
Normal distribution 98

Normality tests
Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal distribution. Typically
the null hypothesis H0 is that the observations are distributed normally with unspecified mean μ and variance σ2,
versus the alternative Ha that the distribution is arbitrary. A great number of tests (over 40) have been devised for
this problem, the more prominent of them are outlined below:
• “Visual” tests are more intuitively appealing but subjective at the same time, as they rely on informal human
judgement to accept or reject the null hypothesis.
• Q-Q plot — is a plot of the sorted values from the data set against the expected values of the corresponding
quantiles from the standard normal distribution. That is, it’s a plot of point of the form (Φ−1(pk), x(k)), where
plotting points pk are equal to pk = (k−α)/(n+1−2α) and α is an adjustment constant which can be anything
between 0 and 1. If the null hypothesis is true, the plotted points should approximately lie on a straight line.
• P-P plot — similar to the Q-Q plot, but used much less frequently. This method consists of plotting the points
(Φ(z(k)), pk), where . For normally distributed data this plot should lie on a 45° line between
(0,0) and (1,1).
• Wilk–Shapiro test employs the fact that the line in the Q-Q plot has the slope of σ. The test compares the least
squares estimate of that slope with the value of the sample variance, and rejects the null hypothesis if these two
quantities differ significantly.
• Normal probability plot (rankit plot)
• Moment tests:
• D’Agostino’s K-squared test
• Jarque–Bera test
• Empirical distribution function tests:
• Kolmogorov–Smirnov test
• Lilliefors test
• Anderson–Darling test

Estimation of parameters
It is often the case that we don’t know the parameters of the normal distribution, but instead want to estimate them.
That is, having a sample (x1, …, xn) from a normal N(μ, σ2) population we would like to learn the approximate
values of parameters μ and σ2. The standard approach to this problem is the maximum likelihood method, which
requires maximization of the log-likelihood function:
Taking derivatives with respect to μ and σ2 and solving the resulting system of first order conditions yields the
maximum likelihood estimates:

Estimator is called the sample mean, since it is the arithmetic mean of all observations. The statistic is complete
and sufficient for μ, and therefore by the Lehmann–Scheffé theorem, is the uniformly minimum variance unbiased
(UMVU) estimator. In finite samples it is distributed normally:

The variance of this estimator is equal to the μμ-element of the inverse Fisher information matrix . This implies
that the estimator is finite-sample efficient. Of practical importance is the fact that the standard error of is
proportional to , that is, if one wishes to decrease the standard error by a factor of 10, one must increase the
number of points in the sample by a factor of 100. This fact is widely used in determining sample sizes for opinion
polls and the number of trials in Monte Carlo simulations.
Normal distribution 99

From the standpoint of the asymptotic theory, is consistent, that is, it converges in probability to μ as n → ∞. The
estimator is also asymptotically normal, which is a simple corollary of the fact that it is normal in finite samples:

The estimator is called the sample variance, since it is the variance of the sample (x1, …, xn). In practice, another
estimator is often used instead of the . This other estimator is denoted s2, and is also called the sample variance,
which represents a certain ambiguity in terminology; its square root s is called the sample standard deviation. The
estimator s2 differs from by having (n − 1) instead of n in the denominator (the so called Bessel’s correction):

The difference between s2 and becomes negligibly small for large n’s. In finite samples however, the motivation
behind the use of s2 is that it is an unbiased estimator of the underlying parameter σ2, whereas is biased. Also, by
the Lehmann–Scheffé theorem the estimator s2 is uniformly minimum variance unbiased (UMVU), which makes it
the  “best” estimator among all unbiased ones. However it can be shown that the biased estimator is  “better” than
the s2 in terms of the mean squared error (MSE) criterion. In finite samples both s2 and have scaled chi-squared
distribution with (n − 1) degrees of freedom:

The first of these expressions shows that the variance of s2 is equal to 2σ4/(n−1), which is slightly greater than the
σσ-element of the inverse Fisher information matrix . Thus, s2 is not an efficient estimator for σ2, and moreover,
since s2 is UMVU, we can conclude that the finite-sample efficient estimator for σ2 does not exist.
Applying the asymptotic theory, both estimators s2 and are consistent, that is they converge in probability to σ2 as
the sample size n → ∞. The two estimators are also both asymptotically normal:

In particular, both estimators are asymptotically efficient for σ2.


By Cochran’s theorem, for normal distribution the sample mean and the sample variance s2 are independent,
which means there can be no gain in considering their joint distribution. There is also a reverse theorem: if in a
sample the sample mean and sample variance are independent, then the sample must have come from the normal
distribution. The independence between and s can be employed to construct the so-called t-statistic:

This quantity t has the Student’s t-distribution with (n − 1) degrees of freedom, and it is an ancillary statistic
(independent of the value of the parameters). Inverting the distribution of this t-statistics will allow us to construct
the confidence interval for μ; similarly, inverting the χ2 distribution of the statistic s2 will give us the confidence
interval for σ2:
where tk,p and χk,p2 are the pth quantiles of the t- and χ2-distributions respectively. These confidence intervals are of
the level 1 − α, meaning that the true values μ and σ2 fall outside of these intervals with probability α. In practice
people usually take α = 5%, resulting in the 95% confidence intervals. The approximate formulas in the display
above were derived from the asymptotic distributions of and s2. The approximate formulas become valid for large
values of n, and are more convenient for the manual calculation since the standard normal quantiles zα/2 do not
depend on n. In particular, the most popular value of α = 5%, results in |z0.025| = 1.96.
Normal distribution 100

Occurrence
The occurrence of normal distribution in practical problems can be loosely classified into three categories:
1. Exactly normal distributions;
2. Approximately normal laws, for example when such approximation is justified by the central limit theorem; and
3. Distributions modeled as normal — the normal distribution being one of the simplest and most convenient to use,
frequently researchers are tempted to assume that certain quantity is distributed normally, without justifying such
assumption rigorously. In fact, the maturity of a scientific field can be judged by the prevalence of the normality
assumption in its methods.

Exact normality
Certain quantities in physics are distributed normally, as was first
demonstrated by James Clerk Maxwell. Examples of such quantities
are:
• Velocities of the molecules in the ideal gas. More generally,
velocities of the particles in any system in thermodynamic
equilibrium will have normal distribution, due to the maximum
entropy principle.
• Probability density function of a ground state in a quantum harmonic
oscillator.
• The density of an electron cloud in 1s state. The ground state of a quantum harmonic
oscillator has the Gaussian distribution.

• The position of a particle which experiences diffusion. If initially the particle is located at a specific point (that is
its probability distribution is a dirac delta function), then after time t its location is described by a normal
distribution with variance t, which satisfies the diffusion equation  . If the initial location is
given by a certain density function g(x), then the density at time t is the convolution of g and the normal pdf.

Approximate normality
Approximately normal distributions occur in many situations, as explained by the central limit theorem. When the
outcome is produced by a large number of small effects acting additively and independently, its distribution will be
close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively),
or if there is a single external influence which has a considerably larger magnitude than the rest of the effects.
• In counting problems, where the central limit theorem includes a discrete-to-continuum approximation and where
infinitely divisible and decomposable distributions are involved, such as
• Binomial random variables, associated with binary response variables;
• Poisson random variables, associated with rare events;
• Thermal light has a Bose–Einstein distribution on very short time scales, and a normal distribution on longer
timescales due to the central limit theorem.
Normal distribution 101

Assumed normality


I can only recognize the occurrence of the normal curve — the Laplacian curve of errors — as a very abnormal phenomenon. It is roughly
approximated to in certain distributions; for this reason, and on account for its beautiful simplicity, we may, perhaps, use it as a first
approximation, particularly in theoretical investigations. — Pearson (1901) ”
There are statistical methods to empirically test that assumption, see the #Normality tests section.
• In biology:
• The logarithm of measures of size of living tissue (length, height, skin area, weight);[14]
• The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth;
presumably the thickness of tree bark also falls under this category;
• Certain physiological measurements, such as blood pressure of adult humans (after separation on male/female
subpopulations).
• In finance, in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices, and
stock market indices are assumed normal (these variables behave like compound interest, not like simple interest,
and so are multiplicative). Some mathematicians such as Benoît Mandelbrot argue that log-Levy distributions
which possesses heavy tails would be a more appropriate model, in particular for the analysis for stock market
crashes.
• Measurement errors in physical experiments are often assumed to be normally distributed. This assumption
allows for particularly simple practical rules for how to combine errors in measurements of different quantities.
However, whether this assumption is valid or not in practice is debatable. A famous remark of Lippmann says: 
“Everyone believes in the [normal] law of errors: the mathematicians, because they think it is an experimental
fact; and the experimenters, because they suppose it is a theorem of mathematics.” [15]
• In standardized testing, results can be made to have a normal distribution. This is done by either selecting the
number and difficulty of questions (as in the IQ test), or by transforming the raw test scores into  “output” scores
by fitting them to the normal distribution. For example, the SAT’s traditional range of 200–800 is based on a
normal distribution with a mean of 500 and a standard deviation of 100.
• Many scores are derived from the normal distribution, including percentile ranks (  “percentiles” or   “quantiles”),
normal curve equivalents, stanines, z-scores, and T-scores. Additionally, a number of behavioral statistical
procedures are based on the assumption that scores are normally distributed; for example, t-tests and ANOVAs.
Bell curve grading assigns relative grades based on a normal distribution of scores.
Normal distribution 102

Generating values from normal distribution


For computer simulations, especially in applications of
Monte-Carlo method, it is often useful to generate
values that have a normal distribution. All algorithms
described here are concerned with generating the
standard normal, since a N(μ, σ2) can be generated as X
= μ + σZ, where Z is standard normal. The algorithms
rely on the availability of a random number generator
capable of producing random values distributed
uniformly.

• The most straightforward method is based on the


probability integral transform property: if U is
distributed uniformly on (0,1), then Φ−1(U) will
have the standard normal distribution. The drawback
The bean machine, a device invented by sir Francis Galton, can be
of this method is that it relies on calculation of the called the first generator of normal random variables. This machine
probit function Φ−1, which cannot be done consists of a vertical board with interleaved rows of pins. Small balls
analytically. Some approximate methods are are dropped from the top and then bounce randomly left or right as
they hit the pins. The balls are collected into bins at the bottom and
described in Hart (1968) and in the erf article.
settle down into a pattern resembling the Gaussian curve.
• A simple approximate approach that is easy to
program is as follows: simply sum 12 uniform (0,1) deviates and subtract 6 — the resulting random variable will
have approximately standard normal distribution. In truth, the distribution will be Irwin–Hall, which is a
12-section eleventh-order polynomial approximation to the normal distribution. This random deviate will have a
limited range of (−6, 6).[16]
• The Box–Muller method uses two independent random numbers U and V distributed uniformly on (0,1]. Then
two random variables X and Y

will both have the standard normal distribution, and be independent. This formulation arises because for a
bivariate normal random vector (X Y) the squared norm X2 + Y2 will have the chi-square distribution with two
degrees of freedom, which is an easily generated exponential random variable corresponding to the quantity
−2ln(U) in these equations; and the angle is distributed uniformly around the circle, chosen by the random
variable V.
• Marsaglia polar method is a modification of the Box–Muller method algorithm, which does not require
computation of functions sin() and cos(). In this method U and V are drawn from the uniform (−1,1)
distribution, and then S = U2 + V2 is computed. If S is greater or equal to one then the method starts over,
otherwise two quantities

are returned. Again, X and Y here will be independent and standard normally distributed.
• Ratio method[17] starts with generating two independent uniform deviates U and V. The algorithm proceeds as
follows:
• Compute X = √(8/e) (V − 0.5)/U;
• If X2 ≤ 5 − 4e1/4U then accept X and terminate algorithm;
• If X2 ≥ 4e−1.35/U + 1.4 then reject X and start over from step 1;
• If X2 ≤ −4 / lnU then accept X, otherwise start over the algorithm.
Normal distribution 103

• The ziggurat algorithm (Marsaglia & Tsang 2000) is faster than the Box–Muller transform and still exact. In
about 97% of all cases it uses only two random numbers, one random integer and one random uniform, one
multiplication and an if-test. Only in 3% of the cases where the combination of those two falls outside the  “core
of the ziggurat” a kind of rejection sampling using logarithms, exponentials and more uniform random numbers
has to be employed.
• There is also some investigation into the connection between the fast Hadamard transform and the normal
distribution, since the transform employs just addition and subtraction and by the central limit theorem random
numbers from almost any distribution will be transformed into the normal distribution. In this regard a series of
Hadamard transforms can be combined with random permutations to turn arbitrary data sets into a normally
distributed data.

Numerical approximations for the normal cdf


The standard normal cdf is widely used in scientific and statistical computing. The values Φ(x) may be approximated
very accurately by a variety of methods, such as numerical integration, Taylor series, asymptotic series and
continued fractions. Different approximations are used depending on the desired level of accuracy.
• Abramowitz & Stegun (1964) give the approximation for Φ(x) for x > 0 with the absolute error |ε(x)| < 7.5·10−8
(algorithm 26.2.17 [18]):where ϕ(x) is the standard normal pdf, and b0 = 0.2316419, b1 = 0.319381530, b2 =
−0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.
• Hart (1968) lists almost a hundred of rational function approximations for the erfc() function. His algorithms
vary in the degree of complexity and the resulting precision, with maximum absolute precision of 24 digits. An
algorithm by West (2009) combines Hart’s algorithm 5666 with a continued fraction approximation in the tail to
provide a fast computation algorithm with a 16-digit precision.
• Marsaglia (2004) suggested a simple algorithm[19] based on the Taylor series expansion for calculating Φ(x) with
arbitrary precision. The drawback of this algorithm is comparatively slow calculation time (for example it takes
over 300 iterations to calculate the function with 16 digits of precision when x = 10).
• The GNU Scientific Library calculates values of the standard normal cdf using Hart’s algorithms and
approximations with Chebyshev polynomials.

History
Some authors[20] [21] attribute at least partially the credit for the discovery of the normal distribution to de Moivre,
who in 1738 published in the second edition of his  “The Doctrine of Chances”[22] [23] the study of the coefficients in
the binomial expansion of (a + b)n. De Moivre proved that the middle term in this expansion has the approximate
magnitude of , and that  “If m or ½n be a Quantity infinitely great, then the Logarithm of the Ratio, which a
Term diſtant from the middle by the Interval ℓ, has to the middle Term, is .” Although this theorem can be
interpreted as the first obscure expression for the normal probability law, Stigler points out that de Moivre himself
did not interpret his results as anything more than the approximate rule for the binomial coefficients, and in
particular de Moivre lacked the concept of the probability density function.[24]
Normal distribution 104

In 1809 Gauss published the monograph  “Theoria motus corporum


coelestium in sectionibus conicis solem ambientium” where among
other things he introduces and describes several important statistical
concepts, such as the method of least squares, the method of maximum
likelihood, and the normal distribution. Gauss used M, M′, M′′, … to
denote the measurements of some unknown quantity V, and sought the 
“most probable” estimator: the one which maximizes the probability
φ(M−V) · φ(M′−V) · φ(M′′−V) · … of obtaining the observed
experimental results. In his notation φΔ is the probability law of the
measurement errors of magnitude Δ. Not knowing what the function φ
is, Gauss requires that his method should reduce to the well-known
answer: the arithmetic mean of the measured values.[25] Starting from
these principles, Gauss demonstrates that the only law which
rationalizes the choice of arithmetic mean as an estimator of the
Carl Friedrich Gauss invented the normal
location parameter, is the normal law of errors:[26]
distribution in 1809 as a way to rationalize the
method of least squares.

where h is  “the measure of the precision of the observations”. Using this normal law as a generic model for errors in
the experiments, Gauss formulates what is now known as the non-linear weighted least squares (NWLS) method.[27]
Although Gauss was the first to suggest the normal distribution law,
the merit of the contributions of Laplace cannot be underestimated.[28]
It was Laplace who first posed the problem of aggregating several
observations in 1774,[29] although his own solution led to the Laplacian
distribution. It was Laplace who first calculated the value of the
integral ∫ e−t ²dt = √π in 1782, providing the normalization constant
for the normal distribution.[30] Finally, it was Laplace who in 1810
proved and presented to the Academy the fundamental central limit
theorem, which emphasized the theoretical importance of the normal
distribution.[31]

It is of interest to note that in 1809 an American mathematician Adrain


published two derivations of the normal probability law,
simultaneously and independently from Gauss.[32] His works remained
unnoticed until 1871 when they were rediscovered by Abbe,[33] mainly
because the scientific community was virtually non-existent in the
Marquis de Laplace proved the central limit
United States at that time. theorem in 1810, consolidating the importance of
the normal distribution in statistics.
In the middle of the 19th century Maxwell demonstrated that the
normal distribution is not just a convenient mathematical tool, but may
also occur in natural phenomena:[34] “The number of particles whose velocity, resolved in a certain direction, lies
between x and x+dx is

Since its introduction, the normal distribution has been known by many different names: the law of error, the law of
facility of errors, Laplace’s second law, Gaussian law, etc. By the end of the 19th century some authors[35] start to
Normal distribution 105

occasionally use the name normal distribution, where the word “normal” is used as an adjective — the term was
derived from the fact that this distribution was seen as typical, common, normal. Around the turn of the 20th century
Pearson popularizes the term normal as a designation for this distribution.[36]


Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has
the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another ‘abnormal.’ — Pearson
(1920) ”
Also, it was Pearson who first wrote the distribution in terms of the standard deviation σ as in modern notation. Soon
after this, in year 1915, Fisher added the location parameter to the formula for normal distribution, expressing it in
the way it is written nowadays:

The term “standard normal” which denotes the normal distribution with zero mean and unit variance came into
general use around 1950s, appearing in the popular textbooks by P.G. Hoel (1947) “Introduction to mathematical
statistics” and A.M. Mood (1950) “Introduction to the theory of statistics”.[37]

See also
• Behrens–Fisher problem — the long-standing problem of testing whether two normal samples with different
variances have same means;
• Erdős-Kac theorem — on the occurrence of the normal distribution in number theory
• Gaussian blur — convolution which uses the normal distribution as a kernel

Notes
[1] The designation  “bell curve” is ambiguous: there are many other distributions in probability theory which can be recognized as  “bell-shaped”:
the Cauchy distribution, Student’s t-distribution, generalized normal, logistic, etc.
[2] Gale Encyclopedia of Psychology — Normal Distribution (http:/ / findarticles. com/ p/ articles/ mi_g2699/ is_0002/ ai_2699000241)
[3] Halperin & et al. (1965, item 7)
[4] McPherson (1990) page 110
[5] Bernardo & Smith (2000)
[6] Patel & Read (1996, [2.1.8])
[7] Scott, Clayton; Robert Nowak (August 7, 2003). "The Q-function" (http:/ / cnx. org/ content/ m11537/ 1. 2/ ). Connexions. .
[8] Barak, Ohad (April 6, 2006). "Q function and error function" (http:/ / www. eng. tau. ac. il/ ~jo/ academic/ Q. pdf). Tel Aviv University. .
[9] Weisstein, Eric W., " Normal Distribution Function (http:/ / mathworld. wolfram. com/ NormalDistributionFunction. html)" from
MathWorld.
[10] Sanders, Mathijs A.. "Characteristic function of the univariate normal distribution" (http:/ / www. planetmathematics. com/ CharNormal.
pdf). . Retrieved 2009-03-06.
[11] http:/ / www. allisons. org/ ll/ MML/ KL/ Normal/
[12] Amari & Nagaoka (2000)
[13] Mathworld entry for Normal Product Distribution (http:/ / mathworld. wolfram. com/ NormalProductDistribution. html)
[14] Huxley (1932)
[15] Whittaker, E. T.; Robinson, G. (1967). The Calculus of Observations: A Treatise on Numerical Mathematics. New York: Dover. p. 179.
[16] Johnson et al. (1995, Equation (26.48))
[17] Kinderman & Monahan (1976)
[18] http:/ / www. math. sfu. ca/ ~cbm/ aands/ page_932. htm
[19] For example, this algorithm is given in the article Bc programming language.
[20] Johnson et al. (1994, page 85)
[21] Le Cam (2000, p. 74)
[22] De Moivre (1738)
[23] De Moivre first published his findings in 1733, in a pamphlet  “Approximatio ad Summam Terminorum Binomii (a + b)n in Seriem Expansi”
that was designated for private circulation only. But it was not until the year 1738 that he made his results publicly available. The original
pamphlet was reprinted several times, see for example Helen M. Walker (1985).
[24] Stigler (1986, p. 76)
Normal distribution 106

[25] “It has been customary certainly to regard as an axiom the hypothesis that if any quantity has been determined by several direct observations,
made under the same circumstances and with equal care, the arithmetical mean of the observed values affords the most probable value, if not
rigorously, yet very nearly at least, so that it is always most safe to adhere to it.” — Gauss (1809, section 177)
[26] Gauss (1809, section 177)
[27] Gauss (1809, section 179)
[28] “My custom of terming the curve the Gauss–Laplacian or normal curve saves us from proportioning the merit of discovery between the two
great astronomer mathematicians.” quote from Pearson (1905, p. 189)
[29] Laplace (1774, Problem III)
[30] Pearson (1905, p. 189)
[31] Stigler (1986, p. 144)
[32] Stigler (1978, p. 243)
[33] Stigler (1978, p. 244)
[34] Maxwell (1860), p. 23
[35] Such use is encountered in the works of Peirce, Galton and Lexis approximately around 1875.
[36] Kruskal & Stigler (1997)
[37] "Earliest uses… (entry STANDARD NORMAL CURVE)" (http:/ / jeff560. tripod. com/ s. html). .

Literature
• Aldrich, John; Miller, Jeff. "Earliest uses of symbols in probability and statistics" (http://jeff560.tripod.com/
stat.html).
• Aldrich, John; Miller, Jeff. "Earliest known uses of some of the words of mathematics" (http://jeff560.tripod.
com/mathword.html). In particular, the entries for “bell-shaped and bell curve” (http://jeff560.tripod.com/b.
html), “normal (distribution)” (http://jeff560.tripod.com/n.html), “Gaussian” (http://jeff560.tripod.com/g.
html), and “Error, law of error, theory of errors, etc.” (http://jeff560.tripod.com/e.html).
• Amari, Shun-ichi; Nagaoka, Hiroshi (2000). Methods of information geometry. Oxford University Press.
ISBN 0-8218-0531-2.
• Bernardo, J. M.; Smith, A.F.M. (2000). Bayesian Theory. Wiley. ISBN 0-471-49464-X.
• de Moivre, Abraham (1738). The Doctrine of Chances. ISBN 0821821032.
• Gavss, Carolo Friderico (1809) (in Latin). Theoria motvs corporvm coelestivm in sectionibvs conicis Solem
ambientivm [Theory of the motion of the heavenly bodies moving about the Sun in conic sections]. English
translation (http://books.google.com/books?id=1TIAAAAAQAAJ).
• Gould, Stephen Jay (1981). The mismeasure of man (first ed.). W.W. Norton. ISBN 0-393-01489-4.
• Halperin, Max; Hartley, H. O.; Hoel, P. G. (1965). "Recommended standards for statistical symbols and notation.
COPSS committee on symbols and notation" (http://jstor.org/stable/2681417). The American Statistician 19
(3): 12–14. doi:10.2307/2681417.
• Hart, John F.; et al (1968). Computer approximations. New York: John Wiley & Sons, Inc. ISBN 0882756427.
• Herrnstein, C.; Murray (1994). The bell curve: intelligence and class structure in American life. Free Press.
ISBN 0-02-914673-9.
• Huxley, Julian S. (1932). Problems of relative growth. London. ISBN 0486611140. OCLC 476909537.
• Johnson, N.L.; Kotz, S.; Balakrishnan, N. (1994). Continuous univariate distributions, Volume 1. Wiley.
ISBN 0-471-58495-9.
• Johnson, N.L.; Kotz, S.; Balakrishnan, N. (1994). Continuous univariate distributions, Volume 2. Wiley.
ISBN 0-471-58494-0.
• Kruskal, William H.; Stigler, Stephen M. (1997). Normative terminology: ‘normal’ in statistics and elsewhere.
Statistics and public policy, edited by Bruce D. Spencer. Oxford University Press. ISBN 0-19-852341-6.
• la Place, M. de (1774). "Mémoire sur la probabilité des causes par les évènemens". Mémoires de Mathématique et
de Physique, Presentés à l’Académie Royale des Sciences, par divers Savans & lûs dans ses Assemblées, Tome
Sixième: 621–656. Translated by S.M.Stigler in Statistical Science 1 (3), 1986: JSTOR 2245476.
• Laplace, Pierre-Simon (1812). Analytical theory of probabilities.
Normal distribution 107

• McPherson, G. (1990). Statistics in scientific investigation: its basis, application and interpretation.
Springer-Verlag. ISBN 0-387-97137-8.
• Marsaglia, George; Tsang, Wai Wan (2000). "The ziggurat method for generating random variables" (http://
www.jstatsoft.org/v05/i08/paper). Journal of Statistical Software 5 (8).
• Marsaglia, George (2004). "Evaluating the normal distribution" (http://www.jstatsoft.org/v11/i05/paper).
Journal of Statistical Software 11 (4).
• Maxwell, James Clerk (1860). "V. Illustrations of the dynamical theory of gases. — Part I: On the motions and
collisions of perfectly elastic spheres". Philosophical Magazine, series 4 19 (124): 19–32.
doi:10.1080/14786446008642818 (inactive 2010-09-14).
• Patel, Jagdish K.; Read, Campbell B. (1996). Handbook of the normal distribution. ISBN 0824715411.
• Pearson, Karl (1905). "‘Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson’. A
rejoinder". Biometrika 4: 169–212. JSTOR 2331536.
• Pearson, Karl (1920). "Notes on the history of correlation". Biometrika 13 (1): 25–45.
doi:10.1093/biomet/13.1.25. JSTOR 2331722.
• Stigler, Stephen M. (1978). "Mathematical statistics in the early states". The Annals of Statistics 6 (2): 239–265.
doi:10.1214/aos/1176344123. JSTOR 2958876.
• Stigler, Stephen M. (1982). "A modest proposal: a new standard for the normal". The American Statistician 36
(2). JSTOR 2684031.
• Stigler, Stephen M. (1986). The history of statistics: the measurement of uncertainty before 1900. Harvard
University Press. ISBN 0-674-40340-1.
• Stigler, Stephen M. (1999). Statistics on the table. Harvard University Press. ISBN 0674836014.
• Walker, Helen M. (editor) (1985) "De Moivre on the law of normal probability" in: Smith, David Eugene (1985),
A Source Book in Mathematics, Dover. ISBN 0486646904 pages 566–575. (online pdf) (http://www.york.ac.
uk/depts/maths/histstat/demoivre.pdf)
• Weisstein, Eric W. "Normal distribution" (http://mathworld.wolfram.com/NormalDistribution.html).
MathWorld.
• West, Graeme (2009). "Better approximations to cumulative normal functions" (http://www.wilmott.com/pdfs/
090721_west.pdf). Wilmott Magazine: 70–76.
• Zelen, Marvin; Severo, Norman C. (1964). Probability functions (chapter 26) (http://www.math.sfu.ca/~cbm/
aands/page_931.htm). Handbook of mathematical functions with formulas, graphs, and mathematical tables, by
Abramowitz and Stegun: National Bureau of Standards. New York: Dover. ISBN 0-486-61272-4.
Pareto distribution 108

Pareto distribution
Pareto

Probability density function

Pareto probability density functions for various α  with xm = 1. The horizontal axis is the x  parameter. As α → ∞ the
distribution approaches δ(x − xm) where δ is the Dirac delta function.
Cumulative distribution function

Pareto cumulative distribution functions for various α  with xm = 1. The horizontal axis is the x  parameter.
parameters: scale (real)
shape (real)
support:
pdf:

cdf:

mean:

median:
mode:
variance:

skewness:
Pareto distribution 109

ex.kurtosis:

entropy:

mgf:
cf:
Fisher information:

The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution
that coincides with social, scientific, geophysical, actuarial, and many other types of observable phenomena. Outside
the field of economics it is at times referred to as the Bradford distribution.
Pareto originally used this distribution to describe the allocation of wealth among individuals since it seemed to
show rather well the way that a larger portion of the wealth of any society is owned by a smaller percentage of the
people in that society. He also used it to describe distribution of income.[1] This idea is sometimes expressed more
simply as the Pareto principle or the "80-20 rule" which says that 20% of the population controls 80% of the
wealth[2] . The probability density function (PDF) graph on the right shows that the "probability" or fraction of the
population that owns a small amount of wealth per person is rather high, and then decreases steadily as wealth
increases. This distribution is not limited to describing wealth or income, but to many situations in which an
equilibrium is found in the distribution of the "small" to the "large". The following examples are sometimes seen as
approximately Pareto-distributed:
• The sizes of human settlements (few cities, many hamlets/villages)
• File size distribution of Internet traffic which uses the TCP protocol (many smaller files, few larger ones)
• Hard disk drive error rates[3]
• Clusters of Bose–Einstein condensate near absolute zero
• The values of oil reserves in oil fields (a few large fields, many small fields)
• The length distribution in jobs assigned supercomputers (a few large ones, many small ones)
• The standardized price returns on individual stocks
• Sizes of sand particles
• Sizes of meteorites
• Numbers of species per genus (There is subjectivity involved: The tendency to divide a genus into two or more
increases with the number of species in it)
• Areas burnt in forest fires
• Severity of large casualty losses for certain lines of business such as general liability, commercial auto, and
workers compensation.
Pareto distribution 110

Properties

Definition
If X is a random variable with a Pareto distribution, then the probability that X is greater than some number x is given
by

where xm is the (necessarily positive) minimum possible value of X, and α is a positive parameter. The family of
Pareto distributions is parameterized by two quantities, xm and α. When this distribution is used to model the
distribution of wealth, then the parameter α is called the Pareto index.
It follows from the above that therefore the cumulative distribution function of a Pareto random variable with
parameters α and xm is

Density function
It follows (by differentiation) that the probability density function is

Moments and characteristic function


• The expected value of a random variable following a Pareto distribution with α > 1 is

(if α ≤ 1, the expected value does not exist).


• The variance is

(If α ≤ 2, the variance does not exist).


• The raw moments are

but the nth moment exists only for n < α.


• The moment generating function is only defined for non-positive values t ≤ 0 as
• The characteristic function is given by

where Γ(a, x) is the incomplete gamma function.


Pareto distribution 111

Degenerate case
The Dirac delta function is a limiting case of the Pareto density:

Conditional distributions
The conditional probability distribution of a Pareto-distributed random variable, given the event that it is greater than
or equal to a particular number x1 exceeding xm, is a Pareto distribution with the same Pareto index α but with
minimum x1 instead of xm.

Relation to the exponential distribution


The Pareto distribution is related to the exponential distribution as follows. If X is Pareto-distributed with minimum
xm and index α, then

is exponentially distributed with intensity α. Equivalently, if Y is exponentially distributed with intensity α, then

is Pareto-distributed with minimum xm and index α.

A characterization theorem
Suppose Xi, i = 1, 2, 3, ... are independent identically distributed random variables whose probability distribution is
supported on the interval [xm, ∞) for some xm > 0. Suppose that for all n, the two random variables min{ X1, ..., Xn }
and (X1 + ... + Xn)/min{ X1, ..., Xn } are independent. Then the common distribution is a Pareto distribution.

Relation to Zipf's law


Pareto distributions are continuous probability distributions. Zipf's law, also sometimes called the zeta distribution,
may be thought of as a discrete counterpart of the Pareto distribution.

Relation to the "Pareto principle"


The "80-20 law", according to which 20% of all people receive 80% of all income, and 20% of the most affluent
20% receive 80% of that 80%, and so on, holds precisely when the Pareto index is α = log45. Moreover, the
following have been shown[4] to be mathematically equivalent:
• Income is distributed according to a Pareto distribution with index α > 1.
• There is some number 0 ≤ p ≤ 1/2 such that 100p% of all people receive 100(1 − p)% of all income, and similarly
for every real (not necessarily integer) n > 0, 100pn% of all people receive 100(1 − p)n% of all income.
This does not apply only to income, but also to wealth, or to anything else that can be modeled by this distribution.
This excludes Pareto distributions in which 0 < α ≤ 1, which, as noted above, have infinite expected value, and so
cannot reasonably model income distribution.
Pareto distribution 112

Pareto, Lorenz, and Gini


The Lorenz curve is often used to
characterize income and wealth
distributions. For any distribution, the
Lorenz curve L(F) is written in terms
of the PDF ƒ or the CDF F as

Lorenz curves for a number of Pareto distributions. The case α = ∞ corresponds to
perfectly equal distribution (G = 0) and the α = 1 line corresponds to complete inequality
(G = 1)

where x(F) is the inverse of the CDF. For the Pareto distribution,

and the Lorenz curve is calculated to be

where α must be greater than or equal to unity, since the denominator in the expression for L(F) is just the mean
value of x. Examples of the Lorenz curve for a number of Pareto distributions are shown in the graph on the right.
The Gini coefficient is a measure of the deviation of the Lorenz curve from the equidistribution line which is a line
connecting [0, 0] and [1, 1], which is shown in black (α = ∞) in the Lorenz plot on the right. Specifically, the Gini
coefficient is twice the area between the Lorenz curve and the equidistribution line. The Gini coefficient for the
Pareto distribution is then calculated to be

(see Aaberge 2005).


Pareto distribution 113

Parameter estimation
The likelihood function for the Pareto distribution parameters α and xm, given a sample x = (x1, x2, ..., xn), is

Therefore, the logarithmic likelihood function is

It can be seen that is monotonically increasing with , that is, the greater the value of , the greater
the value of the likelihood function. Hence, since , we conclude that

To find the estimator for α, we compute the corresponding partial derivative and determine where it is zero:

Thus the maximum likelihood estimator for α is:

The expected statistical error is:

[5]

Graphical representation
The characteristic curved 'long tail' distribution when plotted on a linear scale, masks the underlying simplicity of the
function when plotted on a log-log graph, which then takes the form of a straight line with negative gradient.

Generating a random sample from Pareto distribution


Random samples can be generated using inverse transform sampling. Given a random variate U drawn from the
uniform distribution on the unit interval (0, 1), the variate

is Pareto-distributed.

Bounded Pareto distribution


Pareto distribution 114

Bounded Pareto

parameters: location (real)


location (real)
shape (real)
support:
pdf:

cdf:

mean:

median:

mode:
variance:

skewness:
ex.kurtosis:
entropy:
mgf:
cf:

The bounded Pareto distribution has three parameters α, L and H. As in the standard Pareto distribution α
determines the shape. L denotes the minimal value, and H denotes the maximal value. (The Variance in the table on
the right should be interpreted as 2nd Moment).
The probability density function is

where L ≤ x ≤ H, and α > 0.

Generating bounded Pareto random variables


If U is uniformly distributed on (0, 1), then

is bounded Pareto-distributed[6]

Generalized Pareto distribution


Pareto distribution 115

Generalized Pareto

parameters: location
(real)
scale (real)
shape (real)
support:

pdf:

where

cdf:
mean:

median:

mode:
variance:

skewness:
ex.kurtosis:
entropy:
mgf:
cf:

The family of generalized Pareto distributions (GPD) has three parameters and .
The cumulative distribution function is

for , and when , where is the location parameter, the scale


parameter and the shape parameter. Note that some references give the "shape parameter" as .
The probability density function is:

or

again, for , and when .


Pareto distribution 116

Generating generalized Pareto random variables


If U is uniformly distributed on (0, 1], then

In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.

See also
• Pareto analysis
• Pareto efficiency
• Pareto interpolation
• Pareto principle
• The Long Tail
• Traffic generation model

References
• Lorenz, M. O. (1905). Methods of measuring the concentration of wealth. Publications of the American Statistical
Association. 9: 209–219.

External links
• The Pareto, Zipf and other power laws / William J. Reed – PDF [7]
• Gini's Nuclear Family / Rolf Aabergé. – In: International Conference to Honor Two Eminent Social Scientists [8],
May, 2005 – PDF [9]

References
[1] Pareto, Vilfredo, Cours d’Économie Politique: Nouvelle édition par G.-H. Bousquet et G. Busino, Librairie Droz, Geneva, 1964, pages
299–345.
[2] For a two-quantile population, where 18% of the population owns 82% of the wealth, the Theil index takes the value 1.
[3] Schroeder, Bianca; Damouras, Sotirios; Gill, Phillipa (2010-02-24), "Understanding latent sector error and how to protect against them"
(http:/ / www. usenix. org/ event/ fast10/ tech/ full_papers/ schroeder. pdf), 8th Usenix Conference on File and Storage Technologies (FAST
2010), , retrieved 2010-09-10, "We experimented with 5 different distributions (Geometric,Weibull, Rayleigh, Pareto, and Lognormal), that
are commonly used in the context of system reliability, and evaluated their fit through the total squared differences between the actual and
hypothesized frequencies (χ² statistic). We found consistently across all models that the geometric distribution is a poor fit, while the Pareto
distribution provides the best fit."
[4] Michael Hardy (2010) "Pareto's Law", Mathematical Intelligencer, 32 (3), 38–43. doi: 10.1007/s00283-010-9159-2
[5] Arxiv.org (http:/ / arxiv. org/ abs/ cond-mat/ 0412004v3)
[6] USF.edu (http:/ / www. csee. usf. edu/ ~christen/ tools/ syntraf1. c)
[7] http:/ / linkage. rockefeller. edu/ wli/ zipf/ reed01_el. pdf
[8] http:/ / www. unisi. it/ eventi/ GiniLorenz05/
[9] http:/ / www. unisi. it/ eventi/ GiniLorenz05/ 25%20may%20paper/ PAPER_Aaberge. pdf
Student's t-distribution 117

Student's t-distribution
Student's t

Probability density function

Cumulative distribution function

parameters: degrees of freedom (real)

support:
pdf:

cdf:

where 2F1 is the hypergeometric function


mean: , otherwise undefined
median:
mode:
variance: , for , otherwise
undefined
skewness:
ex.kurtosis:

entropy:

• : digamma function,
• : beta function
Student's t-distribution 118

mgf: (Not defined)


cf:

[1]
• : Bessel function

In probability and statistics, Student's t-distribution (or simply the t-distribution) is a continuous probability
distribution that arises in the problem of estimating the mean of a normally distributed population when the sample
size is small. It is the basis of the popular Student's t-tests for the statistical significance of the difference between
two sample means, and for confidence intervals for the difference between two population means. The Student's
t-distribution also arises in the Bayesian analysis of data from a normal family. The Student's t-distribution is a
special case of the generalised hyperbolic distribution.
In statistics, the t-distribution was first derived as a posterior distribution by Helmert and Lüroth.[2] [3] [4] In the
English literature, a derivation of the t-distribution was published in 1908 by William Sealy Gosset[5] while he
worked at the Guinness Brewery in Dublin. Due to proprietary issues, the paper was written under the pseudonym
Student. The t-test and the associated theory became well-known through the work of R.A. Fisher, who called the
distribution "Student's distribution".[6]
Student's distribution arises when (as in nearly all practical statistical work) the population standard deviation is
unknown and has to be estimated from the data. Quite often, however, textbook problems will treat the population
standard deviation as if it were known and thereby avoid the need to use the Student's t-test. These problems are
generally of two kinds: (1) those in which the sample size is so large that one may treat a data-based estimate of the
variance as if it were certain, and (2) those that illustrate mathematical reasoning, in which the problem of estimating
the standard deviation is temporarily ignored because that is not the point that the author or instructor is then
explaining.

Etymology
The "Student's" distribution was actually published in 1908 by William Sealy Gosset. Gosset, however, was
employed at a brewery that forbade members of its staff publishing scientific papers due to an earlier paper
containing trade secrets. To circumvent this restriction, Gosset used the name "Student", and consequently the
distribution was named "Student's t-distribution".[7]

Characterization
Student's t-distribution is the probability distribution of the ratio[8]

where
• Z is normally distributed with expected value 0 and variance 1;
• V has a chi-square distribution with degrees of freedom;
• Z and V are independent.

While, for any given constant μ, is a random variable of noncentral t-distribution with noncentrality

parameter μ.
Student's t-distribution 119

Probability density function


Student's t-distribution has the probability density function

where is the number of degrees of freedom and is the Gamma function.


For even,

For odd,

The overall shape of the probability density function of the t-distribution resembles the bell shape of a normally
distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As the number of degrees of
freedom grows, the t-distribution approaches the normal distribution with mean 0 and variance 1.
The following images show the density of the t-distribution for increasing values of . The normal distribution is
shown as a blue line for comparison. Note that the t-distribution (red line) becomes closer to the normal distribution
as increases.

Density of the t-distribution (red) for 1, 2, 3, 5, 10, and 30 df compared to normal


distribution (blue). Previous plots shown in green.

1 degree of freedom 2 degrees of freedom 3 degrees of freedom


Student's t-distribution 120

5 degrees of freedom 10 degrees of freedom 30 degrees of freedom

Derivation
Suppose X1, ..., Xn are independent values that are normally distributed with expected value μ and variance σ2. Let

be the sample mean, and

be the sample variance. It can be shown that the random variable

has a chi-square distribution with n − 1 degrees of freedom (by Cochran's theorem). It is readily shown that the
quantity

is normally distributed with mean 0 and variance 1, since the sample mean is normally distributed with mean
and standard error . Moreover, it is possible to show that these two random variables—the normally
distributed one and the chi-square-distributed one—are independent. Consequently the pivotal quantity,

which differs from Z in that the exact standard deviation is replaced by the random variable , has a Student's
2
t-distribution as defined above. Notice that the unknown population variance σ does not appear in T, since it was in
both the numerator and the denominators, so it canceled. Gosset's work showed that T has the probability density
function

with equal to n − 1.


This may also be written as
Student's t-distribution 121

where B is the Beta function.


The distribution of T is now called the t-distribution. The parameter is called the number of degrees of freedom.
The distribution depends on , but not μ or σ; the lack of dependence on μ and σ is what makes the t-distribution
important in both theory and practice.
Gosset's result can be stated more generally. (See, for example, Hogg and Craig, Sections 4.4 and 4.8.) Let Z have a
normal distribution with mean 0 and variance 1. Let V have a chi-square distribution with degrees of freedom.
Further suppose that Z and V are independent (see Cochran's theorem). Then the ratio

has a t-distribution with degrees of freedom.

Cumulative distribution function


The cumulative distribution function is given by the regularized incomplete beta function,

with

Properties

Moments
The moments of the t-distribution are

It should be noted that the term for 0 < k <  , k even, may be simplified using the properties of the Gamma
function to

For a t-distribution with degrees of freedom, the expected value is 0, and its variance is /(  − 2) if  > 2.
The skewness is 0 if  > 3 and the excess kurtosis is 6/(  − 4) if  > 4.
Student's t-distribution 122

Confidence intervals
Suppose the number A is so chosen that

when T has a t-distribution with n − 1 degrees of freedom. By symmetry, this is the same as saying that A satisfies

so A is the "95th percentile" of this probability distribution, or . Then

and this is equivalent to

Therefore the interval whose endpoints are

is a 90-percent confidence interval for μ. Therefore, if we find the mean of a set of observations that we can
reasonably expect to have a normal distribution, we can use the t-distribution to examine whether the confidence
limits on that mean include some theoretically predicted value - such as the value predicted on a null hypothesis.
It is this result that is used in the Student's t-tests: since the difference between the means of samples from two
normal distributions is itself distributed normally, the t-distribution can be used to examine whether that difference
can reasonably be supposed to be zero.
If the data are normally distributed, the one-sided (1 − a)-upper confidence limit (UCL) of the mean, can be
calculated using the following equation:

The resulting UCL will be the greatest average value that will occur for a given confidence interval and population
size. In other words, being the mean of the set of observations, the probability that the mean of the distribution
is inferior to UCL1−a is equal to the confidence level 1 − a.
A number of other statistics can be shown to have t-distributions for samples of moderate size under null hypotheses
that are of interest, so that the t-distribution forms the basis for significance tests in other situations as well as when
examining the differences between means. For example, the distribution of Spearman's rank correlation coefficient ρ,
in the null case (zero correlation) is well approximated by the t distribution for sample sizes above about 20.

Prediction interval
The t-distribution can be used to construct a prediction interval for an unobserved sample from a normal distribution
with unknown mean and variance.

Monte Carlo sampling


There are various approaches to constructing random samples from the Student distribution. The matter depends on
whether the samples are required on a stand-alone basis, or are to be constructed by application of a quantile function
to uniform samples, e.g. in multi-dimensional applications basis on copula-dependency. In the case of stand-alone
sampling, Bailey's 1994 extension of the Box-Muller method and its polar variation are easily deployed. It has the
merit that it applies equally well to all real positive and negative degrees of freedom.
Student's t-distribution 123

Integral of Student's probability density function and p-value


The function is the integral of Student's probability density function, ƒ(t) between −t and t. It thus gives the
probability that a value of t less than that calculated from observed data would occur by chance. Therefore, the
function can be used when testing whether the difference between the means of two sets of data is statistically
significant, by calculating the corresponding value of t and the probability of its occurrence if the two sets of data
were drawn from the same population. This is used in a variety of situations, particularly in t-tests. For the statistic t,
with degrees of freedom, is the probability that t would be less than the observed value if the two means
were the same (provided that the smaller mean is subtracted from the larger, so that t > 0). It is defined for real t by
the following formula:
where B is the Beta function. For t > 0, there is a relation to the regularized incomplete beta function Ix(a, b) as
follows:

For statistical hypothesis testing this function is used to construct the p-value.

Three-parameter version
A generalization of the one-parameter Student's t distribution described above, also known as the "Student's t
distribution", is a three-parameter version that introduces a location parameter and an inverse scale parameter (i.e.
precision) , and has a density defined by
Other properties of this version of the distribution are:

This distribution results from compounding a Gaussian distribution with mean and unknown precision (the
reciprical of the variance), with a gamma distribution with parameters and . In other words,
the random variable X is assumed to have a normal distribution with an unknown precision distributed as gamma,
and then this is marginalized over the gamma distribution.

Related distributions
• has a t-distribution if has a scaled inverse-χ2 distribution and
has a normal distribution.
• has an F-distribution if and has a Student's t-distribution.
• has a normal distribution as where .
• has a Cauchy distribution if .
Student's t-distribution 124

Special cases
Certain values of give an especially simple form.

ν=1
Distribution function:

Density function:

See Cauchy distribution

ν=2
Distribution function:

Density function:

Occurrences

Hypothesis testing
Confidence intervals and hypothesis tests rely on Student's t-distribution to cope with uncertainty resulting from
estimating the standard deviation from a sample, whereas if the population standard deviation were known, a normal
distribution would be used.

Robust parametric modeling


The t-distribution is often used as an alternative to the normal distribution as a model for data.[9] It is frequently the
case that real data have heavier tails than the normal distribution allows for. The classical approach was to identify
outliers and exclude or downweight them in some way. However, it is not always easy to identify outliers (especially
in high dimensions), and the t-distribution is a natural choice of model for such data and provides a parametric
approach to robust statistics.
Lange et al. explored the use of the t-distribution for robust modeling of heavy tailed data in a variety of contexts. A
Bayesian account can be found in Gelman et al. The degrees of freedom parameter controls the kurtosis of the
distribution and is correlated with the scale parameter. The likelihood can have multiple local maxima and, as such,
it is often necessary to fix the degrees of freedom at a fairly low value and estimate the other parameters taking this
as given. Some authors report that values between 3 and 9 are often good choices. Venables and Ripley suggest that
a value of 5 is often a good choice.
Student's t-distribution 125

Table of selected values


Most statistical textbooks list t distribution tables. Nowadays, the better way to a fully precise critical t value or a
cumulative probability is the statistical function implemented in spreadsheets (Office Excel, OpenOffice Calc, etc.),
or an interactive calculating web page. The relevant spreadsheet functions are TDIST and TINV, while online
calculating pages save troubles like positions of parameters or names of functions. For example, a Mediawiki page
supported by R extension can easily give the interactive result [10] of critical values or cumulative probability, even
for noncentral t-distribution.
The following table lists a few selected values for t-distributions with degrees of freedom for a range of one-sided
or two-sided critical regions. For an example of how to read this table, take the fourth row, which begins with 4; that
means , the number of degrees of freedom, is 4 (and if we are dealing, as above, with n values with a fixed sum, n
= 5). Take the fifth entry, in the column headed 95% for one-sided (90% for two-sided). The value of that entry is
"2.132". Then the probability that T is less than 2.132 is 95% or Pr(−∞ < T < 2.132) = 0.95; or mean that
Pr(−2.132 < T < 2.132) = 0.9.
This can be calculated by the symmetry of the distribution,
Pr(T < −2.132) = 1 − Pr(T > −2.132) = 1 − 0.95 = 0.05,
and so
Pr(−2.132 < T < 2.132) = 1 − 2(0.05) = 0.9.
Note that the last row also gives critical points: a t-distribution with infinitely-many degrees of freedom is a normal
distribution. (See above: Related distributions).

One Sided 75% 80% 85% 90% 95% 97.5% 99% 99.5% 99.75% 99.9% 99.95%

Two 50% 60% 70% 80% 90% 95% 98% 99% 99.5% 99.8% 99.9%
Sided

1 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6

2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60

3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92

4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610

5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869

6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959

7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408

8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041

9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781

10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587

11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437

12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318

13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221

14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140

15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073

16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015

17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965

18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922

19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
Student's t-distribution 126

20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850

21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819

22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792

23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.767

24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745

25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725

26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707

27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690

28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674

29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.659

30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646

40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551

50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496

60 0.679 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460

80 0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416

100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390

120 0.677 0.845 1.041 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373

0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291

The number at the beginning of each row in the table above is which has been defined above as n − 1. The
percentage along the top is 100%(1 − α). The numbers in the main body of the table are tα, . If a quantity T is
distributed as a Student's t distribution with degrees of freedom, then there is a probability 1 − α that T will be less
than tα, .(Calculated as for a one-tailed or one-sided test as opposed to a two-tailed test.)
For example, given a sample with a sample variance 2 and sample mean of 10, taken from a sample set of 11 (10
degrees of freedom), using the formula

We can determine that at 90% confidence, we have a true mean lying below

(In other words, on average, 90% of the times that an upper threshold is calculated by this method, the true mean lies
below this upper threshold.) And, still at 90% confidence, we have a true mean lying over

(In other words, on average, 90% of the times that a lower threshold is calculated by this method, the true mean lies
above this lower threshold.) So that at 80% confidence, we have a true mean lying within the interval
This is generally expressed in interval notation, e.g., for this case, at 80% confidence the true mean is within the
interval [9.41490, 10.58510].
(In other words, on average, 80% of the times that upper and lower thresholds are calculated by this method, the true
mean is both below the upper threshold and above the lower threshold. This is not the same thing as saying that there
is an 80% probability that the true mean lies between a particular pair of upper and lower thresholds that have been
calculated by this method—see confidence interval and prosecutor's fallacy.)
Student's t-distribution 127

For information on the inverse cumulative distribution function see Quantile function.

See also
• Student's t-statistic
• F-distribution
• Gamma function
• Hotelling's T-square distribution
• Noncentral t-distribution
• Multivariate Student distribution
• Confidence interval
• Variance

References
• Helmert, F. R. (1875). Über die Bestimmung des wahrscheinlichen Fehlers aus einer endlichen Anzahl wahrer
Beobachtungsfehler. Z. Math. Phys. 20, 300-3.
• Helmert, F. R. (1876a). Über die Wahrscheinlichkeit der Potenzsummen der Beobachtungsfehler und uber einige
damit in Zusammenhang stehende Fragen. Z. Math. Phys. 21, 192-218.
• Helmert, F. R. (1876b). Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen
Beobachtungsfehlers directer Beobachtungen gleicher Genauigkeit Astron. Nachr. 88, 113-32.
• Senn, S. & Richardson, W. (1994). The first t-test. Statist. Med. 13, 785-803.
• Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26" [11], Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables, New York: Dover, pp. 948, MR0167642, ISBN 978-0486612720.
• R.V. Hogg and A.T. Craig (1978). Introduction to Mathematical Statistics. New York: Macmillan.
• Press, William H.; Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery (1992). Numerical Recipes in C:
The Art of Scientific Computing [12]. Cambridge University Press. pp. pp. 228–229 [13]. ISBN 0-521-43108-5.
• Bailey, R. W. (1994). Polar generation of random variates with the t-distribution. Mathematics of Computation
62(206), 779–781.
• W.N. Venables and B.D. Ripley, Modern Applied Statistics with S (Fourth Edition), Springer, 2002
• Gelman, Andrew; John B. Carlin, Hal S. Stern, Donald B. Rubin (2003). Bayesian Data Analysis (Second
Edition) [14]. CRC/Chapman & Hall. ISBN 1-584-88388-X.

External links
• Earliest Known Uses of Some of the Words of Mathematics (S) [15] (Remarks on the history of the term "Student's
distribution")

References
[1] Hurst, Simon, The Characteristic Function of the Student-t Distribution (http:/ / wwwmaths. anu. edu. au/ research. reports/ srr/ 95/ 044/ ),
Financial Mathematics Research Report No. FMRR006-95, Statistics Research Report No. SRR044-95
[2] Lüroth, J (1876). "Vergleichung von zwei Werten des wahrscheinlichen Fehlers". Astron. Nachr. 87: 209–20.
doi:10.1002/asna.18760871402.
[3] Pfanzagl, J.; Sheynin, O. (1996). "A forerunner of the t-distribution (Studies in the history of probability and statistics XLIV)" (http:/ /
biomet. oxfordjournals. org/ cgi/ content/ abstract/ 83/ 4/ 891). Biometrika 83 (4): 891–898. doi:10.1093/biomet/83.4.891. MR1766040. .
[4] Sheynin, O (1995). "Helmert's work in the theory of errors". Arch. Hist. Ex. Sci. 49: 73–104. doi:10.1007/BF00374700.
[5] Student [William Sealy Gosset] (March 1908). "The probable error of a mean" (http:/ / www. york. ac. uk/ depts/ maths/ histstat/ student.
pdf). Biometrika 6 (1): 1–25. doi:10.1093/biomet/6.1.1. .
[6] Fisher, R. A. (1925). "Applications of "Student's" distribution" (http:/ / digital. library. adelaide. edu. au/ coll/ special/ fisher/ 43. pdf). Metron
5: 90–104. .
Student's t-distribution 128

[7] Walpole, Ronald; Myers, Raymond; Ye, Keying. Probability and Statistics for Engineers and Scientists. Pearson Education, 2002, 7th
edition, pg. 237
[8] Johnson, N.L., Kotz, S., Balakrishnan, N. (1995) Continuous Univariate Distributions, Volume 2, 2nd Edition. Wiley, ISBN 0-471-58494-0
(Chapter 28)
[9] Lange, Kenneth L.; Little, Roderick J.A.; Taylor, Jeremy M.G. (1989). "Robust statistical modeling using the t-distribution" (http:/ / www.
jstor. org/ stable/ 2290063). JASA 84 (408): 881–896. .
[10] http:/ / mars. wiwi. hu-berlin. de/ mediawiki/ slides/ index. php/ Comparison_of_noncentral_and_central_distributions
[11] http:/ / www. math. sfu. ca/ ~cbm/ aands/ page_948. htm
[12] http:/ / www. nr. com/
[13] http:/ / www. nrbook. com/ a/ bookcpdf/ c6–4. pdf
[14] http:/ / www. stat. columbia. edu/ ~gelman/ book/
[15] http:/ / jeff560. tripod. com/ s. html
Uniform distribution (continuous) 129

Uniform distribution (continuous)


Uniform

Probability density function

Using maximum convention


Cumulative distribution function

parameters:
support:
pdf:

cdf:

mean:
median:
mode: any value in
variance:
skewness: 0
ex.kurtosis:
entropy:
mgf:

cf:

In probability theory and statistics, the continuous uniform distribution is a family of probability distributions such
that for each member of the family, all intervals of the same length on the distribution's support are equally probable.
The support is defined by the two parameters, a and b, which are its minimum and maximum values. The
distribution is often abbreviated U(a,b).
Uniform distribution (continuous) 130

Characterization

Probability density function


The probability density function of the continuous uniform distribution is:

The values at the two boundaries a and b are usually unimportant because they do not alter the values of the integrals
of f(x) dx over any interval, nor of x f(x) dx or any higher moment. Sometimes they are chosen to be zero, and
sometimes chosen to be 1/(b − a). The latter is appropriate in the context of estimation by the method of maximum
likelihood. In the context of Fourier analysis, one may take the value of f(a) or f(b) to be 1/(2(b − a)), since then the
inverse transform of many integral transforms of this uniform function will yield back the function itself, rather than
a function which is equal "almost everywhere", i.e. except on a set of points with zero measure. Also, it is consistent
with the sign function which has no such ambiguity.
In terms of mean μ and variance σ2, the probability density may be written as:

Cumulative distribution function


The cumulative distribution function is:

Its inverse is:

In mean and variance notation, the cumulative distribution function is:

and the inverse is:

Generating functions

Moment-generating function
The moment-generating function is

from which we may calculate the raw moments m k


Uniform distribution (continuous) 131

For a random variable following this distribution, the expected value is then m1 = (a + b)/2 and the variance is
m2 − m12 = (b − a)2/12.

Cumulant-generating function
For n ≥ 2, the nth cumulant of the uniform distribution on the interval [0, 1] is bn/n, where bn is the nth Bernoulli
number.

Properties

Generalization to Borel sets


This distribution can be generalized to more complicated sets than intervals. If S is a Borel set of positive, finite
measure, the uniform probability distribution on S can be specified by defining the pdf to be zero outside S and
constantly equal to 1/K on S, where K is the Lebesgue measure of S.

Order statistics
Let X1, ..., Xn be an i.i.d. sample from U(0,1). Let X(k) be the kth order statistic from this sample. Then the probability
distribution of X(k) is a Beta distribution with parameters k and n − k + 1. The expected value is

This fact is useful when making Q-Q plots.


The variances are

Uniformity
The probability that a uniformly distributed random variable falls within any interval of fixed length is independent
of the location of the interval itself (but it is dependent on the interval size), so long as the interval is contained in the
distribution's support.
To see this, if X ≈ U(0,b) and [x, x+d] is a subinterval of [0,b] with fixed d > 0, then

which is independent of x. This fact motivates the distribution's name.


Uniform distribution (continuous) 132

Standard uniform
Restricting and , the resulting distribution U(0,1) is called a standard uniform distribution.
One interesting property of the standard uniform distribution is that if u1 has a standard uniform distribution, then so
does 1-u1. This property can be used for generating antithetic variates, among other things.

Related distributions
• If X has a standard uniform distribution, then by the inverse transform sampling method, Y = − ln(X) / λ has an
exponential distribution with (rate) parameter λ.
• Y = 1 − X1/n has a beta distribution with parameters 1 and n. (Note this implies that the standard uniform
distribution is a special case of the beta distribution, with parameters 1 and 1.)
• The Irwin–Hall distribution is the sum of n i.i.d. U(0,1) distributions.
• The sum of two independent, equally distributed, uniform distributions yields a symmetric triangular distribution.

Relationship to other functions


As long as the same conventions are followed at the transition points, the probability density function may also be
expressed in terms of the Heaviside step function:

or in terms of the rectangle function

There is no ambiguity at the transition point of the sign function. Using the half-maximum convention at the
transition points, the uniform distribution may be expressed in terms of the sign function as:

Applications
In statistics, when a p-value is used as a test statistic for a simple null hypothesis, and the distribution of the test
statistic is continuous, then the test statistic (p-value) is uniformly distributed between 0 and 1 if the null hypothesis
is true.

Sampling from a uniform distribution


There are many applications in which it is useful to run simulation experiments. Many programming languages have
the ability to generate pseudo-random numbers which are effectively distributed according to the standard uniform
distribution.
If u is a value sampled from the standard uniform distribution, then the value a + (b − a)u follows the uniform
distribution parametrised by a and b, as described above.
Uniform distribution (continuous) 133

Sampling from an arbitrary distribution


The uniform distribution is useful for sampling from arbitrary distributions. A general method is the inverse
transform sampling method, which uses the cumulative distribution function (CDF) of the target random variable.
This method is very useful in theoretical work. Since simulations using this method require inverting the CDF of the
target variable, alternative methods have been devised for the cases where the cdf is not known in closed form. One
such method is rejection sampling.
The normal distribution is an important example where the inverse transform method is not efficient. However, there
is an exact method, the Box-Muller transformation, which uses the inverse transform to convert two independent
uniform random variables into two independent normally distributed random variables.

Estimation

Estimation of maximum
Given a uniform distribution on [0, N] with unknown N, the UMVU estimator for the maximum is given by

where m is the sample maximum and k is the sample size, sampling without replacement (though this distinction
almost surely makes no difference for a continuous distribution). This follows for the same reasons as estimation for
the discrete distribution, and can be seen as a very simple case of maximum spacing estimation. This problem is
commonly known as the German tank problem, due to application of maximum estimation to estimates of German
tank production during World War II.

Estimation of midpoint
The midpoint of the distribution (a + b) / 2 is both the mean and the median of the uniform distribution. Although
both the sample mean and the sample median are unbiased estimators of the midpoint, neither is as efficient as the
sample mid-range, i.e. the arithmetic mean of the sample maximum and the sample minimum, which is the UMVU
estimator of the midpoint (and also the maximum likelihood estimate).

See also
• Beta distribution
• Box-Muller transform
• Probability plot
• Q-Q plot
• Random number
• Uniform distribution (discrete)
Weibull distribution 134

Weibull distribution
Weibull (2-Parameter)

Probability density function

Cumulative distribution function

parameters: scale (real)


shape (real)
support:
pdf:

cdf:
mean:

median:
mode:
if

variance:

skewness:

ex.kurtosis: (see text)


entropy:

mgf:

cf:
Weibull distribution 135

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named
after Waloddi Weibull who described it in detail in 1951, although it was first identified by Fréchet (1927) and first
applied by Rosin & Rammler (1933) to describe the size distribution of particles.

Definition
The probability density function of a Weibull random variable X is[1] :

where k > 0 is the shape parameter and λ >0 is the scale parameter of the distribution. Its complementary
cumulative distribution function is a stretched exponential function. The Weibull distribution is related to a number
of other probability distributions; in particular, it interpolates between the exponential distribution (k = 1) and the
Rayleigh distribution (k = 2).
If the quantity X is a "time-to-failure", the Weibull distribution gives a distribution for which the failure rate is
proportional to a power of time. The shape parameter, k, is that power plus one, and so this parameter can be
interpreted directly as follows:
• A value of k<1 indicates that the failure rate decreases over time. This happens if there is significant "infant
mortality", or defective items failing early and the failure rate decreasing over time as the defective items are
weeded out of the population.
• A value of k=1 indicates that the failure rate is constant over time. This might suggest random external events are
causing mortality, or failure.
• A value of k>1 indicates that the failure rate increases with time. This happens if there is an "aging" process, or
parts that are more likely to fail as time goes on.
In the field of materials science, the shape parameter k of a distribution of strengths is known as the Weibull
modulus.

Properties
The cumulative distribution function for the Weibull distribution is

for x ≥ 0, and F(x; k; λ) = 0 for x < 0.


The failure rate h (or hazard rate) is given by

Moments
The moment generating function of the logarithm of a Weibull distributed random variable is given by[2]

where Γ is the gamma function. Similarly, the characteristic function of log X is given by

In particular, the nth raw moment of X is given by:


Weibull distribution 136

The mean and variance of a Weibull random variable can be expressed as:

and

The skewness is given by:

The excess kurtosis is given by:

where . The kurtosis excess may also be written as :

Moment generating function


A variety of expressions are available for the moment generating function of X itself. As a power series, since the
raw moments are already known, one has

Alternatively, one can attempt to deal directly with the integral

If the parameter k is assumed to be a rational number, expressed as k = p/q where p and q are integers, then this
integral can be evaluated analytically.[3] With t replaced by −t, one finds
where G is the Meijer G-function.
The characteristic function has also been obtained by Muraleedharan et al. (2007).
Information entropy
The information entropy is given by

where is the Euler–Mascheroni constant.


Weibull distribution 137

Related distributions
The translated Weibull distribution contains an additional parameter, and is also often found in the literature.[2] It has
the probability density function

for and f(x; k, λ, θ) = 0 for x < θ, where is the shape parameter, is the scale parameter and
is the location parameter of the distribution. When θ=0, this reduces to the 2-parameter distribution.
The Weibull distribution can be characterized as the distribution of a random variable X such that the random
variable

is the standard exponential distribution with intensity 1.[2] The Weibull distribution interpolates between the
exponential distribution with intensity 1/λ when k = 1 and a Rayleigh distribution of mode when k = 2.
The density function of the Weibull distribution changes character radically as k varies between 0 and 3, particularly
in terms of its behaviour near x=0. For k < 1 the density approaches ∞ as x nears zero and the density is J-shaped. For
k = 1 the density has a finite positive value at x=0. For 1<k<2 the density is zero nears zero,has an infinite slope at
x=0 and is unimodal. For k=2 the density has a finite positive slope at x=0. For k>2 the density is zero and has a zero
slope at x=0 and the density is unimodal. As k goes to infinity, the Weibull distribution converges to a Dirac delta
distribution centred at x=λ.
The Weibull distribution can also be characterized in terms of a uniform distribution: if X is uniformly distributed on
(0,1), then the random variable Weibull distributed with parameters k and λ. This leads to an
easily implemented numerical scheme for simulating a Weibull distribution.
The Weibull distribution (usually sufficient in reliability engineering) is a special case of the three parameter
Exponentiated Weibull distribution where the additional exponent equals 1. The Exponentiated Weibull distribution
accommodates unimodal, bathtub shaped*[4] and monotone failure rates.
The Weibull distribution is a special case of the generalized extreme value distribution. It was in this connection that
the distribution was first identified by Maurice Fréchet in 1927. The closely related Fréchet distribution, named for
this work, has the probability density function

The Weibull distribution can also be generalized to the 3 parameter exponentiated Weibull distribution. This models
the situation when the failure rate of a system is due to a combination of factors, and may increase for some times
and decrease for other times (see bathtub curve).

Uses
The Weibull distribution is used
• In survival analysis
• In reliability engineering and failure analysis
• In industrial engineering to represent manufacturing and delivery times
• In extreme value theory
• In weather forecasting
• To describe wind speed distributions, as the natural distribution often matches the Weibull shape
• In communications systems engineering
Weibull distribution 138

• In radar systems to model the dispersion of the received signals level produced by some types of clutters
• To model fading channels in wireless communications, as the Weibull fading model seems to exhibit good fit
to experimental fading channel measurements
• In General insurance to model the size of Reinsurance claims, and the cumulative development of Asbestosis
losses
• In forecasting technological change (also known as the Sharif-Islam model)
The 2-Parameter Weibull distribution is used to describe the particle size distribution of particles generated by
grinding, milling and crushing operations. The Rosin-Rammler distribution predicts fewer fine particles than the
Log-normal distribution. It is generally most accurate for narrow PSDs.
Using the cumulative distribution function:
• F(x; k; λ) is the mass fraction of particles with diameter < x
• λ is the mean particle size
• k is a measure of particle size spread

Bibliography
• Fréchet, Maurice (1927), "Sur la loi de probabilité de l'écart maximum", Annales de la Société Polonaise de
Mathematique, Cracovie 6: 93–116.
• Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1994), Continuous univariate distributions. Vol. 1, Wiley
Series in Probability and Mathematical Statistics: Applied Probability and Statistics (2nd ed.), New York: John
Wiley & Sons, MR1299979, ISBN 978-0-471-58495-7
• Muraleedharan, G.; Rao, A.G.; Kurup, P.G.; Nair, N. Unnikrishnan; Sinha, Mourani (2007), "Coastal
Engineering", Coastal Engineering 54 (8): 630–638, doi:10.1016/j.coastaleng.2007.05.001
• Rosin, P.; Rammler, E. (1933), "The Laws Governing the Fineness of Powdered Coal", Journal of the Institute of
Fuel 7: 29–36.
• Sagias, Nikos C.; Karagiannidis, George K. (2005), "Gaussian class multivariate Weibull distributions: theory and
applications in fading channels", Institute of Electrical and Electronics Engineers. Transactions on Information
Theory 51 (10): 3608–3619, doi:10.1109/TIT.2005.855598, MR2237527, ISSN 0018-9448
• Weibull, W. (1951), "A statistical distribution function of wide applicability", J. Appl. Mech.-Trans. ASME 18
(3): 293–297.
• "Engineering statistics handbook" [5]. National Institute of Standards and Technology. 2008.
• Nelson, Jr, Ralph (2008-02-05). "Dispersing Powders in Liquids, Part 1, Chap 6: Particle Volume Distribution"
[6]
. Retrieved 2008-02-05.

External links
• The Weibull plot. [7]
• Mathpages - Weibull Analysis [8]

References
[1] Papoulis, Pillai, "Probability, Random Variables, and Stochastic Processes, 4th Edition
[2] Johnson, Kotz & Balakrishnan 1994
[3] See (Cheng, Tellambura & Beaulieu 2004) for the case when k is an integer, and (Sagias & Karagiannidis 2005) for the rational case.
[4] "System evolution and reliability of systems" (http:/ / www. sys-ev. com/ reliability01. htm). Sysev (Belgium). 2010-01-01. .
[5] http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda3668. htm
[6] http:/ / www. erpt. org/ 014Q/ nelsa-06. htm
[7] http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ weibplot. htm
[8] http:/ / www. mathpages. com/ home/ kmath122/ kmath122. htm
139

Discrete distributions

Bernoulli distribution
Bernoulli

parameters:

support:
pmf:

cdf:

mean:
median: N/A
mode:

variance:
skewness:

ex.kurtosis:

entropy:
mgf:
cf:

In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jacob Bernoulli, is a
discrete probability distribution, which takes value 1 with success probability and value 0 with failure probability
. So if X is a random variable with this distribution, we have:
   
The probability mass function f of this distribution is

This can also be expressed as


.
The expected value of a Bernoulli random variable X is , and its variance is

The kurtosis goes to infinity for high and low values of p, but for the Bernoulli distribution has a lower
kurtosis than any other probability distribution, namely -2.
Bernoulli distribution 140

The Bernoulli distribution is a member of the exponential family.

Related distributions
• If are independent, identically distributed (i.i.d.) random variables, all Bernoulli distributed with

success probability p, then (binomial distribution). The Bernoulli

distribution is simply .
• The Categorical distribution is the generalization of the Bernoulli distribution for variables with any constant
number of discrete values.
• The Beta distribution is the conjugate prior of the Bernoulli distribution.
• The Geometric distribution is the number of Bernoulli trials needed to get one success.

References
• Weisstein, Eric W., "Bernoulli Distribution [1]" from MathWorld.

See also
• Bernoulli trial
• Bernoulli process
• Bernoulli sampling
• Binary entropy function
• Sample size

References
[1] http:/ / mathworld. wolfram. com/ BernoulliDistribution. html
Beta-binomial distribution 141

Beta-binomial distribution
Probability mass function

Cumulative distribution function

parameters: n ∈ N0 — number of trials


(real)
(real)
support: k ∈ { 0, …, n }
pmf:

cdf:
where 3F2(a,b,k) is the Generalized Hypergeometric
Function
=3F2(1,α+k+1,-n+k+1;k+2,-β-n+k+2;1)
mean:

median:
mode:
variance:

skewness:

ex.kurtosis: See text


entropy:
mgf:

cf:
Beta-binomial distribution 142

In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions
arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or
random. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics as an
overdispersed binomial distribution.
It reduces to the Bernoulli distribution as a special case when n = 1. For α = β = 1, it is the discrete uniform
distribution from 0 to n. It also approximates the binomial distribution arbitrarily well for large α and β. The
beta-binomial is a two-dimensional multivariate Polya distribution, as the binomial and beta distributions are special
cases of the multinomial and Dirichlet distributions, respectively.

Motivation and derivation

Beta-binomial distribution as a compound distribution


The Beta distribution is a conjugate distribution of the binomial distribution. This fact leads to an analytically
tractable compound distribution where one can think of the parameter in the binomial distribution as being
randomly drawn from a beta distribution. Namely, if

is the binomial distribution where p is a random variable with a beta distribution

then the compound distribution is given by

Using the properties of the beta function, this can alternatively be written
It is within this context that the beta-binomial distribution appears often in Bayesian statistics: the beta-binomial is
the predictive distribution of a binomial random variable with a beta distribution prior on the success probability.

Beta-binomial as an urn model


The beta-binomial distribution can also be motivated via an urn model for positive integer values of α and β.
Specifically, imagine an urn containing α red balls and β black balls, where random draws are made. If a red ball is
observed, then two red balls are returned to the urn. Likewise, if a black ball is drawn, it is replaced and another
black ball is added to the urn. If this is repeated n times, then the probability of observing k red balls follows a
beta-binomial distribution with parameters n,α and β.
Note that if the random draws are with simple replacement (no balls over and above the observed ball are added to
the urn), then the distribution follows a binomial distribution and if the random draws are made without replacement,
the distribution follows a hypergeometric distribution.
Beta-binomial distribution 143

Moments and properties


The first three raw moments are

and the kurtosis is

Letting we note, suggestively, that the mean can be written as

and the variance as

where is the correlation between the n Bernoulli draws and is called the over-dispersion parameter.

Point estimates

Method of moments
The method of moments estimates can be gained by noting the first and second moments of the beta-binomial
namely

and setting these raw moments equal to the sample moments

and solving for α and β we get

Note that these estimates can be non-sensically negative which is evidence that the data is either undispersed or
underdispersed relative to the binomial distribution. In this case, the binomial distribution and the hypergeometric
distribution are alternative candidates respectively.

Maximum likelihood estimation


While closed-form maximum likelihood estimates are impractical, given that the pdf consists of common functions
(gamma function and/or Beta functions), they can be easily found via direct numerical optimization. Maximum
likelihood estimates from empirical data can be computed using general methods for fitting multinomial Polya
distributions, methods for which are described in (Minka 2003). The R package VGAM through the function vglm,
via maximimum likelihood, facilitates the fitting of glm type models with responses distributed according to the
beta-binomial distribution. Note also that there is no requirement that n is fixed throughout the observations.
Beta-binomial distribution 144

Example
The following data gives the number of male children among the first 12 children of family size 13 in 6115 families
taken from hospital records in 19th century Saxony (Sokal and Rohlf, p. 59 from Lindsey). The 13th child is ignored
to assuage the effect of families non-randomly stopping when a desired gender is reached.

Males 0 1 2 3 4 5 6 7 8 9 10 11 12

Families 3 24 104 286 670 1033 1343 1112 829 478 181 45 7

We note the first two sample moments are

and therefore the method of moments estimates are

The maximum likelihood estimates can be found numerically

and the maximized log-liklihood is

from which we find the AIC

The AIC for the competing Binomial model is AIC=25070.34 and thus we see that the beta-binomial model provides
a superior fit to the data i.e. there is evidence for overdispersion. Trivers and Willard posit a theoretical justification
for heterogeneity in gender-proneness among families (i.e. overdispersion).
The superior fit is evident especially among the tails

Males 0 1 2 3 4 5 6 7 8 9 10 11 12

Observed Families 3 24 104 286 670 1033 1343 1112 829 478 181 45 7

Predicted (Beta-Binomial) 2.3 22.6 104.8 310.9 655.7 1036.2 1257.9 1182.1 853.6 461.9 177.9 43.8 5.2

Predicted (Binomial p = 0.9 12.1 71.8 258.5 628.1 1085.2 1367.3 1265.6 854.2 410.0 132.8 26.1 2.3
0.519215)

Further Bayesian considerations


It is convenient to reparameterize the distributions so that the expected mean of the prior is a single parameter: Let

where

so that
Beta-binomial distribution 145

The posterior distribution ρ(θ|k) is also a beta distribution:


And

while the marginal distribution m(k|μ, M) is given by


Because the marginal is a complex, non-linear function of Gamma and Digamma functions, it is quite difficult to
obtain a marginal maximum likelihood estimate (MMLE) for the mean and variance. Instead, we use the method of
iterated expectations to find the expected value of the marginal moments.
Let us write our model as a two-stage compound sampling model. Let ki be the number of success out of ni trials for
event i:

We can find iterated moment estimates for the mean and variance using the moments for the distributions in the
two-stage model:

(Here we have used the law of total expectation and the law of total variance.)
We want point estimates for and . The estimated mean is calculated from the sample

The estimate of the hyperparameter M is obtained using the moment estimates for the variance of the two-stage
model:

Solving:

where
Beta-binomial distribution 146

Since we now have parameter point estimates, and , for the underlying distribution, we would like to find a
point estimate for the probability of success for event i. This is the weighted average of the event
estimate and . Given our point estimates for the prior, we may now plug in these values to find a point
estimate for the posterior

Shrinkage factors
We may write the posterior estimate as a weighted average:

where is called the shrinkage factor.

See also
• multivariate Polya distribution

References
* Minka, Thomas P. (2003). Estimating a Dirichlet distribution [1]. Microsoft Technical Report.

External links
• Empirical Bayes for Beta-Binomial model [2]
• Using the Beta-binomial distribution to assess performance of a biometric identification device [3]
• Extended Beta-Binomial Model for Demand Forecasting of Multiple Slow-Moving Items with Low Consumption
and Short Request History [4]
• Fastfit [5] contains Matlab code for fitting Beta-Binomial distributions (in the form of two-dimensional Polya
distributions) to data.

References
[1] http:/ / research. microsoft. com/ ~minka/ papers/ dirichlet/
[2] http:/ / www. cs. ubc. ca/ ~murphyk/ Teaching/ Stat406-Spring07/ reading/ ebHandout. pdf
[3] http:/ / it. stlawu. edu/ ~msch/ biometrics/ papers. htm
[4] http:/ / www. emse. fr/ g2i/ publications/ rapports/ RR_2005-500-012. pdf
[5] http:/ / research. microsoft. com/ ~minka/ software/ fastfit/
Binomial distribution 147

Binomial distribution
Probability mass function

Cumulative distribution function

notation: B(n, p)
parameters: n ∈ N0 — number of trials
p ∈ [0,1] — success probability in each trial
support: k ∈ { 0, …, n }
pmf:

cdf:
mean: np
median: ⌊np⌋ or ⌈np⌉
mode: ⌊(n + 1)p⌋ or ⌊(n + 1)p⌋ − 1
variance: np(1 − p)
skewness:

ex.kurtosis:

entropy:

mgf:
cf:

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of
successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such
a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, the binomial
distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of
statistical significance.
Binomial distribution 148

It is frequently used to model number of successes in a sample of size n from a population of size N. Since the
samples are not independent (this is sampling without replacement), the resulting distribution is a hypergeometric
distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good
approximation, and widely used.

Examples
An elementary example is this: roll a standard die ten times and count the number of fours. The distribution of this
random number is a binomial distribution with n = 10 and p = 1/6.
As another example, flip a coin three times and count the number of heads. The distribution of this random number
is a binomial distribution with n = 3 and p = 1/2.

Specification

Probability mass function


In general, if the random variable K follows the binomial distribution with parameters n and p, we write K ~ B(n, p).
The probability of getting exactly k successes in n trials is given by the probability mass function:

for k = 0, 1, 2, ..., n, where

is the binomial coefficient (hence the name of the distribution) "n choose k", also denoted C(n, k),  nCk, or nCk. The
formula can be understood as follows: we want k successes (pk) and n − k failures (1 − p)n − k. However, the k
successes can occur anywhere among the n trials, and there are C(n, k) different ways of distributing k successes in a
sequence of n trials.
In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is
because for k > n/2, the probability can be calculated by its complement as

So, one must look to a different k and a different p (the binomial is not symmetrical in general). However, its
behavior is not arbitrary. There is always an integer m that satisfies

As a function of k, the expression ƒ(k; n, p) is monotone increasing for k < m and monotone decreasing for k > m,
with the exception of one case where (n + 1)p is an integer. In this case, there are two maximum values for
m = (n + 1)p and m − 1. m is known as the most probable (most likely) outcome of Bernoulli trials. Note that the
probability of it occurring can be fairly small.
Binomial distribution 149

Cumulative distribution function


The cumulative distribution function can be expressed as:

where is the "floor" under x, i.e. the greatest integer less than or equal to x.
It can also be represented in terms of the regularized incomplete beta function, as follows:

For k ≤ np, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's
inequality yields the bound

and Chernoff's inequality can be used to derive the bound

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k ≥ 3n/8[1]

Mean and variance


If X ~ B(n, p) (that is, X is a binomially distributed random variable), then the expected value of X is

and the variance is

This fact is easily proven as follows. Suppose first that we have a single Bernoulli trial. There are two possible
outcomes: 1 and 0, the first occurring with probability p and the second having probability 1 − p. The expected value
in this trial will be equal to μ = 1 · p + 0 · (1−p) = p. The variance in this trial is calculated similarly: σ2 = (1−p)2·p +
(0−p)2·(1−p) = p(1 − p).
The generic binomial distribution is a sum of n independent Bernoulli trials. The mean and the variance of such
distributions are equal to the sums of means and variances of each individual trial:
Binomial distribution 150

Mode and median


Usually the mode of a binomial B(n, p) distribution is equal to ⌊(n + 1)p⌋, where ⌊ ⌋ is the floor function. However
when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1.
When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:
In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique.
However several special results have been established:
• If np is an integer, then the mean, median, and mode coincide.[2]
• Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.[3]
• A median m cannot lie too far away from the mean: |m − np| ≤ min{ ln 2, max{p, 1 − p} }.[4]
• The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or
|m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd).[3] [4]
• When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial
distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.

Covariance between two binomials


If two binomially distributed random variables X and Y are observed together, estimating their covariance can be
useful. Using the definition of covariance, in the case n = 1 we have

The first term is non-zero only when both X and Y are one, and μX and μY are equal to the two probabilities. Defining
pB as the probability of both happening at the same time, this gives

and for n such trials again due to independence

If X and Y are the same variable, this reduces to the variance formula given above.

Algebraic derivations of mean and variance


We derive these quantities from first principles. Certain particular sums occur in these two derivations. We rearrange
the sums and terms so that sums solely over complete binomial probability mass functions (pmf) arise, which are
always unity

We apply the definition of the expected value of a discrete random variable to the binomial distribution
The first term of the series (with index k = 0) has value 0 since the first factor, k, is zero. It may thus be discarded,
i.e. we can change the lower limit to: k = 1
We've pulled factors of n and k out of the factorials, and one power of p has been split off. We are preparing to
redefine the indices.

We rename m = n − 1 and s = k − 1. The value of the sum is not changed by this, but it now becomes readily
recognizable
The ensuing sum is a sum over a complete binomial pmf (of one order lower than the initial sum, as it happens).
Thus
Binomial distribution 151

[5]

Variance
It can be shown that the variance is equal to (see: Computational formula for the variance):

In using this formula we see that we now also need the expected value of X 2:

We can use our experience gained above in deriving the mean. We know how to process one factor of k. This gets us
as far as
(again, with m = n − 1 and s = k − 1). We split the sum into two separate sums and we recognize each one
The first sum is identical in form to the one we calculated in the Mean (above). It sums to mp. The second sum is
unity.
Using this result in the expression for the variance, along with the Mean (E(X) = np), we get

Using falling factorials to find E(X2)


We have

But

So
Thus

Relationship to other distributions

Sums of binomials
If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its
distribution is

Bernoulli distribution
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B(1, p) has
the same meaning as X ~ Bern(p). Conversely, any binomial distribution, B(n, p), is the sum of n independent
Bernoulli trials, Bern(p), each with the same probability p.
Binomial distribution 152

Poisson binomial distribution


The binomial distribution is a special case of the Poisson binomial distribution, which is a sum of n independent
non-identical Bernoulli trials Bern(pi). If X has the Poisson binomial distribution with p1 = … = pn =p then
X ~ B(n, p).

Normal approximation
If n is large enough, then the skew of the distribution is
not too great. In this case, if a suitable continuity
correction is used, then an excellent approximation to
B(n, p) is given by the normal distribution

Binomial PDF and normal approximation for n = 6 and p = 0.5

The approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.[6]
Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of
zero or unity:
• One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies from source
to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the
same results as the following rule for large n until n is very large (ex: x=11, n=7752).
• That rule[6] is that for n > 5 the normal approximation is adequate if

• Another commonly used rule holds that the normal approximation is appropriate only if everything within 3
standard deviations of its mean is within the range of possible values, that is if

• Also as the approximation generally improves, it can be shown that the inflection points occur at

The following is an example of applying a continuity correction: Suppose one wishes to calculate Pr(X ≤ 8) for a
binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is
approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation
gives considerably less accurate results.
This approximation, known as de Moivre–Laplace theorem, is a huge time-saver (exact calculations with large n are
very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book
Binomial distribution 153

The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since
B(n, p) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis
of a hypothesis test, a "proportion z-test," for the value of p using x/n, the sample proportion and estimator of p, in a
common test statistic.[7]
For example, suppose you randomly sample n people out of a large population and ask them whether they agree with
a certain statement. The proportion of people who agree will of course depend on the sample. If you sampled groups
of n people repeatedly and truly randomly, the proportions would follow an approximate normal distribution with
mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)1/2.
Large sample sizes n are good because the standard deviation, as a proportion of the expected value, gets smaller,
which allows a more precise estimate of the unknown parameter p.

Poisson approximation
The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while
the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an
approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According
to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[8]

Limits
• As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at least np approaches λ > 0, then the
Binomial(n, p) distribution approaches the Poisson distribution with expected value λ.
• As n approaches ∞ while p remains fixed, the distribution of

approaches the normal distribution with expected value 0 and variance 1. This result is sometimes loosely
stated by saying that the distribution of X approaches the normal distribution with expected value np and
variance np(1 − p). That loose statement cannot be taken literally because the thing asserted to be approached
actually depends on the value of n, and n is approaching infinity. This result is a specific case of the Central
Limit Theorem).

Generating binomial random variates


• Luc Devroye, Non-Uniform Random Variate Generation, New York: Springer-Verlag, 1986. See especially
Chapter X, Discrete Univariate Distributions [9].
• Kachitvichyanukul, V.; Schmeiser, B. W. (1988). "Binomial random variate generation". Communications of the
ACM 31: 216–222. doi:10.1145/42372.42381.

See also
• Bean machine / Galton box
• Beta distribution
• Binomial proportion confidence interval
• Hypergeometric distribution
• Logistic regression
• Multinomial distribution
• Negative binomial distribution
• Beta-binomial distribution
Binomial distribution 154

• Normal distribution
• Poisson distribution
• Sample_size#Estimating_proportions
• SOCR

External links
• Binomial Probabilities Simple Explanation [10]
• SOCR Binomial Distribution Applet [4]
• CAUSEweb.org [11] Many resources for teaching Statistics including Binomial Distribution
• "Binomial Distribution" [12] by Chris Boucher, Wolfram Demonstrations Project, 2007.
• Binomial Distribution [13] Properties and Java simulation from cut-the-knot
• Statistics Tutorial: Binomial Distribution [14]

References
[1] Matousek, J, Vondrak, J: The Probabilistic Method (lecture notes) (http:/ / kam. mff. cuni. cz/ ~matousek/ prob-ln. ps. gz).
[2] Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung" (in German). Wissenschaftliche Zeitschrift der Technischen
Universität Dresden 19: 29–33.
[3] Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica 34 (1): 13–18.
doi:10.1111/j.1467-9574.1980.tb00681.x.
[4] Hamza, K. (1995). "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson
distributions". Statistics & Probability Letters 23: 21–21. doi:10.1016/0167-7152(94)00090-U.
[5] Morse, Philip (1969). Thermal Physics. New York: W. A. Benjamin. ISBN 0805372024.
[6] Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
[7] NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" (http:/ / www. itl. nist. gov/ div898/ handbook/ prc/
section2/ prc24. htm) e-Handbook of Statistical Methods.
[8] NIST/SEMATECH, "6.3.3.1. Counts Control Charts" (http:/ / www. itl. nist. gov/ div898/ handbook/ pmc/ section3/ pmc331. htm),
e-Handbook of Statistical Methods.
[9] http:/ / cg. scs. carleton. ca/ ~luc/ chapter_ten. pdf
[10] http:/ / faculty. vassar. edu/ lowry/ binomialX. html
[11] http:/ / www. causeweb. org
[12] http:/ / demonstrations. wolfram. com/ BinomialDistribution/
[13] http:/ / www. cut-the-knot. org/ Curriculum/ Probability/ BinomialDistribution. shtml
[14] http:/ / stattrek. com/ Lesson2/ Binomial. aspx
Uniform distribution (discrete) 155

Uniform distribution (discrete)


discrete uniform

Probability mass function

n = 5 where n = b − a + 1
Cumulative distribution function

parameters:

support:
pmf:

cdf:

mean:

median:

mode: N/A
variance:

skewness:
ex.kurtosis:

entropy:
mgf:

cf:
Uniform distribution (discrete) 156

In probability theory and statistics, the discrete uniform distribution is a probability distribution whereby a finite
number of equally spaced values are equally likely to be observed; every one of n values has equal probability 1/n.
Another way of saying "discrete uniform distribution" would be "a known, finite number of equally spaced outcomes
equally likely to happen."
If a random variable has any of possible values that are equally spaced and equally probable,
then it has a discrete uniform distribution. The probability of any outcome   is . A simple example of the
discrete uniform distribution is throwing a fair die. The possible values of are 1, 2, 3, 4, 5, 6; and each time the die
is thrown, the probability of a given score is 1/6. If two dice are thrown and their values added, the uniform
distribution no longer fits since the values from 2 to 12 do not have equal probabilities.
The cumulative distribution function (CDF) can be expressed in terms of a degenerate distribution as

where the Heaviside step function is the CDF of the degenerate distribution centered at , using the
convention that

Estimation of maximum
This example is described by saying that a sample of k observations are obtained from a uniform distribution on the
integers , with the problem being to estimate the unknown maximum N. This problem is commonly
known as the German tank problem, following the application of maximum estimation to estimates of German tank
production during World War II.
The UMVU estimator for the maximum is given by

where m is the sample maximum and k is the sample size, sampling without replacement.[1] [2] This can be seen as a
very simple case of maximum spacing estimation.
The formula may be understood intuitively as:
"The sample maximum plus the average gap between observations in the sample",
the gap being added to compensate for the negative bias of the sample maximum as an estimator for the population
maximum.[3]
This has a variance of[1]

so a standard deviation of approximately , the (population) average size of a gap between samples; compare
above.

The sample maximum is the maximum likelihood estimator for the population maximum, but, as discussed above, it
is biased.
If samples are not numbered but are recognizable or markable, one can instead estimate population size via the
capture-recapture method.
Uniform distribution (discrete) 157

Random permutation
See rencontres numbers for an account of the probability distribution of the number of fixed points of a uniformly
distributed random permutation.

See also
• Delta distribution
• Uniform distribution (continuous)

References
[1] Johnson, Roger (1994), "Estimating the Size of a Population", Teaching Statistics (http:/ / www. rsscse. org. uk/ ts/ index. htm) 16 (2
(Summer)): 50, doi:10.1111/j.1467-9639.1994.tb00688.x
[2] Johnson, Roger (2006), "Estimating the Size of a Population" (http:/ / www. rsscse. org. uk/ ts/ gtb/ johnson. pdf), Getting the Best from
Teaching Statistics (http:/ / www. rsscse. org. uk/ ts/ gtb/ contents. html),
[3] The sample maximum is never more than the population maximum, but can be less, hence it is a biased estimator: it will tend to
underestimate the population maximum.

Geometric distribution
In probability theory and statistics, the geometric distribution is either of two discrete probability distributions:
• The probability distribution of the number X of Bernoulli trials needed to get one success, supported on the
set { 1, 2, 3, ...}
• The probability distribution of the number Y = X − 1 of failures before the first success, supported on the
set { 0, 1, 2, 3, ... }
Which of these one calls "the" geometric distribution is a matter of convention and convenience.

Geometric

Probability mass function


Geometric distribution 158

Cumulative distribution function

Parameters success probability (real) success probability


(real)
Support

Probability mass function


(pmf)

Cumulative distribution
function (cdf)
Mean

Median
(not unique if is

an integer)
Mode 1 0
Variance

Skewness

Excess kurtosis

Entropy

Moment-generating function
(mgf)
Characteristic function

These two different geometric distributions should not be confused with each other. Often, the name shifted
geometric distribution is adopted for the former one (distribution of the number X); however, to avoid ambiguity, it
is considered wise to indicate which is intended, by mentioning the range explicitly.
If the probability of success on each trial is p, then the probability that the kth trial (out of k trials) is the first success
is

for k = 1, 2, 3, ....
Geometric distribution 159

Equivalently, if the probability of success on each trial is p, then the probability that there are k failures before the
first success is

for k = 0, 1, 2, 3, ....
In either case, the sequence of probabilities is a geometric sequence.
For example, suppose an ordinary die is thrown repeatedly until the first time a "1" appears. The probability
distribution of the number of times it is thrown is supported on the infinite set { 1, 2, 3, ... } and is a geometric
distribution with p = 1/6.

Moments and cumulants


The expected value of a geometrically distributed random variable X is 1/p and the variance is (1 − p)/p2:

Similarly, the expected value of the geometrically distributed random variable Y is (1 − p)/p, and its variance is
(1 − p)/p2:

Let μ = (1 − p)/p be the expected value of Y. Then the cumulants of the probability distribution of Y satisfy the
recursion

Outline of proof: That the expected value is (1 − p)/p can be shown in the following way. Let Y be as above. Then

(The interchange of summation and differentiation is justified by the fact that convergent power series converge
uniformly on compact subsets of the set of points where they converge.)
Geometric distribution 160

Parameter estimation
For both variants of the geometric distribution, the parameter p can be estimated by equating the expected value with
the sample mean. This is the method of moments, which in this case happens to yield maximum likelihood estimates
of p.
Specifically, for the first variant let k = k1, ..., kn be a sample where ki ≥ 1 for i = 1, ..., n. Then p can be estimated as

In Bayesian inference, the Beta distribution is the conjugate prior distribution for the parameter p. If this parameter is
given a Beta(α, β) prior, then the posterior distribution is

The posterior mean E[p] approaches the maximum likelihood estimate as α and β approach zero.
In the alternative case, let k1, ..., kn be a sample where ki ≥ 0 for i = 1, ..., n. Then p can be estimated as

The posterior distribution of p given a Beta(α, β) prior is

Again the posterior mean E[p] approaches the maximum likelihood estimate as α and β approach zero.

Other properties
• The probability-generating functions of X and Y are, respectively,

• Like its continuous analogue (the exponential distribution), the geometric distribution is memoryless. That means
that if you intend to repeat an experiment until the first success, then, given that the first success has not yet
occurred, the conditional probability distribution of the number of additional trials does not depend on how many
failures have been observed. The die one throws or the coin one tosses does not have a "memory" of these
failures. The geometric distribution is in fact the only memoryless discrete distribution.
• Among all discrete probability distributions supported on {1, 2, 3, ... } with given expected value μ, the geometric
distribution X with parameter p = 1/μ is the one with the largest entropy.
• The geometric distribution of the number Y of failures before the first success is infinitely divisible, i.e., for any
positive integer n, there exist independent identically distributed random variables Y1, ..., Yn whose sum has the
same distribution that Y has. These will not be geometrically distributed unless n = 1; they follow a negative
binomial distribution.
• The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not
identically distributed) random variables. For example, the hundreds digit D has this probability distribution:
Geometric distribution 161

where q = 1 − p, and similarly for the other digits, and, more generally, similarly for numeral systems with
other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be
written as a sum of independent random variables whose probability distributions are indecomposable.
• Golomb coding is the optimal prefix code for the geometric discrete distribution.

Related distributions
• The geometric distribution Y is a special case of the negative binomial distribution, with r = 1. More generally, if
Y1, ..., Yr are independent geometrically distributed variables with parameter p, then the sum

follows a negative binomial distribution with parameters r and p.


• If Y1, ..., Yr are independent geometrically distributed variables (with possibly different success parameters p(m)),
then their minimum

is also geometrically distributed, with parameter

• Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable Xk has a Poisson distribution with expected value
r k/k. Then

has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value r/(1 − r).
• The exponential distribution is the continuous analogue of the geometric distribution. If X is an exponentially
distributed random variable with parameter λ, then

(where is the floor (or greatest integer) function)


is a geometrically distributed random variable with parameter p = 1 − e−λ (thus λ = −ln(1 − p)[1] ) and taking
values in the set {0, 1, 2, ...}. This can be used to generate geometrically distributed pseudorandom numbers
by first generating exponentially distributed pseudorandom numbers from a uniform pseudorandom number
generator: then is geometrically distributed with parameter , if is uniformly
distributed in [0,1].
Geometric distribution 162

See also
• Hypergeometric distribution
• Coupon collector's problem

External links
• Geometric distribution [2] on PlanetMath
• Geometric distribution [3] on MathWorld.

References
[1] http:/ / www. wolframalpha. com/ input/ ?i=inverse+ p+ %3D+ 1+ -+ e^-l
[2] http:/ / planetmath. org/ ?op=getobj& amp;from=objects& amp;id=3456
[3] http:/ / mathworld. wolfram. com/ GeometricDistribution. html

Hypergeometric distribution
Hypergeometric

parameters:

support:
pmf:

cdf:
mean:

median:
mode:

variance:

skewness:

ex.kurtosis:

entropy:
mgf:

cf:

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that
describes the number of successes in a sequence of n draws from a finite population without replacement, just as the
Hypergeometric distribution 163

binomial distribution describes the number of successes for draws with replacement.
The notation is illustrated by this contingency table:

drawn not drawn total

white k m−k m

black n−k N+k−n−m N−


m

total n N−n N

Perhaps the easiest way to understand this distribution is in terms of urn models. Suppose you are to draw "n"
marbles without replacement from an urn containing "N" marbles in total, "m" of which are white. The
hypergeometric distribution describes the distribution of the number of white marbles drawn from the urn.
A random variable X follows the hypergeometric distribution with parameters N, m and n if the probability is given
by

where the binomial coefficient is defined to be the coefficient of xb in the polynomial expansion of (1 + x)a.
The probability is positive when max(0, n + m − N) ≤ k ≤ min(m, n).

The formula can be understood as follows: There are possible samples (without replacement). There are
ways to obtain k white marbles and there are ways to fill out the rest of the sample with black marbles.
The sum of the probabilities for all possible values of k is equal to 1 as one would expect intuitively; this is
essentially Vandermonde's identity from combinatorics. Also note that the following identity holds:

This follows clearly from the symmetry of the problem, but it can also be shown easily by expressing the binomial
coefficients in terms of factorials, and rearranging the latter.

Application and example


The classical application of the hypergeometric distribution is sampling without replacement. Think of an urn with
two types of marbles, black ones and white ones. Define drawing a white marble as a success and drawing a black
marble as a failure (analogous to the binomial distribution). If the variable N describes the number of all marbles in
the urn (see contingency table above) and m describes the number of white marbles, then N − m corresponds to the
number of black marbles.
Now, assume that there are 5 white and 45 black marbles in the urn. Standing next to the urn, you close your eyes
and draw 10 marbles without replacement. What is the probability that exactly 4 of the 10 are white? Note that
although we are looking at success/failure, the data cannot be modeled under the binomial distribution, because the
probability of success on each trial is not the same, as the size of the remaining population changes as we remove
each marble.
This problem is summarized by the following contingency table:
Hypergeometric distribution 164

drawn not drawn total

white marbles k=4 m−k=1 m=5

black n−k= N + k − n − m = 39 N − m = 45
marbles 6

total n = 10 N − n = 40 N = 50

The probability of drawing exactly k white marbles can be calculated by the formula

Hence, in this example calculate


Intuitively we would expect it to be even more unlikely for all 5 marbles to be white.
As expected, the probability of drawing 5 white marbles is much lower than that of drawing 4.

Symmetries
Swapping the roles of black and white marbles:

Swapping the roles of drawn and not drawn marbles:

Swapping the roles of white and drawn marbles:

Symmetry application
The metaphor of defective and drawn objects depicts an application of the hypergeometric distribution in which the
interchange symmetry between n and m is not of foremost concern. Here is an alternate metaphor which brings this
symmetry into sharper focus, as there are also applications where it serves no purpose to distinguish n from m.
Suppose you have a set of N children who have been identified with an unusual bone marrow antigen. The doctor
wishes to conduct a heredity study to determine the inheritance pattern of this antigen. For the purposes of this study,
the doctor wishes to draw tissue from the bone marrow from the biological mother and biological father of each
child. This is an uncomfortable procedure, and not all the mothers and fathers will agree to participate. Of the
mothers, m participate and N-m decline. Of the fathers, n participate and N-n decline.
We assume here that the decisions made by the mothers is independent of the decisions made by the fathers. Under
this assumption, the doctor, who is given n and m, wishes to estimate k, the number of children where both parents
have agreed to participate. The hypergeometric distribution can be used to determine this distribution over k. It's not
straightforward why the doctor would know n and m, but not k. Perhaps n and m are dictated by the experimental
design, while the experimenter is left blind to the true value of k.
It is important to recognize that for given N, n and m a single degree of freedom partitions N into four
sub-populations:
1. Children where both parents participate
2. Children where only the mother participates
3. Children where only the father participates and
4. Children where neither parent participates.
Hypergeometric distribution 165

Knowing any one of these four values determines the other three by simple arithmetic relations. For this reason, each
of these quadrants is governed by an equivalent hypergeometric distribution. The mean, mode, and values of k
contained within the support differ from one quadrant to another, but the size of the support, the variance, and other
high order statistics do not.
For the purpose of this study, it might make no difference to the doctor whether the mother participates or the father
participates. If this happens to be true, the doctor will view the result as a three-way partition: children where both
parents participate, children where one parent participates, children where neither parent participates. Under this
view, the last remaining distinction between n and m has been eliminated. The distribution where one parent
participates is the sum of the distributions where either parent alone participates.

Symmetry and sampling


To express how the symmetry of the clinical metaphor degenerates to the asymmetry of the sampling language used
in the drawn/defective metaphor, we will restate the clinical metaphor in the abstract language of decks and cards.
We begin with a dealer who holds two prepared decks of N cards. The decks are labelled left and right. The left deck
was prepared to hold n red cards, and N-n black cards; the right deck was prepared to hold m red cards, and N-m
black cards.
These two decks are dealt out face down to form N hands. Each hand contains one card from the left deck and one
card from the right deck. If we determine the number of hands that contain two red cards, by symmetry relations we
will necessarily also know the hypergeometric distributions governing the other three quadrants: hand counts for
red/black, black/red, and black/black. How many cards must be turned over to learn the total number of red/red
hands? Which cards do we need to turn over to accomplish this? These are questions about possible sampling
methods.
One approach is to begin by turning over the left card of each hand. For each hand showing a red card on the left, we
then also turn over the right card in that hand. For any hand showing a black card on the left, we do not need to
reveal the right card, as we already know this hand does not count toward the total of red/red hands. Our treatment of
the left and right decks no longer appears symmetric: one deck was fully revealed while the other deck was partially
revealed. However, we could just as easily have begun by revealing all cards dealt from the right deck, and partially
revealed cards from the left deck.
In fact, the sampling procedure need not prioritize one deck over the other in the first place. Instead, we could flip a
coin for each hand, turning over the left card on heads, and the right card on tails, leaving each hand with one card
exposed. For every hand with a red card exposed, we reveal the companion card. This will suffice to allow us to
count the red/red hands, even though under this sampling procedure neither the left nor right deck is fully revealed.
By another symmetry, we could also have elected to determine the number of black/black hands rather than the
number of red/red hands, and discovered the same distributions by that method.
The symmetries of the hypergeometric distribution provide many options in how to conduct the sampling procedure
to isolate the degree of freedom governed by the hypergeometric distribution. Even if the sampling procedure
appears to treat the left deck differently from the right deck, or governs choices by red cards rather than black cards,
it is important to recognize that the end result is essentially the same.
Hypergeometric distribution 166

Relationship to Fisher's exact test


The test (see above) based on the hypergeometric distribution (hypergeometric test) is identical to the corresponding
one-tailed version of Fisher's exact test. Reciprocally, the p-value of a two-sided Fisher's exact test can be calculated
as the sum of two appropriate hypergeometric tests (for more information see [1] ).

Order of draws
The probability of drawing any sequence of white and black marbles (the hypergeometric distribution) depends only
on the number of white and black marbles, not on the order in which they appear; i.e., it is an exchangeable
distribution. As a result, the probability of drawing a white marble in the draw is

This can be shown by induction. First, it is certainly true for the first draw that:

Also, we can show that by writing:

,
which makes it true for every draw.

Related distributions
Let X ~ Hypergeometric( , , ) and .
• If then has a Bernoulli distribution with parameter .
• Let have a binomial distribution with parameters and ; this models the number of successes in the
analogous sampling problem with replacement. If and are large compared to and is not close to 0 or
1, then and have similar distributions, i.e., .
• If is large, and are large compared to and is not close to 0 or 1, then

where is the standard normal distribution function


• If the probabilities to draw a white or black marble are not equal (e.g. because their size is different) then has
a Noncentral hypergeometric distribution

Multivariate hypergeometric distribution


Hypergeometric distribution 167

Multivariate Hypergeometric Distribution

parameters:

support:

pmf:

cdf:
mean:

median:
mode:
variance:

skewness:
ex.kurtosis:
entropy:
mgf:
cf:

The model of an urn with black and white marbles can be extended to the case where there are more than two colors
of marbles. If there are mi marbles of color i in the urn and you take n marbles at random without replacement, then
the number of marbles of each color in the sample (k1,k2,...,kc) has the multivariate hypergeometric distribution. This
has the same relationship to the multinomial distribution that the hypergeometric distribution has to the binomial
distribution--the multinomial distribution is the "with-replacement" distribution and the multivariate hypergeometric
is the "without-replacement" distribution.
The properties of this distribution are given in the adjacent table, where c is the number of different colors and
is the total number of marbles.

Example
Suppose there are 5 black, 10 white, and 15 red marbles in an urn. You reach in and randomly select six marbles
without replacement. What is the probability that you pick exactly two of each color?
Note: When picking the six marbles without replacement, the expected number of black marbles is 6*(5/30) = 1, the
expected number of white marbles is 6*(10/30) = 2, and the expected number of red marbles is 6*(15/30) = 3.

See also
• Binomial distribution
• Multinomial distribution
• Fisher's exact test
Hypergeometric distribution 168

• Noncentral hypergeometric distributions


• Sampling (statistics)
• Coupon collector's problem
• Geometric distribution
• Keno

External links
• Hypergeometric Distribution Calculator [2]
• Hypergeometric Distribution Calculator with source (Ruby, C++) [3]
• The Hypergeometric Distribution [4] and Binomial Approximation to a Hypergeometric Random Variable [5] by
Chris Boucher, Wolfram Demonstrations Project.
• Weisstein, Eric W., "Hypergeometric Distribution [6]" from MathWorld.
• Hypergeometric distribution online calculator (.XBAP) [7]
• Hypergeometric tail inequalities: ending the insanity [8] by Matthew Skala.
• Survey Analysis Tool [9] using discrete hypergeometric distribution based on A. Berkopec, HyperQuick algorithm
for discrete hypergeometric distribution, Journal of Discrete Algorithms, Elsevier, 2006 [10].

References
[1] K. Preacher and N. Briggs. "Calculation for Fisher's Exact Test: An interactive calculation tool for Fisher's exact probability test for 2 x 2
tables (interactive page)" (http:/ / www. people. ku. edu/ ~preacher/ fisher/ fisher. htm). . Retrieved 2008-04-08.
[2] http:/ / www. adsciengineering. com/ hpdcalc
[3] http:/ / www. nerdbucket. com/ statistics/ hypergeometric/
[4] http:/ / demonstrations. wolfram. com/ TheHypergeometricDistribution/
[5] http:/ / demonstrations. wolfram. com/ BinomialApproximationToAHypergeometricRandomVariable/
[6] http:/ / mathworld. wolfram. com/ HypergeometricDistribution. html
[7] http:/ / pcarvalho. com/ things/ hypegeocalc/ HypergeometricCalculator. xbap
[8] http:/ / ansuz. sooke. bc. ca/ professional/ hypergeometric. pdf
[9] http:/ / www. i-marvin. si
[10] http:/ / dx. doi. org/ 10. 1016/ j. jda. 2006. 01. 001
Negative binomial distribution 169

Negative binomial distribution


Probability mass function

The orange line represents the mean, which is equal to 10 in each of these plots;
the green line shows the standard deviation.
notation:
parameters: r > 0 — number of failures until the experiment is stopped (integer,
but the definition can also be extended to reals)
p ∈ (0,1) — success probability in each experiment (real)
support: k ∈ { 0, 1, 2, 3, … }
pmf:

cdf: , the regularized incomplete beta function


mean:

median:
mode:

variance:

skewness:

ex.kurtosis:

entropy:
mgf:

cf:

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the
number of successes in a sequence of Bernoulli trials before a specified (non-random) number r of failures occurs.
For example, if one throws a die repeatedly until the third time “1” appears, then the probability distribution of the
number of non-“1”s that had appeared will be negative binomial.
The Pascal distribution (after Blaise Pascal) and Polya distribution (for George Pólya) are special cases of the
negative binomial. There is a convention among engineers, climatologists, and others to reserve “negative binomial”
in a strict sense or “Pascal” for the case of an integer-valued stopping-time parameter r, and use “Polya” for the
real-valued case. The Polya distribution more accurately models occurrences of “contagious” discrete events, like
tornado outbreaks, than does the Poisson distribution.
Negative binomial distribution 170

Definition
Suppose there is a sequence of independent Bernoulli trials, each trial having two potential outcomes called
“success” and “failure”. In each trial the probability of success is p and of failure is 1 − p. We are observing this
sequence until a predefined number r of failures has occurred. Then the random number of successes we have seen,
X, will have the negative binomial (or Pascal) distribution:

When applied to real-world situations, the words success and failure need not necessarily be associated with
outcomes which we see as good or bad. Say in one case we may use negative binomial distribution to model the
number of days a certain machine works before it breaks down. In such a case the “failure” would mean the machine
breaking down, whereas “success” will be it working properly. In another case we can use negative binomial
distribution to model the number of hits needed for a sportsman to score a goal. Then the “failure” will be his/her
scoring the goal, whereas “successes” are misses.
The probability mass function of the negative binomial distribution is
Here the quantity in parentheses is called the binomial coefficient, and is equal to
This quantity can alternatively be written in the following manner, explaining the name “negative binomial”:

Extension to real-valued r
It is possible to extend the definition of the negative binomial distribution to the case of real-valued r’s. Although it
is impossible to visualize a non-integer number of “failures”, we can still formally define the distribution through its
probability mass function.
As before, we say that X has a negative binomial (or Pólya) distribution if it has a probability mass function:
Here r is a real, positive number, and the binomial coefficient can be interpreted through the gamma function:

Alternative formulations
Some textbooks may define the negative binomial distribution slightly differently than it is done here. The most
common variations are:
• The definition where X is the total number of trials needed to get r failures, not simply the number of successes.
Since the total number of trials is equal to the number of successes plus the number of failures, this definition
differs from ours by adding constant r.
In order to convert formulas written with this definition into the one used in the article, replace everywhere “k”
with “k + r”, and also subtract r from the mean, the median, and the mode. In order to convert formulas of this
article into this alternative definition, replace “k” with “k − r” and add r to the mean, the median and the mode.
• The definition where p denotes the probability of a failure, not of a success. This may also be formulated as “X is
the number of failures before r successes”, in which case p will be the probability of a success, but the words
“failure” and “success” have been swapped around.
In order to convert formulas between this definition and the one used in the article, replace “p” with “1 − p”
everywhere.
• The two alterations above may be applied simultaneously.
Negative binomial distribution 171

Occurrence

Waiting time in a Bernoulli process


For the special case where r is an integer, the negative binomial distribution is known as the Pascal distribution. It
is the probability distribution of a certain number of failures and successes in a series of independent and identically
distributed Bernoulli trials. For k + r Bernoulli trials with success probability p, the negative binomial gives the
probability of k successes and r failures, with a failure on the last trial. In other words, the negative binomial
distribution is the probability distribution of the number of successes before the rth failure in a Bernoulli process,
with probability p of successes on each trial. A Bernoulli process is a discrete time process, and so the number of
trials, failures, and successes are integers.
Consider the following example. Suppose we repeatedly throw a die, and consider a “1” to be a “failure”. The
probability of failure on each trial is 1/6. The number of successes before the third failure belongs to the infinite set {
0, 1, 2, 3, ... }. That number of successes is a negative-binomially distributed random variable.
When r = 1 we get the probability distribution of number of successes before the first failure (i.e. the probability of
the first failure occurring on the (k + 1)st trial), which is a geometric distribution:

Overdispersed Poisson
The negative binomial distribution, especially in its alternative parameterization described above, can be used as an
alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range
whose sample variance exceeds the sample mean. If a Poisson distribution is used to model such data, the model
mean and variance are equal. In that case, the observations are overdispersed with respect to the Poisson model.
Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used
to adjust the variance independently of the mean. See Cumulants of some discrete probability distributions. In the
case of modest overdispersion, this may produce substantially similar results to an overdispersed Poisson
distribution.[1]

Related distributions
• The geometric distribution is a special case of the negative binomial distribution, with

• The negative binomial distribution is a special case of the discrete phase-type distribution.

Poisson distribution
Consider a sequence of negative binomial distributions where the stopping parameter r goes to infinity, whereas the
probability of success in each trial, p, goes to zero in such a way as to keep the mean of the distribution constant.
Denoting this mean λ, the parameter p will have to be

Under this parametrization the probability mass function will be

Now if we consider the limit as r → ∞, the second factor will converge to one, and the third to the exponent
function:
Negative binomial distribution 172

which is the mass function of a Poisson-distributed random variable with expected value λ.
In other words, the alternatively parameterized negative binomial distribution converges to the Poisson distribution
and r controls the deviation from the Poisson. This makes the negative binomial distribution suitable as a robust
alternative to the Poisson, which approaches the Poisson for large r, but which has larger variance than the Poisson
for small r.

Gamma–Poisson mixture
The negative binomial distribution also arises as a continuous mixture of Poisson distributions where the mixing
distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ)
distribution, where λ is itself a random variable, distributed according to Gamma(r, p/(1 − p)).
Formally, this means that the mass function of the negative binomial distribution can be written as

Because of this, the negative binomial distribution is known as the gamma–Poisson (mixture) distribution.

Sum of geometric distributions


If Yr is a random variable following the negative binomial distribution with parameters r and p, and support
{0, 1, 2, ...}, then Yr is a sum of r independent variables following the geometric distribution with parameter p. As a
result of the central limit theorem, Yr (properly scaled and shifted) is therefore approximately normal for sufficiently
large r.
Furthermore, if Bs+r is a random variable following the binomial distribution with parameters s + r and p, then
In this sense, the negative binomial distribution is the "inverse" of the binomial distribution.
The sum of independent negative-binomially distributed random variables with the same value of the parameter p
but the "r-values" r1 and r2 is negative-binomially distributed with the same p but with "r-value" r1 + r2.
The negative binomial distribution is infinitely divisible, i.e., if Y has a negative binomial distribution, then for any
positive integer n, there exist independent identically distributed random variables Y1, ..., Yn whose sum has the same
distribution that Y has. These will not be negative-binomially distributed in the sense defined above unless n is a
divisor of r (more on this below).
Negative binomial distribution 173

Properties

Cumulative distribution function


The cumulative distribution function can be expressed in terms of the regularized incomplete beta function:

Sampling and point estimation of p


Suppose p is unknown and an experiment is conducted where it is decided ahead of time that sampling will continue
until r successes are found. A sufficient statistic for the experiment is k, the number of failures.
In estimating p, the minimum variance unbiased estimator is

The maximum likelihood estimate of p is

but this is a biased estimate. Its inverse (r + k)/r, is an unbiased estimate of 1/p, however.[2] .

Relation to the binomial theorem


Suppose K is a random variable with a negative binomial distribution with parameters r and p. The statement that the
sum from k = 0 to infinity, of the probability Pr[K = k], is equal to 1, can be shown algebraically to be equivalent to
the statement that (1 − p)− r is what Newton's binomial theorem says it should be.
Suppose Y is a random variable with a binomial distribution with parameters n and p. The statement that the sum
from y = 0 to n, of the probability Pr[Y = y], is equal to 1, says that 1 = (p + (1 − p))n is what the strictly finitary
binomial theorem of rudimentary algebra says it should be.
Thus the negative binomial distribution bears the same relationship to the negative-integer-exponent case of the
binomial theorem that the binomial distribution bears to the positive-integer-exponent case.
Assume p + q = 1. Then the binomial theorem of elementary algebra implies that

This can be written in a way that may at first appear to some to be incorrect, and perhaps perverse even if correct:

in which the upper bound of summation is infinite. The binomial coefficient

is defined even when n is negative or is not an integer. But in our case of the binomial distribution it is zero when k >
n. So why would we write the result in that form, with a seemingly needless sum of infinitely many zeros? The
answer comes when we generalize the binomial theorem of elementary algebra to Newton's binomial theorem. Then
we can say, for example

Now suppose r > 0 and we use a negative exponent:


Negative binomial distribution 174

Then all of the terms are positive, and the term

is just the probability that the number of failures before the rth success is equal to k, provided r is an integer. (If r is a
negative non-integer, so that the exponent is a positive non-integer, then some of the terms in the sum above are
negative, so we do not have a probability distribution on the set of all nonnegative integers.)
Now we also allow non-integer values of r. Then we have a proper negative binomial distribution, which is a
generalization of the Pascal distribution, which coincides with the Pascal distribution when r happens to be a positive
integer.
Recall from above that
The sum of independent negative-binomially distributed random variables with the same value of the
parameter p but the "r-values" r1 and r2 is negative-binomially distributed with the same p but with "r-value"
r1 + r2.
This property persists when the definition is thus generalized, and affords a quick way to see that the negative
binomial distribution is infinitely divisible.

Examples
(After a problem by Dr. Diane Evans, professor of mathematics at Rose-Hulman Institute of Technology)
Pat is required to sell candy bars to raise money for the 6th grade field trip. There are thirty houses in the
neighborhood, and Pat is not supposed to return home until five candy bars have been sold. So the child goes door to
door, selling candy bars. At each house, there is a 0.4 probability of selling one candy bar and a 0.6 probability of
selling nothing.
What's the probability mass function for selling the last candy bar at the nth house?
Recall that the NegBin(r, p) distribution describes the probability of k failures and r successes in k+r Bernoulli(p)
trials with success on the last trial. Selling five candy bars means getting five successes. The number of trials (i.e.
houses) this takes is therefore k+5 = n. The random variable we are interested in is the number of houses, so we
substitute k = n − 5 into a NegBin(5, 0.4) mass function and obtain the following mass function of the distribution of
houses (for n ≥ 5):

What's the probability that Pat finishes on the tenth house?

What's the probability that Pat finishes on or before reaching the eighth house?
To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those
probabilities:
Negative binomial distribution 175

What's the probability that Pat exhausts all 30 houses in the neighborhood?
This can be expressed as the probability that Pat does not finish on the fifth through the thirtieth house:

See also
• Coupon collector's problem
• Negative multinomial distribution

Further reading
• Hilbe, Joseph M., Negative Binomial Regression, Cambridge, UK: Cambridge University Press (2007) Negative
Binomial Regression - Cambridge University Press [3]

References
[1] McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC.
ISBN 0-412-31760-5.
[2] J. B. S. Haldane, "On a Method of Estimating Frequencies", Biometrika, Vol. 33, No. 3 (Nov., 1945), pp. 222–225. JSTOR 2332299
[3] http:/ / www. cambridge. org/ uk/ catalogue/ catalogue. asp?isbn=9780521857727
176

Multivariate distributions

Multinomial distribution
Multinomial

parameters: number of trials (integer)


event probabilities ( )
support:

pmf:

cdf:
mean:
median:
mode:
variance:

skewness:
ex.kurtosis:
entropy:
mgf:

cf:

In probability theory, the multinomial distribution is a generalization of the binomial distribution.


The binomial distribution is the probability distribution of the number of "successes" in n independent Bernoulli
trials, with the same probability of "success" on each trial. In a multinomial distribution, the analog of the Bernoulli
distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number k of

possible outcomes, with probabilities p1, ..., pk (so that pi ≥ 0 for i = 1, ..., k and ), and there are n

independent trials. Then let the random variables Xi indicate the number of times outcome number i was observed
over the n trials. The vector X = (X1, ..., Xk) follows a multinomial distribution with parameters n and p, where
p = (p1, ..., pk).
Note that, in some fields, such as natural language processing, the categorical and multinomial distributions are
conflated, and it is common to speak of a "multinomial distribution" when a categorical distribution is actually
meant. This stems from the fact that it is sometimes convenient to express the outcome of a categorical distribution
as a "1-of-K" vector (a vector with one element containing a 1 and all other elements containing a 0) rather than as
an integer in the range ; in this form, a categorical distribution is equivalent to a multinomial distribution
over a single observation.
Multinomial distribution 177

Specification

Probability mass function


The probability mass function of the multinomial distribution is:
for non-negative integers x1, ..., xk.

Properties
The expected number of times the outcome i was observed over n trials is

The covariance matrix is as follows. Each diagonal entry is the variance of a binomially distributed random variable,
and is therefore

The off-diagonal entries are the covariances:

for i, j distinct.
All covariances are negative because for fixed n, an increase in one component of a multinomial vector requires a
decrease in another component.
This is a k × k positive-semidefinite matrix of rank k − 1.
The off-diagonal entries of the corresponding correlation matrix are

Note that the sample size drops out of this expression.


Each of the k components separately has a binomial distribution with parameters n and pi, for the appropriate value
of the subscript i.
The support of the multinomial distribution is the set

Its number of elements is

the number of n-combinations of a multiset with k types, or multiset coefficient.

Example
In a recent three-way election for a large country, candidate A received 20% of the votes, candidate B received 30%
of the votes, and candidate C received 50% of the votes. If six voters are selected randomly, what is the probability
that there will be exactly one supporter for candidate A, two supporters for candidate B and three supporters for
candidate C in the sample?
Note: Since we’re assuming that the voting population is large, it is reasonable and permissible to think of the
probabilities as unchanging once a voter is selected for the sample. Technically speaking this is sampling without
replacement, so the correct distribution is the multivariate hypergeometric distribution, but the distributions
converge as the population grows large.
Multinomial distribution 178

Sampling from a multinomial distribution


First, reorder the parameters such that they are sorted in descending order (this is only to speed up
computation and not strictly necessary). Now, for each trial, draw an auxiliary variable X from a uniform (0, 1)
distribution. The resulting outcome is the component

This is a sample for the multinomial distribution with n = 1. A sum of independent repetitions of this experiment is a
sample from a multinomial distribution with n equal to the number of such repetitions.

Related distributions
• When k = 2, the multinomial distribution is the binomial distribution.
• The continuous analogue is Multivariate normal distribution
• Categorical distribution, the distribution of each trial; for k = 2, this is the Bernoulli distribution
• The Dirichlet distribution is the conjugate prior of the multinomial in Bayesian statistics.
• Multivariate Polya distribution
• Beta-binomial model

See also
• Multinomial theorem
• Negative multinomial distribution

References
Evans, Merran; Nicholas Hastings, Brian Peacock (2000). Statistical Distributions. New York: Wiley. pp. 134–136.
3rd ed.. ISBN 0-471-37124-6.
Multivariate normal distribution 179

Multivariate normal distribution


Probability density function

Multivariate (bivariate) Gaussian distribution centered at (1,3) with a standard deviation


of 3 in roughly the (0.878, 0.478) direction and of 1 in the orthogonal direction.
parameters: μ ∈ Rk — location
Σ ∈ Rk×k — covariance (nonnegative-definite matrix)
support: x ∈ span(Σ) ⊆ Rk
pdf:

(pdf exists only for positive-definite Σ)


cdf: (no analytic expression)
mean: μ
mode: μ
variance: Σ
entropy:

mgf:

cf:

In probability theory and statistics, the multivariate normal distribution or multivariate Gaussian distribution, is
a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. A random vector is
said to be multivariate normally distributed if every linear combination of its components has a univariate normal
distribution.

Notation and parametrization


The multivariate normal distribution of a k-dimensional random vector X = [X1, X2, …, Xk] can be written in the
following notation:

or to make it explicitly known that X is k-dimensional,

with k-dimensional mean vector

and k x k covariance matrix


Multivariate normal distribution 180

Definition
A random vector X = (X1, …, Xk)′ is said to have the multivariate normal distribution if it satisfies the following
equivalent conditions [1] :
• Every linear combination of its components Y = a1X1 + … + akXk is normally distributed. That is, for any constant
vector a ∈ Rk, the random variable Y = a′X has a univariate normal distribution.
• There exists a random ℓ-vector Z, whose components are independent normal random variables, a k-vector μ, and
a k×ℓ matrix A, such that X = AZ + μ. Here ℓ is the rank of the covariance matrix Σ = AA′.
• There is a k-vector μ and a symmetric, nonnegative-definite k×k matrix Σ, such that the characteristic function of
X is

• (Only in case when the support of X is the entire space Rk). There exists a k-vector μ and a symmetric
positive-definite k×k matrix Σ, such that the probability density function of X can be expressed as

where |Σ| is the determinant of Σ, and where (2π)k/2|Σ|1/2 could instead be written as |2πΣ|1/2. This expression
reduces to the density of the univariate normal distribution if Σ is a scalar (i.e., a 1×1 matrix).
The covariance matrix is allowed to be singular (in which case the corresponding distribution has no density). This
case arises frequently in statistics; for example, in the distribution of the vector of residuals in the ordinary least
squares regression. Note also that the Xi are in general not independent; they can be seen as the result of applying the
matrix A to a collection of independent Gaussian variables Z.

Bivariate case
In the 2-dimensional nonsingular case (k = rank(Σ) = 2), the probability density function of a vector [X Y]′ is
where ρ is the correlation between X and Y. In this case,

In the bivariate case, we also have a theorem that makes the first equivalent condition for multivariate normality less
restrictive: it is sufficient to verify that countably many distinct linear combinations of X and Y are normal in order
to conclude that the vector [X Y]′ is bivariate normal.[2]

Properties

Cumulative distribution function


The cumulative distribution function (cdf) F(x) of a random vector X is defined as the probability that all
components of X are less than or equal to the corresponding values in the vector x. Though there is no closed form
for F(x), there are a number of algorithms that estimate it numerically. For example, see MVNDST under [3]
(includes FORTRAN code) or [4] (includes MATLAB code).

Normally distributed and independent


If X and Y are normally distributed and independent, this implies they are "jointly normally distributed", i.e., the pair
(X, Y) must have bivariate normal distribution. However, a pair of jointly normally distributed variables need not be
independent.
Multivariate normal distribution 181

Two normally distributed random variables need not be jointly bivariate normal
The fact that two random variables X and Y both have a normal distribution does not imply that the pair (X, Y) has a
joint normal distribution. A simple example is one in which X has a normal distribution with expected value 0 and
variance 1, and Y = X if |X| > c and Y = −X if |X| < c, where c is about 1.54. There are similar counterexamples for
more than two random variables.

Conditional distributions
If μ and Σ are partitioned as follows

with sizes

with sizes

then the distribution of x1 conditional on x2 = a is multivariate normal (X1|X2 = a) ∼ N(μ, Σ) where

and covariance matrix

This matrix is the Schur complement of Σ22 in Σ. This means that to calculate the conditional covariance matrix, one
inverts the overall covariance matrix, drops the rows and columns corresponding to the variables being conditioned
upon, and then inverts back to get the conditional covariance matrix.
Note that knowing that x2 = a alters the variance, though the new variance does not depend on the specific value of a;
perhaps more surprisingly, the mean is shifted by ; compare this with the situation of not
knowing the value of a, in which case x1 would have distribution .
−1
The matrix Σ12Σ22 is known as the matrix of regression coefficients.
In the bivariate case the conditional distribution of Y given X is

Bivariate conditional expectation


In the case

then

where this latter ratio is often called the inverse Mills ratio.
Multivariate normal distribution 182

Marginal distributions
To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to drop the
irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix.
The proof for this follows from the definitions of multivariate normal distributions and some advanced linear algebra
[5]
.
Example
Let X = [X1, X2, X3] be multivariate normal random variables with mean vector μ = [μ1μ2μ3] and covariance matrix Σ
(Standard parametrization for multivariate normal distribution). Then the joint distribution of X′ = [X1X3] is

multivariate normal with mean vector μ′ = [μ1μ3] and covariance matrix

Affine transformation
If Y = c + BX is an affine transformation of where c is an vector of constants and B is a
constant matrix, then Y has a multivariate normal distribution with expected value c + Bμ and variance
T
BΣB i.e., . In particular, any subset of the Xi has a marginal distribution that is also
multivariate normal. To see this, consider the following example: to extract the subset (X1, X2, X4)T, use

which extracts the desired elements directly.


Another corollary is that the distribution of Z = b · X, where b is a constant vector of the same length as X and the dot
indicates a vector product, is univariate Gaussian with . This result follows by using

and considering only the first component of the product (the first row of B is the vector b). Observe how the
positive-definiteness of Σ implies that the variance of the dot product must be positive.
An affine transformation of X such as 2X is not the same as the sum of two independent realisations of X.

Geometric interpretation
The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations
of hyperspheres) centered at the mean[6] . The directions of the principal axes of the ellipsoids are given by the
eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the
corresponding eigenvalues.
If Σ = UΛUT = UΛ1/2(UΛ1/2)T is an eigendecomposition where the columns of U are unit eigenvectors and Λ is a
diagonal matrix of the eigenvalues, then we have
Moreover, U can be chosen to be a rotation matrix, as inverting an axis does not have any effect on N(0, Λ), but
inverting a column changes the sign of U's determinant. The distribution N(μ, Σ) is in effect N(0, I) scaled by Λ1/2,
rotated by U and translated by μ.
Conversely, any choice of μ, full rank matrix U, and positive diagonal entries Λi yields a non-singular multivariate
normal distribution. If any Λi is zero and U is square, the resulting covariance matrix UΛUT is singular.
Geometrically this means that every contour ellipsoid is infinitely thin and has zero volume in n-dimensional space,
Multivariate normal distribution 183

as at least one of the principal axes has length of zero.

Correlations and independence


In general, random variables may be uncorrelated but highly dependent. But if a random vector has a multivariate
normal distribution then any two or more of its components that are uncorrelated are independent. This implies that
any two or more of its components that are pairwise independent are independent.
But it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated
are independent. Two random variables that are normally distributed may fail to be jointly normally distributed, i.e.,
the vector whose components they are may fail to have a multivariate normal distribution. For an example of two
normally distributed random variables that are uncorrelated but not independent, see normally distributed and
uncorrelated does not imply independent.

Higher moments
The kth-order moments of X are defined by

where r1 + r2 + ⋯ + rN = k.
The central k-order central moments are given as follows
(a) If k is odd, μ1, …, N(X − μ) = 0.
(b) If k is even with k = 2λ, then

where the sum is taken over all allocations of the set into λ (unordered) pairs. That is, if you have a
kth ( = 2λ = 6) central moment, you will be summing the products of λ = 3 covariances (the -μ notation has been
dropped in the interests of parsimony):
This yields terms in the sum (15 in the above case), each being the product of λ (in
this case 3) covariances. For fourth order moments (four variables) there are three terms. For sixth-order moments
there are 3 × 5 = 15 terms, and for eighth-order moments there are 3 × 5 × 7 = 105 terms.
The covariances are then determined by replacing the terms of the list by the corresponding terms of
the list consisting of r1 ones, then r2 twos, etc... To illustrate this, examine the following 4th-order central moment
case:

where σij is the covariance of Xi and Xj. The idea with the above method is you first find the general case for a kth
moment where you have k different X variables - and then you can simplify this accordingly.
Say, you have then you simply let Xi = Xj and realise that σii = σi2.
Multivariate normal distribution 184

Kullback–Leibler divergence
The Kullback–Leibler divergence from to , for non-singular matrices Σ0 and Σ1, is:
[7]

The logarithm must be taken to base e since the two terms following the logarithm are themselves base-e logarithms
of expressions that are either factors of the density function or otherwise arise naturally. The equation therefore gives
a result measured in nats. Dividing the entire expression above by loge 2 yields the divergence in bits.

Estimation of parameters
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is
perhaps surprisingly subtle and elegant. See estimation of covariance matrices.
In short, the probability density function (pdf) of an N-dimensional multivariate normal is

and the ML estimator of the covariance matrix from a sample of n observations is

which is simply the sample covariance matrix. This is a biased estimator whose expectation is

An unbiased sample covariance is

The Fisher information matrix for estimating the parameters of a multivariate normal distribution has a closed form
expression. This can be used, for example, to compute the Cramer-Rao bound for parameter estimation in this
setting. See Fisher information#Multivariate normal distribution for more details.

Entropy
The differential entropy of the multivariate normal distribution is [8]

where is the determinant of the covariance matrix Σ.


Multivariate normal distribution 185

Multivariate normality tests


Multivariate normality tests check a given set of data for similarity to the multivariate normal distribution. The null
hypothesis is that the data set is similar to the normal distribution, therefore a sufficiently small p-value indicates
non-normal data. Multivariate normality tests include the Cox-Small test [9] and Smith and Jain's adaptation [10] of
the Friedman-Rafsky test.[11]

Drawing values from the distribution


A widely used method for drawing a random vector X from the N-dimensional multivariate normal distribution with
mean vector μ and covariance matrix Σ (required to be symmetric and positive-definite) works as follows:
1. Find any matrix A such that A AT = Σ. Often this is a Cholesky decomposition, though a square root of Σ would
also suffice.
2. Let Z = (z1, …, zN)T be a vector whose components are N independent standard normal variates (which can be
generated, for example, by using the Box-Muller transform).
3. Let X be μ + AZ. This has the desired distribution due to the affine transformation property.

See also
• Chi distribution, the pdf of the 2-norm (or Euclidean norm) of a multivariate normally-distributed vector.

References
[1] Gut, Allan: An Intermediate Course in Probability, 2009, chapter 5
[2] Hamedani & Tata (1975)
[3] http:/ / www. math. wsu. edu/ faculty/ genz/ software/ software. html
[4] http:/ / alex. strashny. org/ a/ Multivariate-normal-cumulative-distribution-function-(cdf)-in-MATLAB. html
[5] The formal proof for marginal distribution is shown here http:/ / fourier. eng. hmc. edu/ e161/ lectures/ gaussianprocess/ node7. html
[6] Nikolaus Hansen. "The CMA Evolution Strategy: A Tutorial" (http:/ / www. bionik. tu-berlin. de/ user/ niko/ cmatutorial. pdf) (PDF). .
[7] Penny & Roberts, PARG-00-12, (2000) (http:/ / www. allisons. org/ ll/ MML/ KL/ Normal). pp. 18
[8] Gokhale, DV; NA Ahmed, BC Res, NJ Piscataway (May 1989). "Entropy Expressions and Their Estimators for Multivariate Distributions".
Information Theory, IEEE Transactions on 35 (3): 688–692. doi:10.1109/18.30996.
[9] Cox, D. R.; N. J. H. Small (August 1978). "Testing multivariate normality". Biometrika 65 (2): 263–272. doi:10.1093/biomet/65.2.263.
[10] Smith, Stephen P.; Anil K. Jain (September 1988). "A test to determine the multivariate normality of a dataset". IEEE Transactions on
Pattern Analysis and Machine Intelligence 10 (5): 757–761. doi:10.1109/34.6789.
[11] Friedman, J. H. and Rafsky, L. C. (1979) "Multivariate generalizations of the Wald-Wolfowitz and Smirnov two sample tests". Annals of
Statistics, 7, 697–717.

Literature
Hamedani, G. G.; Tata, M. N. (1975). "On the determination of the bivariate normal distribution from distributions
of linear combinations of the variables" (http:/ / jstor. org/ stable/ 2318494). The American Mathematical Monthly
(The American Mathematical Monthly, Vol. 82, No. 9) 82 (9): 913–915. doi:10.2307/2318494. JSTOR 2318494.
Wishart distribution 186

Wishart distribution
Wishart

parameters: deg. of freedom (real)


scale matrix ( pos. def)
support: positive definite matrices
pdf:

cdf:
mean:
median:
mode:
variance:

skewness:
ex.kurtosis:
entropy:
mgf:
cf:

In statistics, the Wishart distribution is a generalization to multiple dimensions of the chi-square distribution, or, in
the case of non-integer degrees of freedom, of the gamma distribution. It is named in honor of John Wishart, who
first formulated the distribution in 1928.[1]
It is any of a family of probability distributions defined over symmetric, nonnegative-definite matrix-valued random
variables ("random matrices"). These distributions are of great importance in the estimation of covariance matrices in
multivariate statistics. In Bayesian inference, the Wishart distribution is of particular importance, as it is the
conjugate prior of the inverse of the covariance matrix (the precision matrix) of a multivariate normal distribution.

Definition
Suppose X is an n × p matrix, each row of which is independently drawn from p-variate normal distribution with zero
mean:

Then the Wishart distribution is the probability distribution of the p×p random matrix

known as the scatter matrix. One indicates that S has that probability distribution by writing

The positive integer n is the number of degrees of freedom. Sometimes this is written W(V, p, n). For n ≥ p the
matrix S is invertible with probability 1 if V is invertible.
If p = 1 and V = 1 then this distribution is a chi-square distribution with n degrees of freedom.
Wishart distribution 187

Occurrence
The Wishart distribution arises as the distribution of the sample covariance matrix for a sample from a multivariate
normal distribution. It occurs frequently in likelihood-ratio tests in multivariate statistical analysis. It also arises in
the spectral theory of random matrices{{cn} and in multidimensional Bayesian analysis.

Probability density function


The Wishart distribution can be characterized by its probability density function, as follows.
Let W be a p × p symmetric matrix of random variables that is positive definite. Let V be a (fixed) positive definite
matrix of size p × p.
Then, if n ≥ p, W has a Wishart distribution with n degrees of freedom if it has a probability density function given
by

where Γp(·) is the multivariate gamma function defined as

In fact the above definition can be extended to any real n > p − 1. If n ≤ p − 2, then the Wishart no longer has a
density—instead it represents a singular distribution. [2]

Characteristic function
The characteristic function of the Wishart distribution is

In other words,

where denotes expectation. (Here and are matrices the same size as ( is the identity matrix); and
is the square root of −1).

Theorem
If has a Wishart distribution with m degrees of freedom and variance matrix —write —and
is a q × p matrix of rank q, then

Corollary 1
If is a nonzero constant vector, then .
In this case, is the chi-square distribution and (note that is a constant; it is positive because
is positive definite).
Wishart distribution 188

Corollary 2
Consider the case where (that is, the jth element is one and all others zero). Then
corollary 1 above shows that

gives the marginal distribution of each of the elements on the matrix's diagonal.
Noted statistician George Seber points out that the Wishart distribution is not called the "multivariate chi-square
distribution" because the marginal distribution of the off-diagonal elements is not chi-square. Seber prefers to reserve
the term multivariate for the case when all univariate marginals belong to the same family.

Estimator of the multivariate normal distribution


The Wishart distribution is the sampling distribution of the maximum-likelihood estimator (MLE) of the covariance
matrix of a multivariate normal distribution. The derivation of the MLE is perhaps surprisingly subtle and elegant. It
involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a 1×1 matrix than as
a mere scalar. See estimation of covariance matrices.

Bartlett decomposition
The Bartlett decomposition of a matrix W from a p-variate Wishart distribution with scale matrix V and n degrees
of freedom is the factorization:

where L is the Cholesky decomposition of V, and:

where and independently. This provides a useful method for obtaining random
[3]
samples from a Wishart distribution.

The possible range of the shape parameter


It can be shown that the Wishart distribution can be defined for all shape parameters p in

This set is named after Gindikin, who introduced it in the sixties in the context of gamma distributions on
homogeneous cones. However, for the new parameters in the discrete spectrum of the Gindikin ensemble, the
corresponding Wishart distribution has no Lebesgue density.
Wishart distribution 189

See also
• Hotelling's T-square distribution
• Inverse-Wishart distribution

References
[1] Wishart, J. (1928). "The generalised product moment distribution in samples from a normal multivariate population". Biometrika 20A (1-2):
32–52. doi:10.1093/biomet/20A.1-2.32. JFM 54.0565.02.
[2] "On singular Wishart and singular multivariate beta distributions" by Harald Uhling, The Annals of Statistics, 1994, 395-405 projecteuclid
(http:/ / projecteuclid. org/ DPubS?service=UI& version=1. 0& verb=Display& handle=euclid. aos/ 1176325375)
[3] Smith, W. B.; Hocking, R. R. (1972). "Algorithm AS 53: Wishart Variate Generator". Journal of the Royal Statistical Society. Series C
(Applied Statistics) 21 (3): 341–345. JSTOR 2346290.
Article Sources and Contributors 190

Article Sources and Contributors


Probability distribution  Source: http://en.wikipedia.org/w/index.php?oldid=388038117  Contributors: (:Julien:), 198.144.199.xxx, 3mta3, A.M.R., A5, Abhinav316, AbsolutDan, Adrokin,
Alansohn, Alexius08, Ap, Applepiein, Avenue, AxelBoldt, BD2412, Baccyak4H, Benwing, Bfigura's puppy, Bhoola Pakistani, Bkkbrad, Bryan Derksen, Btyner, Calvin 1998, Caramdir,
Cburnett, Chirlu, Chris the speller, Classical geographer, Closedmouth, Conversion script, Courcelles, Damian Yerrick, Davhorn, David Eppstein, David Vose, DavidCBryant, Dcljr, Delldot, Den
fjättrade ankan, Dick Beldin, Digisus, Dino, Domminico, Dysprosia, Eliezg, Emijrp, Epbr123, Eric Kvaalen, Fintor, Firelog, Fnielsen, G716, Gaius Cornelius, Gala.martin, Gandalf61,
Gate2quality, Giftlite, Gjnyasa, GoodDamon, Graham87, Hu12, ImperfectlyInformed, It Is Me Here, Iwaterpolo, J.delanoy, JRSpriggs, Jan eissfeldt, JayJasper, Jclemens, Jipumarino, Jitse
Niesen, Jon Awbrey, Josuechan, Jsd115, Jsnx, Jtkiefer, Knutux, Larryisgood, LiDaobing, Lilac Soul, Lollerskates, Lotje, Loupeter, MGriebe, MarkSweep, Markhebner, Marner, Megaloxantha,
Melcombe, Mental Blank, Michael Hardy, Miguel, MisterSheik, Morton.lin, MrOllie, Napzilla, Nbarth, Noodle snacks, NuclearWarfare, O18, OdedSchramm, Ojigiri, OverInsured, Oxymoron83,
PAR, Pabristow, Patrick, Paul August, Pax:Vobiscum, Pgan002, Phys, Ponnu, Poor Yorick, Populus, Ptrf, Quietbritishjim, Qwfp, Riceplaytexas, Rich Farmbrough, Richard D. LeCour,
Rinconsoleao, Roger.simmons, Rursus, Salgueiro, Salix alba, Samois98, Sandym, Schmock, Seglea, Serguei S. Dukachev, ShaunES, Shizhao, Silly rabbit, SiobhanHansa, Sky Attacker, Statlearn,
Stpasha, TNARasslin, TakuyaMurata, Tarotcards, Tayste, Techman224, Thamelry, The Anome, The Thing That Should Not Be, TheCoffee, Tomi, Topology Expert, Tordek ar, Tsirel, Ttony21,
Unyoyega, Uvainio, VictorAnyakin, Whosasking, Whosyourjudas, X-Bert, Zundark, 218 anonymous edits

Beta distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385210788  Contributors: Adamace123, AnRtist, Art2SpiderXL, Baccyak4H, Betadistribution, BlaiseFEgan, Bootstoots,
Bryan Derksen, Btyner, Cburnett, Crasshopper, Cronholm144, DFRussia, Dean P Foster, Dshutin, Eric Kvaalen, FilipeS, Fintor, Fnielsen, Giftlite, Gill110951, GregorB, Gökhan, Henrygb,
Hilgerdenaar, J04n, Jamessungjin.kim, Janlo, Jhapk, Jheald, Josang, Ketiltrout, Kts, Ladislav Mecir, Linas, Livius3, MarkSweep, Mcld, Melcombe, Michael Hardy, MisterSheik, Mochan
Shrestha, MrOllie, Nbarth, O18, Oberobic, Ohanian, Oleg Alexandrov, Ott2, PAR, PBH, Pleasantville, Pnrj, Qwfp, Robbyjo, Robinh, Rodrigo braz, Rumping, SJP, ST47, Schmock, SharkD,
Steve8675309, Stoni, Sukisuki, Tomi, Urhixidur, Wile E. Heresiarch, 102 anonymous edits

Burr distribution  Source: http://en.wikipedia.org/w/index.php?oldid=377637548  Contributors: Melcombe, PoochieR, Qwfp, Selket, Yoctobarryc, 2 anonymous edits

Cauchy distribution  Source: http://en.wikipedia.org/w/index.php?oldid=384918645  Contributors: 1diot, Abdullah Chougle, Albmont, Arthur Rubin, AxelBoldt, Baccyak4H, Beaumont,
Bfigura's puppy, BoH, Bryan Derksen, Btyner, Clíodhna-2, Conversion script, Cretog8, DJIndica, Dicklyon, Emilpohl, Fjhickernell, Fnielsen, FrankH, Gareth Owen, Giftlite, HEL, Hannes Eder,
Headbomb, Henrygb, Heron, Hidaspal, Hxu, Igny, Jitse Niesen, K.F., KSmrq, Kribbeh, Kurykh, LOL, Lambiam, Leendert, Lightst, MarkSweep, Melchoir, Melcombe, Metacomet, Michael
Hardy, Miguel, MisterSheik, MrOllie, Nbarth, Nichtich, O18, Oleg Alexandrov, Ott2, PAR, PBH, Paul August, PeterC, PhilipHeller55, Pizzadeliveryboy, Quietbritishjim, Qwfp, Rlendog,
Rogerbrent, Romanm, Skbkekas, Snoyes, Sterrys, Stpasha, Sławomir Biały, The Anome, Thesilverbail, Tkuvho, Tomeasy, Tomi, Weialawaga, Wikid77, ZeroOne, Zeycus, Zundark, Zvika, 89
anonymous edits

Chi-square distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385849300  Contributors: A.R., AaronSw, AdamSmithee, Afa86, Alvin-cs, Animeronin, Ap, AstroWiki, AxelBoldt,
Blaisorblade, Bluemaster, Bryan Derksen, Btyner, CBM, Cburnett, Chaldor, Chris53516, Constructive editor, DanSoper, Dbachmann, Dbenbenn, Den fjättrade ankan, Digisus, EOBarnett, Eliel
Jimenez, Eliezg, Emilpohl, Etoombs, Ettrig, Fergikush, Fibonacci, Fieldday-sunday, Fintor, G716, Gaara144, Gauss, Giftlite, Gperjim, Henrygb, Herbee, Hgamboa, HyDeckar, Iav, Icseaturtles,
Isopropyl, It Is Me Here, Iwaterpolo, J-stan, Jackzhp, Jaekrystyn, Jason Goldstick, Jdgilbey, Jitse Niesen, Johnlemartirao, Jspacemen01-wiki, Knetlalala, KnightRider, Kotasik, LeilaniLad,
Leotolstoy, LilHelpa, Lixiaoxu, Loodog, Loren.wilton, Lovibond, MATThematical, MER-C, MarkSweep, Markg0803, Master of Puppets, Mcorazao, Mdebets, Melcombe, Mgiganteus1, Michael
Hardy, Microball, Mikael Häggström, Mindmatrix, MisterSheik, MrOllie, MtBell, Nbarth, Neon white, Nm420, Notatoad, O18, Oleg Alexandrov, PAR, Pabristow, Pahan, Paul August, Paulginz,
Pstevens, Qiuxing, Quantling, Quietbritishjim, Qwfp, Rflrob, Rich Farmbrough, Robinh, Ronz, Saippuakauppias, Sam Blacketer, SamuelTheGhost, Sander123, Schmock, Schwnj, Seglea,
Shoefly, Silly rabbit, Sligocki, Stephen C. Carlson, Steve8675309, Stpasha, Tarkashastri, The Anome, TheProject, TimBentley, Tombomp, Tomi, TomyDuby, Tony1, User A1, Volkan.cevher,
Voyagerfan5761, Wasell, Wassermann7, Weialawaga, Willem, Zero0000, Zfr, Zvika, 245 anonymous edits

Dirichlet distribution  Source: http://en.wikipedia.org/w/index.php?oldid=388000539  Contributors: A5, Adfernandes, BSVulturis, Barak, Benwing, Btyner, Charles Matthews, Cretog8, Daf,
Dlwhall, Erikerhardt, Finnancier, Franktuyl, Frigyik, Giftlite, Herve1729, Ipeirotis, Ivan.Savov, J04n, Josang, Kzollman, M0nkey, MarkSweep, Mathknightapprentice, Mcld, Melcombe, Michael
Hardy, MisterSheik, Mitch3, Nbarth, Prasenjitmukherjee, Qwfp, Robinh, Rvencio, Salgueiro, Schmock, Slxu.public, Tomi, Tomixdf, Whosasking, Wolfman, Zvika, 47 anonymous edits

F-distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385210914  Contributors: Albmont, Art2SpiderXL, Bluemaster, Brenda Hmong, Jr, Bryan Derksen, Btyner, Cburnett,
DanSoper, Dysprosia, Elmer Clark, Emilpohl, Fnielsen, Ged.R, Giftlite, Gperjim, Hectorlamadrid, Henrygb, Jan eissfeldt, Jitse Niesen, JokeySmurf, MarkSweep, Markjoseph125, Mdebets,
Melcombe, Michael Hardy, MrOllie, Nehalem, O18, Oscar, PBH, Quietbritishjim, Qwfp, Robinh, Salix alba, Seglea, TedE, The Squicks, Tomi, Unyoyega, Zorgkang, 24 anonymous edits

Gamma distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385688382  Contributors: A5, Aastrup, Abtweed98, Adam Clark, Adfernandes, Albmont, Aple123, Apocralyptic, Arg,
Asteadman, Autopilot, Baccyak4H, Barak, Bdmy, Berland, Bo Jacoby, Bobo192, Brenton, Bryan Derksen, Btyner, Cburnett, Cerberus0, ClaudeLo, Complex01, Darin, David Haslam, Dobromila,
Donmegapoppadoc, Dshutin, Erik144, Eug, Fangz, Fnielsen, Frau K, Frobnitzem, Gandalf61, Gauss, Giftlite, Gjnyasa, Henrygb, Hgkamath, Iwaterpolo, Jason Goldstick, Jlc46,
JonathanWilliford, Jshadias, Linas, Lovibond, LukeSurl, Luqmanskye, MarkSweep, Mcld, Mebden, Melcombe, Michael Hardy, MisterSheik, MrOllie, MuesLee, Mundhenk, Narc813, O18, PAR,
PBH, Patrke, Paul Pogonyshev, Paulginz, Pichote, Popnose, Qiuxing, Quietbritishjim, Qwfp, RSchlicht, Robbyjo, Robinh, Samsara, Schmock, Smmurphy, Stephreg, Stevvers, Supergrane,
Tayste, TestUser001, Thomas stieltjes, Thric3, Tomi, Tommyjs, True rover, Umpi77, User A1, Wiki me, Wiki5d, Wikid77, Wile E. Heresiarch, Zvika, 208 anonymous edits

Exponential distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385264451  Contributors: A.M.R., A3r0, ActivExpression, Aiden Fisher, Avabait, Avraham, AxelBoldt, Bdmy,
Beaumont, Bryan Derksen, Btyner, Butchbrody, CYD, Calmer Waters, CapitalR, Cazort, Cburnett, Closedmouth, Coffee2theorems, Cyp, Dcljr, Dcoetzee, Decrypt3, Den fjättrade ankan,
Dudubur, Duoduoduo, Enchanter, Fvw, Gauss, Giftlite, GorillaWarfare, Grinofadrunkwoman, Henrygb, Hsne, IanOsgood, Igny, Ilmari Karonen, Isis, Iwaterpolo, Jason Goldstick, Jester7777,
Kan8eDie, Kappa, Karl-Henner, Kyng, LOL, MStraw, MarkSweep, Markjoseph125, Mattroberts, Mdf, MekaD, Melcombe, Memming, Michael Hardy, Mindmatrix, MisterSheik, Mwanner,
Nothlit, Oysindi, PAR, Paul August, Qwfp, Remohammadi, Rich Farmbrough, Rp, Sergey Suslov, Shaile, Shingkei, Skbkekas, Skittleys, Smack, Spartanfox86, Stpasha, Taral, Taw, The Thing
That Should Not Be, Thegeneralguy, TimBentley, Tomi, Ularevalo98, User A1, Vsmith, WDavis1911, Wilke, Woohookitty, Wyatts, Yoyod, Z.E.R.O., Zeno of Elea, Zeycus, Zvika, Zzxterry, 179
anonymous edits

Erlang distribution  Source: http://en.wikipedia.org/w/index.php?oldid=375279482  Contributors: Acuster, Afa86, Aranel, Autopilot, Avatar, Basten, Bobo192, Bryan Derksen, Bullmoose953,
Calltech, Cnmirose, CraigNYC, Derisavi, Donmegapoppadoc, DudMc3, Giftlite, Gustavf, Ian Geoffrey Kennedy, Iwaterpolo, Jbfung, Jim.henderson, Joshdick, Jrdioko, Jwortman, Kitsonk,
Luckyz, Mange01, MarkSweep, McKay, Michael Hardy, MisterSheik, Myleslong, PAR, Pichote, Qwfp, RHaworth, Salsa Shark, TedPavlic, User A1, Welsh, Zvika, 79 anonymous edits

Kumaraswamy distribution  Source: http://en.wikipedia.org/w/index.php?oldid=377356832  Contributors: Apankrat, Baccyak4H, Btyner, CanadianLinuxUser, Charles Matthews,
Cronholm144, Ganeshk, Giftlite, John Vandenberg, KSmrq, MarkSweep, Michael Hardy, MisterSheik, Oleg Alexandrov, PAR, Pejman47, Ponnu, Qwfp, Ricky81682, Tomi, WikHead, 20
anonymous edits

Inverse Gaussian distribution  Source: http://en.wikipedia.org/w/index.php?oldid=378529633  Contributors: Aastrup, Abtweed98, Baccyak4H, Batman50, Btyner, David Haslam, Deavik,
Dima373, Felipehsantos, Giftlite, Iwaterpolo, LachlanA, LandruBek, Memming, Michael Hardy, MisterSheik, NickMulgan, Oleg Alexandrov, Qwfp, Rhfeng, Sterrys, The real moloch57, Tomi,
User A1, Vana Seshadri, Wikid77, 44 anonymous edits

Laplace distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385025078  Contributors: Alektzin, Btyner, CRGreathouse, Cburnett, Charles Matthews, Comfortably Paranoid, Dcljr,
Dcoetzee, Fasten, Fnielsen, Foobarhoge, Giftlite, Henrygb, Igny, Iwaterpolo, Jurgen, Kabla002, Ludovic89, M.A.Dabbah, MarkSweep, Mashiah Davidson, Melcombe, Memming, Michael Hardy,
MisterSheik, Mohammad Al-Aggan, PAR, Qwfp, Rlendog, Sterrys, User A1, Vovchyck, Wastle, Wolf87, Zundark, Zvika, 40 anonymous edits

Lévy distribution  Source: http://en.wikipedia.org/w/index.php?oldid=368630726  Contributors: 84user, Badger Drink, Btyner, Caviare, DBrane, Digfarenough, Dysmorodrepanis, Eric Kvaalen,
Gaius Cornelius, Gbellocchi, Gene Nygaard, Giftlite, Kloveland, Lovibond, Melcombe, Michael Hardy, Nbarth, Night Gyr, PAR, Ptrf, PyonDude, Qwfp, Rlendog, Saihtam, SebastianHelm,
Tsirel, Wainson, Xcentaur, Ynhockey, 17 anonymous edits

Log-logistic distribution  Source: http://en.wikipedia.org/w/index.php?oldid=368470690  Contributors: Alexenderius, DonAndre, Fetchcomms, John Vandenberg, Melcombe, PeterSymonds,
Qwfp, Rjwilmsi, Sevilledade, 3 anonymous edits

Log-normal distribution  Source: http://en.wikipedia.org/w/index.php?oldid=379551137  Contributors: 2D, A. Pichler, Acct4, Albmont, Alue, Autopilot, AxelBoldt, Baccyak4H, BenB4,
Berland, Biochem67, Bryan Derksen, Btyner, Cburnett, Ciberelm, Ciemo, Cleared as filed, ColinGillespie, Constructive editor, Encyclops, Evil Monkey, Fredrik, Gausseliminering, Giftlite,
Humanengr, Hxu, IanOsgood, Iwaterpolo, Jackzhp, Jeff3000, Jitse Niesen, Khukri, Letsgoexploring, Lojikl, Lunch, Mange01, Martinp23, Melcombe, Michael Hardy, MisterSheik, NonDucor,
Ocatecir, Occawen, Osbornd, Oxymoron83, PAR, PBH, Paul Pogonyshev, Philip Trueman, Philtime, Phoxhat, Pichote, Pontus, Qwfp, Rgbcmy, Rhowell77, Ricardogpn, Rlendog, Rmaus,
Safdarmarwat, Sairvinexx, Schutz, Seriousme, Skunkboy74, SqueakBox, Sterrys, Stigin, Stpasha, Ta bu shi da yu, Techman224, The Siktath, Till Riffert, Tkinias, Tomi, Umpi, Unyoyega,
Urhixidur, User A1, Weialawaga, Wikomidia, Wile E. Heresiarch, ZeroOne, ^demon, ‫ןורי‬, 164 anonymous edits
Article Sources and Contributors 191

Logistic distribution  Source: http://en.wikipedia.org/w/index.php?oldid=377768181  Contributors: Abdel Hameed Nawar, Ahoerstemeier, Army1987, Betacommand, Br77rino, Bubba73,
Carbonate, Cazort, ChrisCork, Coachaxis, Dicklyon, Draco flavus, Eric Kvaalen, Fnielsen, Giftlite, Home Row Keysplurge, Hongooi, Iwaterpolo, Leonard G., LokiClock, MarkSweep,
Melcombe, Michael Hardy, Mwbaxter, PAR, Quicksilvre, Qwfp, Radon210, Rlendog, Shoessss, SimonP, Stpasha, Tmh, Tomi, Trevor.tombe, 32 anonymous edits

Normal distribution  Source: http://en.wikipedia.org/w/index.php?oldid=384847890  Contributors: 0, 119, 194.203.111.xxx, 213.253.39.xxx, 5:40, A. Pichler, A.M.R., AaronSw, Abecedare,
Abtweed98, Alektzin, Ali Obeid, AllanBz, Alpharigel, Amanjain, AndrewHowse, Anna Lincoln, Appoose, Aude, Aurimus, Awickert, AxelBoldt, Aydee, Aylex, Baccyak4H, Beetstra,
BenFrantzDale, Bhockey10, Bidabadi, Bluemaster, Bo Jacoby, Boreas231, Boxplot, Br43402, Brock, Bryan Derksen, Bsilverthorn, Btyner, Bubba73, Burn, CBM, CRGreathouse, Calvin 1998,
Can't sleep, clown will eat me, CapitalR, Cburnett, Cenarium, Charles Matthews, Charles Wolf, Chill doubt, Chris53516, ChrisHodgesUK, Christopher Parham, Ciphergoth, Coffee2theorems,
ComputerPsych, Conversion script, Coolhandscot, Coppertwig, Coubure, Courcelles, Crescentnebula, Cruise, Cwkmail, Cybercobra, DFRussia, Damian Yerrick, DanSoper, Dannya222,
Darwinek, David Haslam, DavidCBryant, Den fjättrade ankan, Denis.arnaud, Dima373, Dj thegreat, Doood1, Drilnoth, Drostie, Dudzcom, Dzordzm, EOBarnett, Eclecticos, Ed Poor, Edin1,
EelkeSpaak, Egorre, Elektron, Elockid, Enochlau, Epbr123, Eric Kvaalen, Ericd, Evan Manning, Fang Aili, Fangz, Fergusq, Fgnievinski, Fibonacci, Fintor, Firelog, Fledylids, Fnielsen,
Fresheneesz, G716, GB fan, Galastril, Gandrusz, Gary King, Gauravm1312, Gauss, Geekinajeep, Gex999, GibboEFC, Giftlite, Gil Gamesh, Gioto, GordontheGorgon, Gperjim, Graft, Graham87,
Gunnar Larsson, Gzornenplatz, Gökhan, Habbie, Heimstern, Henrygb, HereToHelp, Heron, Hiihammuk, Hiiiiiiiiiiiiiiiiiiiii, Hu12, Hugo gasca aragon, Ian Pitchford, It Is Me Here, Ivan Štambuk,
Iwaterpolo, J heisenberg, JaGa, JahJah, JanSuchy, Jason.yosinski, Jeff560, Jim.belk, Jitse Niesen, Jmlk17, Joebeone, Jorgenumata, Joris Gillis, Josephus78, Josuechan, Jpk, Jpsauro, Junkinbomb,
KMcD, KP-Adhikari, Karl-Henner, Kaslanidi, Kay Dekker, Keilana, KipKnight, Kjtobo, Knutux, LOL, Lansey, Laurifer, Lee Daniel Crocker, Leon7, Lilac Soul, Livius3, Lixy, Loadmaster,
Lpele, Lscharen, Lself, MATThematical, MIT Trekkie, Manticore, MarkSweep, Markus Krötzsch, Marlasdad, Mateoee, Mcorazao, Mdebets, Mebden, Meelar, Melcombe, Message From Xenu,
Michael Hardy, Michael Zimmermann, Miguel, Millerdl, Mindmatrix, MisterSheik, Mkch, Mm 202, Morqueozwald, Mr Minchin, Mr. okinawa, MrOllie, MrZeebo, Mundhenk, Mwtoews,
Mysteronald, Naddy, Nbarth, Nicholasink, Nicolas1981, Nilmerg, NoahDawg, Noe, Nolanbard, O18, Ohnoitsjamie, Ojigiri, Oleg Alexandrov, Oliphaunt, Olivier, Orderud, Ossiemanners,
Owenozier, PAR, PGScooter, Pablomme, Pabristow, Paclopes, Patrick, Paul August, Paulpeeling, Pcody, Pdumon, Personman, Petri Krohn, Pfeldman, Pgan002, Pinethicket, Piotrus, Plantsurfer,
Policron, Prodego, Prumpf, Ptrf, Qonnec, Quietbritishjim, Qwfp, R3m0t, RDBury, RHaworth, RSStockdale, Rabarberski, Rajah, Rajasekaran Deepak, Randomblue, Rbrwr, RexNL, Rich
Farmbrough, Richwales, Rjwilmsi, Rmrfstar, Robbyjo, Romanski, Ronz, RxS, Ryguasu, SGBailey, SJP, Saintrain, SamuelTheGhost, Samwb123, Sander123, Schmock, Schwnj, Scohoust,
Seidenstud, Seliopou, Seraphim, Sergey Suslov, SergioBruno66, Shabbychef, Shaww, Siddiganas, Sirex98, Snoyes, Somebody9973, Stan Lioubomoudrov, Stephenb, Stpasha, StradivariusTV,
Sullivan.t.j, SusanLarson, Sverdrup, Svick, Taxman, Tdunning, TeaDrinker, The Anome, The Tetrast, TheSeven, Thekilluminati, TimBentley, Tomeasy, Tomi, Tommy2010, Trewin, Tristanreid,
Trollderella, Troutinthemilk, Tryggvi bt, Tschwertner, Tstrobaugh, Unyoyega, Vakulgupta, Velocidex, Vhlafuente, Vijayarya, Vinodmp, Vrkaul, Waagh, Wakamex, Wavelength, Why Not A
Duck, Wile E. Heresiarch, Wilke, Will Thimbleby, Willking1979, Wissons, Wwoods, XJamRastafire, Yoshigev, Zero0000, Zhurov, Zrenneh, Zundark, Zvika, 588 anonymous edits

Pareto distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385983692  Contributors: A. B., Alexxandros, Am rods, Antandrus, Avraham, AxelBoldt, Beland, Boxplot, Bryan
Derksen, Btyner, Bubbleboys, Carrionluggage, ChemGardener, Clark Kent, Courcelles, Cyberyder, DaveApter, David Haslam, Doobliebop, Dreftymac, Edward, Enigmaman, Fenice, Fpoursafaei,
Giftlite, Gruzd, Henrygb, Heron, Ida Shaw, Iwaterpolo, JavOs, Jive Dadson, Joseph Solis in Australia, Lendu, LunaDeFerrari, Mack2, Mange01, MarkSweep, Mcld, Melchoir, Melcombe,
Michael Hardy, Mindmatrix, MisterSheik, Msghani, Nbarth, Noe, O18, Olaf, P3^1$Problems, PAR, PBH, Paintitblack ft, Paul Pogonyshev, PhysPhD, Qwfp, Reedy, Rock soup, Rror,
SanderSpek, Sergey Suslov, Shell Kinney, Shomoita, Stpasha, Tatrgel, The Anome, Tomi, User A1, Vivacissamamente, Vyznev Xnebara, 117 anonymous edits

Student's t-distribution  Source: http://en.wikipedia.org/w/index.php?oldid=386074519  Contributors: 3mta3, A bit iffy, A.M.R., Addone, Afluent Rider, Albmont, AlexAlex, Alvin-cs,
Arsenikk, Arthur Rubin, Asperal, Avraham, AxelBoldt, B k, Beetstra, Benwing, Bless sins, Bobo192, BradBeattie, Bryan Derksen, Btyner, CBM, Cburnett, Chris53516, Chriscf, Classical
geographer, Coppertwig, Count Iblis, Crouchy7, Daige, DanSoper, Danko Georgiev, Daveswahl, Dchristle, Ddxc, Dejo, Dkf11, Dmcalist, Dmcg026, Duncharris, EPadmirateur, EdJohnston, Eric
Kvaalen, Ethan, F.morett, Finnancier, Fnielsen, Freerow@gmail.com, Furrykef, G716, Gabrielhanzon, Giftlite, Gperjim, Hadleywickham, Hemanshu, Hirak 99, Icairns, Ichbin-dcw, Ichoran,
Ilhanli, Iwaterpolo, JMiall, Jitse Niesen, Johnson Lau, Kiefer.Wolfowitz, Kotar, Kroffe, Kummi, Kyosuke Aoki, Lifeartist, Linas, Lvzon, MATThematical, Madcoverboy, Maelgwn, MarkSweep,
Mdebets, Melcombe, Michael C Price, Michael Hardy, Mig267, Millerdl, MisterSheik, MrOllie, Muzzamo, Nbarth, Ngwt, O18, Ocorrigan, Oliphaunt, PBH, Pegasus1457, Petter Strandmark, Phb,
Piotrus, Pmanderson, Quietbritishjim, Qwfp, R'n'B, R.e.b., Rich Farmbrough, Rjwilmsi, Robert Ham, Robinh, Salgueiro, Sam Derbyshire, Sander123, Secretlondon, Seglea, Serdagger, Sgb 85,
Shaww, Shoefly, Skbkekas, Sonett72, Sougandh, Sprocketoreo, Srbislav Nesic, Stasyuha, Steve8675309, TJ0513, Techman224, The Anome, Theodork, Thermochap, ThorinMuglindir, Tjfarrar,
Tolstoy the Little Black Cat, TomCerul, Tomi, Tutor dave, Uncle G, Unknown, User A1, Valravn, Wastle, Wikid77, Wile E. Heresiarch, Xenonice, ZantTrang, 250 anonymous edits

Uniform distribution (continuous)  Source: http://en.wikipedia.org/w/index.php?oldid=367151161  Contributors: A.M.R., Abdullah Chougle, Aegis Maelstrom, Albmont, AlekseyP, Algebraist,
Amatulic, ArnoldReinhold, B k, Baccyak4H, Brianga, Brumski, Btyner, Capricorn42, Cburnett, Ceancata, DaBler, DixonD, Euchiasmus, Fasten, FilipeS, Gala.martin, Gareth Owen, Giftlite,
Gilliam, Gritzko, Henrygb, Iwaterpolo, Jamelan, Jitse Niesen, Melcombe, Michael Hardy, MisterSheik, Nbarth, Nsaa, Oleg Alexandrov, Ossska, PAR, Qwfp, Ray Chason, Robbyjo, Sl, Stpasha,
Stwalkerster, Tpb, User A1, Warriorman21, Wikomidia, Zundark, 70 anonymous edits

Weibull distribution  Source: http://en.wikipedia.org/w/index.php?oldid=384705100  Contributors: Agriculture, Alfpooh, Argyriou, Avraham, AxelBoldt, Bender235, Bryan Derksen, Btyner,
Calimo, Cburnett, Corecode, Corfuman, Craigy144, Darrel francis, David Haslam, Dhatfield, Diegotorquemada, Dmh, Doradus, Eliezg, Emilpohl, Felipehsantos, Gausseliminering, Gcm, Giftlite,
Gobeirne, GuidoGer, Iwaterpolo, J6w5, Janlo, Jason A Johnson, Jfcorbett, Joanmg, KenT, Kghose, Lachambre, LachlanA, MH, Mack2, Mebden, Melcombe, Michael Hardy, MisterSheik, Noodle
snacks, O18, Olaf, Oznickr, PAR, Pleitch, Policron, Prof. Frink, Qwfp, RekishiEJ, Robertmbaldwin, Saad31, Sam Blacketer, Samikrc, Sandeep4tech, Slawekb, Smalljim, Stern, Sławomir Biały,
TDogg310, Tassedethe, Tom harrison, Tomi, Uppland, WalNi, Wiki5d, Yanyanjun, Zundark, 122 anonymous edits

Bernoulli distribution  Source: http://en.wikipedia.org/w/index.php?oldid=375701945  Contributors: Albmont, AlekseyP, Aquae, Aziz1005, Bando26, Bgeelhoed, Bryan Derksen, Btyner,
Camkego, Cburnett, Charles Matthews, Complex01, El C, Eric Kvaalen, FilipeS, Flatland1, Giftlite, ILikeHowMuch, Iwaterpolo, Jitse Niesen, Jpk, Kyng, Lilac Soul, MarkSweep, Melcombe,
Michael Hardy, Miguel, MrOllie, Olivier, Ozob, PAR, Pabristow, Poor Yorick, Qwfp, RDBury, Rdsmith4, TakuyaMurata, Tomash, Tomi, Typofier, Urhixidur, Weialawaga, Whkoh, Wikid77,
Wtanaka, Zven, 39 anonymous edits

Beta-binomial distribution  Source: http://en.wikipedia.org/w/index.php?oldid=387509672  Contributors: Auntof6, Baccyak4H, Benwing, Charlesmartin14, Domminico, Giftlite, GoingBatty,
Massbless, Melcombe, Michael Hardy, Nschuma, PigFlu Oink, Qwfp, Rjwilmsi, Thouis.r.jones, Thtanner, Tomixdf, Willy.feng, 27 anonymous edits

Binomial distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385210753  Contributors: -- April, Aarond10, AchatesAVC, AdamRetchless, Ahoerstemeier, Ajs072,
Alexb@cut-the-knot.com, Alexius08, Atemperman, Atlant, AxelBoldt, Ayla, BPets, Baccyak4H, BenFrantzDale, Bill Malloy, Blue520, Br43402, Bryan Derksen, Btyner, Can't sleep, clown will
eat me, Cburnett, Cdang, Cflm001, Charles Matthews, Conversion script, Coppertwig, Crackerbelly, David Martland, DavidFHoughton, Daytona2, Deville, Dick Beldin, Eesnyder, Elipongo, Eric
Kvaalen, Falk Lieder, Fisherjs, G716, Gary King, Gauravm1312, Gauss, Gerald Tros, Giftlite, GorillaWarfare, Gperjim, Graham87, Hede2000, Henrygb, Hirak 99, Ian.Shannon, Ilmari Karonen,
Intelligentsium, Iwaterpolo, J04n, JB82, JEH, Janlo, Johnstjohn, Kakofonous, Knutux, Koczy, LOL, Larry_Sanger, LiDaobing, Linas, Logan, MER-C, ML5, MSGJ, Madkaugh, MarkSweep,
Marvinrulesmars, Materialscientist, Mboverload, McKay, Meisterkoch, Melcombe, Michael Hardy, MichaelGensheimer, Miguel, MisterSheik, Mmustafa, Moseschinyama, Mr Ape, MrOllie,
Musiphil, N6ne, NatusRoma, Nbarth, Neshatian, New Thought, Nguyenngaviet, Nschuma, Oleg Alexandrov, PAR, Paul August, Ph.eyes, PhotoBox, Phr, Pleasantville, Postrach, PsyberS, Pt,
Pufferfish101, Qonnec, Quietbritishjim, Qwertyus, Qwfp, Redtryfan77, Rgclegg, Rich Farmbrough, Rjmorris, Rlendog, Ruber chiken, Seglea, Smachet, SoSaysChappy, Spellcast, Stebulus,
Steven J. Anderson, Stigin, Stpasha, Supergroupiejoy, TakuyaMurata, Tayste, The Thing That Should Not Be, Tim1357, Timwi, Tomi, VectorPosse, Wikid77, WillKitch, Xiao Fei, Youandme,
ZantTrang, Zmoboros, 327 anonymous edits

Uniform distribution (discrete)  Source: http://en.wikipedia.org/w/index.php?oldid=378602647  Contributors: Alansohn, Alstublieft, Bob.warfield, Btyner, DaBler, Dec1707, DixonD,
Duoduoduo, Fangz, Fasten, FilipeS, Furby100, Giftlite, Gvstorm, Henrygb, Iwaterpolo, Jamelan, Klausness, LimoWreck, Melcombe, Michael Hardy, Mike74dk, Nbarth, O18, P64, PAR,
Postrach, Qwfp, Stannered, Taylorluker, The Wordsmith, User A1, 53 anonymous edits

Geometric distribution  Source: http://en.wikipedia.org/w/index.php?oldid=374173415  Contributors: AdamSmithee, Alexf, Apocralyptic, Bjcairns, Bo Jacoby, Bryan Derksen, Btyner, Calbaer,
Capricorn42, Cburnett, Classicalecon, Count ludwig, Damian Yerrick, Deineka, Digfarenough, El C, Eraserhead1, Felipehsantos, Gauss, Giftlite, Gogobera, Gsimard, Gökhan, Iwaterpolo,
Juergik, K.F., LOL, MarkSweep, MathKnight, Mav, Michael Hardy, MichaelRutter, Mikez, Mr.gondolier, NeonMerlin, Nov ialiste, PhotoBox, Qwfp, Ricklethickets, Rumping, Ryguasu,
Serdagger, Skbkekas, Squizzz, Steve8675309, SyedAshrafulla, TakuyaMurata, Tomi, Wafulz, Wikid77, Wrogiest, Wtruttschel, Youandme, 86 anonymous edits

Hypergeometric distribution  Source: http://en.wikipedia.org/w/index.php?oldid=385036428  Contributors: Alexius08, Arnold90, Baccyak4H, Bo Jacoby, Booyabazooka, Bryan Derksen,
Btyner, Burn, Cburnett, ChevyC, Commander Keane, David Shay, DavidLDill, Drcrnc, Eidolon232, El C, Eug, FedeLebron, Felipehsantos, Gauss, Giftlite, Herr blaschke, It Is Me Here,
Iwaterpolo, Jack joff, Johnlv12, Josh Cherry, Kamrik, Kingboyk, LOL, Linas, MSGJ, MaxEnt, Maximilianh, Melcombe, Michael Hardy, Nbarth, Nerdmaster, Ott2, PAR, PBH, Peteraandrews,
Pleasantville, Pol098, Qwfp, Randomactsofkindness2, Reb42, Rgclegg, Schutz, Screech1941, SkatingNerd, TakuyaMurata, Tomi, User A1, Veryhuman, Wtmitchell, Wtruttschel, Yvswan,
Zigger, ‫کشرز‬, 144 anonymous edits

Negative binomial distribution  Source: http://en.wikipedia.org/w/index.php?oldid=369581392  Contributors: Alexius08, Ascánder, Asymmetric, AxelBoldt, Bo Jacoby, Bryan Derksen, Btyner,
Burn, CALR, Cburnett, Charles Matthews, Chocrates, Damian Yerrick, Dcljr, DutchCanadian, Econstatgeek, Eggstone, Evra83, Facorread, Felipehsantos, Formivore, Gabbe, Gauss, Giftlite,
Henrygb, Iowawindow, Iwaterpolo, Jahredtobin, Keltus, Kevinhsun, Linas, Ludovic89, MarkSweep, McKay, Melcombe, Michael Hardy, Moldi, Nov ialiste, O18, Odysseuscalypso,
Oxymoron83, Phantomofthesea, Pmokeefe, Qwfp, Rar, Rje, Rumping, Salgueiro, Sapphic, Shreevatsa, Sleempaster21229, Sleepmaster21229, Statone, Steve8675309, Stpasha, TGS, Taraborn,
Tomi, Trevor.maynard, User A1, Waltpohl, Wikid77, Wile E. Heresiarch, Zvika, 122 anonymous edits
Article Sources and Contributors 192

Multinomial distribution  Source: http://en.wikipedia.org/w/index.php?oldid=387210294  Contributors: A5, Albmont, Baccyak4H, Benwing, Btyner, CaAl, Charles Matthews, ChevyC,
Dysprosia, Giftlite, Gjnyasa, Icairns, Iwaterpolo, J04n, Jamelan, Jamie King, Karlpearson, Killerandy, Linas, McKay, Mebden, Melcombe, Michael Hardy, MisterSheik, Nbarth, O18, Qwfp,
Robinh, Sohale, Squidonius, Stephan sand, Steve8675309, Tomi, Tomixdf, Wolfman, Zvika, 33 anonymous edits

Multivariate normal distribution  Source: http://en.wikipedia.org/w/index.php?oldid=387881005  Contributors: Alanb, Anthony5429, Arvinder.virk, AussieLegend, AxelBoldt, BenFrantzDale,
BernardH, Breno, Bryan Derksen, Btyner, Cburnett, Cfp, Chromaticity, Ciphergoth, Coffee2theorems, Colin Rowat, Delirium, Delldot, Derfugu, Giftlite, Hongooi, HyDeckar, J heisenberg,
Jondude11, Jorgenumata, Josuechan, KHamsun, Kaal, KipKnight, KrodMandooon, KurtSchwitters, Lambiam, Lockeownzj00, MER-C, MarkSweep, Mauryaan, MaxSem, Mcld, Mct mht, Mdf,
Mebden, Meduz, Melcombe, Michael Hardy, Miguel, Mjdslob, Moriel, Mrwojo, Nabla, O18, Ogo, Oli Filth, Omrit, Opabinia regalis, Orderud, Paul August, Peni, PhysPhD, Picapica, Pycoucou,
Quantling, Qwfp, Riancon, RickK, Rjwilmsi, Robinh, Rumping, SebastianHelm, Selket, SgtThroat, Steve8675309, Stpasha, Strashny, Tabletop, TedPavlic, Tommyjs, Ulner, Waldir, Wikomidia,
Winterstein, Yoderj, Zelda, Zero0000, Zvika, 146 anonymous edits

Wishart distribution  Source: http://en.wikipedia.org/w/index.php?oldid=387694111  Contributors: 3mta3, Aetheling, Aleenf1, AtroX Worf, Baccyak4H, Benwing, Bryan Derksen, Btyner,
David Eppstein, Deacon of Pndapetzim, Dean P Foster, Entropeneur, Erki der Loony, Gammalgubbe, Giftlite, Ixfd64, Joriki, Jrennie, Kurtitski, Lockeownzj00, MDSchneider, Melcombe, Michael
Hardy, P omega sigma, P.wirapati, Perturbationist, PhysPhD, Qwfp, Robbyjo, Robinh, Ryker, Shae, Srbauer, TNeloms, Tomi, WhiteHatLurker, Zvika, 42 anonymous edits
Image Sources, Licenses and Contributors 193

Image Sources, Licenses and Contributors


File:Standard deviation diagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_diagram.svg  License: Public Domain  Contributors: Chesnok, Juiced lemon,
Krinkle, Manuelt15, Mwtoews, Petter Strandmark, Revolus, Tom.Reding, Wknight94, 17 anonymous edits
Image:Beta distribution pdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Beta_distribution_pdf.png  License: GNU General Public License  Contributors: Cburnett, It Is Me
Here, LeaW, MarkSweep, WikipediaMaster, 1 anonymous edits
Image:Beta distribution cdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Beta_distribution_cdf.png  License: GNU General Public License  Contributors: MarkSweep,
WikipediaMaster, 1 anonymous edits
Image:Burr pdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Burr_pdf.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:Selket
Image:Burr cdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Burr_cdf.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: User:Selket
Image:cauchy_pdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Cauchy_pdf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
Image:cauchy_cdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Cauchy_cdf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
File:Chi-square distributionPDF-English.png  Source: http://en.wikipedia.org/w/index.php?title=File:Chi-square_distributionPDF-English.png  License: Public Domain  Contributors:
User:Mikael Häggström
File:chi-square distributionCDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Chi-square_distributionCDF.png  License: Public Domain  Contributors: EugeneZelenko, PAR,
WikipediaMaster
Image:Dirichlet distributions.png  Source: http://en.wikipedia.org/w/index.php?title=File:Dirichlet_distributions.png  License: Public Domain  Contributors: Bender235, Euku, Kilom691, Mdd,
Timeshifter
Image:Dirichlet_example.png  Source: http://en.wikipedia.org/w/index.php?title=File:Dirichlet_example.png  License: Public Domain  Contributors: User:Mitch3
Image:F_distributionPDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:F_distributionPDF.png  License: GNU Free Documentation License  Contributors: Lovibond, Olaf
Image:F_distributionCDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:F_distributionCDF.png  License: GNU Free Documentation License  Contributors: Olaf
Image:Gamma distribution pdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Gamma_distribution_pdf.svg  License: GNU General Public License  Contributors: User:Autopilot
Image:Gamma distribution cdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Gamma_distribution_cdf.svg  License: GNU General Public License  Contributors: User:Autopilot
Image:Gamma-PDF-3D.png  Source: http://en.wikipedia.org/w/index.php?title=File:Gamma-PDF-3D.png  License: Creative Commons Attribution-Sharealike 3.0  Contributors:
User:Mundhenk
Image:Gamma-KL-3D.png  Source: http://en.wikipedia.org/w/index.php?title=File:Gamma-KL-3D.png  License: Creative Commons Attribution-Sharealike 3.0  Contributors: User:Mundhenk
Image:exponential pdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Exponential_pdf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
Image:exponential cdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Exponential_cdf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
Image:KumaraswamyT pdf.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:KumaraswamyT_pdf.jpg  License: Public Domain  Contributors: Lovibond, MarkSweep, PAR, Ponnu,
Ricky81682
Image:Kumaraswamy cdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Kumaraswamy_cdf.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors:
User:Anarkman
Image:PDF invGauss.png  Source: http://en.wikipedia.org/w/index.php?title=File:PDF_invGauss.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: Thomas Steiner
Image:Laplace distribution pdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Laplace_distribution_pdf.png  License: GNU General Public License  Contributors: It Is Me Here,
MarkSweep
Image:Laplace distribution cdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Laplace_distribution_cdf.png  License: GNU General Public License  Contributors: Bender235,
MarkSweep
Image:Levy0 distributionPDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Levy0_distributionPDF.png  License: Public Domain  Contributors: PAR, Tano4595, 1 anonymous
edits
Image:Levy0 distributionCDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Levy0_distributionCDF.png  License: Public Domain  Contributors: PAR, Tano4595, 1 anonymous
edits
Image:Levy0 LdistributionPDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Levy0_LdistributionPDF.png  License: Public Domain  Contributors: PAR, Tano4595, 1
anonymous edits
Image:Loglogisticpdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Loglogisticpdf.svg  License: GNU Free Documentation License  Contributors: Qwfp (talk) Original uploader
was Qwfp at en.wikipedia
Image:Loglogisticcdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Loglogisticcdf.svg  License: GNU Free Documentation License  Contributors: Qwfp (talk) Original uploader
was Qwfp at en.wikipedia
Image:Loglogistichaz.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Loglogistichaz.svg  License: GNU Free Documentation License  Contributors: Qwfp (talk) Original uploader
was Qwfp at en.wikipedia
Image:Shiftedloglogisticpdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Shiftedloglogisticpdf.svg  License: Creative Commons Attribution-Sharealike 3.0  Contributors:
User:Qwfp
Image:Shiftedloglogisticcdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Shiftedloglogisticcdf.svg  License: Creative Commons Attribution-Sharealike 3.0  Contributors:
User:Qwfp
Image:Lognormal distribution PDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Lognormal_distribution_PDF.svg  License: GNU Free Documentation License  Contributors:
User:Autopilot, User:Par
Image:Lognormal distribution CDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Lognormal_distribution_CDF.svg  License: GNU Free Documentation License  Contributors:
User:Autopilot, User:PAR
Image:Logisticpdfunction.png  Source: http://en.wikipedia.org/w/index.php?title=File:Logisticpdfunction.png  License: GNU Free Documentation License  Contributors: Anarkman,
Pfctdayelise, RandomP, WikipediaMaster
Image:Logistic cdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Logistic_cdf.png  License: Creative Commons Attribution-Sharealike 2.5  Contributors: Anarkman
Image:Normal Distribution PDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_Distribution_PDF.svg  License: Public Domain  Contributors: User:Inductiveload
Image:Normal Distribution CDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Normal_Distribution_CDF.svg  License: Public Domain  Contributors: User:Inductiveload
Image:standard deviation diagram.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Standard_deviation_diagram.svg  License: Public Domain  Contributors: Chesnok, Juiced lemon,
Krinkle, Manuelt15, Mwtoews, Petter Strandmark, Revolus, Tom.Reding, Wknight94, 17 anonymous edits
Image:De moivre-laplace.gif  Source: http://en.wikipedia.org/w/index.php?title=File:De_moivre-laplace.gif  License: Public Domain  Contributors: User:Stpasha
Image:QHarmonicOscillator.png  Source: http://en.wikipedia.org/w/index.php?title=File:QHarmonicOscillator.png  License: GNU Free Documentation License  Contributors: Inductiveload,
Maksim, Pieter Kuiper
Image:Fisher iris versicolor sepalwidth.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Fisher_iris_versicolor_sepalwidth.svg  License: Creative Commons Attribution-Sharealike
3.0  Contributors: User:Pbroks13
Image:Planche de Galton.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Planche_de_Galton.jpg  License: Creative Commons Attribution-Sharealike 3.0  Contributors:
User:Antoinetav
Image:Carl Friedrich Gauss.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Carl_Friedrich_Gauss.jpg  License: unknown  Contributors: Bcrowell, Blösöf, Conscious, Gabor,
Joanjoc, Kaganer, Kilom691, Luestling, Mattes, Rovnet, Schaengel89, Ufudu, 4 anonymous edits
Image:Pierre-Simon Laplace.jpg  Source: http://en.wikipedia.org/w/index.php?title=File:Pierre-Simon_Laplace.jpg  License: unknown  Contributors: Ashill, Ecummenic, Elcobbola,
Gene.arboit, Jimmy44, Olivier2, 霧木諒二
Image Sources, Licenses and Contributors 194

Image:Pareto distributionPDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Pareto_distributionPDF.png  License: Public Domain  Contributors: EugeneZelenko, G.dallorto, It Is
Me Here, Juiced lemon, PAR, 1 anonymous edits
Image:Pareto distributionCDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Pareto_distributionCDF.png  License: Public Domain  Contributors: EugeneZelenko, G.dallorto,
Juiced lemon, PAR, 2 anonymous edits
Image:Pareto distributionLorenz.png  Source: http://en.wikipedia.org/w/index.php?title=File:Pareto_distributionLorenz.png  License: GNU Free Documentation License  Contributors:
G.dallorto, Grafite, Juiced lemon, Magister Mathematicae, PAR, 1 anonymous edits
Image:student t pdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Student_t_pdf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
Image:student t cdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Student_t_cdf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
Image:T distribution 1df.png  Source: http://en.wikipedia.org/w/index.php?title=File:T_distribution_1df.png  License: GNU Free Documentation License  Contributors: Juiced lemon, Maksim,
1 anonymous edits
Image:T distribution 2df.png  Source: http://en.wikipedia.org/w/index.php?title=File:T_distribution_2df.png  License: GNU Free Documentation License  Contributors: Juiced lemon, Maksim,
1 anonymous edits
Image:T distribution 3df.png  Source: http://en.wikipedia.org/w/index.php?title=File:T_distribution_3df.png  License: GNU Free Documentation License  Contributors: Juiced lemon, Maksim,
1 anonymous edits
Image:T distribution 5df.png  Source: http://en.wikipedia.org/w/index.php?title=File:T_distribution_5df.png  License: GNU Free Documentation License  Contributors: Juiced lemon, Maksim,
1 anonymous edits
Image:T distribution 10df.png  Source: http://en.wikipedia.org/w/index.php?title=File:T_distribution_10df.png  License: GNU Free Documentation License  Contributors: Juiced lemon,
Maksim, 1 anonymous edits
Image:T distribution 30df.png  Source: http://en.wikipedia.org/w/index.php?title=File:T_distribution_30df.png  License: GNU Free Documentation License  Contributors: Juiced lemon,
Maksim, 1 anonymous edits
image:Uniform distribution PDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Uniform_distribution_PDF.png  License: Public Domain  Contributors: EugeneZelenko, It Is Me
Here, PAR
image:Uniform distribution CDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:Uniform_distribution_CDF.png  License: Public Domain  Contributors: EugeneZelenko, PAR
Image:Weibull PDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Weibull_PDF.svg  License: GNU Free Documentation License  Contributors: User:Calimo
Image:Weibull CDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Weibull_CDF.svg  License: GNU Free Documentation License  Contributors: User:Calimo
Image:Beta-binomial distribution pmf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Beta-binomial_distribution_pmf.png  License: Creative Commons Attribution-Sharealike 3.0
 Contributors: User:Nschuma
Image:Beta-binomial cdf.png  Source: http://en.wikipedia.org/w/index.php?title=File:Beta-binomial_cdf.png  License: GNU Free Documentation License  Contributors: User:Nschuma
Image:Binomial distribution pmf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Binomial_distribution_pmf.svg  License: Public Domain  Contributors: User:Tayste
Image:Binomial distribution cdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Binomial_distribution_cdf.svg  License: Public Domain  Contributors: User:Tayste
Image:Binomial Distribution.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Binomial_Distribution.svg  License: Creative Commons Attribution-Sharealike 3.0  Contributors:
User:Cflm001
Image:DUniform_distribution_PDF.png  Source: http://en.wikipedia.org/w/index.php?title=File:DUniform_distribution_PDF.png  License: GNU Free Documentation License  Contributors:
EugeneZelenko, PAR, WikipediaMaster
Image:Dis_Uniform_distribution_CDF.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Dis_Uniform_distribution_CDF.svg  License: GNU General Public License  Contributors:
User:Stannered
Image:geometric_pmf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Geometric_pmf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
Image:geometric_cdf.svg  Source: http://en.wikipedia.org/w/index.php?title=File:Geometric_cdf.svg  License: Creative Commons Attribution 3.0  Contributors: User:Skbkekas
File:Negbinomial.gif  Source: http://en.wikipedia.org/w/index.php?title=File:Negbinomial.gif  License: Public Domain  Contributors: User:Stpasha
Image:GaussianScatterPCA.png  Source: http://en.wikipedia.org/w/index.php?title=File:GaussianScatterPCA.png  License: GNU Free Documentation License  Contributors:
User:BenFrantzDale
License 195

License
Creative Commons Attribution-Share Alike 3.0 Unported
http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

You might also like