0% found this document useful (0 votes)

6 views20 pages

Bayesian Inference Under Small Sample

This paper presents a Bayesian inference method tailored for small sample sizes, introducing a general noninformative prior that encompasses various classical priors. The proposed prior allows for analytical evaluations of Bayesian posteriors and predictors, and its effectiveness is demonstrated through a fatigue reliability problem and an aeroengine disk lifing application. The study emphasizes the importance of prior selection in Bayesian inference, particularly when sample sizes are limited, to avoid biased results.

Uploaded by

leidy rivera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views20 pages

Bayesian Inference Under Small Sample

Uploaded by

leidy rivera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

mathematics

Article
Bayesian Inference under Small Sample Sizes Using General
Noninformative Priors
Jingjing He 1 , Wei Wang 1 , Min Huang 1 , Shaohua Wang 2 and Xuefei Guan 3, *

1 School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China;
hejingjing@buaa.edu.cn (J.H.); w.wei@buaa.edu.cn (W.W.); buaahuang@163.com (M.H.);
2 China Aviation Power Plant Research Institute, Zhuzhou 412002, China; wangshaohua@buaa.edu.cn
3 Graduate School of China Academy of Engineering Physics, Beijing 100193, China
* Correspondence: xfguan@gscaep.ac.cn

Abstract: This paper proposes a Bayesian inference method for problems with small sample sizes. A
general type of noninformative prior is proposed to formulate the Bayesian posterior. It is shown that
this type of prior can represent a broad range of priors such as classical noninformative priors and
asymptotically locally invariant priors and can be derived as the limiting states of normal-inverse-
Gamma conjugate priors, allowing for analytical evaluations of Bayesian posteriors and predictors.
The performance of different noninformative priors under small sample sizes is compared using
the likelihood combining both fitting and prediction performances. Laplace approximation is used
to evaluate the likelihood. A realistic fatigue reliability problem was used to illustrate the method.
Following that, an actual aeroengine disk lifing application with two test samples is presented, and
the results are compared with the existing method.

Keywords: Bayesian inference; noninformative prior; Jeffreys’ prior; invariant

Citation: He, J.; Wang, W.; Huang,

M.; Wang, S.; Guan, X. Bayesian
Inference under Small Sample Sizes
1. Introduction
Using General Noninformative Priors.
Mathematics 2021, 9, 2810. https://
Sample sizes are often quite small in many engineering fields due to time, economic,
doi.org/10.3390/math9212810
and physical constraints, for example, life testing data of high-reliability mechanical compo-
nents, large and complex engineering systems, and so on. The influence of sample size on
Academic Editor: Francisco-José interpreting classical significance test results has been discussed in several studies [1–3]. The
Vázquez-Polo predictive models established based on scarce samples may highly depend on the chosen
parameter estimation method [4,5]. In those cases, the probabilistic approach is usually
Received: 6 September 2021 preferred over the deterministic approach [6–11]. The Bayes rule provides a consistent and
Accepted: 3 November 2021 rational mathematical device to incorporate relevant information and prior knowledge for
Published: 5 November 2021 probabilistic inference.
To motivate the discussion, consider an observable random variable X with a condi-
Publisher’s Note: MDPI stays neutral tional probability density function (PDF) (or simply density) of p( x |φ), where φ ∈ Φ is also
with regard to jurisdictional claims in a random variable. The inverse problem is to make inferences about φ given an observed
published maps and institutional affil- value x of X. The Bayesian approach to the solution is to use some density p(φ) over Φ to
iations. represent the prior information of φ. In this way, the prior knowledge of φ can be encoded
through the Bayes rule to obtain the posterior PDF of φ given x,

p ( φ | x ) ∝ p ( φ ) p ( x | φ ). (1)
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
The forward inference, e.g., the PDF of a certain variable or the probability of an event
This article is an open access article
involving φ, can be made on the basis of the posterior PDF. Using the method of Markov
distributed under the terms and chain Monte Carlo (MCMC), samples can directly be drawnR from the posterior distribution
conditions of the Creative Commons without knowing the normalizing constant p( x ) = p(φ) p( x |φ)dφ in Equation (1). The
Attribution (CC BY) license (https:// Bayesian method has been successfully demonstrated in all important disciplines [12–19].
creativecommons.org/licenses/by/ The choice of p(φ) can have a great influence on the inference result. The proper choice
4.0/). of priors has been extensively discussed in the probability and statistics communities, and it

Mathematics 2021, 9, 2810. https://doi.org/10.3390/math9212810 https://www.mdpi.com/journal/mathematics

Mathematics 2021, 9, 2810 2 of 20

can never be overlooked as it is one of the fundamental pieces of Bayesian inference [20–26].
For one thing, the formal rule of constructing a prior regardless of the data and likelihood
is sought in the field of physics [9,27]. Jaynes and Bretthorst [28] argued that “problems of
inference are ill-posed until we recognize three essential things. (A) The prior probabilities
represent our prior information, and are to be determined, not by introspection but by logical
analysis of that information. (B) Since the final conclusions depend necessarily on both the
prior information and the data, it follows that, in formulating a problem, one must specify
the prior information to be used just as fully as one specifies the data. (C) Our goal is that
inferences are to be completely ‘objective’ in the sense that two persons with the same prior
information must assign the same prior probabilities.” (p. 373). For another, the choice of a
prior may highly depend on data and likelihood in practice. Gelman et al. [29] argued that
“a prior can in general only be interpreted in the context of the likelihood with which it will
be paired” (p. 12).
To ensure a consistent and objective inference, rules for constructing priors with minimal
subjective constraints are sought. Early work on the construction of such priors is based on the
“ignorance” over the parameter space using invariance techniques [27,30,31]. The fundamental
reasoning is that the priors should carry the same amount of information such that a change
of scale and/or a shift of location do not affect the inference results on those parameters. The
ignorant prior can systematically be derived using the concept of the transformation group.
Using different transformation groups, different priors can be obtained. For simplicity, such
priors are loosely referred to as noninformative priors. One of the most notable priors for
a scale parameter σ of a distribution is Jeffreys’ prior, i.e., p(σ ) ∝ 1/σ. Jeffreys’ prior can
be obtained using the tool of the transformation group under the condition that a change of
scale does not change that state of knowledge. A further extension of the noninformative
priors is “reference priors” [32,33]. The theoretical framework has been discussed in many
studies, including, but not limited to, [31,34]. Apart from the noninformative approach
to derive the priors, there are several less objective approaches to construct the priors.
Gelman [35] proposed the weakly informative priors based on the idea of conditional
conjugacy for hierarchical Bayesian models. The conditionally conjugate priors provide
some computational convenience as a Gibbs sampler can be used to draw samples from a
posterior distribution, and some parameters for inverse-Gamma distributions in Bayesian
applications are suggested. Simpson et al. [36] proposed a method to build priors. The
basic idea is to penalize the complexity induced by deviating from a simpler base model
using the Kullback–Leibler divergence as a metric [37].
For critical problems with small sample sizes, the choice of the prior in Bayesian
inference can have a great impact on the inference results, rendering an unreliable decision-
making. Despite a great deal of research, including the aforementioned, having been
conducted, the choice of an optimal prior is still nontrivial to make from the practical point
of view, and a systematical method to cope with such cases is rarely seen. Moreover, a
quantitative measure to evaluate the performance of a prior is equally important to identify
the optimal prior. This study develops a noninformative Bayesian approach to probabilistic
inference under small sample sizes. A general 1/σq -type of noninformative prior for the
location-scale family of distributions is proposed. This type of prior is further shown to
be the limiting state of the commonly used normal-inverse-Gamma conjugate prior, thus
allowing for an analytical evaluation of the posterior of the parameter. Given a linear model
with a Gaussian error variable, the analytical form of the posterior of the model prediction
can also be obtained. The informative or noninformative nature of the prior is centered
more on the concept of whether the prior encodes subjective information; therefore, it is
less relevant to the likelihood function (and data). The subjective information is the source
of inconsistency in the inference result as different practitioners can use different subjective
information in the form of assumptions, which in most cases may only be justified by
oneself. However, the prior has to be built with a certain state of knowledge. This state
of knowledge, or loosely called information, is not related to experience nor assumptions,
but is a result of a logical reasoning process [27]. The purpose of the study is not to argue
Mathematics 2021, 9, 2810 3 of 20

whether these assumptions are justified or not, nor to question the suitable assumptions
made during the inference. This study provides an alternative to existing methods in
such cases to avoid introducing subjectiveness into the Bayesian inference. In particular,
when the number of samples is small, the influence of the prior can be as important as the
likelihood (and data). In such cases, subjective information encoded in the prior can lead
to biased results.
The remainder of the paper is organized as follows. First, the Bayesian model with
noninformative priors is developed, in particular, a general 1/σq -type of noninformative
prior for the location-scale family of distributions is proposed for small sample problems.
The noninformative priors are further shown as the limiting states of the normal-inverse-
Gamma (NIG) conjugate priors. The closed-form expressions of the Bayesian posterior
and predictors using the proposed noninformative priors are obtained under a Gaussian
likelihood. Next, a performance measure considering both the fitting performance and
predictive performance is proposed using the concept of Bayes factors. Different priors are
treated as models in a Bayesian hypothesis testing context for comparisons. Following that,
the method is illustrated using two examples.

2. Bayesian Linearized Models with Noninformative Priors

To motivate the discussion, consider a general linear or linearized model:

y i = x i θ + ei , (2)

where θ is a k-dimensional column vector and ei are independent and identical distributed
random error variables. The distribution of ei determines the likelihood function or vice
versa. Without loss of generality, the distribution belongs to a location-scale family of
distributions. Furthermore, it is enough to use a single scale parameter to characterize the
distribution of e since any constant nonzero mean, no matter known or unknown, can be
grouped into θ. Denote the scale parameter of e as σ2 . A Bayesian model incorporates
both the prior information and the observation through Bayes’ rule. Using the matrix form
y = xθ + e, the Bayesian posterior of (θ, σ2 ) writes:

p(θ, σ2 |y) ∝ p(θ, σ2 ) p(y|θ, σ2 ). (3)

The common Gaussian error variable, i.e., ei ∼ N (0, σ2 ), corresponds to the fol-
lowing likelihood function, for n observations y = (y1 , y2 , ..., yn ) T and n input vector
x = (x1 , x2 , ..., xn ) T where xi , i = 1, ..., n, is a row vector of size k.

p(y|θ, σ2 ) = N(xθ, σ2"I) #

n
1 (4)
∝ σ−n exp − 2
2σ ∑ ( y i − x i θ )2 .
i =1

It should be noted that the error variable does not necessarily follow a Gaussian PDF,
and other types of error distributions can be used. For example, the extreme-value PDF for
the error corresponds to a Weibull likelihood for the log-transformed model prediction.

2.1. A General Form of Noninformative Priors—∝ 1/σq

A general ∝ 1/σq (q >= 0) form of priors is considered here. The classical Jeffreys’
prior and the related asymptotically locally invariant priors are introduced first for the
purpose of completeness.
Consider the PDF of a random variable x characterized by a parameter vector φ;
Jeffreys’ noninformative prior distribution of φ is proportional to the square root of the
determinant of the Fisher information matrix, e.g.,
q
p(φ) ∝ detI(φ), (5)
Mathematics 2021, 9, 2810 4 of 20

where det(·) is the determinant operator and I(·) is the Fisher information matrix. The key
feature of it is invariance under the monotone transformation of φ. This feature is achieved
by using the change of variables theorem. Denote the reparameterized variable or vector
as ψ; it can be shown that:
q
∂φi
p(ψ) = p(φ) det = detI( ), (6)
∂ψj

For a Gaussian likelihood with unknown parameters µ and σ2 ,

( x − µ)2

2 1
p( x |µ, σ ) = √ exp − . (7)
2πσ2 2σ2

Jeffreys’ prior for the joint parameter (µ, σ2 ) is:

q
p(µ, σ2 ) ∝ detI(µ, σ2 ), (8)

where I(µ, σ2 ) is:

∂ ln p( x |·) ∂ ln p( x |·) ∂ ln p( x |·) ∂ ln p( x |·)

 
 ∂µ ∂µ ∂µ ∂ ( σ2 ) 
I = E
 ∂ ln p( x |·) ∂ ln p( x |·)
 (9)
∂ ln p( x |·) ∂ ln p( x |·) 
∂µ ∂ ( σ2 ) ∂ ( σ2 ) ∂ ( σ2 )

and E(·) is the expectation operator. Using algebraic deduction and integration, I(µ, σ2 ) is
simplified to:  
1
 2 0
I = σ 2 . (10)
0
σ4
As a result, Jeffreys’ prior for (µ, σ2 ) is:

p(µ, σ2 ) ∝ 1/σ3 . (11)

It is noted that the distribution of interest is (µ, σ2 ), not (µ, σ ); therefore, the derivative
is taken with respective to σ2 as a whole instead of σ. In other word, the parameter space is
on σ2 , not σ. For example, the Jeffreys’ prior for σ2 , with a fixed value of µ, is:

1
p ( σ2 ) ∝ . (12)
σ2

The Jeffreys’ prior for (µ, σ ) and σ with a fixed µ are ∝ 1/σ2 and ∝ 1/σ, respectively.
It is shown in Appendix B that the Fisher information matrix is also the Hessian matrix
of the Kullback–Leibler (KL) distance of a deviated distribution with respect to the true
distribution evaluated at the true parameters. For exponential families of distributions,
the KL distance has analytical forms, allowing for the evaluation of the Fisher information
matrix without involving integrals in Equation (9).
The asymptotically locally invariant (ALI) prior is another type of prior that satisfies
the invariance under certain transformations. An ALI prior can be uniquely determined
using the following equation according to [31],

∂ ln p(φ)
= −E[ f 1 f 2 ]/E[ f 2 ], (13)
∂φ
Mathematics 2021, 9, 2810 5 of 20

where f 1 = ∂ ln p( x |·)/∂φ and f 2 = ∂2 ln p( x |·)/∂φ2 . For a Gaussian distribution with a

fixed mean and a random σ, the two terms are E[ f 1 f 2 ] = −6/σ3 and E[ f 2 ] = −2/σ2 . Solve:

∂ ln p(σ ) 3
=− (14)
∂σ σ
to obtain the ALI prior for σ:
1
p(σ) ∝ . (15)
σ3
Similarly, the ALI prior for σ2 is:

1
p ( σ2 ) ∝ . (16)
σ4
When both µ and σ are random, the joint ALI prior for (µ, σ ) is:

1
p(µ, σ ) ∝ . (17)
σ5
It is noticed that the Jeffreys’, ALI, and uniform priors are reproduced from ∝ 1/σq as
q takes different integer values. In the following, the derivation of a 1/σq prior from NIG
conjugates is shown.

2.2. Derivation of the 1/σq Priors as the Limiting States of NIG Conjugates under Gaussian Likelihood
In Bayesian models, for the given likelihood function, the posterior p(φ| x ) and the
prior p(φ) are called conjugate distributions if both are in the same family of distributions.
It can be considered as the prior can be reconditioned by encoding the evidence through the
likelihood; therefore, the evidence or data merely change the distribution parameters of the
prior and yield another distribution of the same type, but with a different set of parameters.
For a location-scale parameter vector (θ, σ2 ) used in linear or linearized Bayesian mod-
els with a Gaussian likelihood, the corresponding conjugate prior is the NIG distribution.
The closed-form posterior PDFs for the parameter and prediction are given in Appendix A.
Noninformative priors, including the Jeffreys’, ALI, and reference priors, for (θ, σ2 )
are mostly in the form of 1/σq , q ∈ 1, 2, .... The uniform prior can be seen as a special case
of 1/σq as q = 0. It is shown as follows that these noninformative priors can be obtained as
certain limiting states of NIG conjugates of (θ, σ2 ). For example, the NIG distribution with
parameters (α, β, Σ, µ) given by Equation (A3) can reduce to the Jeffreys’ prior:

1
p(θ, σ2 ) → (18)
σ2
as: 
 → −k/2
α
→ 0+


β
− . (19)

 Σ 1 →0
|µ| < ∞


Furthermore, a 1/σq -type of prior can all be derived as reduced NIG distributions.
By assigning the initial values for α, β, and Σ of the NIG distribution, different q values
are obtained. Table 1 presents priors with different q values as reduced NIG distributions
and the corresponding α, β, and Σ of the NIG distributions and α∗ , β∗ , and Σ∗ of the
resulting NIG posterior distributions. The posterior distribution of σ2 is an inverse-Gamma
distribution IG(α∗ , β∗ ), and the PDF is given by:
∗ α ∗ +1
β∗α β∗

2 1
p(σ |y) = exp − 2 . (20)
Γ(α∗ ) σ2 σ
Mathematics 2021, 9, 2810 6 of 20

Table 1. 1/σq -type of prior as limiting state cases of the NIG distribution and its NIG posteriors.

Prior (θ, σ 2 ) NIG(α, β, θ, σ 2 ) → Prior Posterior → NIG(α∗ , β∗ , θ, σ 2 )

α = −k/2 − 1 α∗ = α + n/2 = (n − k − 2)/2
flat β →0 β∗ = β + SSE/2 = SSE/2
−1 −1
Σ −1 →0 Σ∗ = Σ −1 + x T x = xT x
α = −k/2 − 1/2 α∗ = α + n/2 = (n − k − 1)/2
1
σ β →0 β∗ = β + SSE/2 = SSE/2
−1 −1
Σ −1 →0 Σ∗ = Σ −1 + x T x = xT x
α = −k/2 α∗ = α + n/2 = (n − k)/2
1 β →0 β∗ = β + SSE/2 = SSE/2
σ2 −1 −1
Σ −1 →0 Σ∗ = Σ −1 + x T x = xT x
α = −k/2 + 1/2 α∗ = α + n/2 = (n − k + 1)/2
1 β →0 β∗ = β + SSE/2 = SSE/2
σ3 −1 −1
Σ −1 →0 Σ∗ = Σ −1 + x T x = xT x
α = −k/2 + 1 α∗ = α + n/2 = (n − k + 2)/2
1 β →0 β∗ = β + SSE/2 = SSE/2
σ4 −1 −1
Σ −1 →0 Σ∗ = Σ −1 + x T x = xT x
α = −k/2 + 3/2 α∗ = α + n/2 = (n − k + 3)/2
1 β →0 β∗ = β + SSE/2 = SSE/2
σ5 −1 −1
Σ −1 →0 Σ∗ = Σ −1 + x T x = xT x

The posterior distribution of θ is obtained by integrating out σ2 from the joint posterior
distribution of Equation (A6) as,
Z
p(θ |y) = p(θ, σ2 |y)dσ2
Z α ∗ +1
1 1 ∗ 1 ∗ T ∗−1 ∗
∝ exp − β + ( θ − µ ) Σ ( θ − µ ) dσ2 (21)
σ2 σ2 2
−(2α∗ +k)/2
1
∝ 1 + ∗ (θ − µ∗ ) T Σ∗−1 (θ − µ∗ ) ,
2β

which is a (2α∗ ) degree-of-freedom multivariate t– distribution with a location vector of µ∗

β∗

and a shape matrix of α∗ Σ∗ . In particular, when the NIG prior for (θ, σ2 ) is reduced to
the Jeffreys’ prior 1/σ2 by Equation (19), the resulting posteriors of σ2 and θ are:

n − k SSE

p(σ2 |y) = IG , , (22)
2 2

and:
SSE T −1
p(θ |y) = MVTn−k (x T x)−1 (x T y), (x x) , (23)
n−k
respectively. The term SSE in Equation (22) is the sum of squared errors, given by:
n
SSE = ∑ ( y i − x i θ )2 . (24)
i

The prediction posterior in this case is:

T −1 T SSE T −1
p(ỹ|y) = MVTn−k x̃(x x) (x y), I + x̃(x x) x̃ . (25)
n−k
Mathematics 2021, 9, 2810 7 of 20

The advantage of treating the 1/σq -type of noninformative prior as reduced NIG
conjugates is that the Bayesian posteriors of the model prediction and parameters all
have analytical forms, allowing for efficient evaluations without resorting to the MCMC
techniques, as done in regular Bayesian analysis. It is worth noting that the analytical
results are obtained under the condition (or assumption) that the likelihood is a Gaussian.
This condition implies that the conjugate prior is an NIG. The values of the NIG parameters
are obtained by equating the noninformative prior to the NIG prior. In other words,
equating the two distributions for the purpose of mathematical convenience is the premise
of resolving those parameter values.

3. Assessment of Noninformative Priors

To evaluate the performance of priors, the Bayesian model assessment method using
efficient asymptotic approximations is proposed. The idea is to recast the assessment as
a model comparison problem. The participating models here are the posteriors obtained
with different priors. The comparison can then be made using Bayes’ factors in a Bayesian
hypothesis testing context.

3.1. Fitting Performance

The Bayes factor, on the basis of observed data y, evaluating the plausibility of two
different models, M1 and M2 , can be expressed as:

p(y|θ, σ2 , M1 ) p(θ, σ2 | M1 )dθdσ2

R
P(y| M1 )
B12 = = R (26)
P(y| M2 ) p(y|θ, σ2 , M2 ) p(θ, σ2 | M2 )dθdσ2

The comparison of two models can be made based on the ratio of the posterior
probabilities of the models:

P( M1 |y) P(y| M1 ) P( M1 ) P( M1 )
= = B12 · . (27)
P( M2 |y) P(y| M2 ) P( M2 ) P( M2 )

The ratio, when the prior probabilities of the models are equal, is reduced to the Bayes
factor of Equation (26). The assessment of the Bayes factors involves two integrals over
the parameter space of (θ, σ2 ). The integrands for models M1 and M2 are p(y|θ, σ2 , M1 ) ·
p(θ, σ2 | M1 ) and p(y|θ, σ2 , M2 ) · p(θ, σ2 | M2 ), respectively.
For general multidimensional integration, asymptotic approximation or simulation-
based estimation are two commonly used methods. The Laplace approximation method
evaluates the integral as a multivariate normal distribution. Consider the above integrand
term in Equation (26), i.e., p(y|θ, σ2 , M1 ) · p(θ, σ2 | M1 ). Drop the model symbol for sim-
plicity of the derivation, and denote (θ, σ2 ) as φ. The natural logarithm of the integrand,
p(y|φ) p(φ), i.e., p(y, φ), can be expressed using Taylor expansion around its mode φ0 as:

ln p(y, φ) = ln p(y, φ0 ) + (φ − φ0 )T ∇ ln p(y, φ0 )

1 h i
+ (φ − φ0 )T ∇2 ln p(y, φ0 ) (φ − φ0 ) (28)
2
+O (φ − φ0 )3 ,

where ∇ ln p(y, φ0 ) is the gradient of ln p(y, φ) evaluated at φ0 , ∇2 ln p(y, φ0 ) is the Hessian

matrix of ln p(y, φ) evaluated at φ0 , and O[·] are higher-order terms. When the higher-order
terms are negligible,

ln p(y, φ) ≈ ln p(y, φ0 ) + (φ − φ0 )T ∇ ln p(y, φ0 )

| {z }
(◦) (29)
1 h i
+ (φ − φ0 )T ∇2 ln p(y, φ0 ) (φ − φ0 ).
2
Mathematics 2021, 9, 2810 8 of 20

The term (◦) is zero at the mode of the distribution where the gradient is zero;
therefore, expanding ln p(y, φ) around φ0 eliminates the term (◦) and yields:

1 h i
ln p(y, φ) ≈ ln p(y, φ0 ) + (φ − φ0 ) T ∇2 ln p(y, φ0 ) (φ − φ0 ). (30)
2
Exponentiate the above equation to obtain:

0 1 h
0 T 2 0
i
0

p(y, φ) ≈ p(y, φ ) exp − φ − φ −∇ ln p(y, φ ) φ − φ . (31)
2

Realizing the first term of the above equation is a constant and the last term is the
variable part of a multivariate normal distribution with a mean vector of φ0 and a covariance
matrix Σ0 = [−∇2 ln p(y, φ0 )]−1 , the integration of p(y, φ) writes,

1
Z −1
P(y) = p(y, φ0 ) · exp − (φ − φ0 ) T Σ0 (φ − φ0 ) dφ
pΦ 2 (32)
= p(y, φ0 ) (2π )k+1 |Σ0 |.

Notice that θ is a k-dimensional vector, so φ = (θ, σ2 ) is a k + 1-dimensional vector. Using

the result of Equation (32), the integral associated with the jth model in Equation (26) is:
q
P(y| M j ) ≈ p y, (θ, σ2 )0 | M j (2π )k+1 |Λ0j |. (33)

where (θ, σ2 )0 is the mode of the distribution and Λ0j is the covariance matrix, i.e., the
inverse of the negative Hessian matrix of ln p(y, θ, σ2 | M j ) evaluated at (θ, σ2 )0 . Proper
numerical derivative methods based on finite difference schemes can achieve reliable
results for the gradient and the Hessian in the following equation.
 h i
2 0 2


 ( θ, σ ) = arg max ∇ ln p ( y, θ, σ | M j )

 (θ,σ2 )
" # −1 , (34)
0

2 2
Λ = −∇ ln p(y, θ, σ | M j )




(θ,σ2 )0

Experience has shown that for a distribution with a single mode and approximately
symmetric, the Laplace approximation method can achieve accurate results for engineering
applications [38,39].
It should be noted that the 1/σq -type of noninformative prior is not a proper prior, i.e.,
the normalizing constant for 1/σq is not bounded for σ ∈ (0, +∞); therefore, it is necessary
to limit the support of 1/σq to a proper range and to have a finite normalizing constant.
Assume the effective range of σ is (σ− , σ+ ) for q ∈ (0, 1, ...); the normalizing constant is:


 ln(σ+1 ) − ln(σ− ) q=1
Z (σ) = 1 1− q 1− q
. (35)
 σ+ − σ− q 6= 1
1−q


Substitute p(θ, σ2 ) with the 1/( Z (σ )σq ) in Equation (3). The likelihood of Equation (33)
compares the fitting performance of different 1/σq -type of priors.

3.2. Predictive Performance

The predictive performance for a model is measured by the likelihood of the data that
are not used for model parameter estimation. The likelihood of the future measurement
data ỹ with the posterior PDF of p(θ, σ2 |y) writes:
Mathematics 2021, 9, 2810 9 of 20

Z
P(ỹ) = p(ỹ|θ, σ2 ) p(θ, σ2 |y)dθdσ2
1
Z (36)
= p(ỹ|θ, σ2 ) p(y|θ, σ2 ) p(θ, σ2 )dθdσ2 ,
P(y)
where P(y) is evaluated using Equation (33).
The comprehensive performance integrating both fitting performance and predictive
performance in terms of likelihood in the joint space of (y × ỹ) is:
Z
P(ỹ) P(y) = p(ỹ|θ, σ2 ) p(y|θ, σ2 ) p(θ, σ2 )dθdσ2 . (37)

Realize that Equations (32), (36), and (37) can all be evaluated using the method of
Laplace approximation, and the performance of different priors can quantitatively be
compared. It is noted that the likelihood is the product of the fitting component and
the prediction component, and the influence of the prior is incorporated in the fitting
component, i.e., P(y).

4. Application Examples
To illustrate the proposed Bayesian inference with noninformative priors, two exam-
ples are presented. The results were compared with the classical least squares approach.
The fitting and comprehensive performance of the 1/σq -type of priors were evaluated
using the method of Laplace approximation.

4.1. Low-Cycle Fatigue Reliability Assessment with Sparse Samples

The above numerical example reveals the advantage of using a noninformative prior
over the flat prior in probabilistic inference under small sample sizes. To examine its useful-
ness in realistic problems and further compare the performance of different noninformative
priors, a fatigue life prediction problem is presented.
The low-cycle fatigue testing data reported in ASTM E739-10 [40] were used and
shown in Table 2. The cyclic strain amplitudes (∆ε/2) were (∼0.016, ∼0.0068, ∼0.0016,
∼0.0005), and each level had two or three data points. The following log-linear model
was adopted,
ln N = a0 + a1 ln(∆ε/2), (38)
where a0 and a1 are model parameters that need to be identified. It should be noted that
other variants of the above equation can be used to include material plasticity, the mean
stress effect, etc. The discrepancy between the experimental data and the model prediction
can be modeled using an uncertain variable with zero mean and a standard deviation of σe .
To compare with the regular least squares method, a Gaussian likelihood was assumed,
and the Bayesian posterior is:
n/2
1 1
p( a0 , a1 , σe | N ) ∝ ·
σ q " σ2
2
# (39)
∑in=1 (ln Ni − a0 − a1 ln(∆ε i /2))
× exp − .
2σ2

Table 2. Low-cycle fatigue testing data. Source: [40].

i 1 2 3 4 5 6 7 8 9
∆ε/2 0.01636 0.01609 0.00675 0.00682 0.00179 0.00160 0.00165 0.00053 0.00054
N 168 200 1000 1180 4730 8035 5254 28,617 32,650
Mathematics 2021, 9, 2810 10 of 20

The exponent q takes the values of 0, 1, 2, ... to generate a flat prior and different
noninformative priors, and (∆ε i , Ni ), i = 1, ..., n is the ith experimental data points of the
total n data points used for parameter estimation.
Notice that when q = 0, the above equation reduces to a regular least squares format.
One data point (i = 6) was arbitrarily chosen for prediction performance evaluation, and
the rest eight data points were used to estimate the model parameters using Equation (39).
To compare the performance of 1/σq -type priors, q = 0, ...5 were used to represent the
regular least squares, Jeffreys’, and ALI priors. The likelihood of the fitting performance
and prediction performance was evaluated using the method of Laplace approximation,
and results are shown in Figure 1. The total likelihood of the fitting and prediction results
indicates that the Jeffreys’ prior 1/σ2 outperformed the others. The result associated with
q = 0 is the Bayesian version of the regular least squares estimation.

106

-1.2
104
-1.3

102
N

-1.4

-1.5 Exp.
100
Mean
-1.6 LS: POF = 1.0E-05
NB: POF = 1.0E-05
10-2
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 10-4 10-3 10-2 10-1

(a) (b)

Figure 1. (a) MCMC samples drawn from the noninformative Bayesian posterior with Jeffreys’ prior (q = 2) and (b) the
fatigue life results at POF = 10−5 obtained with the regular least squares (LS) estimator and noninformative Bayesian (NB).

To further demonstrate the difference between the regular least squares and noninfor-
mative Bayesian approach under this sample size, the following reliability problem was
considered. The fatigue life given a prescribed probability of failure (POF) was calculated
for the cyclic strain range (∆ε/2) between 10−4 and 10−1 . Given a cyclic strain range and a
POF, the fatigue life NPOF can be expressed as:

Pr( N > NPOF ) = 1 − POF. (40)

For the regular least squares estimator, it can be obtained using the t-statistics given
by Equation (A9) where γ/2 should be replaced with the prescribed POF because only the
lower one-sided bound is needed. For the Bayesian with Jeffreys’ prior, the one-side bound
can trivially be obtained using the POF-quantile of the prediction results evaluated using
MCMC samples of the posterior POF. Figure 2 presents the comparisons of the fatigue
life results where the obvious difference was observed. The results of the noninformative
Bayesian were larger than those of regular least squares, indicating a longer fatigue life for
the same risk level, i.e., 10−5 POF.

4.2. Aeroengine Turbine Disk Lifing and Failure Rate Assessment

The prescribed safe life is a critical parameter for airworthiness and service health
management of turbine disks in aeroengines. To determine the prescribed safe life for
turbine disks, life testing with a number of disks is performed using disk samples in a
rotating pit. The number of tested piece is usually less than five due to the cost and time
Mathematics 2021, 9, 2810 11 of 20

constraints; sometimes, only one or two test pieces are available. The determination of the
prescribed safe life must be carefully made to mitigate the risk. A general and standard
procedure described in [41] is centered on the idea that two source of uncertainty, namely
the sampling uncertainty and the life distribution uncertainty, should be incorporated. To
account for the sampling uncertainty, the 95% single-sided lower bound of the sample
mean is used as the mean of the life distribution. To account for the uncertainty of the
life distribution itself, no more than one in seven-hundred fifty service disks is allowed
to develop an engineering crack. The total safety margin is a 1/750 cumulative failure
probability to a 95% confidence.

5 12
4 10
3 8
2 6
1 4
0 2
0 1 2 3 4 5 0 1 2 3 4 5

(a) (b)

Figure 2. (a) The fitting performance P( N ) evaluated with different 1/σq priors and (b) the prediction performance
P( Ñ ) P( N ) evaluated with different 1/σq priors. One data point Ñ is considered for the prediction.

The prescribed safe life is assumed to be a log-normal random variable, and the log-life
is a normally distributed variable. Denote the mean and standard deviation of the normal
distribution of the log-life as ln Nµ and σ. The 95% single-sided lower bound of the sample
mean can be expressed as:
σ
ln Nµ95% = ln Nµ − λα √ , (41)
n
where Nµ is the geometric mean of the life, Nµ,95% is the single-sided lower bound of the
geometric mean, λα = Φ−1 (1 − α), and Φ−1 (·) is the inverse cumulative density function
(CDF) of the standard normal variable when the standard deviation σ is known. For the
95% single-sided lower bound, zα = Φ−1 (0.95) = 1.645. When the standard deviation is
unknown, the estimate of the standard deviation based on the samples should be used.
In such cases, the value of λα is obtained using the Student distribution with a degree of
freedom (DOF) of n − 1, i.e., λα = t−1 (1 − α, n − 1). The function t−1 (·, n − 1) is the inverse
CDF of the Student distribution with a DOF of n − 1. The 1/750 failure probability log-life
corresponds to the −3σ shift from the mean of the log-life distribution. When using ln Nµ95%
as the mean of the log-life distribution, the 1/750 failure probability life Nµ95%
−3σ writes:

ln Nµ95% 95%
−3σ = ln Nµ − 3σ. (42)

Combine Equations (41) and (42) to have:

σ
ln Nµ − ln Nµ95%
−3σ = λα √ + 3σ, (43)
n

The safety factor that ensures a 1/750 failure to a 95% confidence can be expressed by
transforming Equation (43) into the linear scale as:

Nµ λα
S = 95% = exp √ + 3 σ (44)
Nµ−3σ n
Mathematics 2021, 9, 2810 12 of 20

It is seen that the prescribed safe life depends on the number of tested samples n
and the standard deviation of the distribution. The deterministic prescribed safe life Ar is
finally obtained as:
Ar = Nµ /S. (45)
The cumulative burst probability when the disk is used y cycles can be expressed as:
" #
( t − x 0 )2
Z y
0 1
Fb (y| x ) = √ exp − dt, (46)
0 2πσb 2σb2

where x 0 is the log-transformed geometric mean of the samples and y is the log-transformed
number of cycles, i.e., y = ln N. The failure rate in terms of the failures per cycle can be
expressed as:
∂F (y| x 0 )/∂y f b (y| x 0 )
h(y| x 0 ) = b = , (47)
1 − Fb (y| x 0 ) 1 − Fb (y| x 0 )
where f b (y| x 0 ) is the burst PDF:
" #
0 1 ( y − x 0 )2
f b (y| x ) = √ exp − . (48)
2πσb 2σb2

The sampling uncertainty in the log-transformed geometric mean x 0 can be quantified

using the sampling PDF. The sample mean x 0 distributes according to:

( x 0 − ln Nµ )2

 √1

 exp − σ known
f s (x0 ) = 2πσs 2σs2 , (49)
s
 ln Nµ + √ · t(n − 1) σ unknown


n

where s is the standard deviation of the sample, t(n − 1) is the standard Student distribution
with a DOF of n − 1, and σs = √σn .
The final failure rate can be obtained by marginalizing out the sampling error variable
x 0 as: Z
h(y) = h(y| x 0 ) f s ( x 0 )dx 0 . (50)

With the above discussion, turbine disk life testing results on two disks were obtained
as shown in Table 3.

Table 3. Spin rig testing results on two disks.

Disk No. No. of Cycles

1 27,000
2 21,700

It is seen that the key information is the standard deviation of the population. For
one thing, whether it is considered as a known quantity requires a justification, and its
actual value relies on empirical evidence and historical data. For another, when it needs
to be estimated from the samples, the proper method should be used due to its limited
sample size.

4.2.1. Existing Method

Previous disk life testing results showed that the scatter factor, defined as the ratio
of the maximum life (Nmax ) over the minimum life (Nmin ), is less than six. Furthermore,
Nmax and Nmin are considered as the +3σ and −3σ points of the normal distribution of
Mathematics 2021, 9, 2810 13 of 20

the log-life. With the above empirical evidence and assumption The empirical evidence of
Nmax /Nmin = 6, combined with the assumption that ln Nmax − ln Nmin = 6σ, yields:

ln Nmax − ln Nmin

1 Nmax ln 6
σ= = ln = . (51)
6 6 Nmin 6

The safety factor that ensures a 1/750 failure to a 95% confidence can be expressed, by
substituting Equation (51) into Equation (43) as,

Nµ 1 + 1.645
√
S= 95%
= 62 6 n (52)
Nµ−3σ

Given that assumption that the standard deviation of the log-life distribution is known,
the first equation of the right-hand side in Equation (49) is used. In addition, the term
1 − Fb (y| x 0 ) ≈ 1 for rare events. In such a case, the hazard rate function Equation (50)
reduces to the convolution of two normal PDFs, and the resulting hazard rate function is:
Z
h(y) = h(y| x 0 ) f s ( x 0 )dx 0 ≈ Norm(y, ln Nµ , σb2 + σs2 ). (53)

Note that y = ln N; the hazard rate in terms of failure per flight hour is found by using
the chain rule as,
1
h( H ) = · Norm(ln( βH ), ln Nµ , σb2 + σs2 ), (54)
H
where β is the exchange rate (cycles/hour) converting the cycle to equivalent engine flight
hours. Using the data in Table 3, the log-transformed geometric mean ln Nµ = 24,205,
the standard deviation of the log-life distribution is σ√ b = (ln 6) /6, and the standard
deviation of the sampling distribution is σs = (ln 6)/(6 2). Substitute these parameters
into Equation (45) to obtain the prescribed safe life as Ar = 6981 cycles.

4.2.2. The Proposed Noninformative Bayesian Method

Without relying on the empirical evidence and the assumption, the noninformative
Bayesian estimator for (µ, σ ) of the log-life distribution can be obtained by writing the
posterior of the likelihood of samples with a noninformative prior. For a demonstration,
the prior of ∝ 1/σ2 was used, and the posterior writes,
n/2
∑n (ln Ni − µ)2

1 1
p(µ, σ | D ) ∝ · · exp − i=1 , (55)
σ2 σ2 2σ2
where D denotes the event of observed life samples N1 , ..., Nn . To estimate µ and σ, 1 × 107
samples were drawn from Equation (55) using the MCMC method. The resulting histogram
of the samples is shown in Figure 3. The 95% single-sided confidence bound µ95% can be
estimated using the 0.05-quantile of the MCMC samples as 9.86565 or 19,259 in the linear
scale. Centering on µ95% = 9.86565, the value corresponding to a 1/750 cumulative failure
probability was found by subtracting µ95% by 3σ. The term σ was estimated using the
MCMC sample mean, and in this case, it was found as 0.1893. Consequently, after applying
the two safety margins, the prescribed safe life (in log scale) using the noninformative
Bayesian method was found as µ95% −3σ = 9.2976 or Ar = 10,911 cycles.
The failure rates calculated using the existing method and the proposed noninforma-
tive Bayesian method are compared in Figure 4. The failure rate per flight hour was based
on an exchange rate β = 2.5.
Based on the results shown in Figure 4, the usable flight hours obtained using the
noninformative Bayesian method were about 1000 h more than those obtained using the
existing method given the same risk constraint. For example, the flight hours corresponding
to a failure rate of 10−8 /h were 2734 and 1734, respectively, for the two methods. When
the failure rate was required to be 10−6 /h, the usable flight hours were 3817 and 2720,
Mathematics 2021, 9, 2810 14 of 20

respectively. It should be noted that the proposed noninformative Bayesian method does
not rely on empirical evidence and assumptions other than the log-normal distribution for
disk life.

0.8

0.6

0.4

0.2

0
9 9.5 10 10.5 11

Figure 3. Samples drawn from the posterior PDF and histograms.

Cycles
2500 5000 7500 10000 12500
-5
10
Failure rate (failure/hrs.)

-10
10

NB
Ref.
10-15
1000 2000 3000 4000 5000
Time (hrs.)
Figure 4. Comparisons of the failure rate results of the two methods. The solid line and the dashed
line are results from the noninformative Bayesian (NB) and existing method (Ref.), respectively.

5. Conclusions
The study developed a noninformative Bayesian inference method for small sample
problems. In particular, a 1/σq -type of prior was proposed to formulate the Bayesian pos-
terior. It was shown that this type of prior can represent the classical Jeffreys’ prior (Fisher
information), flat prior, and asymptotic locally invariant prior. More importantly, this type
of prior was derived as the limiting state of normal-inverse-Gamma (NIG) conjugate priors,
allowing for fast and efficient evaluations of the Bayesian posteriors and predictors in
closed-form expressions under a Gaussian likelihood. To illustrate the overall method and
compare the performance of noninformative priors, a realistic fatigue reliability problem
was discussed in detail. The combined fitting and prediction performance in terms of
Mathematics 2021, 9, 2810 15 of 20

the likelihood was used to evaluate different priors. It was observed that Jeffreys’ prior,
1/σ2 , yielded the maximum likelihood. The developed method was further applied to
an actual aeroengine disk lifing problem with two test samples, and the results on the
prescribed safe life and risk in terms of failure/hour were comparable with those obtained
using the existing method incorporating the empirical evidence and assumptions. It should
be emphasized that the study did not promote abandoning justified experience and/or
information in prior construction. It was argued in this study that in the absence of those
data or experience, the noninformative prior should be used to avoid introducing unjus-
tified information into the prior. The following conclusions were drawn based on the
current results:
• The 1/σq -type of prior can be used equivalently as the NIG conjugate priors at their
limiting states. The great features of conjugate priors, such as having analytical
posterior and prediction PDFs under a Gaussian likelihood, can be retained when
using noninformative priors for Bayesian linear regression analysis;
• For the 1/σq -type of noninformative prior, the classical Jeffreys’ prior 1/σ2 yielded an
optimal fitting and prediction performance in terms of the likelihood or Bayes factors.
When q = 0, the Bayesian estimator reduces to the regular least squares estimator.
The results of the two case studies showed the advantage of using the noninformative
Bayesian estimator with the prior of ∝ 1/σ2 over the regular least squares estimator
and other empirical methods under small sample sizes.

Author Contributions: Conceptualization, X.G.; Data curation, S.W.; Formal analysis, W.W. and
M.H.; Methodology, J.H., W.W. and X.G.; Resources, M.H. and S.W. All authors have read and agreed
to the published version of the manuscript.
Funding: The study was supported by National Natural Science Foundation of China, Nos. 11872088,
12088101, NSAF U1930403, CAEP CX20200035, and 2020-JCJQ-JJ-465. The support is gratefully acknowledged.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Acknowledgments: The study was supported by National Natural Science Foundation of China,
Nos. 11872088, 12088101, NSAF U1930403, CAEP CX20200035, and 2020-JCJQ-JJ-465. The support is
gratefully acknowledged.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Conjugate Prior, Posterior, and Prediction

Assume a general linear model with a parameter vector θ and a Gaussian likelihood
with a scale parameter σ2 , i.e., the modeling error ei = (yi − xi θ ) ∼ Norm(0, σ2 ).

Appendix A.1. Conjugate Prior

It is known that when the likelihood function belongs to the exponential family, a
conjugate prior exists and belongs also to the exponential family. To obtain the joint
conjugate using p(θ, σ2 ) = p(θ |σ2 ) p(σ2 ), it is necessary to look at the conjugate prior of σ2
first. For a Gaussian likelihood with an unknown variance σ2 and zero mean, the conjugate
prior for σ2 is an inverse Gamma distribution, i.e., p(σ2 ) ∼ IG(α, β), and is expressed as:
α +1
βα 1 β
p ( σ2 ) = · exp − 2 , (A1)
Γ(α) σ2 σ

where α > 0 and β > 0 are shape and scale parameters, respectively. The posterior
distribution of σ2 , given n data points, has the following new shape and scale parameters:
Mathematics 2021, 9, 2810 16 of 20

!
2
n ∑ n ( y − xi θ )
(α0 , β0 ) = α+ ,β+ i i
2 2
(A2)
n SSE
= α+ ,β+ ,
2 2
where SSE = ∑in (yi − xi θ )2 is the sum of squared errors. Consider the joint conjugate
of (θ, σ2 ), a widely adopted conjugate prior for Bayesian linear regression is the normal-
inverse-Gamma (NIG) prior for (θ, σ2 ), and it can be written as Equation (A3).

p(θ, σ2 ) = p(θ |σ2 ) p(σ2 ) = NIG(µθ , Σθ , α, β)

1 1 T −1
= Z · exp − 2 β + (θ − µθ ) Σθ (θ − µθ )
σ 2 (A3)

0 1 1 T −1
∝ Z · exp − 2 β + (θ − µθ ) Σθ (θ − µθ ) ,
σ 2

where: α+k/2+1
βα 1
Z= , (A4)
(2π )k/2 Γ(α) |Σθ | σ2
p

and: α+k/2+1
1
Z0 = . (A5)
σ2
Using the inverse gamma distribution for σ2 , the initial distribution parameter (α, β)
still needs to be determined. Unfortunately, there is no formal rule to do that. Gelman [35]
discussed the inverse-Gamma distribution IG(e, e) where e was set to a low value such as
1, 0.01, or 0.001. A difficulty of this prior is that e must be set to a proper value. Inferences
become very sensitive to e for cases where low values of σ2 are possible. Due to this reason,
the IG(e, e) family of priors for σ2 was not recommended by the author. It was further argued
that the resulting inference based on it for cases where σ2 is estimated near zero was the case
for which classical and Bayesian inferences differed the most [35]. Browne and Draper [42]
also mentioned this disadvantage and suggested using a uniform one U(0, 1/e) to reduce
this defect. However, choosing an appropriate value for e for U(0, 1/e) still requires
attention and understanding of the data. Indeed, the effect of the prior and its parameters
is highly subjected to the data and problem, and its sensitivity to Bayesian inferences based
on sparse data or extensive data can be dramatically different as the data dominants the
posterior due to the law of large numbers.

Appendix A.2. Posterior Distribution

The posterior distribution of (θ, σ2 ) is expressed as:

p(θ, σ2 ) p(y|θ, σ2 )
p(θ, σ2 |y) = . (A6)
p(y)

The term p(y) = p(θ, σ2 ) p(y|θ, σ2 )dθdσ2 is the marginal distribution of the data. It
R

can be shown that after some algebraic operations, the posterior can be written as:
∗
1 α +k/2+1

p(θ, σ2 |y) ∝
σ2 (A7)
1 1
× exp − 2 β∗ + (θ − µ∗ )T Σ∗−1 (θ − µ∗ ) ,
σ 2
Mathematics 2021, 9, 2810 17 of 20

where:  −1 −1
µ∗ = Σ −1 + x T x Σ µ + xT y




 ∗ −1
Σ = Σ −1 + x T x .

 α∗ = α + n/2
 ∗
= β + 12 µT Σ−1 µ + y T y − µ∗T Σ∗−1 µ∗

β
The initial NIG conjugate parameter (µ, Σ, α, β) is updated to a new set of parameters
(µ∗ , Σ ∗ , α ∗ , β ∗ ).

Appendix A.3. Prediction

The Bayesian prediction of the response given the data y can be obtained using the
above distributions with a new set of input variable x̃. Denote the corresponding prediction
as ỹ. The prediction posterior distribution of ỹ can be expressed as:
Z
p(ỹ|y) = p(ỹ|θ, σ2 ) p(θ, σ2 |y)dθdσ2
Z
= MVN(x̃θ, σ2 I) × NIG(µ∗ , Σ∗ , α∗ , β∗ )dθdσ2 (A8)
β∗

= MVT2α∗ x̃µ∗ , ∗ (I + x̃Σ∗ x̃ T ) .
α

when a single point is evaluated, Equation (A8) is reduced to a one-dimensional Student

distribution, and the confidence interval is readily evaluated using the inverse Student
distribution. For multipoint simultaneous evaluations, the interval contours can be approx-
imated using a few methods as suggested in [43].
To compare, the simple linear regression results of the prediction mean and bounds
are given. The prediction mean is x̃θ̂, and the prediction bounds of level (1 − γ)% are:

ỹ1−γ = x̃θ̂ ± tinv(γ/2, n − k) · œy , (A9)

where θ̂ = (x T x)−1 (x T y) is the maximum likelihood estimator of θ. It is known that θ̂ is

also identical to the least squares estimator and the Bayesian estimator, i.e., Equation (23).
The term tinv(γ/2, n − k) represents the γ/2-percentile value of the standard Student
t-distribution with n − k degrees of freedom, and σy is the standard error of the prediction
given by:
eT e h i
σy2 = · I + x̃(x T x)−1 x̃ T , (A10)
n−k
where the term e = y − xθ̂ is the discrepancy vector between the model and observation,
also called the residual or error. As multiple points are estimated, the F-statistic can be
used instead of the t-statistic to yield the Working–Hotelling confidence intervals [44],
q
ỹ1−γ = x̃θ̂ ± 2Finv(γ, k, n − k) · œy , (A11)

where Finv(γ, k, n − k) represents the γ-percentile value of the F-distribution with (k, n − k)
degrees of freedom.

Appendix B. Fisher Information Matrix and Kullback–Leibler Divergence

The equivalence of the Fisher information matrix and the Kullback–Leibler divergence
can be shown as follows. The matrix form of the Fisher information matrix of a continuous
PDF with a parameter vector θ writes,
h i
I(θ ) = Eθ ∇ ln pθ ( x ) · ∇ ln pθ ( x ) T . (A12)

Under the condition that the integration and derivation may be exchanged and the
log-function is twice differentiable, the following Equation (A13) can be attained.
Mathematics 2021, 9, 2810 18 of 20

∇2 pθ ( x ) ∇ pθ ( x )∇ · pθ ( x )T
∇2 ln pθ ( x ) = −
pθ ( x ) [ pθ ( x )]2 (A13)
2
∇ pθ ( x )
= − ∇ ln pθ ( x ) · ∇ ln pθ ( x )T .
pθ
It is noted that:

∇2 p θ ( x )
Z
Eθ = ∇2 pθ ( x )dx = ∇2 1 = 0. (A14)
pθ ( x )

The following result, that is the negative expected Hessian matrix of the log-function
is equal to the Fisher information matrix, can be obtained.
h i h i
I(θ ) = Eθ ∇ ln pθ ( x ) · ∇ ln pθ ( x ) T = −Eθ ∇2 ln pθ ( x ) , (A15)

where ∇2 ln pθ ( x ) is recognized as the Hessian matrix of ln pθ ( x ), denoted as Hln pθ ( x) .

The Fisher information matrix can loosely be thought of as the curvature matrix of the
log-function graph.
The Kullback–Leibler divergence, also called the relative entropy, between two distri-
butions can be written as:

p (x)
Z
D(θ ||θ 0 ) = pθ ( x ) ln θ dx = Eθ [ln pθ ( x ) − ln pθ 0 ( x )]. (A16)
pθ 0 ( x )

Expand ln pθ 0 ( x ) around θ to the second order to have,

ln pθ 0 ( x ) ≈ ln pθ ( x ) + (θ − θ 0 )∇ ln pθ ( x )
1 . (A17)
+ (θ − θ 0 )T ∇2 ln pθ ( x )(θ − θ 0 )
2
Substitute Equation (A17) into Equation (A16):

1 h i
D(θ ||θ 0 ) ≈ −Eθ [∆θ · ∇ ln pθ ( x )] − Eθ ∆θ · ∇2 ln pθ ( x ) · ∆θ T , (A18)
2
where ∆θ = (θ − θ 0 ). The first term of the right-hand side of Equation (A18) is zero; and
the second term of the right-hand side is related to Equation (A15). The KL divergence can
finally be expressed as,
1
D(θ ||θ 0 ) ≈ ∆θ T · I(θ ) · ∆θ. (A19)
2
Furthermore, differentiate Equation (A16) with respect to θ 0 to obtain:

∇θ 0 D(θ ||θ 0 ) = ∇θ 0 Eθ [ln pθ ( x )− ln pθ 0 ( x )]

∇ pθ 0 ( x ) . (A20)
= −Eθ
pθ 0 ( x )

Continue to differentiate Equation (A20) with respect to θ 0 to obtain:

" #
2 0 ∇ p θ 0 ( x ) ∇2 p θ 0 ( x )
∇θ 0 D(θ ||θ ) = −Eθ ∇ pθ 0 ( x ) − . (A21)
p2θ pθ 0 ( x )

Notice that the expectation of the second term in Equation (A21), when θ 0 = θ, reads,

∇2 p θ 0 ( x )
Z
Eθ = ∇2 pθ ( x ) = ∇2 1 = 0, (A22)
pθ 0 ( x )
Mathematics 2021, 9, 2810 19 of 20

and the first term is the Fisher information matrix. As a result, the Fisher information matrix
is the Hessian matrix of the Kullback–Leibler distance evaluated at the true parameter θ.

∇2θ 0 D(θ ||θ 0 )|θ 0 =θ = I(θ ). (A23)

References
1. Royall, R.M. The effect of sample size on the meaning of significance tests. Am. Stat. 1986, 40, 313–315.
2. Raudys, S.J.; Jain, A.K. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans.
Pattern Anal. Mach. Intell. 1991, 13, 252–264. [CrossRef]
3. Nelson, C.R.; Kim, M.J. Predictable stock returns: The role of small sample bias. J. Financ. 1993, 48, 641–661. [CrossRef]
4. Fan, X.; Thompson, B.; Wang, L. Effects of sample size, estimation methods, and model specification on structural equation
modeling fit indexes. Struct. Equ. Model. A Multidiscip. J. 1999, 6, 56–83. [CrossRef]
5. McNeish, D. On using Bayesian methods to address small sample problems. Struct. Equ. Model. A Multidiscip. J. 2016, 23, 750–773.
[CrossRef]
6. Winkler, R.L. Probabilistic prediction: Some experimental results. J. Am. Stat. Assoc. 1971, 66, 675–685. [CrossRef]
7. Melchers, R.E.; Beck, A.T. Structural Reliability Analysis and Prediction; John Wiley & Sons: Hoboken, NJ, USA, 2018.
8. He, J.; Chen, J.; Guan, X. Lifetime distribution selection for complete and censored multi-level testing data and its influence on
probability of failure estimates. Struct. Multidiscip. Optim. 2020, 62, 1–17. [CrossRef]
9. Du, Y.M.; Ma, Y.H.; Wei, Y.F.; Guan, X.; Sun, C. Maximum entropy approach to reliability. Phys. Rev. E 2020, 101, 012106.
[CrossRef]
10. Zhou, D.; He, J.; Du, Y.M.; Sun, C.; Guan, X. Probabilistic information fusion with point, moment and interval data in reliability
assessment. Reliab. Eng. Syst. Saf. 2021, 213, 107790. [CrossRef]
11. Gao, C.; Fang, Z.; Lin, J.; Guan, X.; He, J. Model averaging and probability of detection estimation under hierarchical uncertainties
for Lamb wave detection. Mech. Syst. Signal Process. 2022, 165, 108302. [CrossRef]
12. Gregory, P. Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support; Cambridge
University Press: Cambridge, UK, 2005.
13. Guan, X.; Jha, R.; Liu, Y. Model selection, updating, and averaging for probabilistic fatigue damage prognosis. Struct. Saf. 2011,
33, 242–249. [CrossRef]
14. Hamdia, K.M.; Zhuang, X.; He, P.; Rabczuk, T. Fracture toughness of polymeric particle nanocomposites: Evaluation of models
performance using Bayesian method. Compos. Sci. Technol. 2016, 126, 122–129. [CrossRef]
15. Yang, J.; He, J.; Guan, X.; Wang, D.; Chen, H.; Zhang, W.; Liu, Y. A probabilistic crack size quantification method using in-situ
Lamb wave test and Bayesian updating. Mech. Syst. Signal Process. 2016, 78, 118–133. [CrossRef]
16. Wang, D.; He, J.; Guan, X.; Yang, J.; Zhang, W. A model assessment method for predicting structural fatigue life using Lamb
waves. Ultrasonics 2018, 84, 319–328. [CrossRef]
17. Guan, X.; He, J. Life time extension of turbine rotating components under risk constraints: A state-of-the-art review and case
study. Int. J. Fatigue 2019, 129, 104799. [CrossRef]
18. He, J.; Huo, H.; Guan, X.; Yang, J. A Lamb wave quantification model for inclined cracks with experimental validation. Chin. J.
Aeronaut. 2021, 34, 601–611. [CrossRef]
19. Huo, H.; He, J.; Guan, X. A Bayesian fusion method for composite damage identification using Lamb wave. Struct. Health Monit.
2020, 1475921720945000. [CrossRef]
20. Berger, J.O. Robust Bayesian analysis: Sensitivity to the prior. J. Stat. Plan. Inference 1990, 25, 303–328. [CrossRef]
21. Berger, J.O.; Moreno, E.; Pericchi, L.R.; Bayarri, M.J.; Bernardo, J.M.; Cano, J.A.; De la Horra, J.; Martín, J.; Ríos-Insúa, D.; Betrò, B.;
et al. An overview of robust Bayesian analysis. Test 1994, 3, 5–124. [CrossRef]
22. Kass, R.E.; Wasserman, L. The selection of prior distributions by formal rules. J. Am. Stat. Assoc. 1996, 91, 1343–1370. [CrossRef]
23. Bishop, P.; Bloomfield, R.; Littlewood, B.; Povyakalo, A.; Wright, D. Toward a formalism for conservative claims about the
dependability of software-based systems. IEEE Trans. Softw. Eng. 2010, 37, 708–717. [CrossRef]
24. Gelman, A.; Hennig, C. Beyond subjective and objective in statistics. J. R. Stat. Soc. Ser. A 2017, 180, 967–1033. [CrossRef]
25. Zhang, J.; Shields, M.D. The effect of prior probabilities on quantification and propagation of imprecise probabilities resulting
from small datasets. Comput. Methods Appl. Mech. Eng. 2018, 334, 483–506. [CrossRef]
26. Schervish, M.J.; DeGroot, M.H. Probability and Statistics; Pearson Education: London, UK, 2014.
27. Jaynes, E.T. Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [CrossRef]
28. Jaynes, E.; Bretthorst, G. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003.
29. Gelman, A.; Simpson, D.; Betancourt, M. The prior can often only be understood in the context of the likelihood. Entropy 2017,
19, 555. [CrossRef]
30. Jeffreys, H. An Invariant Form for the Prior Probability in Estimation Problems. Proc. R. Soc. Lond. Ser. A 1946, 186, 453–461.
31. Hartigan, J. Invariant prior distributions. Ann. Math. Stat. 1964, 35, 836–845. [CrossRef]
32. Bernardo, J.M. Reference posterior distributions for Bayesian inference. J. R. Stat. Soc. Ser. B 1979, 41, 113–147. [CrossRef]
Mathematics 2021, 9, 2810 20 of 20

33. Berger, J.O.; Bernardo, J.M. Ordered group reference priors with application to the multinomial problem. Biometrika 1992,
79, 25–37. [CrossRef]
34. Jeffreys, H. The Theory of Probability; Oxford University Press: Oxford, UK, 1998.
35. Gelman, A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).
Bayesian Anal. 2006, 1, 515–534. [CrossRef]
36. Simpson, D.; Rue, H.; Riebler, A.; Martins, T.G.; Sørbye, S.H. Penalising model component complexity: A principled, practical
approach to constructing priors. Stat. Sci. 2017, 32, 1–28. [CrossRef]
37. Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [CrossRef]
38. Guan, X.; He, J.; Jha, R.; Liu, Y. An efficient analytical Bayesian method for reliability and system response updating based on
Laplace and inverse first-order reliability computations. Reliab. Eng. Syst. Saf. 2012, 97, 1–13. [CrossRef]
39. He, J.; Guan, X.; Jha, R. Improve the accuracy of asymptotic approximation in reliability problems involving multimodal
distributions. IEEE Trans. Reliab. 2016, 65, 1724–1736. [CrossRef]
40. American Society for Testing and Materials. ASTM E739-10(2015)-Standard Practice for Statistical Analysis of Linear or Linearized
Stress-Life (S-N) and Strain-Life (ε-N) Fatigue Data; ASTM International: West Conshohocken, PA, USA, 2015.
41. Boyd-Lee, A.; Harrison, G.; Henderson, M. Evaluation of standard life assessment procedures and life extension methodologies
for fracture-critical components. Int. J. Fatigue 2001, 23, 11–19. [CrossRef]
42. Browne, W.J.; Draper, D. A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Anal.
2006, 1, 473–514. [CrossRef]
43. Kotz, S.; Nadarajah, S. Multivariate T-Distributions and Their Applications; Cambridge University Press: Cambridge, UK, 2004.
44. Graybill, F.A.; Bowden, D.C. Linear segment confidence bands for simple linear models. J. Am. Stat. Assoc. 1967, 62, 403–408.
[CrossRef]

Bayesian Week2 LectureNotes
No ratings yet
Bayesian Week2 LectureNotes
14 pages
Bayesian Theory-Priors, Part 1: Other Reading
No ratings yet
Bayesian Theory-Priors, Part 1: Other Reading
14 pages
Synopsis For Client
No ratings yet
Synopsis For Client
9 pages
Priors in Bayesian Learning
No ratings yet
Priors in Bayesian Learning
26 pages
Model-Free Objetive Bayesian Prediction
No ratings yet
Model-Free Objetive Bayesian Prediction
8 pages
Prior Distributions For Variance Parameters in Hierarchical Models
No ratings yet
Prior Distributions For Variance Parameters in Hierarchical Models
20 pages
Class 7 8 Unlocked
No ratings yet
Class 7 8 Unlocked
18 pages
Prior Distribution
No ratings yet
Prior Distribution
14 pages
Nonparametric Application of Bayesian Inference
No ratings yet
Nonparametric Application of Bayesian Inference
7 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
Objective Bayesian Statistics: 0.-A. Al-Hujaj and H.L. Harney
No ratings yet
Objective Bayesian Statistics: 0.-A. Al-Hujaj and H.L. Harney
8 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
Bayes Rule, Prior, Posterior
No ratings yet
Bayes Rule, Prior, Posterior
10 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
No ratings yet
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
22 pages
Intro-Bayes Theory
No ratings yet
Intro-Bayes Theory
17 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Nonparametric Inference Techniques For High-Dimensional Data: Challenges and Solutions
No ratings yet
Nonparametric Inference Techniques For High-Dimensional Data: Challenges and Solutions
16 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
Bayes Lectures English
No ratings yet
Bayes Lectures English
74 pages
Bayesian Priors: Understanding and Application
No ratings yet
Bayesian Priors: Understanding and Application
22 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Sjos 2
No ratings yet
Sjos 2
15 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Baysian-Slides 16 Bayes Intro
No ratings yet
Baysian-Slides 16 Bayes Intro
49 pages
Chapitre 1 Statistique - Bayesienne
No ratings yet
Chapitre 1 Statistique - Bayesienne
47 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
HB 092012 Submitted
No ratings yet
HB 092012 Submitted
26 pages
Zzzz-Essential Bayes
No ratings yet
Zzzz-Essential Bayes
158 pages
Bayesian Statistics Essentials
No ratings yet
Bayesian Statistics Essentials
180 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
A Cautionary Note On The Discrete Uniform Prior For The Binomial
No ratings yet
A Cautionary Note On The Discrete Uniform Prior For The Binomial
7 pages
Priors For New Physics
No ratings yet
Priors For New Physics
33 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Bayesian Learning Optimization
No ratings yet
Bayesian Learning Optimization
4 pages
Models Beyond The DP
No ratings yet
Models Beyond The DP
47 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
Baysian Inferences
No ratings yet
Baysian Inferences
20 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
Probabilistic View On Predictive Construction For Bayesian Learning
No ratings yet
Probabilistic View On Predictive Construction For Bayesian Learning
32 pages
Bayesian Theory and Methods With Applications Instant Reading Access
100% (19)
Bayesian Theory and Methods With Applications Instant Reading Access
16 pages
Bayesian Inference
No ratings yet
Bayesian Inference
22 pages
Unit - 5 ML
No ratings yet
Unit - 5 ML
57 pages
General Bayesian Updating and The Loss-Likelihood Bootstrap
No ratings yet
General Bayesian Updating and The Loss-Likelihood Bootstrap
14 pages
Course On Bayesian Methods in Environmental Valuation: Basics (Continued) : Models For Proportions and Means
No ratings yet
Course On Bayesian Methods in Environmental Valuation: Basics (Continued) : Models For Proportions and Means
34 pages
Bayesian Regression Analysis Guide
No ratings yet
Bayesian Regression Analysis Guide
53 pages
Bayesian Statistical Methods 1st Edition Brian J. Reich Download
100% (4)
Bayesian Statistical Methods 1st Edition Brian J. Reich Download
56 pages
Snoussi 2007
No ratings yet
Snoussi 2007
45 pages
Bayesian Conjugate Priors Guide
No ratings yet
Bayesian Conjugate Priors Guide
47 pages
An Overview of Bayesian Econometrics
No ratings yet
An Overview of Bayesian Econometrics
30 pages
Block 4 ST3189
No ratings yet
Block 4 ST3189
25 pages
Bayesian Statistical Methods 1st Edition Brian J. Reich Download
No ratings yet
Bayesian Statistical Methods 1st Edition Brian J. Reich Download
142 pages
LN 13
No ratings yet
LN 13
5 pages
Var PPTS
No ratings yet
Var PPTS
249 pages
24 Intro To Bayesian Inference
No ratings yet
24 Intro To Bayesian Inference
33 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Bayesian Inference: Statisticat, LLC
No ratings yet
Bayesian Inference: Statisticat, LLC
30 pages
Conjugate Prior
0% (1)
Conjugate Prior
7 pages
Introduce To Probabilistic Machine Learning
No ratings yet
Introduce To Probabilistic Machine Learning
53 pages
M.A./M.Sc. Statistics Syllabus
No ratings yet
M.A./M.Sc. Statistics Syllabus
26 pages
Probability Concepts and Bayesian Analysis
No ratings yet
Probability Concepts and Bayesian Analysis
57 pages
Panion PDF
No ratings yet
Panion PDF
154 pages
Bayesian Inference Basics
No ratings yet
Bayesian Inference Basics
7 pages
Markov Chain Monte Carlo Stochastic Simulation For Bayesian Inference 2nd Edition Dani Gamerman Instant Access 2025
No ratings yet
Markov Chain Monte Carlo Stochastic Simulation For Bayesian Inference 2nd Edition Dani Gamerman Instant Access 2025
112 pages
Bayesian Conjugate Priors Explained
No ratings yet
Bayesian Conjugate Priors Explained
5 pages
Chan Jeliazkov 2009
No ratings yet
Chan Jeliazkov 2009
20 pages
A Bayesian Approach For Integrating Multilevel Priors and Data For Aerospace System Reliability Assessment
No ratings yet
A Bayesian Approach For Integrating Multilevel Priors and Data For Aerospace System Reliability Assessment
13 pages
Bayesian-Statistics Final 20140416 3
No ratings yet
Bayesian-Statistics Final 20140416 3
38 pages
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
Nonlife Actuarial Models Theory Methods and Evaluation International Series On Actuarial Science 1st Edition Yiu-Kuen Tse Digital Download
100% (1)
Nonlife Actuarial Models Theory Methods and Evaluation International Series On Actuarial Science 1st Edition Yiu-Kuen Tse Digital Download
129 pages
Application of Bayesian Regression Model in Financial Stock Market Forecasting
No ratings yet
Application of Bayesian Regression Model in Financial Stock Market Forecasting
5 pages
DS 630 - Lec 3 - ST
No ratings yet
DS 630 - Lec 3 - ST
24 pages
Bayesian Reasoning in Data Analysis A Critical Introduction by Giulio D. Agostini
100% (2)
Bayesian Reasoning in Data Analysis A Critical Introduction by Giulio D. Agostini
351 pages
STA360/601 Midterm Solutions
No ratings yet
STA360/601 Midterm Solutions
6 pages
Linear Models: Detection & Estimation
No ratings yet
Linear Models: Detection & Estimation
6 pages
Bayesian Stats for Academics
No ratings yet
Bayesian Stats for Academics
33 pages
Bayesian Parameter Estimation
No ratings yet
Bayesian Parameter Estimation
40 pages
STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018
No ratings yet
STAT-36700 Homework 4 - Solutions: Fall 2018 September 28, 2018
14 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
Gaussian Conjugate Prior Cheat Sheet
No ratings yet
Gaussian Conjugate Prior Cheat Sheet
7 pages
Actuarial Exam: Statistical Methods
No ratings yet
Actuarial Exam: Statistical Methods
6 pages
Exponential Family - Wikipedia
No ratings yet
Exponential Family - Wikipedia
15 pages
Bayesian Poisson Inference Methods
No ratings yet
Bayesian Poisson Inference Methods
10 pages
1 s2.0 S0266352X10000947 Main PDF
No ratings yet
1 s2.0 S0266352X10000947 Main PDF
8 pages
Applied Bayesian Econometrics For Central Bankers Updated 2017 PDF
No ratings yet
Applied Bayesian Econometrics For Central Bankers Updated 2017 PDF
222 pages

Bayesian Inference Under Small Sample

Uploaded by

Bayesian Inference Under Small Sample

Uploaded by

mathematics

 Keywords: Bayesian inference; noninformative prior; Jeffreys’ prior; invariant

Citation: He, J.; Wang, W.; Huang,

Mathematics 2021, 9, 2810. https://doi.org/10.3390/math9212810 https://www.mdpi.com/journal/mathematics

2. Bayesian Linearized Models with Noninformative Priors

p(θ, σ2 |y) ∝ p(θ, σ2 ) p(y|θ, σ2 ). (3)

p(y|θ, σ2 ) = N(xθ, σ2"I) #

2.1. A General Form of Noninformative Priors—∝ 1/σq

For a Gaussian likelihood with unknown parameters µ and σ2 ,

Jeffreys’ prior for the joint parameter (µ, σ2 ) is:

where I(µ, σ2 ) is:

∂ ln p( x |·) ∂ ln p( x |·) ∂ ln p( x |·) ∂ ln p( x |·)

p(µ, σ2 ) ∝ 1/σ3 . (11)

where f 1 = ∂ ln p( x |·)/∂φ and f 2 = ∂2 ln p( x |·)/∂φ2 . For a Gaussian distribution with a

Prior (θ, σ 2 ) NIG(α, β, θ, σ 2 ) → Prior Posterior → NIG(α∗ , β∗ , θ, σ 2 )

which is a (2α∗ ) degree-of-freedom multivariate t– distribution with a location vector of µ∗

The prediction posterior in this case is:

3. Assessment of Noninformative Priors

3.1. Fitting Performance

p(y|θ, σ2 , M1 ) p(θ, σ2 | M1 )dθdσ2

ln p(y, φ) = ln p(y, φ0 ) + (φ − φ0 )T ∇ ln p(y, φ0 )

where ∇ ln p(y, φ0 ) is the gradient of ln p(y, φ) evaluated at φ0 , ∇2 ln p(y, φ0 ) is the Hessian

ln p(y, φ) ≈ ln p(y, φ0 ) + (φ − φ0 )T ∇ ln p(y, φ0 )

Notice that θ is a k-dimensional vector, so φ = (θ, σ2 ) is a k + 1-dimensional vector. Using

3.2. Predictive Performance

4.1. Low-Cycle Fatigue Reliability Assessment with Sparse Samples

Table 2. Low-cycle fatigue testing data. Source: [40].

Pr( N > NPOF ) = 1 − POF. (40)

4.2. Aeroengine Turbine Disk Lifing and Failure Rate Assessment

Combine Equations (41) and (42) to have:

The sampling uncertainty in the log-transformed geometric mean x 0 can be quantified

Table 3. Spin rig testing results on two disks.

Disk No. No. of Cycles

4.2.1. Existing Method

4.2.2. The Proposed Noninformative Bayesian Method

Figure 3. Samples drawn from the posterior PDF and histograms.

Appendix A. Conjugate Prior, Posterior, and Prediction

Appendix A.1. Conjugate Prior

p(θ, σ2 ) = p(θ |σ2 ) p(σ2 ) = NIG(µθ , Σθ , α, β)

Appendix A.2. Posterior Distribution

Appendix A.3. Prediction

when a single point is evaluated, Equation (A8) is reduced to a one-dimensional Student

ỹ1−γ = x̃θ̂ ± tinv(γ/2, n − k) · œy , (A9)

where θ̂ = (x T x)−1 (x T y) is the maximum likelihood estimator of θ. It is known that θ̂ is

Appendix B. Fisher Information Matrix and Kullback–Leibler Divergence

where ∇2 ln pθ ( x ) is recognized as the Hessian matrix of ln pθ ( x ), denoted as Hln pθ ( x) .

Expand ln pθ 0 ( x ) around θ to the second order to have,

∇θ 0 D(θ ||θ 0 ) = ∇θ 0 Eθ [ln pθ ( x )− ln pθ 0 ( x )]

Continue to differentiate Equation (A20) with respect to θ 0 to obtain:

∇2θ 0 D(θ ||θ 0 )|θ 0 =θ = I(θ ). (A23)

You might also like

Keywords: Bayesian inference; noninformative prior; Jeffreys’ prior; invariant

∇θ 0 D(θ ||θ 0 ) = ∇θ 0 Eθ [ln pθ ( x )− ln pθ 0 ( x )]