KEMBAR78
Bayesian Hierarchical Analysis On Crash Prediction Models | PDF | Multilevel Model | Statistical Inference
0% found this document useful (0 votes)
66 views22 pages

Bayesian Hierarchical Analysis On Crash Prediction Models

Uploaded by

Ahmed Fenneur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views22 pages

Bayesian Hierarchical Analysis On Crash Prediction Models

Uploaded by

Ahmed Fenneur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

This is the author’s version of a work that was submitted/accepted for pub-

lication in the following source:

Huang, Helai, Chin, Hoong Chor, & Haque, Md. Mazharul


(2008)
Bayesian hierarchical analysis on crash prediction models. In
87th Annual Meeting of Transportation Research Board (TRB), Transporta-
tion Research Board, Capital Hilton, Washington DC.

This file was downloaded from: https://eprints.qut.edu.au/51218/

c Copyright 2008 [please consult the author]


Notice: Changes introduced as a result of publishing processes such as


copy-editing and formatting may not be reflected in this document. For a
definitive version of this work, please refer to the published source:

http://pubsindex.trb.org/view.aspx?id=844461
Huang et al. 1

Please cite this article as:


Huang, H., Chin, H. C. and Haque, M. M. "Bayesian hierarchical analysis on crash
prediction models." In Proc. 87th Annual Meeting of Transportation Research Board (TRB),
Washington DC, USA, 2008.

Bayesian Hierarchical Analysis on Crash Prediction Models

Helai Huang *
Research Fellow
Department of Civil Engineering
National University of Singapore
Engineering Drive 2,
Singapore, 117576
Tel: 65 6516 2255
Email: huanghelai@nus.edu.sg

Hoong Chor Chin


Associate Professor
Department of Civil Engineering
National University of Singapore
Singapore 117576
Tel: 65 6516 1359
Email: cvechc@nus.edu.sg

Md. Mazharul Haque


Research Scholar
Department of Civil Engineering
National University of Singapore
Singapore, 117576
Tel: 65 6516 2255
Email: mmh@nus.edu.sg

* Corresponding author
Huang et al. 2

ABSTRACT

Traditional crash prediction models, such as generalized linear regression models, are
incapable of taking into account the multilevel data structure, which extensively exists in
crash data. Disregarding the possible within-group correlations can lead to the production of
models giving unreliable and biased estimates of unknowns. This study innovatively proposes
a 5 × T -level hierarchy, viz. (Geographic region level – Traffic site level – Traffic crash level
– Driver-vehicle unit level – Vehicle-occupant level) × Time level, to establish a general form
of multilevel data structure in traffic safety analysis. To properly model the potential cross-
group heterogeneity due to the multilevel data structure, a framework of Bayesian hierarchical
models that explicitly specify multilevel structure and correctly yield parameter estimates is
introduced and recommended. The proposed method is illustrated in an individual-severity
analysis of intersection crashes using the Singapore crash records. This study proved the
importance of accounting for the within-group correlations and demonstrated the flexibilities
and effectiveness of the Bayesian hierarchical method in modeling multilevel structure of
traffic crash data.
Huang et al. 3

INTRODUCTION

Crash prediction model (also called safety performance function) is one of the most important
techniques in investigating the relationship between crash occurrence and risk factors
associated with various traffic entities. These risk factors are assumed to provide information
on the behavior of the crash occurrence, which is commonly measured by crash frequency
with various degrees of crash severity. Appropriate probabilistic forms and statistically
significant factors are identified based on the examination of crash occurrence mechanism and
model fitting performance to the historical crash data. The typical structure of such models
could be expressed as a general form as follows:

Y | θ ~ Dist (θ)
(1)
with θ = f ( X, β, ε)

Y: Dependent variable of interest, e.g. crash frequency or severity


Dist (θ) : Adapted distribution for Y | θ and its parameters
X: Covariates representing various relative risk factors to crash occurrence
β: Factor effects of X on Y
f (⋅) : Link function relating X and Y
ε: Disturbance/error terms in the model

The dependent variable Y is assumed to follow some distribution with parameters θ ,


which is further modeled as a link function f ( X, β, ε) . The selection of the distribution
depends on the natural characteristics of the crash features of interest. Particularly, in
predicting crash frequency, Poisson distribution is traditionally employed to model the count
data (e.g. 1-4). In contrast, when crash severity is concerned, discrete outcome distributions
are generally used, such as those in nominal models (e.g. 5-9) or ordered discrete models (10-
13). The distribution parameters ( θ ) are then related to the risk factors using a link function
which in a conceptual sense, consists of three components:
i) a suitable transformation function for θ based on the nature of data type, for example, a
logistic function for binary data or exponential function for count data;
ii) an expression combining X and β , typically assuming a linear combination of X or
their transforms, i.e. Xβ , and
iii) the term ε to represent various disturbance/error terms assumed in the model.

A significant number of studies have been conducted in investigating the suitability of


various crash prediction models for both crash frequency and severity. Traditionally,
generalized linear regression models (GLMs), such as Poisson model, Logit or Probit Models
are broadly applied to build probabilistic formulations on the relationship of the crash
occurrence with a variety of possible covariates. In most of these classical models, the
disturbance term ε is inherently determined by the adapted distribution, resulting in some
constraints for the mean and the variance of the model (e.g. ‘variance = mean’ in Poisson
model, or ‘variance = mean × (1-mean)’ in Binomial Logit model). Hence, they may not be
adequate to account for some over-dispersed data, which are commonly found in crash
frequency data. To overcome the over-dispersion problem in count data, some overdispersed
Poisson models have been proven to be useful by relaxing the condition of ‘variance = mean’
in standard Poisson model. Without explicitly distinguishing the source of over-dispersion, an
Huang et al. 4

additional stochastic component ε is introduced to the link function. By respectively


assuming exp(ε ) a Gamma distribution or a Lognormal distribution, Poisson-Gamma model
(also call NB model) (e.g. 14-18) or Poisson-Lognormal distribution (19) is typically
employed.
However, these GLMs suffer from a common underlying limitation that each
observation (e.g. a crash or a vehicle involvement) in the estimation procedure corresponds to
an individual situation. Hence, the residuals from the model exhibit independence. However,
this “independence” assumption may often not hold true since multilevel data structures exist
extensively because of the traffic data collection and clustering process. Disregarding the
possible within-group correlations may lead to production of models with unreliable
parameter estimates and statistical inferences.
The objective of this study is to propose a Bayesian hierarchical method to explicitly
model the multilevel data structure in crash data. In the following, potential multilevel data
structures in traffic safety are comprehensively examined. A 5× T -level hierarchy is
innovatively proposed to establish a general form of the multilevel data structure. To properly
model the potential between-group heterogeneity, a framework of Bayesian hierarchical
method that explicitly specifies multilevel data is developed. A case study is then summarized
to illustrate the proposed method. A concluding remark is finally given.

MULTILEVEL DATA STRUCTURE IN TRAFFIC SAFETY

A Common Neglect in Existing Models: Cross-Group Heterogeneity


To clearly explain the underlying limitation of ‘independence’ assumption in GLMs, we take
an example of a simple regression relationship between crash frequency and crash exposure.
The crash exposure is defined as the amount of opportunities for crashes of a certain type
which drivers or the traffic systems experiences. In this example, the crash exposure is
assumed to have a linear relationship with the logarithm of the mean crash count ( log µ ):
higher exposure is associated with more crashes. A standard GLM may generate the
relationship as shown in Case (a) of Figure 1. Given the crash exposure ( x ), the variation
between different observations ( y ) is restricted by distribution adapted ( Dist ( µ ) ). Put
another way, the only stochastic component of variation is introduced by Dist ( µ ) .
(Insert Figure 1 here)
In particular, standard Poisson model assumes a fixed variance for different
observations given µ , which exactly equals to µ . Hence the variation of y is only determined
through observed heterogeneity, i.e. crash exposure in this example. In overdispersed Poisson
models, by adding an additional disturbance ( ε ) to relax the constraint of ‘variance = mean’,
the new mean crash count ( µ~ ) is subject both to the deterministic variation associated with
crash exposure but also to the unobserved heterogeneity introduced by ε . For a given crash
exposure ( x ), there is a distribution of µ~ ’s rather than a single value for the mean crash count
µ . However, overdispersed Poisson models only take an overall same distribution on the
disturbance among individual observations. Hence different observations are still independent
with each other. The potential structural improvement of overdispersed Poisson model over
standard Poisson model is only being able to account for unobserved cross-individual
heterogeneity in addition to the observed variations.
However, the “independence” assumption may often not hold true since the multilevel
data structures exists extensively, either intrinsically in the traffic data or extrinsically
resulting from the manner data are collected or clustered. For example, to study the
Huang et al. 5

relationship of crash count and exposure, a number of selected road segments may be nested
in several areas of interest (e.g. cities). Moreover, for each selected road segment, there may
be several observations from different time periods. In this case, some cross-group
heterogeneities, either observed or unobserved, may exist due to spatial and temporal effects
of crash data. Indeed, some characteristic variations may necessarily exist between different
areas or between road segments.
For instance, suppose the data in the above example are collected from four different
areas, in each of which a number of road segments are involved in the study. The Cases (b)-(f)
in Figure 2 illustrate various potential relationships between crash count and exposure. As
discussed previously, if a standard GLM model is used on the aggregate dataset (Case (a)), the
area-level context in which the road segments belong to is completely ignored: the same
single straight line relationship is held to exist everywhere. In effect the model has explained
“everything in general and nothing in particular”. However, given the different features
among the areas such as the road density, there may be varying crash count/exposure
relationships. One possible result shown in the Case (b) of Figure 2, is the varying-intercept
pattern. Here each of four areas has their own crash count/exposure relation represented by a
separate line. The single thicker line represents the general relationship across all four areas.
The parallel lines imply that, while the crash count/exposure relation in each area is the same,
some areas have uniformly higher crash frequency than others. In Case (c) and (d), the
situations are more complicated as the steepness of the lines varies from area to area. In Case
(c), the pattern is such that areas make very little difference for the relatively low exposure
roads but there is a high degree of between-area variation in crash count for higher exposure
roads. In contrast, Case (d) shows large area-specific differentials exist for the road segment
with lower exposure. In Case (e), there is a complex interaction between crash count and
exposure. In some areas it is the lower exposure roads which have relatively high crash
frequency, whereas in others it is the higher exposure roads. While the final plot, Case (f), is
unlikely to occur in terms of the present example, it can be expected in some other risk
factors. Across all the areas there is no relationship between the crash count and the risk factor
(the single thicker line is horizontal) but in specific area there are distinctive relationships.
This situation is similar to Case (c) but here the differences result from some areas having
high crash frequency for high value of the risk factor, while in others they have the lowest
frequency.
The cross-area variations of slopes and intercepts could be caused by various area-
specific heterogeneities. For those observable heterogeneities, it is theoretically possible to
factorize and then account them by using some classical techniques such as GLM with
consideration of interactions, ANOVA, or ANOCOVA. But traffic crash is a complex event
with a large number of factors involved. Ideally, all of the relevant factors in different levels
(e.g. road segment level and area level) should be considered in the model. In practice,
however, some of the factors may not be available or even collectable for study. A model may
only consider the most important factors and omit the others. It assumes that similar groups
(i.e. with same selected observable factors) have the same pattern of crash occurrence. In the
real world, however, similar groups (e.g. area) may be different in omitted factors and thus
may have different means. These unobservable or omitted heterogeneities introduce additional
variance to the data and cause the over-dispersion. Consequently, without appropriately
accounting for the cross-group heterogeneities, the estimates of the standard error in the
regression coefficients may be underestimated.
The patterns shown in the above example exist almost everywhere in the traffic safety
studies since most crash datasets are collected with an inherent multilevel structure. For
example, in predicting crash severities, it is reasonable to assume that the characteristics of the
Huang et al. 6

vehicles within which casualties are traveling will affect their probability of survival. If this is
the case, then casualties within the same vehicle would tend to have more similar severity
than casualties in different vehicles, rendering the assumption of residual independence
invalid. The same argument may be extended to encompass the effect of similarities between
different crashes, traffic sites, or geographical regions.

A 5 × T -level Hierarchy in Traffic Safety Data


To systematically examine the possible multilevel data, a 5× T -level hierarchy, as shown in
Figure 2, is proposed to represent the general framework of multilevel data structures in traffic
safety. Along the vertical of this triangular prism is a five-level hierarchy representing various
traffic entity including, from macroscopic to the microscopic levels, Geographic region level
– Traffic site level – Traffic crash level – Driver-vehicle unit level – Vehicle occupant level.
All these traffic entity levels are structured along a horizontal time axis, defined as Time level,
thus resulting in a “ 5× T ” -level hierarchy. The involvement and emphasis for different sub-
groups of these levels depend on different research purposes and also rely on the
heterogeneity examination on crash data employed. Generally, macro-analysis focus on the
top three levels, i.e. Geographic region level, Traffic site level, and Traffic crash level, while
micro-analysis concern the bottom three levels, i.e. Traffic crash level, Driver-vehicle unit
level, and Vehicle occupant level.
(Insert Figure 2 here)
Specifically, the Geographic regional level could be a number of regions, countries,
states, or cities etc. Inter-regional studies generally include the traffic data collected from the
regions of interest. This level are normally associated with a number of contextual factors
potentially affecting the traffic safety situation such as driving regulations, road density,
spatial features, population and other socio-economic features. Nested under Geographic
regional level is Traffic site level, which is of greatest interest in many traffic safety studies. It
consists of what constitute the basic road network, namely road segments (link) and road
junctions (node). Various collective or comparative safety studies are conducted regarding
road design, operation and assessment. While traffic sites necessarily reside in some
geographic region, various types of traffic crash occur at different traffic sites. Traffic crash
level has been the most direct and thus most used criterion in monitoring the safety situation
for traffic sites. It is intuitively reasonable that characteristics of crashes occurring at a same
site should be correlated due to the same context in terms of geometric, traffic, and regulatory
control factors. Measures such as crash severity, collision type and possible crash causes are
used to characterize the traffic crashes. Driver-vehicle unit level, including driver and vehicle
crash involvements, is the most concerned entity in traffic safety because it directly relates to
the life and property loss. Individual severity of driver injury or vehicle damage may
potentially show a strong correlation between those involved in the same multi-vehicle
crashes. Various driver and vehicle characteristics are factors distinguishing different involved
units in this level. The lowest level in the hierarchy is vehicle occupants involved in crashes.
This level is commonly concerned with such aspects as driver behavior and vehicle design.
Finally, traffic data in any entity level are necessarily marked by a time scale (horizontal axis
in the prism), with which the interest of studies may be on the time serial correlations of
traffic safety situation.
In the framework of the 5× T -level hierarchy, typical data clustering designs in traffic
safety research could vary depending on the research purposes. For example, in some inter-
regional studies, with Geographic region level as higher level, study subjects in the lower
level could be safety performance of various traffic sites, drivers or vehicles. In these cases,
two-entity-level design could be used to explicitly examine the safety effects of risk factors in
Huang et al. 7

both individual and contextual levels. The two-entity-level design can be naturally extended to
reflect three-entity-level data structure, for example, Geographic region level – Traffic site
level – Traffic crash level. Moreover, when time series are considered, panel data design or
repeated cross-sectional design could be used. In panel data design, a set of sites within
regions are pre-selected on which repeated measures along time are conducted, whereas
repeated cross-sectional design consider a number of time periods, in each of which selected
sites may be different.

BAYESIAN HIERARCHICAL METHOD ON MULTILEVEL CRASH DATA

The previous sections show that appropriate method is needed to account for the multilevel
data structure in traffic safety discipline. In this section, a methodological framework using
Bayesian hierarchical modeling is established to properly model the potential heterogeneities
due to the multilevel data structure. A number of advantages of this method ensure its great
potential of extensive applications in traffic safety discipline.

Hierarchical Models
To model the multilevel data structure, several potential solutions have been found in the
literature. For example, some researchers have employed the artificial intelligent models (AI)
in crash prediction such as the most widely used neural networks (NN) and Bayesian NN (20-
23). But the NN has been criticized for being black boxes incapable of generating explicit
functional relationships and statistically interpretable results. Another useful technique in
accounting for correlated data is generalized estimating equations (GEE), which is regarded as
an extension of GLM (24, 25). GEE is also called as ‘marginal’ model, as distinguished from
‘subject specific’ model, such as hierarchical model in this paper. When dealing with
multilevel data structure, GEE aims to provide estimates with acceptable properties only for
the fixed parameters in the model, while treating the existence of any random parameters as a
necessary ‘nuisance’. Hence, the GEE may merely be superior in the case where the exact
form of the multilevel data structure is unknown.
Another way to distinctly address the multilevel data structure is to use hierarchical
models (also called as multilevel model or random effect model). Hierarchical modeling is a
statistical technique that allows multilevel data structures to be properly specified and
estimated (see 26). Although the basic theories of hierarchical model have been developed
and discussed for many years, it is only recent that many practical limitations on the use of
hierarchical analysis have been overcome. Currently, hierarchical models have become
commonplace in research in a variety of other disciplines such as sociology, education,
political science, and public health. In employing the hierarchical model in the first
application in traffic crash study, Shankar et al. (27) showed that the introducing of site-
specific random effects and time indicators into the NB regression model can significantly
improve the explanatory power of crash models. Jones and Jorgenson (28) presented a good
exploration and discussion on the potential applications of the hierarchical models. Since then,
the hierarchical modeling technique has been gaining an increasing amount of attention in
accounting for the multilevel data structure in crash prediction. For example, while some
researchers (29-32) employed the hierarchical models for predicting crash frequency, others
(28, 33, 34) developed hierarchical models to identify factors affecting crash severity.
As defined by Gelman and Hill (26), a multilevel/hierarchical model is a regression (a
linear or generalized linear model) in which the parameters – the regression coefficients – are
given a probability model. Hence, this higher-level model has parameters of its own – the
hyperparameters of the model – which are also estimated from the data. In the context of
Huang et al. 8

GLM, the hierarchical modeling (also called hierarchical GLMs) is mainly working on the
link function: disturbance terms are added to the model corresponding to different sources of
variation in the multilevel data.
Specifically, recall the general expression of statistical modeling in Equation (1), while
the first part of the expression ( Dist (θ) ) remains to represent different characteristics of crash
feature of interest, it is the disturbance term ε which differs the hierarchical modeling to
classical statistical models. It should be noted that here the ε represents a general concept for
the disturbances. In fact, it could consist of many components, with some of which working
on the intercept, others on the slopes in the link function.
A two-level hierarchical model is used to mathematically interpret how the method
works on the multilevel data. As with most practices, a basic linear combination of X, β is
assumed to simplify the interpretation. Furthermore, the covariate vector X is divided into
three components, c(1, X L1 , X L 2 ), to respectively represent the factors associated with
intercept, individual level (level 1) and group level (level 2). Correspondingly, β and ε are
also divided into different components to serve different functions with the bold symbol
representing vector or matrix. Hence, the link function becomes the combination of models in
terms of two levels,

Level 1 model: f −1
(θ) = β 0L1 + X L1β L1 + ε L1 (2)
Level 2 model: β L1
0 =β L2
00 +X β
L2 L2
0 +ε L2
0

β L1
=β L2
01 +X β
L2 L2
1 + ε 1L 2

The combined model is obtained by substituting the Level 2 model into Level 1 model,

f −1
(θ) = ( β 00L 2 + X L1β 01
L2
+ X L 2 β 0L 2 + X L1 X L 2 β1L 2 ) + (ε L1 + ε 0L 2 + X L1ε 1L 2 ) (3)

It is clear that now the link function consists of two parts: fixed part and random part.
The fixed part means a deterministic relationship fully depending on covariate X , while
random part is stochastically determined by a number of disturbance terms. The components
in both the two parts are interpreted as follows,

Fixed part:

1) β 00L 2 : The intercept, which is the main effect with all covariates equal zero. By centering all
covariates on their mean, this term represents the main effect on the average values of
covariates.
2) X L1β 01 L2
: β 01
L2
is the mean of the main-effect coefficient of level 1 covariates X L1 on the
dependent variable.
3) X L 2 β 0L 2 : β 0L 2 is the main-effect coefficient of level 2 covariates X L 2 on the dependent
variable.
4) X L1 X L 2 β1L 2 : β1L 2 is the interactive-effect coefficient of X L1 and X L 2 . This component make
it possible to in-depth understand how contextual factor (level 2 covariates) could affect the
individual factor (level 1 covariates).
Huang et al. 9

Random part:

1) ε L1 : The disturbance term associated with level 1 analysis. Normally, it is assumed to be


identical independent distributed (IID) among individuals with mean zero and variance to
be estimated. The associated unknown variance structure of this term facilitates the
estimation of unobservable or omitted between-individual heterogeneity. The additional
disturbance in overdispersed Poisson models is a typical example, in which with Gamma
distribution assumption on exp(ε ) resulting in Poisson-Gamma model, and Lognormal
distribution assumption in Poisson-Lognormal model.
2) ε 0L 2 : The disturbance term associated with level 2 analysis. It is also common to assume
IID among groups (level 2) with mean zero and variance to be estimated. With this term,
those individuals (level 1) belonging to a same group (level 2) share a same variance
component, thus resulting in a within-group covariance. As a result, the model intercept
now consists of three parts: β 00L 2 + ε L1 + ε 0L 2 and is hence variable by between-individual
(or within-group) variation ε L1 as well as between-group variation ε 0L 2 .
3) X L1ε 1L 2 : ε 1L 2 is the disturbance vector on the slope of level 1 covariates X L1 associated with
level 2. ε1L 2 makes the slope of X L1 variable according to the data clustering. In other
words, individuals in a same group share with a same variance on the slope. As a result, the
slope of X L1 consists of two parts: β 01 L2
+ ε 1L 2 and is hence variable by between-group
variation ε1L 2 . Note that while X L1 has varying main-effect slope, the main-effect slope for
X L 2 is fixed. A higher level analysis, for example in a three-level model, could make this
level 2 slope varying.

It is clear that ε 0L 2 and ε1L 2 are the unique features of hierarchical models while all of
the rest components could be included and estimated in classical models. It is just these two
stochastic terms making it possible to account for the unobservable or omitted heterogeneity
in Level 2 model.
In the framework of hierarchical modeling, the two-level model shown in Equation (3)
is also called as varying-intercept and varying-slope model as defined by Gelman and Hill
(2007). Obviously, this full-version model could be simplified by taking account of either
varying intercept or varying slope, resulting in varying-intercept model and varying-slope
model.

Varying-intercept model:

f −1
(θ) = ( β 00L 2 + X L1β 01
L2
+ X L 2 β 0L 2 + X L1 X L 2 β1L 2 ) + (ε L1 + ε 0L 2 ) (4)

Varying-slope model:

f −1
(θ) = ( β 00L 2 + X L1β 01
L2
+ X L 2 β 0L 2 + X L1 X L 2 β1L 2 ) + (ε L1 + X L1ε 1L 2 ) (5)

Clearly, all of these models could be expanded to accord with more complicated
designs. The above derivative also showed that the hierarchical modeling provides a rather
flexible technique to account for various study purposes and different extent of model
Huang et al. 10

complexity such as within-level or between-level interactions, varying intercept, and varying


slopes.
However, fitting hierarchical models, as well as displaying, checking, analyzing the
model results necessarily get much more complicated than classical models. Given the
increased ‘costs’ of using hierarchical models, a number of major advantages are identified in
traffic safety research context.
1) Hierarchical modeling provides a coherent model that simultaneously incorporates both
individual-level and group-level models. In classical models, it is feasible to include
covariates from all levels but incapable of also including the group indicator to account for
the omitted or unobservable cross-group heterogeneity.
2) Hierarchical modeling is more efficient in inference for parameters. In case of multilevel
data, modeling with a complete pooling across all groups would give the average estimate
ignoring variation among groups in contextual safety performance as shown in Case (a) of
Figure 2. Whereas complete polling ignores variation between groups, the no-pooling
analysis (separately modeling for each region) overstates it. The modeling paradigm of
hierarchical analysis represents a preferred partial pooling, a compromise between these
two extremes.
3) Just because hierarchical modeling combines information from both overall individual-
level variation and group-level effect, it is feasible to use all the data to perform inference
for groups with small sample size. This is impossible in classical model where only the
local information is used.
4) Crash prediction models are sometimes used for prediction purpose. Since hierarchical
model allow the data vary by group, we can make predictions for new units in existing
groups or in new groups. The latter is difficult to do in classical models: if a model ignores
group effects, it will tend to understate the error in predictions for new groups.

4.2 Bayesian Inference


Algorithms of likelihood-based inference (such as maximum likelihood estimate, or MLE) for
some hierarchical GLMs have been successfully built up for many years. Currently, such
hierarchical models can be fitted from a frequentist perspective with specialized computer
software such as “MLwinN” and “HLM”.
An alternative approach is to put the hierarchical model into a Bayesian framework
which explicitly models the hierarchical structure. With the recent development of computing
capacity and Bayesian analysis techniques, some researchers have been working on
calculating the models in a Bayesian framework (26, 35). Bayesian inference (BI) is the
process of fitting a probability model to a set of data and summarizing the result by a
probability distribution on the parameters of the model and on unobserved quantities such as
predictions for new observations. Instead of giving “maximum likelihood” estimates for the
studied unknowns totally based on the sample data in MLE inference, the essential
characteristic of Bayesian methods is its explicit use of probability for quantifying uncertainty
in inferences based on statistical data analysis. Specifically, the ultimate aim of Bayesian data
analysis is to obtain the marginal posterior distribution of all unknowns, and then integrate
this distribution over the unknowns that are not of immediate interest to obtain the desired
marginal distribution.
The BI is recommended for the proposed hierarchical models in this paper. As
indicated from a large number of theoretical studies and applications, BI shows numerous
theoretical and practical advantages over the “classical” likelihood-based inference methods
(also called frequentist methods). Several major advantages in the traffic safety context are
identified as follows.
Huang et al. 11

1) BI can accumulate evidences from any information sources regarding crash prediction. A
special property of the crash prediction models among most the traffic safety problems is
that the data is difficult to collect and gradually available along the time scale, e.g. year by
year. And furthermore, there are many possible variations for the prediction models itself
as the outcome of changes of some influential factors, e.g. the installation of red light
camera, or the adjust of amber interval time. This means that, to make the models valid,
we need update them periodically with the coming of new data. The Bayesian algorithm
provides a quite flexible and reliable measure to realize this updating requirement. In
Bayesian context, the previous model, any engineering experiences or justified previous
findings could be used as the prior knowledge of the updated model (37).
2) Missing data occur very commonly in crash records. In Bayesian method, missing data are
automatically modeled as latent variables in a manner that takes into account the
information contained in other observed data.
3) Bayesian posterior distributions for parameters are perfectly valid for any size of sample.
One of the most important strength of BI is the capability to handle small size data. The
extensive application of empirical Bayesian approach in observational before-after study
of safety treatment evaluation is a good supportive example of this statement (Hauer,
1997).
4) Regarding model comparison, frequentist hypothesis tests require that only two models
are compared, and these models must be nested. In a Bayesian setting, any number of non-
nested models may be compared.

The general computing approach for BI is Markov chain Monte Carlo (MCMC)
methods. MCMC is a general method based on drawing values of unknowns from
approximate distributions and then correcting those draws to better approximate the target
posterior distribution. Gibbs sampler and the Metropolis-Hastings algorithm are the most
widely used simulation algorithms in MCMC. BUGS modeling language (Bayesian Inference
using Gibbs Sampling) is a prevailed tool to allow the computation using MCMC algorithms
for all sorts of Bayesian models, including most of the hierarchical models applied.
WinBUGS package (37) provides a flexible and simplified platform to modeling with the
BUGS programs. In particular, since specification of the full conditional densities is not
necessary in WinBUGS, small changes in program code can achieve a wide variation in
modeling options and thus facilitating sensitivity analysis and prior assumptions. In the
following, a case study is summarized to illustrate the proposed Bayesian hierarchical method
regarding model development, calibration and evaluation.

ILLUSTRATIVE EXAMPLE

In this example, a two-entity-level multilevel design (Individual severity ~ Traffic crash level
– Driver-vehicle unit level) was conducted to investigate the individual severity of driver
injury and vehicle damage in intersection crashes (see 34 for details of this study). A total of
4095 crashes occurring at signalized intersections during 2003-2005 were extracted and used
in the model. In these, 7840 driver-vehicle units were involved, resulting in an average
involvement rate of 1.91 individuals per crash.
To yield a net effect estimate of each potential factor on individual severity, a binary
dependent variable is defined by combining the severity of driver injury and vehicle damage.
In particular, for the ith driver-vehicle unit involved in the jth crash, the high severity Yij = 1 is
defined when the driver was fatal/seriously injured or the vehicle was extensively damaged,
Huang et al. 12

and low severity otherwise ( Yij = 0). One the other hand, ten covariates in the Traffic crash
level (level 2) and five in the Driver-vehicle unit level (level 1) are employed to explain the
severity variations. The factors included in the model are listed in Table 1.
(Insert Table 1 here)
A preliminary examination of potential within-crash correlation in the collected data
set identified a significant correlation between individuals involved in same multi-vehicle
crashes. In particular, in a multi-vehicle crash, if the severity of driver-vehicle unit was
identified as high severity state, then the others had a probability of 31% also to be in high
severity. On the other hand, if a driver-vehicle unit was in low severity state, then the others
had only 12% chance to be in low severity state. This significantly lower ratio clearly implies
that the correlation among the individual severities in a multi-vehicle crash may exist. Hence,
a hierarchical Binomial Logistic model (HBL) may be more appropriate in modeling the data
than ordinary Binomial Logistic model (BL).
In the model specification, only the varying-intercept model was investigated to avoid
excess complexity as the large set of covariates used. Specifically, the probability of Yij = 1 is
denoted by π ij = Pr(Yij = 1) , hence the HBL model with varying-intercept could be expressed
as,

 π ij 
Logit(π ij ) = log  = β 0 + X iL1β1 + X Lj 2 β 2 + ε j (6)
1− π 
 ij 

where, X iL1 is the vector of five covariates in the Driver-vehicle unit level (level 1) while X Lj 2
is the vector of ten covariates in the Traffic crash level (level 2). β 0 , β1 and β 2 are the
regression coefficients to be estimated. ε j is the disturbance term on the crash level (level 2),
introducing a random effect for the model intercept for different crashes. As a result, those
driver-vehicle units (level 1) belonging to a same group (level 2) share a same variance
component, thus resulting in a within-group covariance. ε j is assumed as a normal
distribution with mean zero and variance τ 02 . The variance of outcome ( Yij ) therefore consists
of two components: the variance of ε j ( τ 02 ) which captures the between-crash variability
(level 2), and the variance associated with logistic distribution which captures the within-crash
variability (level 1).
In the absence of reliable informative priors, uninformative priors were employed for
all regression coefficients ( β 0 , β1 and β 2 ) with Normal distributions (0, 1000), and the
variance τ 02 with Inverse-Gamma distribution (0.001, 0.001). In the model calibration using
WinBUGS package, three chains of 20,000 iterations each produced trace plots with a good
degree of mixing, and Brooks, Gelman and Rubin convergence diagnostics indicated
convergence.
To check the model adequacy, the normality assumption of ε j were assessed. In the
MCMC simulation, 200 random effects ε j were randomly sampled, and the fact that they
averaged very close to zero was reassured. Normal probability plots, revealing no strong
abnormalities, also validate the normality and exchangeability assumptions.
Huang et al. 13

In the results, the variance of ε j , indicating the magnitude of the between-crash


variance ( τ 02 ), is 1.34. An Intra-class Correlation Coefficient ρ (ICC) could be normally
defined to examine the proportion of specific crash-level variance (level 2) in overall residual
variance (28, 32). Since the logistic distribution for the individual-level (level 1) residual
implies a variance of π 2 / 3 = 3.29 , this implies that for the HBL model of this example, the
ICC for between-crash residual is

τ 02 1.34
ρ= 2 = = 28.9% (7)
τ 0 + π / 3 1.34 + π 2 / 3
2

The ICC is an indicator of the magnitude of the within-crash correlation. A value of ρ close to
zero means that there is a very small variation between the different crashes, whereas a
relative large value of ρ implies a favor for hierarchical model. This means that 28.9% of
unexplained variations in individual severity were resulted from between-crash heterogeneity,
which strongly suggests the usefulness of the model specification of hierarchical structure.
To further ensure the advantage of employing HBL over BL, a BL model with the
same covariates and dataset was also be estimated to compare with the calibrated HBL model.
The BL model was obtained by dropping random effects ε j , which means ignoring the
severity correlations between driver-vehicle units within the same crashes.
For model comparison, Deviance Information Criterion (DIC), proposed by
Spiegelhalter et al. (38) is used. In complex hierarchical models where parameters may
outnumber observations, DIC provides a Bayesian measure of model complexity and fit that
can be combined to compare models of arbitrary structure. Specifically, DIC is defined as:

DIC = D(γ ) + 2 p D = D(γ ) + p D (8)

where D(γ ) is the deviance evaluated at the posterior means of estimated unknowns ( γ ), and
posterior mean deviance D(γ ) can be taken as a Bayesian measure of fit or “adequacy”. p D is
motivated as a complexity measure for the effective number of parameters in a model, as the
difference between D(γ ) and D(γ ) , i.e., mean deviance minus the deviance of the means. As
a generalization of Akaike Information Criterion (AIC), DIC can thus been considered as a
Bayesian measure of fit or adequacy, penalized by an additional complexity term p D . As with
AIC, models with lower DIC values are preferred.
(Insert Table 2 here)
As shown in Table 2, model comparison between HBL and BL using DIC further
strengthened the advantage of hierarchical model. Specifically, results show that D(γ ) of HBL
model (1984.5) is less than one third of that obtained in OBL model (6165.5). After penalized
by p D , the DIC value for HBL model (3067.9) is also hugely less than that in OBL model
(6191.9). This further proves that the use of crash-level random effects in HBL model can
substantially improve the model fit.
Huang et al. 14

CONCLUSION

This paper attempts to promote the use of multilevel analysis in crash prediction models. On
the one hand, it was shown that multilevel data structure exists extensively in traffic safety
because of data collection and clustering processes. For this, a 5 × T -level hierarchy was
innovatively proposed in this paper to give a general form for potential multilevel data in
traffic crash analysis. On the other hand, it was found that traditional crash prediction models,
such as wildly-used GLMs, are incapable of taking into account the heterogeneities due to
multilevel data structure. Disregarding the possible within-group correlations may lead to
production of models with unreliable parameter estimates and statistical inferences.

To appropriately model the potential cross-group heterogeneities in multilevel data, a


methodological framework was established. In this framework, hierarchical models that allow
multilevel data structure to be explicitly specified and estimated are employed. Bayesian
inference using Markov chain Monte Carlo algorithm is introduced and recommended to
calibrate the proposed hierarchical models. The proposed method was illustrated in an
individual-severity analysis of intersection crashes using the Singapore crash data. The
illustrative example showed the flexibilities and effectiveness of the Bayesian hierarchical
method in modeling multilevel structure of traffic crash data.

The proposed multilevel analysis has a great potential in traffic safety discipline. While most
previous studies ignored the multilevel structure in traffic crash data, this study suggested the
importance of accounting for the cross-group heterogeneities in yielding reliable effect
estimation for various risk factors as well as predictions for traffic safety situation in existing
or new traffic sites.
Huang et al. 15

REFERENCES

1. Jovanis, P., Chang, H., 1986. Modeling the relationship of accident to mile traveled.
Transportation Research Record 1068, 42-51.
2. Joshua, S.C., Garber, N.J., 1990. Estimating truck accidents rate and involvements using
linear and Poisson regression models. Transportation Planning and Technology 15(1), 41-
58.
3. Jones, B., Jansen, L., Mannering, F.L., 1991. Analysis of the frequency and duration of
the freeway accidents in Seattle. Accident Analysis and Prevention 23(4), 239-55.
4. Miaou, S.P., Lum, H., 1993. Modeling vehicle accidents and highway geometric design
relationships. Accident Analysis and Prevention 25(6), 689-709.
5. Mannering, F.L., Grodsky, L.L., 1995. Statistical analysis of motorcyclists’ perceived
accident risk. Accident Analysis and Prevention 27(1), 21–31.
6. Shankar, V.N., Mannering, F., 1996. An exploratory multinomial Logit analysis of single-
vehicle motorcycle accident severity. Journal of Safety Research 27(3), 183-194.
7. Mercier, C.R., Shelley, M.C., Rimkus, J., Mercier, J. M., 1997. Age and gender as
predictors of injury severity in head-on highway vehicular collisions. Transportation
Research Record 1581, 37-46.
8. Simoncic, M., 2001. Road fatalities in Slovenia involving a pedestrian, cyclist or
motorcyclist and a car. Accident Analysis and Prevention 33(2), 147-156.
9. Al-Ghamdi, A.S., 2002. Using logistic regression to estimate the influence of accident
factors on accident severity. Accident Analysis and Prevention 34(6), 729-741.
10. O’Donnell, C.J., Connor, D.H., 1996. Predicting the severity of motor vehicle accident
injuries using models of ordered multiple choice. Accident Analysis and Prevention 28(6),
739-753.
11. Quddus, M.A., Noland, R.B., Chin, H.C., 2002. An analysis of motorcycle injury and
vehicle damage severity using ordered probit models. Journal of Safety Research 33(4),
445-462.
12. Rifaat, S.M., Chin, H.C., 2005. Analysis of severity of single-vehicle crashes in
Singapore. In: TRB 2005 Annual Meeting CD-ROM, Transportation Research Board,
National Research Council, Washington D.C.
13. Abdel-Aty, M., Keller, J., 2005. Exploring the overall and specific crash severity levels at
signalized intersections. Accident Analysis and Prevention 37(3), 417-425.
14. Miaou, S.P., 1994. The relationship between truck accidents and geometric design of road
section: Poisson versus negative binomial regression. Accident Analysis and Prevention
26(4), 471-482.
15. Shankar, V. N., Mannering, F. L., Barfield, W., 1995. Effect of roadway geometric and
environmental factors on rural freeway accident frequencies. Accident Analysis and
Prevention 27 (3), 371-389.
16. Kulmala, R., 1995. Safety at rural three- and four-arm junctions: development and
application of accident prediction models. Technical Research Center at Finland, VTT
Publications, Espoo.
17. Poch, M., Mannering, F. L., 1996. Negative binomial analysis of intersection accident
frequencies. Journal of Transportation Engineering 122(2), 105-113.
18. Abdel-Aty, M., Radwan, E., 2000. Modeling traffic accident occurrence and involvement.
Accident Analysis and Prevention 32(5), 633-642.
19. Lord, D., Miranda-Moreno, L.F., 2007. Effects of low sample mean values and small
sample size on the estimation of the fixed dispersion parameter of Poisson-gamma models
for modeling motor vehicle crashes: a Bayesian perspective. Safety Science, in press.
Huang et al. 16

20. Mussone, L., Ferrari, A. and Oneta, M., 1999. An analysis of urban collisions using an
artificial intelligence model. Accident Analysis and Prevention 31(6), 705-718.
21. Abdelwahab, H.T., Abdel-Aty, M.A., 2001. Development of artificial neural network
models to predict driver injury severity in traffic accidents at signalized intersections.
Transportation Research Record 1746, 6-13.
22. Riviere, C., Lauret, P., Ramsamy, J.F.M. and Page, Y., 2006. A Bayesian neural network
approach to estimating the energy equivalent speed. Accident Analysis and Prevention
38(2), 248-259.
23. Xie, Y., Lord, D. and Zhang, Y., 2007. Predicting Motor Vehicle Collisions Using
Bayesian Neural Network Models: An Empirical Analysis. Accident Analysis &
Prevention, in press.
24. Abdel-Aty M.A. Abdalla M.F, 2004, Linking roadway geometrics and real-time traffic
characteristics to model daytime freeway crashes: generalized estimating equations for
correlated data. Transportation Research Record 1879, 106-115.
25. Lord, D. and Persaud, B.N., 2000. Accident prediction models with and without trend:
application of the generalized estimating equations procedure. Transportation Research
Record 1717, 102-108.
26. Gelman, A., Hill, J., 2007. Data Analysis Using Regression and Multilevel/Hierarchical
Models. Cambridge University Press.
27. Shankar, V.N., Albin, R.B., Milton, J.C., Mannering, F.L., 1998. Evaluation of median
crossover likelihoods with clustered accident counts: an empirical inquiry using the
random effect negative binomial model. Transportation Research Record 1635, 44-48.
28. Jones, A.P., Jorgensen, S.H., 2003. The use of multilevel models for the prediction of
road accident outcomes. Accident Analysis and Prevention 35(1), 59-69.
29. Mitra, S., Washington, S., 2007. On the nature of over-dispersion in motor vehicle crash
prediction models. Accident Analysis and Prevention 39(3), 459-468.
30. Chin, H.C., Quddus, M.A., 2003. Applying the random effect negative binomial model to
examine traffic accident occurrence at signalized intersections. Accident Analysis and
Prevention 35(2), 253-259.
31. Yang, C. MacNab, 2003. A Bayesian hierarchical model for accident and injury
surveillance. Accident Analysis and Prevention 35(1), 91-102.
32. Kim, D.G., Lee, Y., Washington, S., Choi, K., 2007. Modeling crash outcome
probabilities at rural intersections: application of hierarchical binomial logistic models.
Accident Analysis and Prevention 39(1), 125-134.
33. Lenguerrand, E., Martin, J.L., Laumon, B., 2006. Modeling the hierarchical structure of
road crash data: application to severity analysis. Accident Analysis and Prevention 38(1),
43-53.
34. Huang, H.L., Chin, H.C., Haque, M.M., 2007. Severity of driver injury and vehicle
damage in traffic crashes at intersections: A Bayesian hierarchical analysis, Accident
Analysis and Prevention, Article in press.
35. Gelman, A., Carlin, J.B., Stern, H.S., 2003. Bayesian Data Analysis, 2nd edition.
Chapman & Hall, New York.
36. Oh, J., Washington, S., 2006. Bayesian methodology incorporating expert judgment for
ranking countermeasure effectiveness under uncertainty: example applied to at grade
railroad crossings in Korea. Accident Analysis and Prevention 38 (2), 346-356.
37. Spiegelhalter, D.J., Thomas, A., Best, N.G., Lunn, D., 2003. WinBUGS version 1.4.1
User Manual. MRC Biostatistics Unit, Cambridge, UK.
Huang et al. 17

38. Speigelhalter, D. J., Best, N. G., Carlin, B. P., Linde, V. D. 2003. Bayesian measures of
model complexity and fit (with discussion). Journal of the Royal Statistical Society,
Series B, 64(4), 583-616.
Huang et al. 18

TABLE 1 Covariates Used in the Study


Covariates in the crash-level
Day of week Road surface condition Vehicle movement
Time of day Weather condition Presence of red light camera
Intersection type Street lighting condition Pedestrian involved
Nature of lane Road speed limit
Covariates in the individual level
Vehicle type Driver gender Passenger on board
Driver age Offending party
Huang et al. 19

TABLE 2 Model Comparison Using DIC and ICC


D(γ ) D(γ ) pD DIC

BL model 6165.5 6139.1 26.4 6191.9


HBL model 1984.5 901.1 1083.4 3067.9
Huang et al. 20

FIGURE 1 Possible Relationships between Crash Occurrence and Risk Factors


Huang et al. 21

FIGURE 2 A 5 × T -Level Hierarchy in Traffic Safety Data

You might also like