Proceedings of the 1991 Winter Simulation Conference
Barty L. Nelson, W. David Kelton, Gordon M. Clark (eds.)
SIMULATION MODEL VERIFICATION AND VALIDATION*
Robert G. Sargent
Simulation Research Group
449 Link Hall
Syracuse University
Syracuse, New York 13244
ABSTRACT
‘This paper discusses verification and validation of
simulation models. The different approaches to deciding
‘model validity are described; how model verification and
validation relate (0 the model development process is
specified; various validation techniques are defined;
conceptual model validity, model verification,
‘operational validity, and data validity are discussed; ways
to document results are given; and a recommended
validation procedure is presented.
1 INTRODUCTION
Simulation models are increasingly being used in
problem-solving and to aid in decision-making. The
developers and users of these models, the decision-makers
using information derived from the results of the models,
and people effected by decisions based on such models are
all rightly concerned with whether a model and its results
are "correct". This concem is addressed through model
verification and validation. Model validation is usually
defined to mean "substantiation that a computerized
model within its domain of applicability possesses a
satisfactory range of accuracy consistent with the
intended application of the model” (Schlesinger, et al
1979) and is the definition used here. Model verification
is often defined as “ensuring that the computer program
of the computerized model and its implementation is
correct”, and is the definition adopted here. A related
topic is model credibility (or acceptability) which is
developing in the (potential) users of information from
the models (¢.g., decision-makers) sufficient confidence
in the information that they are willing to use it.
‘A model should be developed for a specific purpose or
+ This paper is an updated version of "A Tutorial on
Validation and Verification of Simulation Models",
Proceedings of the 1988 Winter Simulation Conference
Conference, pp 33-39.
37
application and its validity determined with respect to
that purpose. If the purpose of a model is to answer a
variety of questions, the validity of the model needs to be
determined with respect to each question. (Different
models of the same system are sometimes developed for
different purposes.) Several sets of experimental
conditions are usually required to define the domain of a
model's intended applicability. A model may be valid for
‘one set of experimental conditions and be invalid in
another. A model is considered valid for a set of
experimental conditions if its accuracy is within its
acceptable range of accuracy which is the amount of
accuracy required for the model's intended purpose. This,
generally requires that the variables of interest, i. the
variables used in answering the questions in the purpose
of the model, be identified and their required accuracy
determined. If the variables of interest are random
variables, then properties and functions of the random
variables such as their means and variances are frequently
what is of primary interest and are what are used in
determining model validity. Several versions of a model
are usually developed prior to obtaining a satisfactory
valid model, The substantiation that a model is valid,
ie, model and verification validation, is generally
considered to be a process and is usually part of the
model development process.
It is often too costly and time consuming to
determine that a model is absolutely valid over the
complete domain of its intended applicability. Instead,
tests and evaluations are conducted until sufficient
confidence is obtained that a model can be considered
valid for its intended application (Sargent 1982, 1984 and
Shannon 1975, 1981). ‘The relationships of cost (and a
similar relationship holds for the amount of time) of
performing model validation and the value of the model
to the user as a function of model confidence are
itlustrated in Figure 1. ‘The cost of model validation is,
usually quite significant; in particular where extremely
high confidence is required because of the consequence of
an invalid model.38
VALUE VALUE
GOEL
cost oa
Cost USER
0% MODEL CONFIDENCE 100%
Figure 1: Model Confidence
“The remainder of this paper is organized as follows:
Section 2 discusses the three basic approaches used in
deciding model validity; Section 3 defines the validation
techniques used; Sections 4, 5, 6, and 7 contain
descriptions of data validity, conceptual model validity,
computerized model verification, and operational validity,
respectively; Section 8 describes ways of presenting
results; Section 9 contains a recommended valida
procedure; and Section 10 gives the conclusions.
2 VALIDATION PROCESS
‘There are three basic decision-making approaches
used in determining that a simulation model is valid.
Each of these approaches require the model development
team to conduct verification and validation as part of the
‘model development process and this is discussed below
in some detail. The most common decision-making
approach is for the model development team to make the
decision that the model is valid. This decision is a
subjective decision based on the results of the various
tests and evaluations conducted as part of the model
development process.
Another approach, often called independent
verification and validation (IV&V), uses a third
(independent) party to decide whether the model is valid.
‘The third party is independent of both the model
development team and the model sponsor/user(s). After
the model has been developed, the third party conducts an
evaluation to determine whether the model is valid.
Based upon this validation, the third party makes a
subjective decision on the validity of the model. This
approach is usually used when there is a large cost
associated with the problem the simulation model is
being used for and/or to help in model credibility.
‘The evaluation used in the IV&V approach can be as
simple as reviewing the verification and validation
performed by the model development team or it may
involve a complete verification and validation effort.
Wood (1986) describes experiences over this range of
evaluation by a third party on energy models. One
Sargent
conclusion that Wood (1986) makes is that a complete
IV&V evaluation is extremely costly and time
consuming for what is gained. This author's view is that
if a third party is to be used, they should be used and
involved during the model development process. If the
model has already been developed, this author believes
that a third party should usually only evaluate what
verification and validation has already been performed and
not repeat earlier work. (Also see Davis (1986) for an
approach that simultaneously specifies and validates a
model)
The last decision-making approach is to use a
scoring model (see, e.g. Balci (1989) and Gass (1979)) to
determine whether a model is valid. Scores (or weights)
are determined subjectively when conducting various
aspects of the validation process. Then these scores are
combined to determine category scores and an overall
score for the simulation model. A simulation model is
considered valid if its overall and category scores are
greater than some passing score(s). This approach is
infrequently used in practice.
‘This author does not believe in the use of a scoring
model for determine validity. One reason is that the
subjectiveness of this approach tends to be hidden and
thus it appears to be objective. A second reason is "how
are passing scores” to be determined. A third reason is
that a model may receive a passing score and yet have a
defect that needs correction. A fourth reason is that the
score(s) may cause over confidence in a model or be used
10 argue one model is better than another.
We will now discuss how model verification and
validation relate to the model development process
There are two common ways to view this relationship.
‘One way uses a detail model development process and the
other uses a simple model development process. Banks,
Gerstein, and Searles (1988) reviewed work in both of
these ways and concluded that the simple way more
clearly illuminates model verification and validation.
This author recommends the use of the simple way (see
e.g., Sargent 1982) and is the way presented here.
Consider the simplified version of the modelling
process in Figure 2. The problem entity is the system
(real or proposed), idea, situation, policy, or phenomena
to be modelled; the conceptual model is the
‘mathematicalfogical/verbal representation (mimic) of the
problem entity developed for a particular study; and the
computerized model is the conceptual model
implemented on a computer. The conceptual model is
developed through an analysis and modelling phase, the
computerized mode! is developed through a computer
programming and implementation phase, and inferences
about the problem entity are obtained by conducting
‘computer experiments on the computerized model in the
experimentation phase.Verification and Validation
\
/
crerarional anaysis
‘auiorty MODELLING =
EXPERIMENTATION arg \ Ne
yf VALIOTY \ yauorry
/ \
é !
‘CONCEPTUAL
MPUTERIZED 1
‘con MODEL [COMPUTER PROGRAMMING _| ‘MODEL,
COMPUTERIZED
MODEL
‘VERIFICATION
Figure 2: Simplified Version of the Modelling Process
We will now relate model validation and verification
to this simplified version of the modelling process (See
Figure 2). Conceptual model validity is defined as
determining that the theories and assumptions underlying
the conceptual model are correct and that the model
representation of the problem entity is "reasonable" for
the intended purpose of the model. Computerized model
verification is defined as ensuring that the computer
programming and implementation of the conceptual
model is correct. Operational validity is defined as
determining that the model's output behavior has
sufficient accuracy for its intended purpose over the
domain of the model's intended applicability. Daca
validity is defined as ensuring that the data necessary for
model building, model evaluation and testing, and
‘conducting the model experiments to solve the problem
are adequate and correct.
Several versions of a model are usually developed in
the modelling process prior to obtaining a satisfactory
valid model. During each model iteration, model
verification and validation are performed (Sargent 1984),
A variety of (validation) techniques are used, which are
described below. Unfortunately, no algorithm or
procedure exists to select which techniques to use. Some
of their attributes are discussed in Sargent (1984).
3. VALIDATION TECHNIQUES
This section describes various validation techniques
(and tests) used in model verification and validation.
Most of the techniques described here are found in the
literature (see Balci and Sargent (1984a) for a detailed
bibliography), although some may be described slightly
39
different. They can be used either subjectively or
objectively. By objectively, we mean using some type
of statistical test or procedure, e.g., hypothesis tests or
confidence intervals. A combination of techniques is
usually used. ‘These techniques are used for validating
and verifying both the overall model and submodels.
Animation (Operational Graphics): ‘The model's
operational behavior is displayed graphically as the
model moves through time. Examples are (i) the
‘graphical plot of the status of a server over time, e... is
it busy, idle, or blocked, and (i) the graphical display of
parts moving through a factory.
Comparison to Other Models: Various results (e.g
outputs) of the simulation model being validated are
compared to results of other (valid) models. Examples
are (i) simple cases of a simulation model may be
compared to known results of analytic models, and (i
the model may be compared to other (simpler) models,
that have been validated. (Sometimes the simulation
model being validated requires modification to allow
comparisons to be made.)
Degenerate Tests: The degeneracy of the mode's
behavior is tested by removing portions of the model or
by appropriate selection of values of the input and
intemal parameters. For example, does the average
number in the queue of a single server continue to
increase with respect to time when the arrival rate is
larger than the service rate.
Event Validity: The “events” of occurrences of the
simulation model are compared to those of the real
system to determine if they are the same. An example of
events are deaths in a given fire department simulation,
Extreme-Condition Tests: ‘The model structure and
output should be plausible for any extreme and unlikely
combination of levels of factors in the system, e.g., if
in-process inventories are zero, production output should
be zero. Also, the model should be bound to restrict the
behavior outside of normal operating ranges.
Face Validity: Face validity is asking people
knowledgeable about the system whether the model
and/or its behavior is reasonable. This technique can be
used in determining if the logic in the conceptual model
is correct and if a model's input-output relationships are
reasonable.
Fixed Values: Fixed values are used for all model
‘input and internal variables. This should allow checking
the model results against hand calculated values,
Historical Data Validation: If historical data exist
(or if data is collected on a system for building or testing
the model), part of the data is used to build the model and
the remaining data is used to determine (test) ifthe model
bbehaves as the system does. (This testing is conducted
by driving the simulation model with either
Distributions or Traces (Balci and Sargent 1982a, 1982b,40
1984b).)
Historical Methods: The three historical methods of
validation are Rationalism, Empiricism, and Positive
Economics, Rationalism assumes that everyone knows
whether the underlying assumptions of a model are true.
‘Then logic deductions are used from these assumptions
to develop the correct (valid) model. Empiricism requires
every assumption and outcome to be empirically
validated. Positive Economics requires only that the
‘model be able to predict the future and is not concerned
with a model's assumptions or structure (causal
relationships or mechanisms).
Internal Validity: Several replications (runs) of a
stochastic model are made to determine the amount of
internal stochastic variability in the model. A high
amount of variability (lack of consistency) may cause the
‘model's results to be questionable, and, if typical of the
problem entity, may question the appropriateness of the
policy or system being investigated.
‘Multistage Validation: Naylor and Finger (1967)
proposed combining the three historical methods of
Rationalism, Empiricism, and Positive Economics into
multistage process of validation. This validation
method consists of (1) developing the model's
assumptions on theory, observations, general knowledge,
‘and function, (2) validating the model's assumptions
where possible by empirically testing them, and (3)
‘comparing (testing) the input-output relationships of the
model to the real system.
Parameter Variability - Sensitivity Analysis: This
validation technique consists of changing the values of
the input and internal parameters of a model to determine
the effect upon the model behavior and its output. The
same relationships should occur in the model as in the
real system. Those parameters that are sensitive, i.e.
ccause significant changes in the model's behavior, should
be made sufficiently accurate prior to using the model.
(This may require iterations in model development.)
Predictive Validation: The model is used to predict
(forecast) the system behavior and comparisons are made
to determine if the system behavior and the model's,
forecast are the same. The system data may come from
‘an operational system or from experiments performed on
the system, e.g, field tests
Traces: The behavior of different types of specific
entities in the model are traced (followed) through the
model to determine if the model's logic is correct and if
the necessary accuracy is obtained,
Turing Tests: People who are knowledgeable about
the operations of a system are asked if they can
discriminate between system and model outputs.
(Schruben (1980) contains statistical procedures for
Turing tests.)
Sargent
4 DATA VALIDITY
Even though data validity is usually not considered
part of model validation, we discuss it because it is
usually difficult, time consuming, and costly to obtain
sufficient, accurate and appropriate data, and is frequently
the reason that early attempts to validate a model fail.
Basically, data is needed for three purposes: for building
the conceptual model, for validating the model, and for
performing experiments with the validated model. In
‘model validation, we are concerned only with the first
two types of data.
To build a conceptual model, we must have
sufficient data on the problem entity to develop theories
that can be used in building the model, to develop the
mathematical and logical relationships in the model for it
to adequately represent the problem entity for its intended
‘purpose, and to test the model's underlying assumptions.
Also needed is behavior data on the problem entity to be
used in the operational validity step of comparing the
problem entity's behavior with the model's behavior.
(Usually, these data are system input/output data.) If
these data are not available, high model confidence
usually cannot be obtained because sufficient operational
validity cannot be achieved.
‘The concern with data is that appropriate, accurate,
and sufficient data are available, and if any data
transformations are made, such as disaggregation, they
are correctly performed. Unfortunately, there is not
‘much that can be done to ensure that the data are correct.
‘The best that one can do is to develop good procedures
for collecting and maintaining data, and test the collected
data using such techniques as internal consistency
checks, and screening for outliers and determine if they
are correct, If the amount of data is large, a data base
should be developed and maintained.
5 CONCEPTUAL MODEL VALIDATION
Conceptual model validity is determining that the
theories and assumptions underlying the conceptual
model are correct, and that the model representation of
the problem entity and the model's structure, logic, and
mathematical and causal relationships are “reasonable”
for the intended purpose of the model. ‘The theories and
assumptions underlying the model should be tested using
mathematical analysis and statistical methods on
Problem entity data. Examples of theories and
assumptions are linearity, independence, stationary, and
Poisson arrivals. Examples of applicable statistical
methods are fitting distributions to data; estimating
parameter values, mean, variance, and correlations among
data observations; and plotting data to see if it is
Stationary. In addition, all theories used should beVerification and Validation
reviewed to ensure they were applied correctly; for
example, if a Markov chain is used, does the system
have the Markov property and are the states and
transition probabilities correct?
Next, each submodel and the overall model must be
evaluated to determine if they are reasonable and correct
for the intended purpose of the model. This should
include determining if the appropriate detail and aggregate
relationships have been used for the model's intended
purpose, and if the appropriate structure, logic, and
mathematical and causal relationships have been used.
The primary validation techniques used for these
evaluations are face validation and traces. Face validation
is having experts on the problem entity evaluate the
conceptual model to determine if they believe it is correct
and reasonable for its purpose. This usually means
examining the flowchart or graphical model, or the set of
model equations. ‘The use of traces is the tracking of
entities through each submodel and the overall model to
determine if the logic is correct and the necessary
accuracy is maintained. If any errors are found in the
conceptual model, it must be revised and conceptual
‘model validation performed again.
6 COMPUTERIZED MODEL VERIFICATION
Computerized model verification is ensuring that the
computer programming and implementation of the
conceptual model is correct. To help ensure that a
correct computer program is obtained, program design
and development procedures found in the field of
Software Engineering should be used in developing and
implementing the computer program. These include
such techniques as top-down design, structured
programming, and program modularity. A separate
rogram module should be used for each submodel, the
‘overall model, and for each simulation function (e.g.,
time-flow mechanism, random number and random
variate generators, and integration routines) when using
‘general purpose higher order languages, ¢., FORTRAN
or PASCAL, and where possible when using simulation
languages (Chattergy and Pooch 1977). (See Whitner and
Balci (1989) for a more detailed discussion on model
verification. )
‘One should be aware that the use of different types
of computer languages effects the probability of having a
correct program. The use of a special purpose
simulation language, if appropriate, generally will result
in having less errors than if a general purpose simulation
language is used, and using a general purpose simulation
language will generally result in having less errors than
if a general purpose higher order language is used. Not
only does the use of simulation languages increase the
probability of having a correct program, they usually
4
reduce programming time.
‘After the computer program has been developed,
implemented, and hopefully most of the programming
"pugs" removed, the program must be tested for
correctness and accuracy. First, the simulation functions
should be tested to see if they are correct, Usually
straightforward tests can be used here to determine if they
are working properly. Next, each submodel and the
overall model should be tested to see if they are correct.
Here the testing is much more difficult. There are two
basic approaches to testing: static and dynamic testing
(analysis) (Fairley 1976). In static testing the computer
program of the computerized model is analyzed to
determine if it is correct by using such techniques as
correctness proofs, structured walk-through, and
examining the structure properties of the program. The
commonly used structured walk-through technique
consists of each program developer explaining their
‘computer program code statement by statement to other
members of the modelling team until all are convinced it
is correct (or incorrect).
In dynamic testing, the computerized model is
executed under different conditions, and the values
obtained are used to determine if the computer program
and its implementations are correct. This includes both
the values obtained during the program execution and the
final values obtained, There are three different strategies
to use in dynamic testing: bottom-up testing which
means, e.g., testing the submodels first and then the
overall model; top-down testing which means, e.g.,
testing the overall model first using programming stubs
(ets of data) for each of the submodels and then testing
the submodels; and mixed testing, which is using a
combination of bottom-up and top-down testing (Fairley
1976). The techniques commonly used in dynamic
testing are traces, investigations of input-output relations,
using the validation techniques, internal consistency
checks, and reprogramming critical components to
determine if the same results are obtained. If there are a
large number of variables, one might aggregate to reduce
the number of tests needed or use certain types of design
of experiments (Kleijnen 1987), e.g., factor screening
experiments (Smith and Mauro 1982) to identify the key
variables, in order to reduce the number of experimental
conditions that need to be tested.
One must continuously be aware while checking the
correctness of the computer program and its
implementation, that errors may be caused by the data,
the conceptual model, the computer program, or the
computer implementation.
7 OPERATIONAL VALIDITY
Operational validity is primarily concerned with42
determining that the model’s output behavior has the
accuracy required for the model's intended purpose over
the domain of its intended applicability. This is where
‘most of the validation testing and evaluation takes place.
‘The computerized model is used in operational validity
and thus any deficiencies found may be due to an
inadequate conceptual model, an improperly programmed
or implemented conceptual model (e.g., due to
programming errors or insufficient numerical accuracy),
‘or due to invalid data,
All of the validation techniques discussed in Section
3 are applicable to operational validity. Which
techniques and whether to use them objectively or
subjectively must be decided by the model development
team and other interested parties. ‘The major attribute
effecting operational validity is whether the problem
entity (or system) is observable or not, where observable
means it is possible to collect data on the operational
behavior of the program entity. Figure 3 gives one
classification of the validation approaches for operational
validity. ‘The “explore model behavior" means to
examine the behavior of the model using appropriate
validation techniques for various sets of experimental
conditions from the domain of the model's intended
applicability and usually includes parameter variability-
sensitivity analysis,
To obtain a high degree of confidence in a model
and its results, comparison of the model's and system's,
input-output behavior for at least two different sets of
experimental conditions is usually required. There are
three basic comparison approaches used: (i) graphs of
the model and system behavior data, (ji) confidence
intervals, and (ii) hypothesis tests. Graphs are the most
‘commonly used approach and confidence intervals are
next.
‘OBSERVABLE [NON-OBSERVABLE
SYSTEM SYSTEM
SUBIECTIVE *COMPARISONOF = * EXPLORE
APPROACH DATAUSING MODEL BEHAVIOR
(GRAPHICAL DISPLAYS
EXPLORE MODEL + COMPARISONTO
BEHAVIOR OTHER MoDaLs
‘ODIECTIVE —*COMPARISONOF——=—* COMPARISON
APPROACH DATA USING TOOTHER
STATISTICAL ‘MODELS USING.
TESTS AND STATISTICAL
PROCEDURES TESTS AND.
PROCEDURES
Figure 3: Operational Validity Classification
Sargent
7.1 Graphical Comparison of Data
‘The model's and system's behavior data are plotted
on graphs for various sets of experimental conditions to
determine if the model's output behavior has sufficient
accuracy for its intended purpose. (See Figures 4 and 5
for examples of such graphs.) A variety of graphs using
different types of measures and relationships are required.
Examples of measures and relationships are (i) time
series, means, variances, and maximums of each output
le, (ji) relationships between parameters of each
output variable, e.g., means and standard deviations, and
(iii) relationships between different output variables. It
is important that appropriate measures and relationships
be used in validating a model and that they be determined
with respect to the model's intended purpose. As an
‘example of a set of graphs used in the validation of a
model, see Anderson and Sargent (1974).
“These graphs can be used in model validation in
three ways. First, the model development team can use
the graphs in the model development process to make a
subjective judgement on whether the model does or does
‘not possess sufficient accuracy for its intended purpose,
20)
sx cosemarn nea system x
be Sea x
face es trem 0X
o| Aewive mignon
‘ose’ Oo
19] x, *
°
2 50
«0
°
xx °
= sol x
3 °
g x8
: x x0
q °
5a oxox *
é ox
30 1203
«
Rox
20] x .
‘ol oo
Ob 8.
fire RENO
oe &
re
0 —1e00 000 3600 4000 S00
NUMBER OF DISK ACCESSES
Figure 4: Disk AccessVerification and Validation
Real system -2
SIMULATION MODEL “A. 2
zis
3 4
Z
5 10
s 2 fe?
Pos fr
ae
s;——s ——
oe os is 15 Zo
AVERAGE VALUE OF REACTION TIME (SECONDS)
Figure 5: Reaction Time
Secondly, they can be used in the face validity technique
‘where experts are asked to make subjective judgements
‘on whether a model does or does not possess sufficient
accuracy for its intended purpose.
‘The third way the graphs can be used is in Turing
Tests. Sets of data from the model and from the system
are plotted on separate graphs. The graphs are shuffled
and then experts are asked to determine which graphs are
from the system and which are from the model. The
results for each measure and relationship can be evaluated
either subjectively or statistically. The subjective
method requires that a subjective decision be made
whether the results are satisfactory or not. ‘The statistical
method requires that the results be analyzed statistically.
See Schruben (1980) for a variety of statistical methods
for analyzing the results of Turing Tests and examples of
their use.
7.2 Confidence Intervals
Confidence intervals (c.i.), simultaneous confidence
intervals (S.c.i.), and joint confidence regions (jc.x.) can
be obtained for the differences between the population
parameters, e.g., means and variances, and distributions
of the model and system output variables for each set of
experimental conditions. These c.i.,s.ci., and j.cx. can
be used as the model range of accuracy for model
validation,
To construct the model range of accuracy, a
statistical procedure containing a statistical technique and
a method of data collection must be developed for each
set of experimental conditions and for each parameter of
interest. ‘The statistical techniques used can be divided
{nto two groups: (A) univariate statistical techniques and
(B) multivariate statistical techniques. The univariate
43
techniques can be used to develop c.i. and with the use of
the Bonferroni inequality (Law and Kelton 1991) s.c.i.
‘The multivariate techniques can be used to develop s.c.i.
and jc.r. Both parametric and nonparametric techniques
can be used,
The method of data collection must satisfy the
underlying assumptions of the statistical technique being
used. The standard statistical techniques and data
collection methods used in simulation output analysis
can be used for developing the model range of accuracy:
namely (1) replication, (2) batch means, (3) regenerative,
(4) spectral, (5) time series, and (6) standardized time
series (Banks and Carson 1984, Law and Kelton 1991).
It is usually desirable to construct the model range
of accuracy with the lengths of the c.i, and s.c.i. and the
sizes of the jc.t. as small as possible. The shorter the
lengths or the smaller the sizes, the more useful and
meaningful the specification of the model range of
accuracy will usually be. The lengths and the sizes of
the joint confidence regions are affected by the values of
confidence levels, variances of the model and system
response variables, and sample sizes. The lengths can be
shortened or sizes made smaller by decreasing the
confidence levels. Variance reduction techniques (Law
and Kelton 1991) can be used when collecting
observations from a simulation model to decrease the
variability and thus obtain a smaller range of accuracy.
‘The lengths can also be shortened or the size decreased by
increasing the sample sizes. A tradeoff needs to be made
‘among the sample sizes, confidence levels, and estimates
of the length or sizes of the model range of accuracy. In
those cases where the cost of data collection is
significant for either the model or system, the data
collection cost should also be considered in the tradeoff
analysis. Tradeoff curves can be constructed to aid in the
tradeoff analysis. Figure 6 is an example of a set of
tradeoff curves which contain the relationship between
the significance level, x, estimated half lengths of the
confidence interval, and cost of data collection,
snumun Hace Lenora estamare (ut
DATA COLLECTION COST IN GOLLARS (coe)
Figure 6: Tradeoff Curves4
Details on the use of ci., s.c.i, and j.c.r. for
operational validity, including a general methodology,
are contained in Balci and Sargent(1984b). A brief
discussion on the use of c.i. for model validation is also
contained in Law and Kelton (1991).
7.3. Hypothesis Tests
Hypothesis tests can be used in the comparison of
parameters, distributions, and time series of the output
data of a model and a system for each set of experimental
conditions to determine if the mode!’s output behavior
has an acceptable range of accuracy. An acceptable range
of accuracy is the amount ef accuracy that is required of a
model to be valid for its intended purpose.
‘The first step in hypothesis testing is to state the
hypotheses to be tested:
Hg: Model is valid for the acceptable range of
accuracy under the set of experimental conditions. (I)
Hj: Model is invalid for the acceptable range of
accuracy under the set of experimental conditions.
‘Two types of errors are possible in testing the
hypotheses in (1). The first or type I error is rejecting
the validity of a valid model; the second or type I error
accepting the validity of an invalid model. The
probability of a type error I is called model builder's risk
(@) and the probability of type Il error is called model
user's risk (B). In model validation, model user's risk is,
extremely important and must be kept small. Thus both
type I and type II errors must be considered in using
hypothesis testing for model validation.
‘The amount of agreement between a model and a
system can be measured by a validity measure. ‘The
validity measure is chosen such that the model accuracy
‘or the amount of agreement between the model and the
system decreases as the value of the validity measure
increases. The acceptable range of accuracy can be used
to determine an acceptable validity range, 0SA