Bank Example: Comparison with Real System Data • Six statistically
independent replications of the model, each of 2-hour duration, are run.
Prof. Dr. Mesut Güneş ▪ Ch. 10 Verification and Validation of Simulation
Models Replication Y4 Arrivals/Hour Y5 Service Time [Minutes] Y2 Average
Delay [Minutes] 1 51.0 1.07 2.79 2 40.0 1.12 1.12 3 45.5 1.06 2.24 4 50.5
1.10 3.45 5 53.0 1.09 3.13 6 49.0 1.07 2.38 Sample mean [Delay] 2.51
Standard deviation [Delay] 0.82 10.23 Bank Example: Hypothesis Testing
• Compare the average delay from the model Y2 with the actual delay Z2
• Null hypothesis testing: evaluate whether the simulation and the real
system are the same (w.r.t. output measures): • If H0 is not rejected, then,
there is no reason to consider the model invalid • If H0 is rejected, the
current version of the model is rejected, and the modeler needs to
improve the model Prof. Dr. Mesut Güneş ▪ Ch. 10 Verification and
Validation of Simulation Models 4.3minutes 4 3minutes 1 2 0 2 ≠ = H : E(Y
) H : E(Y ) . 10.24 Bank Example: Hypothesis Testing • Conduct the t test:
•Chose level of significance (α = 0.05) and sample size (n = 6). •Compute
the sample mean and sample standard deviation over the n replications:
•Compute test statistics: • Hence, reject H0. • Conclude that the model is
inadequate. •Check: the assumptions justifying a t test, that the
observations (Y2i ) are normally and independently distributed. Prof. Dr.
Mesut Güneş ▪ Ch. 10 Verification and Validation of Simulation Models
2.51 minutes 1 1 2 = ∑ 2 = = n i Y i n Y 0.82 minutes 1 ( ) 1 2 2 2 = − − =
∑= n Y Y S n i i t0 = Y2 −µ0 S / n = 2.51− 4.3 0.82 / 6 = 5.34 > t0.025,5
= 2.571 (for a 2-sided test) 10.25 Bank Example: Hypothesis Testing •
Similarly, compare the model output with the observed output for other
measures: Y4 ↔ Z4 Y5 ↔ Z5 Y6 ↔ Z6 Prof. Dr. Mesut Güneş ▪ Ch. 10
Verification and Validation of Simulation Models 10.26 Power of a test Prof.
Dr. Mesut Güneş ▪ Ch. 10 Verification and Validation of Simulation Models
10.27 Power of a test • For validation: • Consider failure to reject H0 as a
strong conclusion, the modeler would want β to be small. Prof. Dr. Mesut
Güneş ▪ Ch. 10 Verification and Validation of Simulation Models The power
of a test is the probability of detecting an invalid model. Power =1−
P(failing to reject H0 | H1 is true) =1− P(Type II error) =1− β 10.28 Power
of a test • Value of β depends on: •Sample size n • The true difference, δ,
between E(Y) and µ • In general, the best approach to control β is: •
Specify the critical difference, δ. • Choose a sample size, n, by making use
of the operating characteristics curve (OC curve). Prof. Dr. Mesut Güneş ▪
Ch. 10 Verification and Validation of Simulation Models σ µ δ − = E(Y)
10.29 Power of a test • Operating characteristics curve (OC curve). •
Graphs of the probability of a Type II Error β(δ) versus δ for a given sample
size n Prof. Dr. Mesut Güneş ▪ Ch. 10 Verification and Validation of
Simulation Models For the same error probability with smaller difference
the required sample size increases! 10.30 Power of a test • Type I error
(α): • Error of rejecting a valid model. • Controlled by specifying a small
level of significance α. • Type II error (β): • Error of accepting a model as
valid when it is invalid. • Controlled by specifying critical difference and
find the n. • For a fixed sample size n, increasing α will decrease β. Prof.
Dr. Mesut Güneş ▪ Ch. 10 Verification and Validation of Simulation Models
Statistical Terminology Modeling Terminology Associated Risk Type I:
rejecting H0 when H0 is true Rejecting a valid model α Type II: failure to
reject H0 when H1 is true Failure to reject an invalid model β 10.31
Confidence interval testing Prof. Dr. Mesut Güneş ▪ Ch. 10 Verification and
Validation of Simulation Models 10.32 Confidence Interval Testing •
Confidence interval testing: evaluate whether the simulation and the real
system performance measures are close enough. • If Y is the simulation
Güneş ▪ Ch. 10 Verification and Validation of Simulation Models ⎥ ⎦ ⎤ ⎢ ⎣
output and µ = E(Y) • The confidence interval (CI) for µ is: Prof. Dr. Mesut
⎡ − + − − n S Y t n S Y t ,n 1 ,n 1 2 2 α , α 10.33 Confidence Interval
Testing • CI does not contain µ0: • If the best-case error is > ε, model
needs to be refined. • If the worst-case error is ≤ ε, accept the model. • If
best-case error is ≤ ε, additional replications are necessary. • CI contains
µ0: • If either the best-case or worstcase error is > ε, additional
replications are necessary. • If the worst-case error is ≤ ε, accept the
model. Prof. Dr. Mesut Güneş ▪ Ch. 10 Verification and Validation of
Simulation Models ε is a difference value chosen by the analyst, that is
small enough to allow valid decisions to be based on simulations! µ0 is the
unknown true value 10.34 Confidence Interval Testing • Bank example: µ0
= 4.3, and “close enough” is ε = 1 minute of expected customer delay. • A
95% confidence interval, based on the 6 replications is [1.65, 3.37]
because: • µ0 = 4.3 falls outside the confidence interval, • the best case |
3.37 – 4.3| = 0.93 < 1, but • the worst case |1.65 – 4.3| = 2.65 > 1
Additional replications are needed to reach a decision. Prof. Dr. Mesut
Güneş ▪ Ch. 10 Verification and Validation of Simulation Models 6 0.82
2.51 2.571 0.025,5 ± ± n S Y t 10.35 Other approaches Prof. Dr. Mesut
Güneş ▪ Ch. 10 Verification and Validation of Simulation Models 10.36
Using Historical Output Data • An alternative to generating input data: •
Use the actual historical record. • Drive the simulation model with the
historical record and then compare model output to system data. • In the
bank example, use the recorded interarrival and service times for the
customers {An, Sn, n = 1,2,…}. • Procedure and validation process:
similar to the approach used for system generated input data. Prof. Dr.
Mesut Güneş ▪ Ch. 10 Verification and Validation of Simulation Models
10.37 Using a Turing Test • Use in addition to statistical test, or when no
statistical test is readily applicable. • Utilize persons’ knowledge about the
system. • For example: • Present 10 system performance reports to a
manager of the system. Five of them are from the real system and the rest
are “fake” reports based on simulation output data. • If the person
identifies a substantial number of the fake reports, interview the person to
get information for model improvement. • If the person cannot distinguish
between fake and real reports with consistency, conclude that the test
gives no evidence of model inadequacy. Prof. Dr. Mesut Güneş ▪ Ch. 10
Verification and Validation of Simulation Models Turing Test Described by
Alan Turing in 1950. A human jugde is involved in a natural language
conversation with a human and a machine. If the judge cannot reliably tell
which of the partners is the machine, then the machine has passed the
test. 10.38 Summary • Model validation is essential: • Model verification •
Calibration and validation • Conceptual validation • Best to compare
system data to model data, and make comparison using a wide variety of
techniques. • Some techniques that we covered: • Insure high face
validity by consulting knowledgeable persons. • Conduct simple statistical
tests on assumed distributional forms. • Conduct a Turing test. • Compare
model output to system output by statistical tests.