Verification and Validation of Simulation Models
Complete 30-Slide Presentation Notes
Slide 1: Title Slide
Verification and Validation of Simulation Models
Prof. Dr. Mesut Güneş
Chapter 10: Verification and Validation of Simulation Models
Essential processes for ensuring simulation model credibility
Slide 2: Contents Overview
Chapter 10 Contents:
Model-Building, Verification, and Validation
Verification of Simulation Models
Calibration and Validation
Statistical Testing Methods
Power Analysis and Error Types
Alternative Validation Approaches
Slide 3: Purpose & Overview
The Goal of the Validation Process:
To produce a model that represents true behavior closely enough for decision-making purposes
To increase the model's credibility to an acceptable level
Key Points:
Validation is an integral part of model development
Verification: building the model correctly, correctly implemented with good input and structure
Validation: building the correct model, an accurate representation of the real system
Most methods are informal subjective comparisons while a few are formal statistical procedures
Slide 4: Model-Building Process Overview
The Complete Process with Feedback Loops:
Real System ↔ Conceptual Model ↔ Operational Model (Computerized representation)
Key Validation/Verification Points:
Conceptual validation: Real system ↔ Conceptual model
Calibration and validation: Real system ↔ Operational model
Model verification: Conceptual model ↔ Operational model
Conceptual Model Components:
1. Assumptions on system components
2. Structural assumptions, which define the interactions between system components
3. Input parameters and data assumptions
Slide 5: Steps in Model-Building
Step 1: Real System Analysis
Observe the real system
Study interactions among the components
Collect data on system behavior
Step 2: Conceptual Model
Construction of a conceptual model based on observations
Step 3: Simulation Program
Implementation of an operational model
Translation of conceptual model into executable code
Slide 6: Verification - Definition and Purpose
Purpose: Ensure the conceptual model is reflected accurately in the computerized representation
Key Verification Activities:
Code review and debugging
Logic verification
Input/output validation
Documentation completeness
Animation and visualization checks
Slide 7: Common-Sense Verification Suggestions
Practical Verification Techniques:
Have someone else check the model (independent review)
Make a flow diagram that includes each logically possible action a system can take when an event
occurs
Closely examine the model output for reasonableness under a variety of input parameter settings
Print the input parameters at the end of the simulation, make sure they have not been changed
inadvertently
Make the operational model as self-documenting as possible
If the operational model is animated, verify that what is seen in the animation imitates the actual
system
Use the debugger
If possible use a graphical representation of the model
Slide 8: Examination of Model Output for Reasonableness
Two Key Statistics for Quick Model Assessment:
Current Content:
The number of items in each component of the system at a given time
Total Counts:
Total number of items that have entered each component of the system by a given time
Additional Analysis:
Compute certain long-run measures of performance
Compare long-run server utilization with simulation results
Slide 9: Complex Queue Network Analysis
Model of Complex Network of Queues with Many Service Centers:
Warning Signs to Watch For:
Unstable Queue: If the current content grows in a more or less linear fashion as the simulation run
time increases, it is likely that a queue is unstable
No Entry Indication: If the total count for some subsystem is zero, indicates no items entered that
subsystem, a highly suspect occurrence
Resource Capture: If the total and current count are equal to one, can indicate that an entity has
captured a resource but never freed that resource
Slide 10: Documentation Requirements
Documentation: A Means of Clarifying Logic and Verifying Completeness
Essential Documentation Elements:
Comment the operational model thoroughly
Definition of all variables (including default values)
Definition of all constants (including default values)
Functions and parameters documentation
Relationship of objects
Important: Default values should be explained!
Slide 11: Trace Implementation
Trace Definition: A trace is a detailed printout of the state of the simulation model over time
Implementation Considerations:
Can be very labor intensive if the programming language does not support statistic collection
Labor can be reduced by a centralized tracing mechanism
In object-oriented simulation framework, trace support can be integrated into class hierarchy
New classes need only to add little for the trace support
Slide 12: Trace Example - Simple Queue
Simple Queue from Chapter 2: Trace over a time interval [0, 16] - Allows the test of the results by pen-
and-paper method
Definition of Variables:
CLOCK = Simulation clock
EVTYP = Event type (Start, Arrival, Departure, Stop)
NCUST = Number of customers in system at time CLOCK
STATUS = Status of server (1=busy, 0=idle)
Slide 13: Trace Example - System States
State of System Just After the Named Event Occurs:
CLOCK EVTYP NCUST STATUS Notes
0 Start 0 0 System initialization
3 Arrival 1 0 Customer arrives, server idle
5 Depart 0 0 Customer departs
11 Arrival 1 0 New customer arrives
12 Arrival 2 1 Second customer, server now busy
16 Depart 1 1 One customer departs, server still busy
Critical Issue Identified: At CLOCK = 3 and CLOCK = 11, there is a customer in the system (NCUST=1),
but the server status is 0 (idle). This indicates a potential logic error in the model - the server should be
busy when there are customers to serve.
Verification Value: This trace allows manual verification and identification of model bugs that might not
be obvious from summary statistics alone.
Slide 14: Calibration and Validation - Definitions
Validation: The overall process of comparing the model and its behavior to the real system
Calibration: The iterative process of comparing the model to the real system and making adjustments
Two Types of Comparison:
Subjective tests: People who are knowledgeable about the system
Objective tests: Requires data on the real system's behavior and the output of the model
Slide 15: Calibration Challenges and Solutions
Danger During the Calibration Phase:
Typically few data sets are available, in the worst case only one
The model is only validated for these specific datasets
Solution:
If possible collect new data sets for broader validation
Key Principle:
No model is ever a perfect representation of the system
The modeler must weigh the possible, but not guaranteed, increase in model accuracy versus the
cost of increased validation effort
Slide 16: Three-Step Validation Approach
Comprehensive Validation Strategy:
Step 1: Build a model that has high face validity
Step 2: Validate model assumptions
Step 3: Compare the model input-output transformations with the real system's data
Slide 17: Step 1 - High Face Validity
Ensure a High Degree of Realism:
Potential users should be involved in model construction from its conceptualization to its
implementation
Sensitivity analysis can also be used to check a model's face validity
Example: In most queueing systems, if the arrival rate of customers were to increase, it would be
expected that server utilization, queue length and delays would tend to increase
Practical Considerations:
For large-scale simulation models, there are many input variables and thus possibly many sensitivity
tests
Sometimes not possible to perform all of these tests, select the most critical ones
Slide 18: Step 2 - Validate Model Assumptions
General Classes of Model Assumptions:
Structural Assumptions: How the system operates Data Assumptions: Reliability of data and its
statistical analysis
Bank Example - Customer Queueing and Service Facility:
Structural Assumptions:
Customer waiting in one line versus many lines
Customers are served according FCFS versus priority
Data Assumptions:
Interarrival time of customers
Service times for commercial accounts
Verify data reliability with bank managers
Test correlation and goodness of fit for data
Slide 19: Step 3 - Input-Output Transformation Validation
Goal: Validate the model's ability to predict future behavior
Key Characteristics:
The only objective test of the model
The structure of the model should be accurate enough to make good predictions for the range of
input data sets of interest
One possible approach: use historical data that have been reserved for validation purposes only
Criteria: use the main responses of interest
Model as Input-Output Transformation: System Input → System Output Model Input → Model Output
Slide 20: Bank Example - System Description
Example: One drive-in window serviced by one teller, only one or two transactions are allowed
Data Collection:
90 customers during 11am to 1pm
Observed service times {Si, i =1,2, ..., 90}
Observed interarrival times {Ai, i =1,2, ..., 90}
Data Analysis Results:
Interarrival times: exponentially distributed with rate λ = 45/hour
Service times: N(1.1, 0.2²)
Slide 21: Bank Example - The Black Box Model
Model Development Process:
A model was developed in close consultation with bank management and employees
Model assumptions were validated
Resulting model is now viewed as a "black box"
Black Box Input-Output Structure:
[Input Variables] → [Model "black box" f(X,D) = Y] → [Model Output Variables, Y]
Input Variables:
Uncontrolled variables, X:
Poisson arrivals λ = 45/hr: X₁₁, X₁₂, ...
Service times, N(D₂, 0.2²): X₂₁, X₂₂, ...
Controlled Decision variables, D:
D₁ = 1 (one teller)
D₂ = 1.1 min (mean service time)
D₃ = 1 (one line)
Model Function: f(X,D) = Y
Slide 22: Bank Example - Output Variables
Model Output Variables, Y:
Primary Interest (Main Performance Measures):
Y₁ = teller's utilization
Y₂ = average delay
Y₃ = maximum line length
Secondary Interest (Supporting Measures):
Y₄ = observed arrival rate
Y₅ = average service time
Y₆ = sample std. dev. of service times
Y₇ = average length of time
Focus for Validation: Primary interest variables (Y₁, Y₂, Y₃) are most critical for decision-making and
should receive priority in validation testing.
Slide 23: Bank Example - Real System Comparison
Real System Data Requirements:
Real system data are necessary for validation
System responses should have been collected during the same time period (from 11am to 1pm on
the same day)
Comparison Focus:
Compare average delay from the model Y₂ with actual delay Z₂
Average delay observed Z₂ = 4.3 minutes
Consider this to be the true mean value μ₀ = 4.3
When the model is run with generated random variates X₁ₙ and X₂ₙ, Y₂ should be close to Z₂
Slide 24: Bank Example - Simulation Results
Six Statistically Independent Replications: Each of 2-hour duration
Replication Y₄ Arrivals/Hour Y₅ Service Time [Minutes] Y₂ Average Delay [Minutes]
1 51.0 1.07 2.79
2 40.0 1.12 1.12
3 45.5 1.06 2.24
4 50.5 1.10 3.45
5 53.0 1.09 3.13
6 49.0 1.07 2.38
Sample Statistics:
Sample mean [Delay]: 2.51
Standard deviation [Delay]: 0.82
Slide 25: Hypothesis Testing Framework
Compare Average Delay from Model Y₂ with Actual Delay Z₂
Null Hypothesis Testing: Evaluate whether the simulation and the real system are the same (w.r.t. output
measures)
Hypotheses:
H₀: E(Y₂) = 4.3 minutes
H₁: E(Y₂) ≠ 4.3 minutes
Decision Rules:
If H₀ is not rejected, then there is no reason to consider the model invalid
If H₀ is rejected, the current version of the model is rejected, and the modeler needs to improve the
model
Slide 26: Hypothesis Testing - Calculations
Conduct the t-test:
Parameters:
Level of significance (α = 0.05) and sample size (n = 6)
Sample Statistics:
Sample mean: Ȳ₂ = (1/n)∑Yᵢ₂ = 2.51 minutes
Sample standard deviation: S₂ = √[(1/(n-1))∑(Yᵢ₂ - Ȳ₂)²] = 0.82 minutes
Test Statistic: t₀ = (Ȳ₂ - μ₀)/(S/√n) = (2.51 - 4.3)/(0.82/√6) = -5.34
Critical Value: t₀.₀₂₅,₅ = 2.571 (for a 2-sided test)
Decision: |t₀| = 5.34 > 2.571, hence reject H₀ Conclusion: The model is inadequate
Slide 27: Additional Comparisons and Assumptions
Similarly, Compare Model Output with Observed Output:
Y₄ ↔ Z₄ (arrival rates)
Y₅ ↔ Z₅ (service times)
Y₆ ↔ Z₆ (service time variability)
Check t-test Assumptions:
The observations (Y₂ᵢ) are normally and independently distributed
This assumption must be verified for the test to be valid
Slide 28: Power of a Test
Power Definition: The power of a test is the probability of detecting an invalid model
Formula: Power = 1 - P(failing to reject H₀ | H₁ is true) Power = 1 - P(Type II error) Power = 1 - β
For Validation: Consider failure to reject H₀ as a strong conclusion, the modeler would want β to be small
Factors Affecting β:
Sample size n
The true difference, δ, between E(Y) and μ₀
Where δ = |E(Y) - μ₀|/σ
Slide 29: Operating Characteristics and Sample Size
Best Approach to Control β:
Specify the critical difference, δ
Choose a sample size, n, by making use of the operating characteristics curve (OC curve)
Operating Characteristics Curve (OC curve):
Graphs of the probability of a Type II Error β(δ) versus δ for a given sample size n
Key Insight: For the same error probability with smaller difference, the required sample size
increases!
Error Types Summary:
Statistical Terminology Modeling Terminology Associated Risk
Type I: rejecting H₀ when H₀ is true Rejecting a valid model α
Type II: failure to reject H₀ when H₁ is true Failure to reject an invalid model β
Trade-off: For a fixed sample size n, increasing α will decrease β
Slide 30: Confidence Interval Testing & Alternative Approaches
Confidence Interval Testing: Evaluate whether the simulation and the real system performance
measures are close enough
If Y is the simulation output and μ = E(Y), the confidence interval (CI) for μ is:
CI = [Ȳ - t_{α/2,n-1} × S/√n, Ȳ + t_{α/2,n-1} × S/√n]
Decision Framework:
ε = difference value chosen by the analyst, small enough to allow valid decisions based on
simulations
μ₀ = the unknown true value
Decision Rules:
When CI does not contain μ₀:
If the best-case error is > ε, model needs to be refined
If the worst-case error is ≤ ε, accept the model
If best-case error is ≤ ε, additional replications are necessary
When CI contains μ₀:
If either the best-case or worst-case error is > ε, additional replications are necessary
If the worst-case error is ≤ ε, accept the model
Bank Example Application:
μ₀ = 4.3, and "close enough" is ε = 1 minute of expected customer delay
95% confidence interval, based on 6 replications: [1.65, 3.37]
Calculation: 2.51 ± 2.571 × (0.82/√6)
μ₀ = 4.3 falls outside the confidence interval
Best case |3.37 - 4.3| = 0.93 < 1, but worst case |1.65 - 4.3| = 2.65 > 1
Conclusion: Additional replications are needed to reach a decision
Alternative Validation Approaches
Using Historical Output Data:
Use the actual historical record to drive the simulation model
Compare model output to system data
In the bank example, use recorded interarrival and service times {Aₙ, Sₙ, n = 1,2,...}
Validation process similar to system-generated input data approach
Using a Turing Test:
Use in addition to statistical test, or when no statistical test is readily applicable
Utilize persons' knowledge about the system
Example: Present 10 system performance reports to a manager. Five from real system, five "fake"
reports from simulation
If person identifies substantial number of fake reports, interview for model improvement information
If person cannot distinguish consistently, conclude no evidence of model inadequacy
Turing Test Background: Described by Alan Turing in 1950. A human judge is involved in a natural
language conversation with a human and a machine. If the judge cannot reliably tell which partner is the
machine, then the machine has passed the test.
Summary - Model Validation is Essential:
Key Components:
Model verification ensures correct implementation
Calibration and validation ensure correct representation
Conceptual validation through expert involvement
Best practice: compare system data to model data using wide variety of techniques
Techniques Covered:
Ensure high face validity by consulting knowledgeable persons
Conduct simple statistical tests on assumed distributional forms
Conduct a Turing test
Compare model output to system output by statistical tests
Use confidence interval testing for practical decision-making
Consider power analysis for adequate sample sizes
Final Principle: The goal is not perfect accuracy, but sufficient credibility for sound decision-making
within the intended scope of the model.