0% found this document useful (0 votes)

17 views13 pages

CH 9

The document discusses input modeling for simulation models. It covers collecting input data, identifying appropriate probability distributions to represent the data, estimating parameters for the distributions, and evaluating goodness of fit. The goal is to obtain a good approximation of the stochastic input process using a probability distribution.

Uploaded by

derbew2112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views13 pages

CH 9

Uploaded by

derbew2112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Chapter 9

Input Modeling

Banks, Carson, Nelson & Nicol

Discrete-Event System Simulation

Purpose & Overview

Input models provide the driving force for a simulation model.
The quality of the output is no better than the quality of inputs.
In this chapter, we will discuss the 4 steps of input model
development:
Collect data from the real system
Identify a probability distribution to represent the input
process
Choose parameters for the distribution
Evaluate the chosen distribution and parameters for
goodness of fit.

1
Data Collection
One of the biggest tasks in solving a real problem. GIGO –
garbage-in-garbage-out
Suggestions that may enhance and facilitate data collection:
Plan ahead: begin by a practice or pre-observing session, watch
for unusual circumstances
Analyze the data as it is being collected: check adequacy
Combine homogeneous data sets, e.g. successive time periods,
during the same time period on successive days
Be aware of data censoring: the quantity is not observed in its
entirety, danger of leaving out long process times
Check for relationship between variables, e.g. build scatter
diagram
Check for autocorrelation
Collect input data, not performance data

Identifying the Distribution

Histograms
Selecting families of distribution
Parameter estimation
Goodness-of-fit tests
Fitting a non-stationary process

2
Histograms [Identifying the distribution]

A frequency distribution or histogram is useful in

determining the shape of a distribution
The number of class intervals depends on:
The number of observations
The dispersion of the data
Suggested: the square root of the sample size
For continuous data:
Corresponds to the probability density function of a theoretical
distribution
For discrete data:
Corresponds to the probability mass function
If few data points are available: combine adjacent cells to
eliminate the ragged appearance of the histogram

Histograms [Identifying the distribution]

Vehicle Arrival Example: # of vehicles arriving at an intersection

between 7 am and 7:05 am was monitored for 100 random workdays.
Arrivals per
Period Frequency
0 12
1 10
2 19
3 17 Same data
4 10 with different
5 8 interval sizes
6 7
7 5
8 5
9 3
10 3
11 1
There are ample data, so the histogram may have a cell for each
possible value in the data range

3
Selecting the Family of Distributions
[Identifying the distribution]

A family of distributions is selected based on:

The context of the input variable
Shape of the histogram
Frequently encountered distributions:
Easier to analyze: exponential, normal and Poisson
Harder to analyze: beta, gamma and Weibull

Selecting the Family of Distributions

[Identifying the distribution]

Use the physical basis of the distribution as a guide, for

example:
Binomial: # of successes in n trials
Poisson: # of independent events that occur in a fixed amount of
time or space
Normal: dist’n of a process that is the sum of a number of
component processes
Exponential: time between independent events, or a process time
that is memoryless
Weibull: time to failure for components
Discrete or continuous uniform: models complete uncertainty
Triangular: a process for which only the minimum, most likely,
and maximum values are known
Empirical: resamples from the actual data collected
8

4
Selecting the Family of Distributions
[Identifying the distribution]

Remember the physical characteristics of the process

Is the process naturally discrete or continuous valued?
Is it bounded?
No “true” distribution for any stochastic input process
Goal: obtain a good approximation

Quantile-Quantile Plots [Identifying the distribution]

Q-Q plot is a useful tool for evaluating distribution fit

If X is a random variable with cdf F, then the q-quantile of X is
the γ such that
F( γ ) = P(X ≤ γ ) = q, for 0 < q < 1

When F has an inverse, γ = F-1(q)

Let {xi, i = 1,2, …., n} be a sample of data from X and {yj, j = 1,2,
…, n} be the observations in ascending order:

 j - 0.5 
y j is approximately F -1  
 n 

where j is the ranking or order number

5
Quantile-Quantile Plots [Identifying the distribution]

The plot of yj versus F-1( (j-0.5)/n) is

Approximately a straight line if F is a member of an appropriate
family of distributions
The line has slope 1 if F is a member of an appropriate family of
distributions with appropriate parameter values

Quantile-Quantile Plots [Identifying the distribution]

Example: Check whether the door installation times follows a

normal distribution.
The observations are now ordered from smallest to largest:

j Value j Value j Value

1 99.55 6 99.98 11 100.26
2 99.56 7 100.02 12 100.27
3 99.62 8 100.06 13 100.33
4 99.65 9 100.17 14 100.41
5 99.79 10 100.23 15 100.47

yj are plotted versus F-1( (j-0.5)/n) where F has a normal

distribution with the sample mean (99.99 sec) and sample
variance (0.28322 sec2)

6
Quantile-Quantile Plots [Identifying the distribution]

Example (continued): Check whether the door installation

times follow a normal distribution.

Straight line,
supporting the
hypothesis of a
normal distribution

Superimposed
density function of
the normal
distribution

Quantile-Quantile Plots [Identifying the distribution]

Consider the following while evaluating the linearity of a q-q

plot:
The observed values never fall exactly on a straight line
The ordered values are ranked and hence not independent,
unlikely for the points to be scattered about the line
Variance of the extremes is higher than the middle. Linearity of
the points in the middle of the plot is more important.
Q-Q plot can also be used to check homogeneity
Check whether a single distribution can represent both sample
sets
Plotting the order values of the two data samples against each
other

7
Parameter Estimation [Identifying the distribution]

Next step after selecting a family of distributions

If observations in a sample of size n are X1, X2, …, Xn (discrete
or continuous), the sample mean and variance are:
∑ ∑
n n
Xi X i2 − nX 2
X= i =1
S 2
= i =1
n n −1
If the data are discrete and have been grouped in a frequency
distribution:
∑ ∑
n n
j =1
fjX j j =1
f j X 2j − nX 2
X= S 2
=
n n −1

where fj is the observed frequency of value Xj

Parameter Estimation [Identifying the distribution]

When raw data are unavailable (data are grouped into class
intervals), the approximate sample mean and variance are:

∑ ∑
c n
j =1
fjX j j =1
f j m 2j − nX 2
X= S2 =
n n −1

where fj is the observed frequency of in the jth class interval

mj is the midpoint of the jth interval, and c is the number of class intervals

A parameter is an unknown constant, but an estimator is a

statistic.

8
Parameter Estimation [Identifying the distribution]

Vehicle Arrival Example (continued): Table in the histogram

example on slide 6 (Table 9.1 in book) can be analyzed to obtain:
n = 100, f1 = 12, X 1 = 0, f 2 = 10, X 2 = 1,...,

∑ f j X j = 364, and ∑ j =1 f j X 2j = 2080

k k
and j =1

The sample mean and variance are

364
X= = 3.64
100
2080 − 100 * (3.64) 2
S2 =
99
= 7.63

The histogram suggests X to have a Possion distribution

However, note that sample mean is not equal to sample variance.
Reason: each estimator is a random variable, is not perfect.
17

Goodness-of-Fit Tests [Identifying the distribution]

Conduct hypothesis testing on input data distribution using:

Kolmogorov-Smirnov test
Chi-square test
No single correct distribution in a real application exists.
If very little data are available, it is unlikely to reject any candidate
distributions
If a lot of data are available, it is likely to reject all candidate
distributions

9
Chi-Square test [Goodness-of-Fit Tests]

Intuition: comparing the histogram of the data to the shape of

the candidate density or mass function
Valid for large sample sizes when parameters are estimated by
maximum likelihood
By arranging the n observations into a set of k class intervals or
cells, the test statistics is:
k
(Oi − Ei ) 2 Expected Frequency
χ 02 = ∑
i =1
Ei
Ei = n*pi
where pi is the theoretical
Observed prob. of the ith interval.
Frequency Suggested Minimum = 5

which approximately follows the chi-square distribution with k-s-1

degrees of freedom, where s = # of parameters of the hypothesized
distribution estimated by the sample statistics.

Chi-Square test [Goodness-of-Fit Tests]

The hypothesis of a chi-square test is:

H0: The random variable, X, conforms to the distributional
assumption with the parameter(s) given by the estimate(s).
H1: The random variable X does not conform.

10
Chi-Square test [Goodness-of-Fit Tests]

Chi-Square test [Goodness-of-Fit Tests]

Vehicle Arrival Example (continued):

H0: the random variable is Poisson distributed.
H1: the random variable is not Poisson distributed.
xi Observed Frequency, Oi Expected Frequency, Ei (Oi - Ei)2/Ei Ei = np ( x)
0 12 2.6
7.87 e −α α x
1
2
10
19
9.6
17.4 0.15
=n
3 17 21.1 0.8
x!
4 19 19.2 4.41
5 6 14.0 2.57
6 7 8.5 0.26
7 5 4.4
8 5 2.0
9 3 0.8 11.62 Combined because
10 3 0.3
> 11 1 0.1 of min Ei
100 100.0 27.68

Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the hypothesis is

rejected at the 0.05 level of significance.
χ 02 = 27.68 > χ 02.05,5 = 11.1
22

11
Kolmogorov-Smirnov Test
[Goodness-of-Fit Tests]
Intuition: formalize the idea behind examining a q-q plot
Recall from Chapter 7.4.1:
The test compares the continuous cdf, F(x), of the hypothesized
distribution with the empirical cdf, SN(x), of the N sample
observations.
Based on the maximum difference statistics (Tabulated in A.8):
D = max| F(x) - SN(x)|
A more powerful test, particularly useful when:
Sample sizes are small,
No parameters have been estimated from the data.
.

p-Values and “Best Fits”

[Goodness-of-Fit Tests]

p-value for the test statistics

The significance level at which one would just reject H0 for the
given test statistic value.
A measure of fit, the larger the better
Large p-value: good fit
Small p-value: poor fit

Vehicle Arrival Example (cont.):

H0: data is Possion
2
Test statistics: χ 0 = 27.68 , with 5 degrees of freedom
p-value = 0.00004, meaning we would reject H0 with 0.00004
significance level, hence Poisson is a poor fit.

12
EAR(1) Time-Series Input Models
[Multivariate/Time Series]

Consider the time-series model:

φX , with probability φ
X t =  t −1 for t = 2,3,...
φX t −1 + ε t , with probability 1-φ
where ε 2 , ε 3 , … are i.i.d. exponentially distributed with µ ε = 1/λ, and 0 ≤ φ < 1

If X1 is chosen appropriately, then

X1, X2, … are exponentially distributed with mean = 1/λ
Autocorrelation ρh = φh , and only positive correlation is allowed.
To estimate φ, λ :
côv( X t , X t +1 )
λˆ = 1 / X , φˆ = ρˆ =
σˆ 2
where côv( X t , X t +1 ) is the lag-1 autocovariance

Summary
In this chapter, we described the 4 steps in developing input
data models:
Collecting the raw data
Identifying the underlying statistical distribution
Estimating the parameters
Testing for goodness of fit

Statistics
No ratings yet
Statistics
164 pages
Input Modeling in Discrete-Event Simulation
No ratings yet
Input Modeling in Discrete-Event Simulation
7 pages
07 One Sample Numerical
No ratings yet
07 One Sample Numerical
42 pages
2 DescriptiveStatistics
No ratings yet
2 DescriptiveStatistics
47 pages
Simulation Methods 2
No ratings yet
Simulation Methods 2
19 pages
Introduction To The Practice of Basic Statistics (Textbook Outline)
100% (14)
Introduction To The Practice of Basic Statistics (Textbook Outline)
65 pages
ch2 (Descriptive Statistics)
No ratings yet
ch2 (Descriptive Statistics)
18 pages
Cpsc531 Input
No ratings yet
Cpsc531 Input
44 pages
Lec08 2025
No ratings yet
Lec08 2025
43 pages
Business Statistics Course Guide
No ratings yet
Business Statistics Course Guide
69 pages
Probability and Statistics For Computer Scientists Second Edition, By: Michael Baron
No ratings yet
Probability and Statistics For Computer Scientists Second Edition, By: Michael Baron
63 pages
Making Sense of Data Mooc Notes PDF
No ratings yet
Making Sense of Data Mooc Notes PDF
32 pages
Simulation Chapter 4
No ratings yet
Simulation Chapter 4
48 pages
ST8114 Module1 PartI UnivariateEDA
No ratings yet
ST8114 Module1 PartI UnivariateEDA
60 pages
Simulation Input Modeling Guide
No ratings yet
Simulation Input Modeling Guide
42 pages
Input Modeling and Simulation Guide
No ratings yet
Input Modeling and Simulation Guide
90 pages
Simulation Theory Review
No ratings yet
Simulation Theory Review
75 pages
Input Modeling in Discrete-Event Simulation
No ratings yet
Input Modeling in Discrete-Event Simulation
41 pages
Chapter 9 - Review Input Analysis
No ratings yet
Chapter 9 - Review Input Analysis
24 pages
Prop Final 4
No ratings yet
Prop Final 4
119 pages
8 CSC446 546 InputModeling
No ratings yet
8 CSC446 546 InputModeling
44 pages
Iie 3017 02
No ratings yet
Iie 3017 02
35 pages
QM1 Notes
No ratings yet
QM1 Notes
81 pages
Business Statistics
No ratings yet
Business Statistics
106 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
Statistics
No ratings yet
Statistics
36 pages
Simulation Input Modeling Guide
No ratings yet
Simulation Input Modeling Guide
63 pages
Unit 6 Input Modeling: Collect Data From The Real System of Interest
No ratings yet
Unit 6 Input Modeling: Collect Data From The Real System of Interest
7 pages
Simulation Input Modeling Guide
No ratings yet
Simulation Input Modeling Guide
33 pages
SM Lect 07
No ratings yet
SM Lect 07
25 pages
Simulation Input Modeling Guide
No ratings yet
Simulation Input Modeling Guide
48 pages
05 Handout 1
No ratings yet
05 Handout 1
4 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
33 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
Percentiles and Quartiles in Sales Data
No ratings yet
Percentiles and Quartiles in Sales Data
47 pages
Input Modeling: Discrete-Event System Simulation
No ratings yet
Input Modeling: Discrete-Event System Simulation
14 pages
Input Modeling in System Simulation
No ratings yet
Input Modeling in System Simulation
32 pages
Ch01 Intro Stat&DataAnalysis
No ratings yet
Ch01 Intro Stat&DataAnalysis
106 pages
MATERIAL01
No ratings yet
MATERIAL01
18 pages
FORMULAS
No ratings yet
FORMULAS
16 pages
Statistics
No ratings yet
Statistics
12 pages
Simulation Input Data Analysis
No ratings yet
Simulation Input Data Analysis
43 pages
QUALITATIVE DATA Are Measurements For Which There Is No Natural
No ratings yet
QUALITATIVE DATA Are Measurements For Which There Is No Natural
9 pages
Key of Week1 - Lecture Notes
No ratings yet
Key of Week1 - Lecture Notes
10 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
Input Modelling: Name: Sohail Shaikh Roll No.: Pa03 Sub: Dess Cad/Cam/Cae
No ratings yet
Input Modelling: Name: Sohail Shaikh Roll No.: Pa03 Sub: Dess Cad/Cam/Cae
14 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Prem S. Mann - Introductory Statistics, Eighth Edition:) To, About 2) To (2), and About 3) To (3)
No ratings yet
Prem S. Mann - Introductory Statistics, Eighth Edition:) To, About 2) To (2), and About 3) To (3)
8 pages
Annotated 3 Ch3 Data Description F2014
No ratings yet
Annotated 3 Ch3 Data Description F2014
16 pages
4 - Statistik Deskriptif
No ratings yet
4 - Statistik Deskriptif
33 pages
2.3 Summary Statistics - Measures of Center and Spread
No ratings yet
2.3 Summary Statistics - Measures of Center and Spread
11 pages

CH 9

Uploaded by

CH 9

Uploaded by

Chapter 9

Banks, Carson, Nelson & Nicol

Purpose & Overview

Identifying the Distribution

 A frequency distribution or histogram is useful in

Histograms [Identifying the distribution]

 Vehicle Arrival Example: # of vehicles arriving at an intersection

 A family of distributions is selected based on:

Selecting the Family of Distributions

 Use the physical basis of the distribution as a guide, for

 Remember the physical characteristics of the process

Quantile-Quantile Plots [Identifying the distribution]

 Q-Q plot is a useful tool for evaluating distribution fit

When F has an inverse, γ = F-1(q)

where j is the ranking or order number

 The plot of yj versus F-1( (j-0.5)/n) is

Quantile-Quantile Plots [Identifying the distribution]

 Example: Check whether the door installation times follows a

j Value j Value j Value

yj are plotted versus F-1( (j-0.5)/n) where F has a normal

 Example (continued): Check whether the door installation

Quantile-Quantile Plots [Identifying the distribution]

 Consider the following while evaluating the linearity of a q-q

 Next step after selecting a family of distributions

where fj is the observed frequency of value Xj

Parameter Estimation [Identifying the distribution]

where fj is the observed frequency of in the jth class interval

 A parameter is an unknown constant, but an estimator is a

 Vehicle Arrival Example (continued): Table in the histogram

∑ f j X j = 364, and ∑ j =1 f j X 2j = 2080

The sample mean and variance are

The histogram suggests X to have a Possion distribution

Goodness-of-Fit Tests [Identifying the distribution]

 Conduct hypothesis testing on input data distribution using:

 Intuition: comparing the histogram of the data to the shape of

which approximately follows the chi-square distribution with k-s-1

Chi-Square test [Goodness-of-Fit Tests]

 The hypothesis of a chi-square test is:

Chi-Square test [Goodness-of-Fit Tests]

 Vehicle Arrival Example (continued):

Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the hypothesis is

p-Values and “Best Fits”

 p-value for the test statistics

 Vehicle Arrival Example (cont.):

 Consider the time-series model:

 If X1 is chosen appropriately, then

You might also like

A frequency distribution or histogram is useful in

Vehicle Arrival Example: # of vehicles arriving at an intersection

A family of distributions is selected based on:

Use the physical basis of the distribution as a guide, for

Remember the physical characteristics of the process

Q-Q plot is a useful tool for evaluating distribution fit

The plot of yj versus F-1( (j-0.5)/n) is

Example: Check whether the door installation times follows a

Example (continued): Check whether the door installation

Consider the following while evaluating the linearity of a q-q

Next step after selecting a family of distributions

A parameter is an unknown constant, but an estimator is a

Vehicle Arrival Example (continued): Table in the histogram

Conduct hypothesis testing on input data distribution using:

Intuition: comparing the histogram of the data to the shape of

The hypothesis of a chi-square test is:

Vehicle Arrival Example (continued):

p-value for the test statistics

Vehicle Arrival Example (cont.):

Consider the time-series model:

If X1 is chosen appropriately, then