Defense Simulation Validation
Defense Simulation Validation
net/publication/221527753
CITATIONS READS
12 1,114
3 authors, including:
All content following this page was uploaded by Eugene P. Paulo on 23 September 2014.
Lyn R. Whitaker
758
Simpkins, Paulo, and Whitaker
operations and a tool to provide realistic air defense train- meaningful and the accuracy demand increases as the mis-
ing to maneuver force exercises. EADSIM models fixed- sile progresses. For instance, 10 percent of the time-to-go-
and rotary-wing aircraft, tactical ballistic missiles, cruise to-impact gets smaller as impact approaches, demanding a
missiles, infrared and radar sensors, satellites, command better match, while 10 percent of time-after-launch be-
and control structures and fire support in a dynamic envi- comes large. This is to say that accuracy near impact is
ronment which includes the effects of terrain and attrition more important than accuracy near launch if the objective
on the outcome of the battle. is to avoid impact. However, impact time is not one of the
reported fields for the baseline EADSIM data and time
2.2 Wargame 2000 since launch was used under the assumption that detection
and intercept occur early enough in flight for time-since-
The department of defense is developing a software simu- launch to be equally significant as time-until-impact.
lation for ballistic missile defense that can be used for Six MOE’s for three batteries against six threats were
command and control analysis, provide insight into tech- adopted: detection time, detection range, 1st launch time, 1st
nology development and provide a training platform for intercept time, last launch time, last intercept time (see
system operators/users. Wargame 2000 is a virtual, real- Figure 1). This leads to potentially 3·6·6=108 comparisons
time, discrete event, command and control missile defense of output variables between EADSIM and WG2K. There
simulation used to investigate human interactions. It is the are those cases among the batteries and threats where no
successor to the Advanced Real-time Gaming Universal engagement occurs because the threat was beyond the ca-
Simulation (ARGUS) that has been used for years. WG2K pability or range of the battery and some cases where no
is intended to provide a simulated combat environment that values were reported; this reduced the data set and analy-
allows war-fighting commanders, their staffs and the ac- sis. Ultimately only 76 output variables were collected.
quisition community to examine missile and air defense Flight Geometry
concepts of operation. This is accomplished through the Front
curacy and ability to perform required tasks. Primary atten- Detection Range
First Launch
tion has been paid to NMD in the past and now develop-
Defense
Last Launch System
IMPACT THREAT
Battery #1 Threat #1
759
Simpkins, Paulo, and Whitaker
760
Simpkins, Paulo, and Whitaker
tiveness is not whether it found the threat but when and 4.4 Scope and Limitations
with how much delay it identified and began tracking.
The defense was established as one-on-one; even Only one-on-one scenarios are considered at this stage of
though a threat was detected by radar and tracked, it is pos- WG2K development. There is no hand-off between radar
sible the threat was not engaged if the flight path toward systems as would happen if threats were detected early by a
the impact point was beyond the defended area of the bat- system and then ‘passed’ to another system with higher Pk.
tery. An interceptor missile was launched when a detected Therefore, no interaction between threats or interceptors is
threat entered the defended area or when a threat already in found. It is anticipated that there will be dependence on
the defended area was detected based on the limitations of threat and interceptor type when there are multiple systems.
the individual system. WG2K essentially has one random variable: stochastic ra-
Characteristic output between systems is shown be- dar detection using a Normal distribution with almost no vari-
low. Figure 3 shows invariability of interceptor launch and ability. EADSIM uses two stochastic variables, radar detection
engagement times that imply the interceptor is limiting the which is Normal and sensor frame time, which is Uniform,
system (i.e. the interceptor’s region of coverage is smaller producing detection times distributed with a convolution of the
than the radar’s area of coverage). Figure 4 demonstrates two input distributions. In other cases only one stochastic vari-
high variance in both launch and engagement that indicates able is used, sensor frame time which is uniform. Interceptor
it is limited by radar performance. flyout is always deterministic in both simulations.
Threat 1
System #1-EADSIM Baseline
5 RESULTS
0.7
Last Intercept independent. As a quick check for independence of
EADSIM output, a test was used to see if MOE times from
EADSIM were truly independent. Here, a ‘runs test’ for
0.5
above and below the median is used to test the null hy-
pothesis that the sequence of detection times for the 100
iterations of the simulation is indeed independent. Specifi-
cally a sequence of 100 binary variables is constructed
0.3
where the ith variable takes value 1 if the MOE time of the
10 30 50 70 90 corresponding ith simulation run is above the median, and 0
Runs
otherwise. The number of runs above the median m, below
Figure 3: Interceptor Limitations Lead to Small Variance the median n and the total number of runs R = m + n are
in Launch and Intercept computed. In the example of Figure 5 there are three runs
of ones (of lengths 2,3,3) and two runs of zeros.
Threat 4
System #2-EADSIM Baseline
0.9
1 1 0 0 0 1 1 1 0 0 0 0 1 1 1
0.8 m=3
n=2
% of Threat Flight Time
0.7 R=5
Detect Observations above Observations below
1st Launch the median the median
0.6 1st Intercept
Last Launch
Last Intercept
0.5 Figure 5: Example of MOE Times Indicating Distribution
Above and Below the Median
0.4
0.3
For large samples, the test statistic
2m
R−
(1 + γ )
10 30 50 70 90
Runs
761
Simpkins, Paulo, and Whitaker
where γ = m/n, has a standard Normal null distribution 4 is three times higher in System #2 than System #1. Vari-
(Lehaman, D’Abrera, 1975). At 5% level of significance ance in detection times is acceptable. However, critical to
none of the 76 sets of simulation runs failed the test for analysis by comparison is that WG2K demonstrate similar
randomness. behavior when modeling the same combinations. Graphic
As illustrated in Figures 6 and 7, the mean and vari- analysis can quickly identify areas of interest or anomalies
ance of detection times differed significantly between sys- within the data. But, when provided with a single output
tems. This may be attributed to specific limitations of the value, graphic analysis is limited in comparing the two
batteries. The distribution of detection times also differed simulations. An interesting observation taken from the
from system to system. Figures 6 and 7 is that for most threats variance in detec-
tion is the same for each system even though System #1
Detection Times
System #1-EADSIM Baseline
takes twice as long to detect on average.
0.9
Stochastic models can be viewed in two distinct
classes. The first class involves sampling from a probabil-
ity distribution of inputs such that, once a sample of inputs
0.7
is generated, the model is deterministic. In this situation,
% of Threat Flight Time
0.40
15
0.35
10
0.30
10 30 50 70 90
Runs
5
There is a clear difference between these two systems 0.72 0.74 0.76 0.78 0.80 0.82 0.84
in the same place and threats follow the same flight path
against them. Figure 6 shows an average near 80% of Figure 8: Distribution of Random Detection Times for Sys-
threat flight time for three missiles while Figure 7 shows tem #1 Against Threat 5
none of the detection times above 50%.Variance for Threat
762
Simpkins, Paulo, and Whitaker
System #1
A box plot places a box around the middle 50% of the
data, with the upper edge at the 3rd quartile and lower edge
at the 1st quartile (Devore, 1995). The whiskers in box
15
plots for all MOE’s extend from the box up to the largest
observation and down to the smallest observation. In gen-
eral, extreme observations are reported as points beyond
10
Threat 1 Detection
box plot at the mean indicating a one-to-one match be-
tween the WG2K result and the average EADSIM result.
Figure 9: Distribution of Random Detection Times for Sys- Figure 10 displays several comparisons where EADSIM
tem #1 Against Threat 1 and WG2K outputs agree and are statistically similar. Even
System #1’s first launch times compare nicely although
In the case of WG2K, only one run of the simulation is EADSIM exhibits zero variance as represented by the flat
provided so a complete analysis of the output cannot be line in-lieu of a box plot. Small deviation, as shown in Sys-
compared to a distribution function. An analysis of the dis- tem #3’s first intercept times by the intersection of the box
tributions resulting from EADSIM provides some insight and line, is acceptable. In general, Figure 10 is good news
into the expected behavior of WG2K. Histograms in Fig- for the new simulation. A preliminary look at these compari-
ures 8 and 9 contrast detection time distributions from one sons indicates WG2K is producing detection, launch and in-
system against two separate threats. tercept times very close to EADSIM.
However, comparing Figure 8 and 9, it is clear detec-
tion times do not come from the same distribution. What System #1 First Launch
0.90
tribution for the same system against similar threats? The Threat 4
0.80
battery modeled here has a search pattern that makes its de-
EADSIM Spread
0.75
System #2 Detection
tection time highly dependent on threat flight trajectory. Threat 5
0.5
0.70
0.3
while threats launched from close range are detected uni- 0.65 0.70 0.75 0.80 0.85 0.90 Threat 4
used parametric distribution captures all of the detection WG2K data (by missile)
0.80
Threat 5
systems/threat combinations.
Threat 3
0.70
Threat 6
0.65
5.2 Comparing Wargame 2000 with EADSIM 0.68 0.70 0.72 0.74 0.76 0.78
The spread of times can be further broken down into quar- Figure 10: Box Plots Reveal Trends when Compared
tiles separating the 100 observations into groups of 25
separated in sequence by the 1st, 2nd (the median) and 3rd Further inspection of the data reveals system/threat
quartiles. One expects the result provided by WG2K to fall combinations with larger variability however. Figure 11
between the 1st and 3rd quartile of EADSIM implying the indicates two detection times far outside the baseline dis-
value is relatively close to that expected for validation. tribution for two of the threats. One of the threats was not
detected by WG2K leaving only five; this further con-
763
Simpkins, Paulo, and Whitaker
founds the results and implies simulation issues larger than 5.2.2 Inference
interceptor flight time such as sensor detection modeling or
sensitivity parameters. Although each scenario was replicated 100 times for
EADSIM, because WG2K is run in real-time, only one re-
System #1 Detection alization of WG2K is available for each scenario.
Often, there is the temptation to treat the output of
such a run as an expected value (i.e. to treat a detection
0.8
764
Simpkins, Paulo, and Whitaker
of EADSIM and the detection time from one run of WG2K However, there are very low p-values for system #1, indi-
respectively. They define the test statistic: cating rejection for nearly every MOE clearly based on the
invariability of EADSIM. System #2 showed mixed results
depending on the MOE.
Y −Y
T =
E
W
Typical Time Distribution for EADSIM
Y
stat
E
30
The test statistic (Tstat) represents the percent difference Extreme Regions
25
MOE and WG2K’s.
Number of Observations
Bootstrapping is used to estimate the sampling
20
distribution of Tstat under the null hypothesis. Sampling
15
from the empirical distribution of EADSIM detection times
is equivalent to draws with replacement from the 100 ac-
10
tual EADSIM values. In total 1000·(100 + 1) draws with
replacement were made from the 100 EADSIM values.
5
The results are depicted in Figure 13; XEi i = 1,2,…, 1000,
are the 1000 averages of 100 draws each while XWi con-
0
sists of individual
_ 1000 draws. 250 Tstat 300 350
From the XEi, Xi, the bootstrapped value of the test sta- Time
765
Simpkins, Paulo, and Whitaker
766