0 ratings0% found this document useful (0 votes) 269 views12 pagesFuzzy Model Identification Based On Cluster Estimation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Stephen L. Chiu
Rockwell Science Center
Thousands Oaks, California 91360
Fuzzy MopEL
IDENTIFICATION BASED ON
CLUSTER ESTIMATION
ApstRacT
We present an efficient method for estimating cluster
centers of numerical data. This method can be used 10
determine the number of clusters and their inital values
{for initializing iterative optimization-based clustering
algorithms such as fuzzy Comeans. Here we use the
Cluster estimation method as the basis of a fast and
robust algorithm for identifying fuzzy. models, A
benchmark problem involving the prediction of a cha-
otic time sertes shows this model identification method
compares favorably with other, more computationally
intensive methods. We also illustrate an application of
this method in modeling the relationship between aw
‘tomobile trips and demographic factors. © 1994 John
Wiley and Sons, Ine.
IntRopucTION
Clustering of numerical data forms the basis of
‘many classification and system modeling algo:
rithms. The purpose of clustering is to distil
natural groupings of data from a large data set,
producing a concise representation of a system’s
behavior. In particular, the fuzzy C-means
(FCM) clustering algorithm (Dunn, 1974; Bez
dek, 1974; Bezdek ct al., 1987) has been widely
studied and applied. The FCM algorithm is an
iterative optimization algorithm that minimizes
the cost function
SS ttn
where n is the number of data points, ¢ is the
number of clusters, x, is the kth data point, », is
the ith cluster center, jz, is the degree of mem.
bership of the kth data in the 7th cluster, and m
is a constant greater than I (typically m = 2)
‘The degree of membership 41, is defined by
Journal of Intelligent and Fuzzy Systems, Vol 2, 267-278 (1994)
© 1994 ohn Wiley & Sons. Ine
a
2 (ia)
Starting with a desired number of clusters ¢ and
an initial guess for each cluster center ¥,, i=
1,2,....€, FCM will converge to a solution for
y, that represents either a local minimum or a
saddle point of the cost function (Bezdek et al.,
1987). The quality of the FCM solution, like that
of most nonlinear optimization problems, de-
pends strongly on the choice of initial valu
(ie., the number © and the initial cluster
centers)
‘Yager and Filev (1992) proposed a simple and
fective algorithm, called the Mountain Meth-
od, for estimating the number and initial loc
tions of cluster centers. Their method is based on
making a grid of the data space and computing a
potential value for each grid point based on its
distances to the actual data points; a grid point
with many data points nearby will have a high
potential value, The grid point with the highest
potential value is chosen as the first cluster cen:
ter. The key idea in their method is that once the
first cluster center is chosen, the potential of all
grid points are reduced according to their dis:
tance from the cluster center. Grid points near
the first cluster center will have greatly reduced
potential. The next cluster center is then placed
the grid point with the highest remaining
potential value, This procedure of acquiring new
cluster center and reducing the potential of sur-
rounding grid points repeats until the potential
of all grid points falls below a threshold. Al-
though this method is simple and effective, the
computation grows exponentially with the d
mension of the problem. For example,
CCE 1064-1246/94/050267.ing problem with 4 variables and each dimension
having a resolution of 10 grid lines would result
in 10° grid points that must be evaluated.
We present a modified form of the Mountain
Method for cluster estimation. We consider each
data point, not a grid point, as a potenti:
center. Using this method, the number of effee
tive “grid points” to be evaluated is simply equal
to the number of data points, independent of the
dimension of the problem. Another advantage of
this method is that it eliminates the need to
specify a grid resolution, in which trade-offs
between accuracy and computational complexity
must be considered. We also improve the com:
putational effcieney and robustness of the origi
nal method via other modifications
Although clustering is generally associated
With classiication problems, here we use the
cluster estimation method as the basis of a fuzzy
model identification algorithm. ‘The key to the
speed of this model identification algorithm is
that it docs not involve any iterative nonlinear
‘optimization; in addition, the computation grows
only linearly with the dimension of the problem.
We use a benchmark problem in chaotic time
series prediction to compare the performance of
this algorithm with the published results of other
algorithms. We also show an appli
timating the number of automobile trips gener-
ated from an arca based on its demographics.
cluster
tion: es
Custer Estimation
Consider a collection of m data points
(2y.%.--..%,) in an M-dimensional_ space
Without loss of generality, we assume that the
data points have been normalized in euch dimen-
sion so that their coordinate ranges in cach
dimension are equal, i.e.. the data points are
bounded by a hypercube. We consider exch data
point as a potential cluster center and define a
measure of the potential of data point x, as
where
and r, is a positive constant. Thus, the measure
of potential for a data point is a function of its
268 cw
distances to all other data points. A data point
with many neighboring data points will have a
high potential value. The constant r, is effectiv
ly the radius defining a neighborhood; data
points outside this radius have little influence on
the potential. This measure of potential differs
from that proposed by Yager and Filev in two
ways: (1) the potential is associated with an
actual data point instead of a grid point; (2) the
influence of a neighboring data point decays
exponentially with the square of the distance
instead of the distance itself. Using the square of
the distance eliminates the square root operation
that otherwise would be needed for determining
the distance itself
After the potential of every data point has
been computed, we select the data point with the
highest potential as the first cluster center. Let
Xx} be the location of the first cluster center and,
Pt be its potential value. We then revise the
potential of cach data point x, by the formula
P&P, — Pye Htemnil 8)
where
and 1, is a positive constant. Thus, we subt
aan amount of potential from each data point
function of its distance from the first cluster
The data points near the first cluster
center will have greatly reduced potential, and
therefore are unlikely to be selected as the next
cluster center. The constant r, is effectively the
radius defining the neighborhood that will have
measurable reductions in potential. To avoid
obtaining closely spaced cluster centers, we set r,
to be somewhat greater than r,; a good choice is
r= 15
When the potential of all data points have
been revised according to eq. (3), we select the
dat point with the highest remaining potential
aay the second cluster center. We then further
reduce the potential of each data point according
to their distance to the second cluster center. In
general, after the kth cluster center has been
obtained, we revise the potential of cach data
point by the formula
center
PE P,— Pre thn
where x is the location of the kth cluster center
and P; iy its potential valueIn Yager and Filev’s procedure, the process of
acquiring new cluster center and revising po-
tentials repeats until
PEcePt
where ¢ is a small fraction, The choice of & is an
important factor affecting the results; if ¢ is 100
large, too few data points will be accepted as,
cluster centers; if ¢ is too small, too many cluster
centers will be generated. We have found it
difficult to establish a single value for e that
works well for all data patterns, and have the
fore developed additional criteria for accepting:
rejecting cluster centers. We use the following
criteria’
if Pp> EP;
Accept x7 as a cluster center and continue.
else if Pr the first matrix on the
right hand side of eq. 9 is constant, while the
second matrix contains all the parameters to be
optimized. To minimize the squared error be-
tween the model output and that of the training
data, we solve the linear least-squares estimation
problem given by Eq. (9), replacing the matrix
oon the left hand side by the actual output of the
training data. Of course, implicit in the least-
squares estimation problem is the assumption
that the number of training data is greater than
the number of parameters to be optimized.
Using the standard notation adopted in most
literature, the least-squares estimation problem
of eq. (9) has the form
7
ast pea]
Peada Pen || Gr
rs
(9)
AX=B
where B isa matrix of output values, A is a
constant matrix, and X is @ matrix of parameters,
to be estimated. The well-known pseudo-inverse
solution that minimizes || AX — Bl" is given by
X=(A'A) ‘AB
However, computing (AA) is computationally
expensive when (A'A) is a large matrix (ATA is
e(N +1) x c(V+1)); numerical problems also
arise when (ATA) is nearly singular. We useanother well-known method for solving for X, a
procedure often referred to as recursive least-
squares estimation (Astrom and Wittenmark,
1984; Strobach, 1990). This is a computationally
efficient and well-behaved method that deter-
mines X via the iterative formulae
Xie =X F SoosQis(bhy~ BeX,) (10)
SthaS,
i+alSa,.,
=0.1, -1
ay
where X, is the estimate of X at the ith iteration,
Sis a c(N +1) x e(N +1) covariance matrix, a?
is the ith row vector of A, and b' is the ith row
vector of B. The least-squares estimate of X
corresponds to the value of X,. The initial con-
ditions for this iterative procedure are X,=0
and S, = y/, where / is an identity matrix and y
is a large positive value.
To summarize, the model identification meth-
od consists of two distinct steps: (1) find cluster
centers (0 establish the number of fuzzy rules
and the rule premises, and (2) optimize the rule
consequent. Neither of these steps involve non-
linear optimization and both steps have well-
bounded computation time. In step 1, the bulk
of the computation time is consumed by evaluat-
ing the initial potential of each data point. Each
subsequent iteration to select a cluster center
and subtract potential consumes the same
amount of time as evaluating the potential of one
data point. Assuming the number of cluster cen
ters that will be obtained is much less than the
total number of data points, we can accurately
estimate the computation time of step 1 based on
the number of data points alone. However, the
number of cluster centers found in step 1 affects
the computation time of step 2 linearly, because
the number of parameters to be optimized grows
linearly with the number of clusters. Hence, we
can determine the computation time of step 2
only after step 1 is completed.
Although the number of clusters (or rules) is
automatically determined by this method, we
should note that the user-specified parameter r,
(ie. the radius of influence of a cluster center)
strongly affects the number of clusters that will
be generated. A large r, generally results in
fewer clusters and hence a coarser model, while
a small r, can produce excessive number of
clusters and a model that does not generalize
well (i.e., by overfitting the training dat
‘Therefore, we may regard 1, as an approximate
specification of the desired resolution of the
model, which can be adjusted based on the
resultant complexity and generalization ability of
the model.
Resutts
We will first apply the model identification meth-
od to a simple 2-dimensional function-approxi-
mation problem to illustrate some of its prop-
erties. Next we will consider a benchmark prob-
lem involving the prediction of a chaotic time
s \d-compare the performance of this
method with the published results of other meth-
‘ods. Lastly, we will show an application model-
ing the relationship between the number of au-
tomobile trips generated from an area and the
demographics of the area
FUNcTION ArPRoxiMATION
For illustrative purposes, we consider the simple
problem of modeling the nonlinear function
For the range [~10, 10], we used equally spaced
y values to generate 100 training datapoints
Because the training data are normalized before
clustering so that they are bounded by a hy-
percube, we find it convenient to express r, as &
fraction of the width of the hypercube; in this
0.25. Applying the clust
estimation method to the training data, 7 cluster
centers were found, Figure 1 shows the training
data and the locations of the cluster centers.
example we chose r,
—— ning
Figure 1. Comparison of training data with unoptim:
ized (th order model output
Fuzz Move. IwENTiFicaTion 271Figure 2. Degree of fulfillment of each rule as a
function of input y
Figure 1 also shows the output of a Oth order
fuzzy model that uses constant 2* as given by the
z coordinate of each cluster center. We see that
the modeling error is quite large. Because the
clusters are closely spaced with respect to the
input dimension, there is significant overlap be~
tween the premise conditions of the rules. The
degree of fulfillment « of each rule as a function
of y (viz. eq. 4) is shown in Figure 2, where it is
evident that several rules can have high firing
strength simultaneously even when the input pre-
cisely matches the premise condition of one rule
‘Therefore, the model output is typically a neu-
tral point interpolated from the 2 coordinates of
strongly competing cluster centers.
One way to minimize competition among
closely spaced cluster centers is to use the fuzzy
C-means definition of 1, viz. eq. (1). When the
input precisely matches the location of a cluster
center, the FCM definition ensures that the input
will have zero degree of membership in all other
clusters. Using the FCM definition, the degree of
fulfillment 1 as a function of y is shown in Figure
3 for a typical rule. We see that wis 1 when y is
at the cluster center associated with the rule and
=
LW ATE
Figure 3. Degree of fillment of a typical rule as a
function of input y, based on the fuzzy C-means
definition of pw.
272 cm
ry WW
Figure 4. Comparison of training data with unoptim-
ized Oth order model output, for the case where
inference computation is based on the fuzzy C-means
definition of
drops sharply to zero as_y approaches a neigh-
boring cluster center. The output of the Oth
‘order fuzzy model when the FCM definition of
adopted is shown in Figure 4, It is evident that
the modeling accuracy has improved significant-
Iy; in particular, the model output trajectory is
now compelled to pass through the cluster
centers,
Although using the FCM definition of can
improve the accuracy of unoptimized 0th order
models, we note that the 4 function and the
resultant model output trajectory have highly
nonlinear “kinks” compared to that obtained
with the exponential definition of .. These kinks
tend to fimit the ultimately attainable accuracy
when optimization is applied to the model
We now apply least-squares optimization to
2}. For illustrative purposes, we consider the two
cases: (1) 21 = h,, and (2) 2} = G,y +h; In the
first case, we retain the assumption of a Oth
order model, but now optimize the constant
value assigned to z$. In the second case, we
assume a Ist order model and optimize both G,
and h,. Table I shows the root-mean-square
(RMS) modeling error resulting from the differ-
ent extents of optimization. Table I also shows
the effects of using the exponential definition of
versus the FCM definition.
‘Table L. Comparison of RMS Modeling Error for
Different Extents of Optimization and for Different
Definitions of
RMSei
Optimization RMS error
using EXP using FCM
ZFunoptimized — 0.180 0.119
Zeh 0.084 0.098
zt 0.010 0.015y iing
a NG
: SNM ma
Figure 5. Comparison of training data with opti-
mized Oth order model output
‘Using the exponential definition of z generally
results in more accurate optimized models. In
what follows, we will not draw any further com-
parisons between using the exponential defini-
tion versus using the FCM definition, but present
only the results obtained from using the ex-
ponential definition. The output of the optimized
Oth order model is shown in Figure 5 and the
output of the optimized Ist order model is shown,
in Figure 6. The consequent function 2* = G,y +
‘h, of each rule is shown in Figure 7.
Although our method can be used to approxi-
mate a function from uniformly distributed data
points as in the above example, the method is
best suited for identifying models from repeti
ous experimental data, where there are repeti:
tive behavior patterns that create distinct clusters
in the data space. In the next example, we will
examine a chaotic time series that does create
such a data set,
Chaotic Time Senies PREDICTION
‘We now consider a benchmark problem in model
idemtfication—that of predicting the time series
o nig
a model
Figure 6. Comparison of training data with opti
‘mized Ist order model output,
ie
rol ouput
Figure 7. Consequent funetions for the Ist order
model
generated by the chaotic Mackey—Glass differen-
tial delay equation (Mackey and Glass, 1977)
defined by
ay = 0280
a
+xGoa)
= 0.1K)
‘The task is to use past values of x up to the time
1 to predict the value of x at some time f+ At in
the future. The standard input for this type of
prediction is N points in the time series spaced $
apart, i.c., the input vector is y = (x(t (N
1S)... x= 28), x= 8). x9}. To allow
comparison with the published results of other
methods, we use T= 17, N=4, S=6, AT=6.
Therefore, each data point in the training set
consists of
x(0= 18), x(¢= 12), x(0 = 6), 2(0), (0+ 6)}
where the first 4 elements correspond 0 the
input variables and the last clement corresponds.
to the output variable. We will compare the
performance of the model identification method
with the Adaptive-Network-Based Fuzzy Infer-
ence System (ANFIS) proposed by Jang (1993)
as well as other neural network-based and poly:
nomialitting methods reported by Crowder
(1990). The ANFIS algorithm also. produces
fuzzy models consisting of the Takagi-Sugeno
type rules. After specifying the number of mem-
bership functions for each input variable, the
ANFIS algorithm iteratively learns the parame-
ters of the premise membership functions via
backpropagation and optimizes the parameters
fof the consequent equations via Tinear least-
squares estimation. ANFIS has the advantage of
being significantly faster and more accurate than
many pure neural network-based methods. Fast
‘computation speed is attained by requiring much
Furry Move. loewrinicarion 273Jess tunable parameters than traditional neural
networks to achieve a highly nonlinear mapping,
and is also attained by optimizing a large fraction
‘of the parameters via linear least-squares estima-
tion, thus further reducing the use of back-
propagation, Because ANFIS typically has much
less tunable parameters than a traditional neural
network, it can avoid the pitfall of over-fitting
the training data, thereby achieving excellent
generalization ability. Comparison of ANFIS
with our model identification method is pa
ticularly interesting because of the similarity in
model structure. ‘The performance of ANFIS
Provides a good indicator of the added benefit
and computational burden that accompany non-
linear optimization of the Takagi-Sugeno type
rules.
For the Mackey-Glass time series problem,
we used the same data set as that used in Gang
1993), which consisted of 1000 data points ex-
tracted from f= 118 to 1= 1117. The first 500
data points were used for training the model,
and the last 500) data points were used for check-
ing the generalization ability of the model. As
mentioned previously, the cluster radius r,s an
‘approximate specification of the desired res-
olution of the model, which can be adjusted
based on the resultant complexity and generali-
zation ability of the model. To illustrate this
principle, we applied the cluster estimation
method with different values of r, ranging from
0.15 10 0,5 (ie., the cluster radius was varied
from 0.15 to 0.5 times the width of the data
hypercube), This produced models of varying
size, ranging from 69 rules to 9 rules. For each
‘model, the consequent equations for a Ist order
‘model were then optimized. Figure 8 shows the
number of rules generated, as well as the model-
ing errors with respect to the training data and
checking data, as functions of the cluster radius.
i lee
2 oon jad
: nd
Chena 5
Figure 8. Model size and prediction error as func~
tions of cluster radius,
274 eww
Sample.
Figure 9. Comparison of training data with model
‘output for Mackey-Glass time series,
We see that the error with respect to the training
data and crror with respect to the checking data
begin to diverge when the cluster radius is less
than 0.3, showing that the model is over-fitting
the training data as the number of fitting param-
ters becomes too large. We use the results at
7, =0.3 (a model with 25 rules) as the basis for
comparison with other algorithms. Figures 9 and
10 show the model output evaluated with respect
to training data and checking data, respectively.
We see that the model output is indistinguishable
from both training and checking data. The clus-
ter centers and consequent equations for this
model are listed in Appendix A.
‘The modeling error with respect to the check~
ing data is listed in Table II along with the
results from the other methods as reported in
Jang (1993) and Crowder (1990). The error
index shown in Table IL is a non-dimensional
error defined as the RMS error divided by the
standard deviation of the actual time series
Giang. 1993; Crowder, 1990). Comparison be-
tween the various methods shows that the cluster
estimation-based method can provide approx
mately the same degree of accuracy as the more
complex methods. Only ANFIS produced a re-
‘Samplene,
Figure 10. Comparison of checking data with model
‘output for Mackey-Glass time series.Table I. Comparison of Results from Differ
Methods of Model Identification”
Method
# Training Error
Data Index
Cluster Estimation-Based 500 ois
ANFIS 500 0.007
‘Auto-Regressive Model 500 0.19
Cascaded-Cortelation NN 500 0.06
Back-Prop NN 500 0.02,
6th-order polynomial 500 0.08
Linear Predictive Method 2000 0.55
= Rows 2 and 3 are from Jang (1995); the last 4 rows are
from Crowder (1990),
sult that is more accurate than the cluster estima-
tion-based method. The ANFIS model refer
enced here uses two membership functions for
each input variable; for the 4-input Macke
Glass problem, this leads to 2*= 16 fuzzy parti
tions in the input space, and thus 16 rules. This
ANFIS model has 24 parameters optimized via
back-propagation (3 parameters for each premise
membership function) and 80 parameters opt:
mized via linear least-squares estimation (5 pa
rameters for each consequent equation). The
cluster estimation-based method docs not in-
yolve any nonlinear optimization but the 1:
sultant model has 125 consequent parameters
optimized via linear least-squares estimation.
Identifying the Mackey-Glass model via the
ANFIS algorithm (coded in C) required
1.Shours on an Apollo 700 series workstation
Gang 1993), while the cluster estimation-based
algorithm (also coded in C) was able to idemtfy
the model in 2 minutes on a Macintosh Centris,
660A (68040 processor running at 25 MHz)
‘Trip Genenarion MoDELine
‘We have applied the model identification method
to estimate the number of automobile trips gen-
erated from an area based on the demographics
of the area, Five demographic factors were con-
sidered: population, number of dwelling units,
vehicle ownership, median household income.
and total employment. Hence, the model has 5
input variables and 1 output variable.
‘Demographic and trip data for 100 traffic anal-
ysis zones in New Castle County, Delaware,
were used; this data was transcribed directly
from (Kikuchi et al., 1994). Of the 100 data
points, we randomly selected 75 as training data
and 25 as checking data. Using r,=0.5, the
Pili etiei tbe Eee
Figure 11. Comparison of checking data and model
‘output for trip generation modeling.
model identification algorithm generated a Ist
order model comprised of 3 rules. The computa-
tion time was 2 seconds on a Macintosh Centris
660AV. The cluster centers and consequent equa-
tions for this model are listed in Appendix B
along with the data set. A comparison of the
model output with that of the checking data is
shown in Figure 11. The average modeling error
with respect to the training data was 0.34 and
that with respect to the checking data was 0.37,
indicating the model generalizes well.
The fact that we can accurately cover a 6
dimensional data space with only 3 rules attests,
to the particular advantages of using the Takagi-
Sugeno type rules.
ConcLusion
We presented a cluster estimation method based
fon computing @ measure of potential for each
data point and iteratively reducing the potential
of data points near new cluster centers. The
computation grows only linearly with the dimen-
sion of the problem and as the square of the
number of data points. This method can be used
to estimate the number of clusters and their
locations for initializing iterative optimization-
based clustering algorithms such as fuzzy C-
‘means, or it can be used as a stand-alone approx-
imate clustering algorithm,
Combining the cluster estimation method with
a linear least-squares estimation procedure pro-
vides a fast and robust algorithm for identifying
fuzzy models from numerical data, Fast compu-
tation and robustness with respect to initial pa-
rameter values are achieved by avoiding any
form of nonlinear optimization. Robustness with
respect to noisy data is achieved by the data
Furzy Moog loewriricarion 275,averaging that takes place in both the cluster
estimation and least-squares estimation proce-
dures. The cluster center selection criteria also
avoid engendering a rule from a few erroneous
outlying data points. Although there exist even
simpler and faster model identification methods
based on look-up tables (Wang and Mendel,
1992) and nearest neighbor clustering (Wang,
1993), they are sensitive to noisy data and prone
to generating a rule from a single outlying data
point. Compared with more complex. model
identification methods, our method can provide
similar degree of accuracy and robustness with
respect to noisy data while significantly reducing
computational complexity
‘The author thanks Jyh-Shing Jang at MathWorks Ine.
for providing the data set for the Mackey-Glass
benchmark problem.
Appenpix A: MacKEY-Giass
Exampie
‘The input coordinates of the 25 cluster centers:
obtained in the Mackey-Glass example are
yy = (0.9479
1.0754
1.1302
0.6686
1.0659
Lissa
1.1135
0.8264
0.6652
0.9671
0.7423,
0.8575,
0.9930
1.2001
1.2177
0.4807
0.4978
1.0240
0.7635
Lisi2
1.0840
0.6038
1.2985
0.8753
0.6581
0.8067
0.9713
0.6702
1.1165
1.1382
1.1514
1.0133
1.0728
0.8227
0.7594
0.6652
1.0007
0.9652
0.8946
1.2001
0.8575
0.5256
L1si2
0.5383
1.2551
0.7295
0.4473,
1.0902
0.6038
0.9713
0.7005
0.9814
0.6539
1.2329
1.1393};
1.0481}:
0.7805};
1.1912};
1.0942}:
0.6748};
0.8227);
0.9736);
1.1350};
0.6462);
0.8946);
1.0007};
0.9060)
1.2551}:
0.5456};
1.2063};
0.4978};
0.7307};
0.7687};
0.4473};
0.9814)
0.6865};
1.0441};
0.9690};
13114}:
The corresponding output equations are
=| -08077 0.8586 1.0985 —Masedly + 1.6873;
[1067 saat v6s09 —n6id3}y+ 2941
= [0.7946 1.0950 0.5198 0.1406} +
EE =] 0308 W463 ~0.2007 O.tlonly +
0313903855 0.2478 Longs] + —
LM 0.8059 —0.1730)y +
0124 O16 605} +
0.2308 03994 02059}y-4 02751,
=N8965 OAD —OTRAS]y + 22601
HUIMS 0.0730 O.5958)y-+ 1.0357:
S097 0.0372 WasoRhy +1305
Wabi —OX6L 6957} + —O.0812
4.7959 077 6178}y + 0.0061
“US888 OTHLE USOH|y + UL444y,
02107-00846 Hai]y + 0.9531
AT 0.0097 1.9832] + 0.0493;
fn 4634 0.479 O.6150]y+ 1-202
=| 02107 7624 05270 UsaIa]y+ 0.050
=] O4us8s 0.71% Ha10GOTB]y= Lams,
0.3792 0889102783 Oat] y+ LSS
Pap 027m 0.025 020% asi0l]y + 0.0530,
PHA [14782 0386407371 —024O]y+ 2.958;
28 = [poss ons 0.1202 “0. 1231}y+ 1305
O.ATI2 R240 01140 117589] + -O.OTOL
= M6685 1.9702 WOLLD—L3s74)y + 0.8855
where y is @ column vector of the input values:
[x= 18), x= 12), x(¢= 6), x(0)}
ApPeNpix B: TRIP GENERATION
Mopvetine
The input coordinates of the 3 cluster centers
‘obtained for the trip generation model are
¥} = (1.5070 0.6570 0.7060 15.7350 0.6360};
{3.1160 1.1930 1.4870 19.7330 0.6030};
{0.0440 0.0240 0.0210 9.3400. 0.8500):
The corresponding output equations are
[04337 sod —ust26 -uo106 orusy
1.1066;
SE=[=0.7690 0.1857 2.1078 LT 1.4631)y
+ 8378;
Ties “19202 7.5781 O8SS UsDTy
+0519;
where y is @ column vector of the input values:
{population, number of dwelling units, vehicle
ownership, median income, total employment]
The trip generation data are given in Table B-L
All numbers are expressed in units of a
thousand.Table B.
‘Trip Generation Data
‘Table BA. (continued)
Popula- Dwelling Vehicle Median Total No.of Popula- Dwelling Vehicle Median Total No. of
tion Units Owner Income Employ- trips tion Units Owner Income Employ: trips
ship ment ship ment
“Training Data
0038 0032 GUT TTS 16.499 8460 2487 0788S 240] 1.399 3198
0.148 0.062 0.051 9.375 3.176 2.250 1.089 0.430 O.7BEARGTL OKT L.A
O24 0.139 0.068 «6.473 0.193 0317 0.293 0.117 OBS 27.220 0.108 OSI,
0.222 0.204 0.064 7.738 3.450 1.907 -2.TOL_ 0.836 STB 1.686.322 2.10
0.023 0.013 0.039 10.125 7.064 4.746 1028 0453 0.657 15.944 1.812 4.431
OAL 0.192 0.093 7.824 0.179 O37S 3.562 1.36) LSM 20.083 2.000 5.049
0.132 0.040 0.017, «S.378 0.627 OSH 4.247 1.651 2.526 20.93 0.265 3.138
0.084 0.024 0.021 9.340 0.850 0.768 2674 1,022 L649 26.034 6.109 2.015
0.066 0.011 0.004 11.705 0.859 0.605 V381 OSIS 0.961 31.280 3.363 2.302
0.337 0.250 0.203 17.406 2.208 LA3l 0494 0731 188731286 1.714
1465 0471 0436 17.340 178 2.210 SOS 1.460 25.000 0.477 2.388
1357 0433 0.303 8.260 0843 0450 0.188 «0.212 13.251 VIB 0.951
407L LMS 1339 11.590 2.801 3.206 07S 1.567 17.099 0.710 2.373
2.277 0.78 0569 18-451 1239 3071 1.035 1.562 20.454 0.599 2.670
3837 1.395 1347.15 27596 2.358 0.732 1.285 4.159.592 1.749
1590 0.762 0.787 11.708 1,304 2507 0.857 L495 21.843 0.073 1.723,
BMG L193 1A87 19.733 2385 GS7T 2.122 -3.710 26.349 -0.536 4.02
1368 0.517 0.708 25.681 1.883 3.007 1175-2014 25.889 0.684 3.151
0.937 0.488 0.639 21.691 1242 © XSSL L674 DIB 19.401 1.584 4.698
319 2.333 1.766 13.403 0.870 2.946 0.229 098.14 3HAS LOZ 5.127
1877 0.763 0917 18.750 1.565 2483-2731 OSL SAS MSDS DATS STS
L872 O83) 1.065 27.661 0.667 L868 3.510 1.008 2.273 41.679 0.235 2.8od
3.107 13691561 16.843 L146 2.772 3.390 1.073, 2.329 48440 0.316 3.074
1535 0.605 0.784 16.544 0.399 1.313 2.246.809 LATT.31.303_ 0.111.805
2326 0.981 0.612 11454 0.573. 1.a72 ae
3300 0.925.775 11.560 0.942 1.852 Checking Data
2223 0.8590 0.970 16.400 2.528 3.177 te eee
3274 06s 1.008 13.193 0.750 2.037, Ay ue ee) es
1.490 0.592 0.685 13.796 1.797 2.037 a ee eae
3447 1.366 1.396 16461 0.705 2.349 MOBS O.DIS DIZ 12.624 3.2
3.035 1176 1.494 18.963 O47 2.169 SOB .290 0.523 28.835 DIL
0,003 0.008 6.002 6.036. 1.687 1.272 pee oe
1,600 0.506 0.260 5.369 L385 1.467 eo rd Bu ay
0.398 O.ISL 0.132 8.159 0.625 0.688, 3.005 1.098 2.288 eeu
239 0.901 0542 S603 1.27 Ta3s ALIS DOH OUTS oe
2.698 1.268 0.541 6,663 0.758 1.548 Ga) el ae
414 0.144 0080 3.119 0245, 0.387 oa es
0.037 G.015 0.005 9.659 1.277 0.966 pnp ell Lue
2.76) 1333-1200 13498 1.064 2.996 ae pee ce ee
4476 1.656 2.719 26.266 OSIT 3.631 ey ee i el
1507 0.657 0.706 15.735 0.636 1.446 oe need oie
1179 0.434 0.784 32.328 ae ee) es aed
1727 690 4.232 30.573 pie alas ee
Be ee Hes 1.093, 0.095 019 0.860)
aos iaTi 2iicelt nasal 1859 0738 0.396 1.346
Hee eG ae) 2.083 0292 0777 1.251
naa as ovale 20M 0.384 9083 9.401 20.529
fe rsy tose fovea asia) 202 0.759 LAT 258.619 1.259.911
Toe Gait. eae alsa 3219 1337 L782 15.466 0.795 3.221
oie latat | ise oral 2.280 M68 1391 27.008 OS212.327
iaiat one; yar) 0s) 0992 0296 032 WAN 11371468
Fuzzy Moos toenriricarion 277REFERENCES
Astrom K, Wittenmark B (1988): Computer Cont-
rolled Systems: Theory and Design. Englewood
Ciitfs, NI: Prentice Hall
Beadek J (1974); “Cluster validity with fuzzy sets.” J
Cybernetics 33): 8-71
Bezdek J, Hathaway R, Sabin M, Tucker W (1987):
“Convergence theory for fray e-means: Counterex-
amples and repairs.” The Analvsis of Fuczy Ine
formation, Bezdek J (ed). CRC Press, Vol. 3, Chap.
8
Crowder RS (1990): “Predicting the Mackey-Glass
time series with cascade-correlation learning.” In
Proc. 1990 Connectionist Models Summer School,
Carnegie Melion University, pp. 117-123.
Dunn J (1974): “A fuzzy relative of the ISODATA
process and its use in detecting compact, well sepa-
rated cluster” J. Cybernetics 33): 32-57
Jang JSR (1983): “ANFIS: Adaptive-network-based
fuzzy inference system.” IEEE Trans. on Stems
Man & Cybernetics 23(3): 665-685
Kikuchi $, Nanda R, Pesincherry V (1994): “Estima-
tion of trip generation using the fuzzy regression
method.” 1994 Annual Mecting of ‘Transportation
Research Board, Washington, D.C
Mackey M, Glass L (1977). “Oxilation and chaos in
physiological control systems” Science 197: 287-289
Powell MID (1987): “Radial basis functions for mul
\atiable interpolation: A review.” In Algorithms for
Approsimaion, Mason J. Cox M (ed), Oxford:
Clarendon Press, pp. 143-167
278 cw
Strobach P (1990): Linear Prediction Theory: A
Mathematical Basis for Adaptive Systems, New
York: Springer-Verlag
Sugeno M, Tanaka K (1991): “Successive identiiea-
tion of a fuzzy model and its applications to predic-
tion of a complex system.” Fuzzy Sets and Systems
42(3): 315-384,
Takagi T. Sugeno M (1985): “Fuzzy identification of
systems and its application to modeling and
control.” fEEE Trans. on Systems, Man & Cyber:
netics 18: HO=
Wang LX, Mendel IM. (1992): “Generating fuzzy
rules by learning from example." /EFE Trans. on
Systems, Man & Cybemerics 22(6).
Wang LX (1993): “Training of fuzzy logic systems
using nearest neighborhood clustering.” Proc. 2nd
IEEE Int | Conf. on Fuzzy Systems (FUZZ-IEEE),
San Francisco, CA, pp. 13-17.
‘Yager RR, Filev DP (1992): “Approximate clustering
via the ‘mountain method.” Tech. Report #MIT-
1305, Machine Intelligence Institute, Yona College,
New Rochelle, NY, Also to appear in IEEE Trans.
fon Systems, Man & Cybernetics
Yager RR. File DP (1993): “Learning of fuzzy rules
‘by mountain clustering.” Proe. SPIE Conf. on Ap-
plications of Fuzzy Logie Technology. Boston, MA.
pp. 246-254,
Received January 1994
Accepted June 1994