Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views9 pages

Forecasting With Neural Netwworks - Fletscher

Uploaded by

mats.14022014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views9 pages

Forecasting With Neural Netwworks - Fletscher

Uploaded by

mats.14022014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Information & Management 24 (1993) 159-167 1.

59
North-Holland

Applications

Forecasting with neural networks


An application using bankruptcy data

Desmond Fletcher Introduction


The Unic~ersityof Southern Mississippi, Hattiesburg, MS, USA
Forecasting bankruptcy from financial ratios is
one of the more common probability models en-
Ernie Goss
countered and represents one of the earliest at-
Creighton University, Omaha, NE, USA
tempts to use logit regression to estimate proba-
bility models in the business world [ll. Recent
empirical research has shown the logit model to
In the business environment, Least-Squares estimation has
long been the principle statistical method for forecasting a
be viable for predicting business failure or
variable from available data with the logit regression model bankruptcy [3]. Despite the popularity of the logit
emerging as the principle methodology where the dependent model, it is our hypothesis that artificial neural
variable is binary. Due to rapid hardware and software inno- networks (ANN& due to their generalization and
vations, neural networks can now improve over the usual logit
abstract mapping capacity, offer forecasting capa-
prediction model and provide a robust and less computation-
ally demanding alternative to nonlinear regression methods.
bilities superior to the logit model.
In this research, a back-propagation neural network method- As an outgrowth of stochastic approximation
ology has been applied to a sample of bankrupt and non- methodologies, ANNs have been in the develop-
bankrupt firms. Results indicate that this technique more ment stage for several decades and have been
accurately predicts bankruptcy than the logit model. The
successfully applied to problems ranging from
methodology represents a new paradigm in the investigation
of causal relationships in data and offers promising results.
pattern recognition, process control, and image
enhancement to data classification and forecast-
Keywords: Neural networks; Logistic regression; Forecasting ing. These successes in a broad range of fields
models; Bankruptcy prediction; Neural network forecasting; can be attributed to the development of new
Back-propagation; Cross-validation.
ANN mechanisms during the “second wave of
artificial intelligence” [13]. Having been previ-
ously limited to linearly separable data during the
‘perceptron’ phase, the advent within the last
decade of network mechanisms such as back-
Desmond Fletcher is an Assistant
Professor of Engineering Technology propagation of error has brought a tremendous
at The University of Southern Missis-
sippi. He received his Master of Ar-
chitecture degree from The Univer- Ernie Goss is currently the Jack
sity of Texas at Austin in 1978. Prior MacAllister Chair in Regional Eco-
to his academic appointment, Mr. nomics at Creighton University. Dr.
Fletcher was president of Earthforms Goss was formerly Professor of Man-
Construction Co., Inc., in central agement Information Systems at the
Texas. Current research involves University of Southern Mississippi
computer augmented analysis and de- and 1990 and 1991 NASA Faculty
sign related to construction and archi- Research Fellow at Marshall Space
tecture. Mr. Fletcher is Project Direc- Flight Center. He received his Ph.D.
tor for the Mississippi Residential Radon Survey. in Economics from The University of
Correspondence to: Desmond Fletcher, M. Arch., Assistant Tennessee in 1983. He has published
Professor, School of Engineering Technology, The University over fifty
: research studies focusing
prlmardy on methods of data analysis.
of Southern Mississippi, SS Box 5137, Hattiesburg, Mississippi He is a member of the Editorial Board of The Review of
39406, USA. Tel. 601-266-5185, Fax 601-266-5829. Regional Studies.

0378-7206/93/$06.00 0 1993 - Elsevier Science Publishers B.V. All rights reserved


160 Applications Information & Management

resurgence of interest in neural networks as a expected performance of an estimator on previ-


viable alternative to usual linear regression meth- ously unseen data. Second, the results of the
ods and the more computationally demanding BPNN(A) are compared to the standard logit
nonlinear regression methods [191. regression (LR), a widely used prediction and
In the business environment, ANNs have been classification methodology. This comparison is
developed for a variety of applications such as made in terms of (1) minimization of prediction
proprietary stock market forecasting, corporate risk, (2) efficiency of the estimator, and (3) maxi-
bond rating prediction [6] [17], emulation of mort- mization of the correctness ratios for unseen test
gage underwriting judgements [5], and systems data. And finally, a family of mean risk curves are
development projects [ll]. For example, CRED- generated from the optimal BPNN(A) to illus-
ITVIEW, a hybrid neural network, is used to trate forecasting empirical probabilities of bank-
reduce losses on loans made to public and private ruptcy.
corporations. Developed by Chase Manhattan
Bank, this model performs three year forecasts
that indicate the likelihood of a company being Description of application
given a risk assignment of good, problematic or
charged-off. CREDITVIEW uses historical finan- A typical business forecasting application in-
cial data on good and bad obligators, along with volves the binary “go, no-go” decision. An exam-
industry norms from COMPUSTAT, to formulate ple might include the make or buy decision with
forecasts [ 141. the dependent variable Y represented as 0 or 1.
The objective of this study is to illustrate the The dependent variable, whether to make (0) or
development of a forecasting model using a par- buy (11, is a function of independent variables
ticular class of ANNs (Back-Propagation Neural such as the cost of capital and other specific
Networks, or BPNNs) in a standard business ap- opportunity costs. Th,e accuracy of the estimated
plication. First, the optimal generalizing BPNN dependent variable Y, and therefore the risk of
model (indexed by A) is determined from avail- making a decision, is a function of the amount of
able historical financial data using a c-fold cross- information contained within the independent
validation technique to estimate prediction risk. variables and the degree to which the estimator
Prediction risk is used here as a measure of the can generalize across the domain of the problem

Table 1
Sample of failed and non-failed firms.

ID Failed (1) CR QR IR ID Non-failed CR QR IR

l-l Westates Petroleum 1.39 0.67 0.34 o-1 Universal 6.43 4.49 0.19
l-2 Cott Corp. 2.00 0.27 0.02 o-2 ME1 Corp. 1.29 0.33 0.98
l-3 American Mfg. Co. 3.20 0.73 0.93 o-3 Gaynor-Stafford 5.20 0.78 -0.20
1-4 Scottex Corp. 1.59 0.09 -0.33 o-4 Compo Ind. 3.77 0.32 - 0.09
l-5 Lynnwear 1.70 0.24 0.09 o-5 Movie Star Inc. 2.80 0.23 0.13
1-6 Nelly Don, Inc. 1.70 0.15 0.13 O-6 Decorator Ind. 3.18 0.57 0.02
l-7 Mansfield Tire & Rubber 2.50 0.09 0.06 o-7 Pope & Talbot 1.90 0.15 0.09
1-8 Brody Seating Co. 2.70 0.14 0.01 O-8 Ohio-Scaly 2.60 0.97 0.35
1-9 Paterson Parchment Paper 2.41 0.09 - 0.04 o-9 Clevepak Corp. 3.28 0.30 0.42
I-10 Rowland Inc. 1.73 0.08 0.02 O-10 Park Chemical 4.91 1.15 0.15
1-11 Pasco Inc. 1.51 0.12 0.16 o-11 Holly Corp. 1.55 0.11 0.37
l-12 RAI Inc. 1.16 0.01 0.09 o-12 Barry (R.G.) 3.00 0.38 0.15
1-13 Gray Mfg. Co. 3.31 0.49 0.11 o-13 Struthers Wells 1.61 0.21 0.39
1-14 Gladding Corp. 2.08 0.04 0.07 o-14 Watkins-Johnson 4.01 0.09 0.19
1-15 Merchants, Inc. 2.73 0.35 0.23 o-15 Banner Ind. 2.30 0.32 0.19
1-16 Shulman Transport 1.13 0.13 0.22 O-16 WTC Inc. 1.17 0.33 0.85
1-17 Reeves Telecom Corp. 3.20 0.63 0.20 o-17 Gross Telecasting 8.80 6.91 0.25
l-18 Plaza Group Inc. 0.91 0.03 - 1.09 O-18 Total Petroleum 2.15 0.25 0.33
Information & Management D. Fletcher, E. Goss / Forecasting with neural networks 161

for new data. Typically, a risk cutoff value is (2) QR: quick ratio [(cash + other near cash as-
selected by the analyst and for categorization sets)/ (current liabilities)].
purposes is normally between 0.5 and 1. All ob- (3) IR: income ratio [(net income/working capi-
servations with predicted risks equal to or above tal].
this value are categorized as 1 (failed) and 0
(non-failed) when less than this value. The choice
of the cutoff value depends on the relative cost of Model training and selection methodology
incorrectly categorizing an observation as a 0
when it is not, versus the converse. Due to the small sample size after matching,
This binary decision approach is also used to direct training-to-test set validation is not advis-
develop an estimator to forecast bankruptcy. In able. A variation of the cross-validation method
this application, the independent variables are known as u-fold cross-validation (CV,) was se-
financial ratios as described below. The depen- lected to conduct analyses of the models. This
dent variable is determined to be 0 for a non- method was introduced by Geisser 171 and Wahba
failed firm and 1 for a failed firm. If the informa- et al. [18] and is illustrated in a case study of
tion contained within the independent variables corporate bond rating predictions by Utans and
is sufficient, the estimated dependent variable 9 Moody [17]. The selection of the optimal model(h)
then represents an empirical probability in the architecture is based on an estimator CV, of the
range of 0 to 1 of the event occurring. Further- prediction risk PA, which is a measure of general-
more, since determining. a firm’s trend to ization ability of model(h). Using this approach,
bankruptcy is of more interest than categorizing the data is divided into u = 18 subsets of six
bankruptcy ex post, we are concerned with the observations for each test set with three rotation-
accuracy of the model in risk categories less than ally selected from the failed group and three
0.5. from the non-failed group for a total of 108
Data for the empirical tests were drawn from observations in the test sets. This also provides 18
an earlier study by Gentry et al. [81. In their training sets of 30 observations. Each observation
sample selection, failed companies were matched is represented three times in the test sets. The
with a sample of non-failed companies that were cross-validation mean square error of each subset
in the same industries and approximately the j as defined by
same size in terms of total assets. Additionally, in
order to control for general economic conditions,
the time frames for the failed and non-failed
CV,,(A)= ;
I
c (h -qpj)(xk))27
firms were matched [12]. After the deletion of
where t, is the actual targeted dependent vari-
firms with incomplete data, there were 18
able for a given observation in the subset and
bankrupt firms. Each of the 18 failed companies
@;<P,>(x,> is the expected value generated by an
was then matched with a non-failed firm based
approximation function. The prediction risk is
on asset size and sales for the fiscal year previous
then defined for each model(h) by
to their bankruptcy.
A listing of the 36 companies used for the
empirical analysis is presented in Table 1. The W(A) = ; CCV,,(A). (2)
current ratio (CR), quick ratio (QR) and income
i
ratio (ZR) of each firm are also presented in the As with the LR model, the output (or pre-
table and defined as follows: dicted dependent variable 9) of the BPNN can
be constrained to values from 0 to 1. The network
Dependent Variable (Y): is composed of an input layer, a hidden layer and
1 for those companies that failed and 0 for those an output layer of nodes (also known as neurons
that did not. due to biological similarities). Information pro-
cessing is performed through modification of con-
Independent (Explanatory) Variables: nection weights (W,) as normalized observation
(1) CR: current ratio [(current assets/current lia- patterns are passed along connections from the
bilities)]. input through the hidden to the output layer.
162 Applications Information & Management

BACK-PROPAGATION NEURAL NETWORK

INPUT HIDDEN OUTPUT


LAYER LAYER LAYER

C-RATIO

I-RATIO

JRE
Q-RATIO

BIAS
NODES

ERROR
*p_i &Pk
J
Fig. 1. Diagram of empirical neural network model.

This distinction between layers can be traced to cesses the dependent variable. The input layer
Rosenblatt’s [15] early work, which divided net- distributes the patterns throughout the network
works into sensory, associative and response units. and the output layer generates an appropriate
In the BPNNs, the input nodes process the response. The middle layer of nodes acts as a
independent variables while the output node pro- collection of associative feature detectors and is

TRAINING EVENTS
(Thousands)

Fig. 2. Minimum error in test set vs. learning events.


Information & Management D. Fletcher, E. Goss / Forecasting with neural networks 163

termed ‘hidden’ because it does directly process bankruptcy risk index, and hidden nodes ranging
information to or from the user. The state of from three to seven, respectively. Although neu-
each node is determined by signals sent to it from ral network research is still in its infancy, this
all connected nodes. These signals are biased by approach appears to provide a principled mecha-
the value of the connection weights WA between nism for determining the optimal network archi-
nodes. Appendix 1 contains a summary of the tecture as the following results will verify.
mathematical derivation of the BPNN signals be- Each of the BPNN(A) and LR(h) were then
tween layers as developed in 1986 by Rumelhart trained with LY= 0 and n = 0.01. In NeuroShell, a
et al. [16]. real-time comparison is maintained between the
Figure 1 displays the configuration of the opti- minimum errors of the training sets and hold-out
mal BPNN(A) developed using the commercial test sets. Figure 2 illustrates a segment of the
package NeuroShell 4.1. The BPNN model is a BPNN learning process at the test set classifica-
single hidden layer feed-forward network which tion error decreases to a minimum error value for
implements an error back-propagation methodol- one of the test sets at approximately 737,000
ogy. Back-propagation permits connection weights learning events. After this optimal point, the clas-
WA between nodes to be modified in a supervised sification errors of the test set increase as memo-
fashion using gradient descent to minimize the rization of the training set begins, generalizing
error function. In supervised learning models, capability declines, and the BPNN is less able to
known pattern pairs of target outputs and neural estimate the dependent variable from previously
network outputs are repeatedly presented to the “unseen” data. NeuroShell automatically retains
network to adjust the network WA. the optimal network WA when the minimum mean
Kolmogorov’s Mapping Neural Network Exis- square error in the test set has been reached.
tence Theorem [9] states that any continuous
function can be implemented with the network
structure described above using 2n + 1 hidden Empirical results
nodes, where n represents the number of input
nodes. This is also the recommendation of Caudill The five BPNN(A) were then compared to the
[4]. However, in some cases this may lead to output of the LR(A) for each of the 108 test set
fitting (or memorizing) the training set too well observations for training efficiency, percent cor-
resulting in poor generalization capabilities. In rect per risk category, and model efficiency as
practice, the number of hidden nodes for optimal determined by estimated prediction risk (CV,)
generalization should be tested in a range from and variance of the errors. These results are
approximately 2& + m to the value 2n + 1, presented in Table 2. Results for all of the test
where m represents the number of output nodes. sets were obtained using an 804861 microproces-
Therefore, for the empirical estimations, five sor running at approximately 2.5 megaHertz.
BPNN models were developed using three input Mean training times and number of learning
nodes, each corresponding to an independent events completed for the BPNN(A) above 4HN
variable, one output node, representing the generally increased with model complexity WA

Table 2
Model performances.

Category LR 3 HN 4 HIV 5 HIV 6 HN 7HN


Percent Correctly Predicted 71.3 80.5 82.4 75.0 74.1 75.0
No. of coefficients (W,) 4 11 16 21 26 31
Mean Training Time (minutes) 0.3 12.6 9.8 14.8 13.8 18.8
Mean Learning Events (thousands) NA 626 354 467 483 597
Mean Learning Events per Minute NA 46.9 35.9 31.6 34.8 31.4
Training Efficiency Indicator NA 3.70 3.66 2.13 2.52 1.67
Variance of error 0.04 0.02 0.02 0.02 0.019 0.02
Cross-Validation Prediction Risk CV, 0.189 0.146 0.143 0.164 0.165 0.165
164 Applications Information & Management

and training efficiency decreased. In all cases, The most statistically efficient model is deter-
training the BPNN(A) represents a larger devel- mined from the variance of errors and the predic-
opment cost than with the logit model. However, tion risk CV,. As can be seen from Table 2, the
as indicated by the neural network training effi- variance of the errors for all of the BPNN(A) is
ciencies, (which are ratios of mean training time less than for the LR(h). This indicates that the
to the average number of learning events per logit model is a less efficient estimator. Further-
minute), the 3,, and 4,, models are the most more, of the BPNN(A), the model with 4,, has
efficient at approximately 3.70. In terms of neural the least value for CV,. As the number of hidden
network development cost, the BPNN(4,,) is the nodes increases above 4, abstract mapping capa-
most desirable with the least training time. bilities appear to decrease and the CV, for the
The BPNN(4,,) model also most accurately BPNN(A) approaches that of the LR(A). The
predicted previously unseen observations from neural network model with 3,,, does not appear
the test sets. The total is 82.4 percent at a risk to be able to extract as much information from
cutoff value of 0.5. All of the models, including the independent variables. In terms of prediction
the LR(A), predicted approximately 77 percent of risk, model efficiency, and total percent correct,
the 54 unseen test set observations targeted as 1. the BPNN(4,) performed better than all other
However, of the 54 unseen test set observations models and is selected as the optimal BPNN.
targeted as 0, the BPNN(4,,) surpasses all other Also, with respect to development cost, slow
models with 89 percent accurately predicted. The learning rates are critical to minimize the vari-
bar chart in Figure 3 illustrates the comparison of ance of the errors and therefore increase the
the percentages accurately predicted by model efficiency.
BPNN(I,,,) versus the LR(A) in a range of risk A family of risk curves (at the 0.5 cutoff value)
categories with cutoff values from 0.25 to 0.75. was then generated from the optimal BPNN(4,,)
The BPNN(4,,) also predicts a higher percent- to illustrate the potential use of neural networks
age of non-failed firms at risk virtually indepen- in forecasting bankruptcy and graphically assess
dent of risk category. the impact of each independent variable on ?.

m BPNN(4HN) m LOGIT

Fig. 3. Comparison of percent correct per risk category


Information& Management D. Fletcher, E. Goss / Forecasting with neural networks 165

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


QUICK RATIO

+ C-RATIO = 1.61 -E- C-RATIO = 1.93 - C-RATIO = 2.25


++ C-RATIO = 2.56 X C-RATIO = 2.82 t C-RATIO = 3.20

Fig. 4. Critical cwves for range of current ratio at 0.5 cutoff.

NemoShell provides a function which calculates As presented in Figure 3 and Table 2, the
the contributions each of the independent vari- neural network outperforms the logit function
ables makes to the modification of W,,. The CR, with the BPNN(4,) selected as the most effi-
IR, and QR contributed 43.3, 45.4, and 12.3 per- cient predictor. In terms of forecasting capability,
cent, respectively. This relationship can be readily not only does this model more accurately predict
seen in Figure 4. For a range of CR (1.61 to 3.21, a higher percentage of firms in the test sets,
the curves represent the estimated 0.5 cutoff value BPNN(4,,) has less variance in the errors and
with respect to the IR and QR. Firms with low lower prediction risk as determined by CV,. This
current ratios are less tolerant to changes in QR means that BPNN(4,) is more statistically effi-
and there is a significant change in the y-inter- cient and will provide more accurate forecasts in
cept as IR increases. the population. Also, the results indicate that
26 + m hidden nodes is the most appropriate
for this application.
Summary Out of the thirty-six firms tested, four were not
correctly predicted by any model. This suggests
These results correspond closely to other stud- missing explanatory variables for the models(h).
ies which have likewise found neural networks For this reason the BPNN(I,,,) is not regarded
better at extracting information from attributes as a completed production model. Yet, this re-
for forecasting purposes. In a neural network search begins to bridge the gap between pure
application examining bond rating, Dutta and statistical methods and neural networking. It
Shekhar [6] show how a neural network is able to shows neural networks to be a viable alternative
forecast more accurately than standard regres- to more traditional methods of estimating causal
sion. However, their approach did not use a com- relationships in data and offers a new paradigm
parison with logit regression as in this research. of computational capabilities to the business
166 Applications Information & Management

practitioner. The pattern recognition and gener- crostructure of Cognition, Vol. 1, (ed.), D.E. Rumelhart
and J.L. McClelland, MIT Press Cambridge, MA, 1986,
alization capabilities of neural networks can en-
pp. 318-362.
hance decision making in cases where the depen-
[17] J. Utans and J. Moody, “Selecting Neural Network Ar-
dent variable is binary and available data is lim- chitectures via the Prediction Risk: Application to Cor-
ited. With over 16,000 neural network systems porate Bond Rating Prediction,” Proceedings: First Inter-
purchased to date and growth expected to exceed national Conference on Artificial Intelligence Applications
on Wail Street, IEEE Computer Society Press, Los
20% per year 121, business practitioners can ex-
Alamitos, CA, 1991.
pect to see intensified research efforts and bene-
(181 G. Wahba and S. Wold, “A Completely Automatic French
fit from the implementation of such applications Curve: Fitting Spline Functions by Cross-Validation,”
as are presented in this paper. Communications in Statistics,” 4(l): 1-17, 1975.
[19] H. White, “Neural Network Learning and Statistics,” AI
Expert, December (19891, pp. 48-52.
[20] B. Widrow and M.E. Hoff, 1960. “Adaptive Switching
References Circuits,” IRE, WESCON Convention Record, New
York, pp. 96-104.
111E. Altman, “Financial Ratios, Discriminant Analysis, and
the Prediction of Corporate Bankruptcy.” Journal of
Finance, September (1968), pp. 589-609.
Appendix 1
PI D. Bailey and D. Thompson, “How to Develop Neural-
Network Application,” AI Expert, June (19901, pp. 38-47.
[31 R. BarNiv and R. Hershbarger, “Classifying Financial The feed-forward process
Distress in the Life Insurance Industry,” The Journal of
Risk and Insurance, Spring (19901, pp. 110-135. In a back-propagation neural network, as de-
[41 M. Caudill, “Neural Network Training Tips and Tech- veloped by Rumelhart et al. [16], independent
niques,” AI Expert, January (19911.
variable patterns, or values, are normalized be-
I51 E. Collins, S. Ghosh and C. Scofield, “An Application of
Multiple Neural Network Learning System to Emulation
tween zero and one at the input layer to produce
of Mortgage Underwriting Judgements, Working Paper the signal Oi prior to presentation to the hidden
(Nestor, Inc., 1 Richmond Square, Providence, RI, 1989. node layer. Each connection between the input
161S. Dutta, and S. Hekhar. “Bond-Rating: A Non-Con- layer and a hidden node has an associated weight
servative Application of Neural Networks,” Proceedings
Wij. The net signal Zj to an individual hidden
of the IEEE International Conference on Neural Networks,
Vol. II (19881, San Diego, CA, pp. 443-450.
node is expressed as the sum of all connections
171 S. Geisser, “The Predictive Reuse Method with Applica- between the input layer nodes and that particular
tions,” Journal of The American Statistical Association, hidden node plus the connection value Wgj from
70(350), June 1975. a bias node. This relationship may be expressed
181 J. Gentry, A. Newbold and D. Whitford. “Classifying
as:
Bankrupt Firms with Funds Flow Components.” Journal
of Accounting Research, Vol. 23 (11, Spring (19851, pp. zi = c wijoi + WRj. (3)
146-160.
[91 R. Hecht-Nielsen, Neurocomputing, Addison-Wesley Co., The signal from the hidden layer is then pro-
New York, 1989. cessed with a sigmoid function which again nor-
]lOl D. Hillman, “Integrating Neural Nets and Expert Sys- malizes the values between 0 and 1 to produce 0,
tems,” AI Expert, June (19901, pp. 54-59.
prior to being sent to the output layer. The nor-
1111D. Hillman, “AUBREY: A Custom Expert System Envi-
ronment in LISP.” AI Expert, January (19901, pp. 34-39. malization procedure is performed according to:
I121 H.G. Hunt and J.K. Ord. “Matched Pair Discrimination: 1
Methodology and an Investigation of Corporate Account-
q= 1 +exp-Z,)’
(4)
ing Policies,” Decision Sciences, Vol. 19(2), Spring (19881,
pp. 373-382.
[13] D. Levine, “The Third Wave in Neural Networks,” AI The net signal to an output node Zk is the sum
Expert, December (1990), pp. 27-31. of all connections between the hidden layer nodes
[14] R. Marose, “A Financial Neural-Network Application,” and the respective output node, expressed as:
AI Expert, May (19901, pp. 50-53.
(151 F. Rosenblatt, Principles of Neurodynamics, D.D. Spar- (5)
tan Books, Washington, 1962.
[16] D.E. Rumelhart, G.E. Hinton and R.J. Williams. “Learn-
where Wgk represents a single connection weight
ing Internal Representations by Error Propagation,” in from a bias node with a value of 1 to the output
Parallel Distributed Processing Exploration in the Mi- layer.
Information & Management D. Fletcher, E. Goss / Forecasting with neural networks 167

The net signal is again normalized with the where 77 is a learning coefficient and (Y is a
sigmoid function to produce the final output value “momentum” factor. The momentum factor de-
O,, where termines the effect of past weight changes on the
1 current direction of movement in weight space
0, =
1+ exp( -Z,) ’
(6) and proportions the amount of the last weight
change to be added into the new weight change.
In terms of bankruptcy modeling, 0, represents The error signal S back-propagated to the
the risk of bankruptcy of the individual firm. connection weights between the hidden and out-
put layers is defined as the difference between
the target value Tpk for a particular input pattern
The process of error hack-propagation p and the neural network’s feed-forward calcula-
tions of the signal from the output layer 0, as:
At the output layer the net signal, O,, (esti-
mated dependent variable), is compared to the
spk= tTpk- opk)“pk(l- ‘,k). (8)
actual value of the dependent variable, Tk, to Then connection weights between the input and
produce an error signal which is propagated back hidden layers are changed by:
through the network. Neuroshell 4.1 implements
“Pj=Opj(‘-Opj)C~pk~k.
a variant of the Widrow/Hoff [20] or “least mean (9)
k
square” learning rule known as the Generalized
Delta Rule where output layer error signals are The data feed-forward and error back-propa-
propagated back through the network to perform gation process is continued until a “stopping”
the appropriate weight adjustments after each point is reached. This point is determined by
pattern presentation [9]. Rumelhart et al. [16] comparing the errors in the training set and the
describe the process of weight adjustment by: test set. This methodology prevents “overlearn-
ing” or fitting the training set “too” closely with
Ay,( n + 1) = r&,0,, + aAM$:.,( n), (7) consequent large errors in the test set.

You might also like