Genetic Algorithm and Artificial Neural Network
Genetic Algorithm and Artificial Neural Network
Network 3
Optimization consists of studying different aspects of an initial idea and using the
gained information to improve it. A computer is a perfect tool for optimization when
the factors influencing the idea can be input in a readable format by a computer.
The terminology (best solution) in optimization implies that there is more than one
solution and the solutions are not of equal value.
A Genetic Algorithm (GA) is a high-level procedure and research-based op-
timization technique, which is inspired by the genetic and natural selection that
belongs to the much larger branch of computation known as evolutionary algo-
rithms [140]. The principle of search techniques in GA is based on Darwin’s theory
of evolution [31].
A GA offers a random search in a complex landscape. One general principle
for the implementation of an algorithm for a specific problem is to create a proper
balance between explorations and exploitation of the search space. To reach this
aim, all operators of GA should be examined carefully [101].
In a GA, there is a pool of candidate solutions (called individuals) to any given
problem which is evolved toward a better solution. A set of properties of each
candidate solution can be called a chromosome. A chromosome is composed from
genes and its value can be either numerical, binary, symbols or characters depending
on the problem want to be solved. The output is generated by a minimizing function
from a set of properties of each candidate solution (a chromosome).
The fitness function can be an experimental result or a mathematical function.
It calculates the difference between the desired and calculated output. Therefore,
determining a proper fitness function and recognizing the most important input
variables is really important. The term minimize is used to calculate the output of
the fitness function in GA [90].
An attempt has to be made to select an optimal size for the initial population. Too
small population will not allow sufficient room for exploring the search effectively,
while too large population can increase the computational cost. Therefore, an opti-
mal population should be selected based on the complexity of the fitness function,
computational cost, memory, and time.
A try has been done to show the application of a genetic algorithm in vibration
isolation for a single-wall trench. The aim is to find the best parameters of a rect-
angular trench to reach the highest value of efficiency. The considered parameters
for optimizing are: location (X), depth (D), width (W) and length of the trench (L).
Each parameter of the trench is defined as a gene, which is generated randomly from
the defined range from the Table 3.1. Each chromosome includes 4 genes, which
are the parameters of the trench and the population is a set of all chromosomes.
Figure 3.1 Example of gene, chromosome and population in vibration isolation problem
@seismicisolation
@seismicisolation
3.1 Genetic Algorithm 27
Encoding is the process of representing individual genes. One of the most im-
portant decisions to make while implementing a genetic algorithm is to decide a
method for representing the solutions. The process of encoding can be performed
using binary and floating methods. In binary encoding representation, which is il-
lustrated in Fig. 3.2, each chromosome consists of bit strings. Each chromosome
encodes a bit string. Each bit in the string can represent some characteristics of the
solution. Every bit string is a solution but not necessarily the best solution. The
whole string represents a number.
0.5 0.2 0.6 0.8 0.7 0.4 0.3 0.2 0.1 0.9
Depending on the solution of the problem, the encoding method can be selected.
In vibration isolation, floating encoding is used since all the parameters are real
values and can be decimal numbers, too.
As explained, generatin an initial population is one of the first steps in developing
a GA model. For this purpose, an initial population in vibration isolation topics with
10 chromosomes are generated randomly and illustrated in Fig. 3.4.
@seismicisolation
@seismicisolation
28 3 Genetic Algorithm and Artificial Neural Network
Figure 3.4 Generate an initial population randomly for GA between the defined ranges
The goodness of the chromosome is evaluated as a solution for the problem by the
fitness function. In a genetic algorithm, the chromosome and its solution are repre-
sented as genotype and phenotype. Calculation of fitness value is done repeatedly in
a GA, and therefore it should be sufficiently fast. In most cases, the fitness function
and the objective function are the same as the objective is to either maximize or
minimize the given objective function. However, for more complex problems with
multiple objectives and constraints, an algorithm designer chooses different fitness
functions.
The evaluation of the goodness of each chromosome in vibration isolation topic
is carried out through a finite element model (Plaxis). It means that the dimensions
of the trench as a chromosome will be entered into the Plaxis model. Then, the
efficiency of the trench based on each chromosome will be calculated. This process
means evaluating the goodness of different parameters. Fig. 3.5 represents the cal-
culated efficiency of each chromosome for the initially generated population. As it
can be seen, different efficiencies are calculated for different chromosomes.
@seismicisolation
@seismicisolation
3.1 Genetic Algorithm 29
Genetic operators are the heart of a genetic algorithm for guiding the algorithm
towards a solution to a given problem. Operators create new and fitter chromosomes
[79]. The three main operators are as follows:
1. Selection
2. Crossover
3. Mutation
Genetic operators are used to select the best solutions, called parents, that contribute
to the population of the next generation (selection). Combine selected solutions of
two parents of children for the next generation (crossover) and create and maintain
random population diversity (mutation), which is called recombination.
Selection
Selection is the process of selecting two parents from the population to create a new
population. Following the steps of encoding and evaluating chromosomes with a
fitness function, the next step is to decide how to perform a selection. A selection
@seismicisolation
@seismicisolation
30 3 Genetic Algorithm and Artificial Neural Network
operator aims to emphasize fitter chromosomes in the population. Parents are se-
lected from the initial population. According to Darwin’s theory of evolution, the
best chromosome survives to create a new offspring [113]. Selection is a procedure
to pick chromosomes from the population according to the fitness function evalua-
tion. The chromosome with a higher fitness function has more chance to be selected
[102].
The process of selecting two parents from the population to apply the crossover
is classified as fitness base selection and ranking base selection.
@seismicisolation
@seismicisolation
3.1 Genetic Algorithm 31
@seismicisolation
@seismicisolation
32 3 Genetic Algorithm and Artificial Neural Network
Tournament selection method is used to select the fittest candidates from the
current population in vibration isolation topic.
Crossover
A crossover operator is responsible of taking two parents solutions to produce an
offspring for the next generation in order to explore a much wider area of the
solution space and find the globally optimal solution for the problem. A crossover
selects two or more chromosomes of the population as parents to reproduce one or
more offsprings through choosing genes from either the chosen parents or from a
combination of both parents.
The duty of a crossover involves sharing information between individuals, where
the features of two parent chromosomes are combined to reproduce two children
to generate better children. The crossover probability is a parameter to show how
often the crossover will be performed [80].
Summary of the process of crossover operator:
@seismicisolation
@seismicisolation
3.1 Genetic Algorithm 33
@seismicisolation
@seismicisolation
34 3 Genetic Algorithm and Artificial Neural Network
is called mask, and provides uniformity through swapping genes in parents. The
value of children are duplicated from parents as per bits of their mask. When
there is 1 in the crossover mask, the gene is copied from the first parent, and
when there is 0 in the mask, the gene is copied from the second parent. Fig. 3.11
shows an example of uniform crossover method.
where αi is uniform random number in the range of [0, 1], yi and xi are new children
and selected parents, respectively.
Arithmetic crossover method is used in vibration isolation topic to generate
new parents. As explained, two parents will be selected randomly from the initial
population to apply crossover. Chromosomes 1 and 4 are selected randomly to apply
the crossover. Before applying crossover, a random value for αi should be generated
in the interval of [0, 1], which is equal to αi = 0.3 for this example. Substituting
the value of αi in the Eq. 3.1 results in Eq. 3.2
Y1 = (0.3)x1 + (1 − 0.3)x2
(3.2)
Y2 = (0.3)x2 + (1 − 0.3x1
@seismicisolation
@seismicisolation
3.1 Genetic Algorithm 35
Now crossover should be applied based on the Eq. 3.2 to the selected parents from
the initial population. Fig. 3.12 illustrates the selected parents and newly generated
children after applying crossover.
Figure 3.12 Selected parents and new generated children after applying crossover
Mutation
The next step after the crossover is preventing the algorithm to be trapped in a local
minimum. This duty is performed by a mutation operator. A mutation operator is
insurance for randomly distributing genetic information. If a crossover is considered
to perform exploitation in the current solution to find a better one, mutation is
viewed as an operator to assist in the exploration of the whole search space. A
mutation introduces new structures in the population by randomly changing some
of its building blocks and assists in escaping from local minimum traps.
In addition, it tries to maintain genetic algorithm diversity in the new popula-
tion. Mutation of variables means adding randomly created values to the variables
with a mutation probability (Pm ). The mutation probability decides how often a
chromosome will be mutated. There are many different kinds of mutation opera-
tors for different forms of representation including binary and floating (real value)
representations. Different kinds of mutation operators are include:
@seismicisolation
@seismicisolation
36 3 Genetic Algorithm and Artificial Neural Network
@seismicisolation
@seismicisolation
3.1 Genetic Algorithm 37
Return to vibration isolation example in Fig. 3.12, there are two selected parents
(from the initial population) and two new children (after applying crossover). All of
these four chromosomes are transferred to a mating pool in order to be selected to
apply mutation. One chromosome from the mating pool will be selected randomly
to apply mutation. Fig. 3.17 shows the mating pool, the selected chromosome for
applying mutation and the mutated child.
Figure 3.17 Mating pool, selected children for applying mutation and the mutated children
The mutated child will be substituted by the selected chromosome in the mating
pool. Fig. 3.18 demonstrates the new mating pool with new children.
@seismicisolation
@seismicisolation
38 3 Genetic Algorithm and Artificial Neural Network
3.1.3 Replacement
The last step of an algorithm after recombination is replacement, which is the process
of selecting chromosomes from a source of population and substituting them to from
a new offspring for a new population. There is a possibility for the optimum solution
to be lost after recombination with crossover and mutation since the selection process
of a chromosome is completely random. When two new children are produced
from two parents of a fixed population, the main problem is which of these newly
generated children or parents are allowed to move forward to the next generation,
so the two must be replaced.
There are two kinds of replacement methods for creating a new population includ-
ing steady-state and elitism replacement. In a steady-state method, a small fraction
of the population is replaced in all iterations. This method involves inserting new
chromosomes in the population as soon as they are produced. Elitisms replacement
is an almost complete replacement except for the best members of each generation
which are carried over to the next generation without modification. This method in-
creases the efficiency of the algorithm since it prevents a loss of the best solutions.
Elitism replacement method is selected to generate new population in vibration
isolation topic. Fig. 3.19 represents the initial population and the newly generated
population.
@seismicisolation
@seismicisolation
3.3 Artificial Neural Network 39
It can be seen from the figure that the parameters with low efficiencies are sub-
stituted with new parameters, which results in better efficiency.
Since modern computers are becoming more powerful, researchers try to use ma-
chines to perform calculations of complicated models. Artificial Intelligence (AI) is
known as the process of simulating human intelligence in machines for thinking like
humans and mimic their actions [105]. Artificial intelligence is able to interpret and
learn external data and use those learnings to reach specific aims through flexible
@seismicisolation
@seismicisolation
40 3 Genetic Algorithm and Artificial Neural Network
adaptation. Deep learning is a subset of Machine Learning (ML) that has a network
for learning from data. Fig. 3.20 illustrates the application of all three algorithms
[72].
Machine learning is a technique to figure out a model from data. After developing
the model, it will be applied to real field data. Fig. 3.21 shows the process in which the
vertical row indicates the learning process, while the horizontal row demonstrates
the trained model [72].
The machine learning technique is categorized into three different groups based
on the training model including [134]
@seismicisolation
@seismicisolation
3.3 Artificial Neural Network 41
• Supervised learning
• Unsupervised learning
• Reinforcement learning
There is input and ground truth for each training dataset in supervised learning.
The main duty of the supervised learning technique is to produce a correct output
from the input using training data. Conversely, the unsupervised learning technique
contains inputs without ground truth.
Classification and regression are two types of application of the supervised learn-
ing technique. Classification is the problem of identifying the classes to which the
data belong. In contrast, regression predicts a value. Vibration isolation problem is
categorized as a regression problem [111].
Machine learning models can be used for different tasks including:
@seismicisolation
@seismicisolation
42 3 Genetic Algorithm and Artificial Neural Network
3.3.1 Neurons
The principal idea behind a neural network is based on imitating the structure of
neurons and the cells in the brain for performing some tasks like recognizing patterns
and making decisions.
Fig. 3.22 shows a biological neuron, which is the fundamental unit of the brain
and the nervous system. The cells responsible for receiving input from the external
world via dendrites, process it through a function and give the output through axons.
A neuron does not have any storage for saving data; it just transmits signals from
one neuron to another [49].
Table 3.3 Analogy between the human brain and artificial neural network
Biological Neuron Artificial Neuron
Cell Node
Dendrites Input
Synapse Weights
Axon Output
@seismicisolation
@seismicisolation
3.3 Artificial Neural Network 43
Input 1 X1 ERROR
w1
Input 2 X2
w2 Y Output
wn
Input n Xn Bias
Nodes X 1 , X 2 , and X 3 which are called features, are the input, and w1 , w2 and
wn are the weights for the corresponding features. Weights show the strength of the
features. Each feature is multiplied by a connection weight and pulsed through a
summation function. Then a bias, which is a constant value, is added to the weighted
sum for shifting the result of an activation function.
The summation is passed through the activation function. An activation function
introduces non-linearity to the neural network through converting the weighted sum
of features into the output signal.
The activation function, which is attached to each neuron in the network, deter-
mines the output of a neural network. In addition, it helps to normalize the output of
each neuron to a range of [0, 1] or [−1, 1]. A neural network without an activation
function is a linear regression model. There are two different kinds of activation func-
tions including linear and non-linear functions. The non-linear activation functions
are the most applicable in neural networks. The most popular non-linear activation
functions include:
@seismicisolation
@seismicisolation
44 3 Genetic Algorithm and Artificial Neural Network
ReLU
10
sigmoid R(z) = max(0, z)
1
1
σ(z) = 1+e−z
8
0.8
6
0.6
4
0.4
0.2
2
0 0
−10 −5 0 5 10 −10 −5 0 5 10
(a) (b)
y T anh
1
1 − e−2x
φ(z) =
0 1 + e− 2x
−1 x
−4 −2 0 2 4
(c)
@seismicisolation
@seismicisolation
3.4 Feedforward Networks 45
In the last step, information will be transferred to output. If the predicted output
with ANN is equal to actual output which is called label, the algorithm will be
finished; otherwise, there will be an error and it returns to back to neurons for
adjusting the weights and bias and this process will continue until the error is
minimized.
@seismicisolation
@seismicisolation
46 3 Genetic Algorithm and Artificial Neural Network
3.4.1 Backpropagation
• Gradient Descent
• Newton Method
• Gauss-Newton Method
• Levenberg-Marquardt Algorithm
@seismicisolation
@seismicisolation
3.4 Feedforward Networks 47
3. Returning weights
where η is the learning rate, which is one of the most important parameters in
the gradient descent technique. The learning rate determines the speed of a neural
network training process. A small value of learning rate leads to an optimal set
of weights but it may take a long time. On the other hand, a large learning rate
results in training the model faster, we may face the risk of missing the optimal
weight. Step decay is a proper method for finding the optimal learning rate, in
which the learning rate is reduced by some percentages after a set of training epochs.
The Newton’s method is a second-order algorithm, in which the Hessian matrix is
used instead of the Jacobian Matrix. The goal of this algorithm is to find better
training directions through using the second derivatives of an error function. The
conjugate gradient algorithm can be regarded a method between Newton’s and
gradient descent procedures. The search process in this method is performed along
a conjugate direction which can result in faster convergence than with gradient
descent directions.
The Gauss-Newton method, which is used to solve nonlinear least square prob-
lems, is a developed model of the Newton method. The assumption in this method is
that the objective function in the parameters near the optimal solution is quadratic.
The Gauss-Newton method usually converges a medium-sized problem much faster
than the gradient descent method.
The Levenberg-Marquardt algorithm is an iterative technique, which is com-
monly used to solve non-linear Least-squares problems. The LM method can be
defined either as a combination of the gradient descent optimization method when
the parameters are far from their optimal values or the Gauss-Newton optimization
procedures when the parameters are close to their optimal value.
The size of a database plays a significant role in training a neural network. Large
database results in a more accurate model but requires a lot of computational time.
On the other hand, too small database results in a less accurate model and requires
@seismicisolation
@seismicisolation
48 3 Genetic Algorithm and Artificial Neural Network
less computational time. Therefore, the database should be big enough to result in
an accurate model and optimal computational time, too.
When the input data in the database have different ranges, the database needs to
be normalized. Normalization is applied to a neural network in order to produce a set
of data values within the same magnitude. When the features are different in terms
of magnitude, the fluctuations of some parameters with bigger ranges may decrease
the influence of the parameters with smaller ranges. Nevertheless, the features with
smaller ranges may be more important in predicting the desired output. Therefore,
all the data should be normalized to have the same range to ignore the influence of
different ranges in ANN. All features are normalized to be in the range of [−1, 1]
through Eq. (7.5).
X j − min X j
(X j )n = 2 −1 (3.3)
max X j − min X j
where X j is the feature, (X j )n is the scaled input feature, min X j and max X j are
the lower and upper limits of input features, respectively.
Over the last few years, ANNs have been widely applied in several areas of geotech-
nical engineering problems and have demonstrated some degree of success [122].
The method is able and well-suited to model complex problems where the relation-
ship between model variables is unknown. ANNs have been used successfully in
pile capacity prediction [33], site characterization [120], earth retaining structures
[48], estimation of the bearing capacity of shallow foundations and settlement pre-
diction [151], slope stability [24], design of tunnels and underground openings [77],
liquefaction during earthquakes [94] and soil compaction and permeability [123].
Jayawardana et al. investigate the use of artificial neural networks (ANN) as
a smart and efficient tool to predict the effectiveness of geofoam-filled trenches
to mitigate ground vibrations. They used multi-layer feedforward network with a
backpropagation algorithm with two hidden layers [67].
A comprehensive parametric study has been performed by [8] and the results
were used to train an ANN model for predicting the efficiency of geofoam-filled
trenches. The developed model exhibited a good generalization capacity beyond the
training stage as validated by new finite element results within the range of training
database.
In addition, soil-ground vibrations induced by moving trains were predicted
based on the artificial neural network model [43]. The result states that the predicted
@seismicisolation
@seismicisolation
3.6 Response Surface Methodology 49
method can control the maximum error below 6.41% and the average error below
2.29% when it is used to predict acceleration vibration levels.
Hung et al. focused on using multiple neural networks to estimate the screening
effect of surface waves by in-filled trenches [129]. Three artificial neural networks, a
backpropagation network (BPN), a generalized regression neural network (GRNN),
and a radial base function network (RBF) were used to evaluate the performance of
a chosen physical model.
The neural network tool was used to analyze the parametric effects of vibrations
versus the surface layer’s depth [40]. Important conclusions were derived from the
analysis regarding the mechanical and geometrical properties of multiple layers and
their varying effects with regard to the distance from the source.
Evaluating the effects of multiple factors and their interactions on one or more
response variables is a challenge for researchers in many fields [71]. In response
surface problems, Y is the response variable of interest and X 1 , X 2 , ... X n are
regarded as a set of predictors. For instance, in a vibration isolation system, Y is
recognized as the efficiency of a trench and X s are governing factors like depth,
location, etc.
In some systems, the nature of the relationship between Y and X s may be known
exactly based on a linear function. Then, Eq. (3.4), which is called mechanistic
model can be suggested for fitting the system.
Y = f (X 1 , X 2 , ..., X n ) + (3.5)
where the function f is usually a non-linear polynomial, typically between the first
or fourth-order polynomial. This empirical model is called a response surface model.
Response Surface Methodology (RSM) is a statistical method for investigating
the interaction and relationship between the independent variables with different re-
sponses [3]. RSM uses quantitative data from appropriate experiments to determine
@seismicisolation
@seismicisolation
50 3 Genetic Algorithm and Artificial Neural Network
@seismicisolation
@seismicisolation