Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views11 pages

Machine Learning (Unit-5) Machine Learning (Unit-5) : Scan To Open On Studocu Scan To Open On Studocu

The document discusses various machine learning concepts, focusing on multilayer perceptron networks and the error backpropagation algorithm, which aims to optimize neural network weights for accurate output. It also covers radial basis function networks, decision tree learning, and algorithms like ID3 and C4.5, detailing their structures, advantages, and applications in classification and regression tasks. Additionally, it highlights the importance of measures like entropy and Gini index in decision tree algorithms for effective attribute selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

Machine Learning (Unit-5) Machine Learning (Unit-5) : Scan To Open On Studocu Scan To Open On Studocu

The document discusses various machine learning concepts, focusing on multilayer perceptron networks and the error backpropagation algorithm, which aims to optimize neural network weights for accurate output. It also covers radial basis function networks, decision tree learning, and algorithms like ID3 and C4.5, detailing their structures, advantages, and applications in classification and regression tasks. Additionally, it highlights the importance of measures like entropy and Gini index in decision tree algorithms for effective attribute selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

lOMoARcPSD|43768427

Machine Learning(Unit-5)

Computer science and Engineering (Devineni Venkata Ramana & Dr. Hima Sekhar Mic
College of Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Parise Veeraswamy ([email protected])
lOMoARcPSD|43768427

Unit V
Multilayer Perceptron Networks and error back propagation algorithm:
The goal of the training process is to find the set of weight values that will cause the
output from the neural network to match the actual target values as closely as possible.
There are several issues involved in designing and training a multilayer perceptron
network:

 Selecting how many hidden layers to use in the network.


 Deciding how many neurons to use in each hidden layer.
 Finding a globally optimal solution that avoids local minima.
 Converging to an optimal solution in a reasonable period of time.
 Validating the neural network to test for over fitting.

The Error Backward Propagation or back propagation is a common method of training


artificial neural networks and used in conjunction with an optimization method such as
gradient descent. The algorithm repeats a two phase cycle, propagation and weight update.
When an input vector is presented to the network, it is propagated forward through the
network, layer by layer, until it reaches the output layer. The output of the network is
then compared to the desired output, using a loss function, and an error value is calculated
for each of the neurons in the output layer. The error values are then propagated
backwards, starting from the output, until each neuron has an associated error value which
roughly represents its contribution to the original output.

Back propagation uses these error values to calculate the gradient of the loss function with
respect to the weights in the network. In the second phase, this gradient is fed to the
optimization method, which in turn uses it to update the weights, in an attempt to
minimize the loss function. The importance of this process is that, as the network is
trained, the neurons in the intermediate layers organize themselves in such a way that the
different neurons learn to recognize different characteristics of the total input space.

Steps of the algorithm: After choosing the weights of the network randomly, the back
propagation algorithm is used to compute the necessary corrections. The algorithm can be
decomposed in the following four steps:
i) Feed-forward computation
ii) Backpropagation to the output layer
iii) Backpropagation to the hidden layer
iv) Weight updates

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

The algorithm is stopped when the value of the error function has become
sufficiently small.

Radial Basis Functions Networks:


Radial basis function (RBF) networks are a commonly used type of artificial neural
network for function approximation problems. Radial basis function networks are
distinguished from other neural networks due to their universal approximation and faster
learning speed. An RBF network is a type of feed forward neural network composed of
three layers, namely the input layer, the hidden layer and the output layer. Each of these
layers has different tasks. Description about the RBF network is presented as follows.
The training of the RBF model is terminated once the calculated error reached the desired
values (i.e. 0.01) or number of training iterations (i.e. 500) already was completed.

An RBF network with a specific number of nodes (i.e. 10) in its hidden layer is chosen.
A Gaussian function is used as the transfer function in computational units. Depending on
the case, it is typically observed that the RBF network required less time to reach the end
of training compared to MLP. The agreement between the model predictions and the
experimental observations will be investigated and the results of the two models will be
compared. The final model is then chosen based on the least computed error.

RBF networks have many applications like function approximation, interpolation,


classification and time series prediction. All these applications serve various industrial
interests like stock price prediction, anomaly detection in data, fraud detection in financial
transaction etc.

Architecture of RBF : RBF network is an artificial neural network with an input layer, a
hidden layer, and an output layer. The Hidden layer of RBF consists of hidden neurons,
and activation function of these neurons is a Gaussian function. Hidden layer generates a
signal corresponding to an input vector in the input layer, and corresponding to this signal,
network generates a response.

In RBF architecture, weights connecting input vector to hidden neurons represents the
center of the corresponding neuron. These weights are predetermined in such a way that
entire space is covered by the receptive field of these neurons, whereas values of weights
connecting hidden neuron to output neurons are determined to train the network.

Radial basis function networks are a means of approximation by algorithms using linear
combinations of translates of a rotationally invariant function, called the radial basis
function. The coefficients of these approximations usually solve a minimization problem
and can also be computed by interpolation processes. The radial basis functions
constitute the so-called reproducing kernels on certain Hilbert-spaces or – in a slightly

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

more general setting – semi-Hilbert spaces. In the latter case, the aforementioned
approximation also contains an element from the nullspace of the semi-norm of the
semi-Hilbert space. That is usually a polynomial space.

Decision Tree Learning :


 Introduction : Decision tree algorithm falls under the category of supervised learning.
They can be used to solve both regression and classification problems.
 Decision tree uses the tree representation to solve the problem in which each leaf
node corresponds to a class label and attributes are represented on the internal node
of the tree.
 We can represent any boolean function on discrete attributes using the decision tree.

In Decision Tree the major challenge is to identification of the attribute for the root
node in each level. This process is known as attribute selection.

A decision tree is a tree-like graph with nodes representing the place where we pick an
attribute and ask a question; edges represent the answers the to the question; and the
leaves represent the actual output or class label. They are used in non-linear decision
making with simple linear decision surface.
Decision trees classify the examples by sorting them down the tree from the root to some
leaf node, with the leaf node providing the classification to the example. Each node in the
tree acts as a test case for some attribute, and each edge descending from that node
corresponds to one of the possible answers to the test case. This process is recursive in
nature and is repeated for every subtree rooted at the new nodes.
Example of classification decision tree: Decision Trees are a type of Supervised
Machine Learning (that is you explain what the input is and what the corresponding
output is in the training data) where the data is continuously split according to a certain
parameter. The tree can be explained by two entities, namely decision nodes and leaves.
The leaves are the decisions or the final outcomes. And the decision nodes are where the
data is split.
There are two main types of Decision Trees:

1. Classification trees (Yes/No types)

What we’ve seen above is an example of classification tree, where the outcome was a
variable like ‘fit’ or ‘unfit’. Here the decision variable is Categorical.

2. Regression trees (Continuous data types)

Here the decision or the outcome variable is Continuous, e.g. a number like 123.

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

Working
Now that we know what a Decision Tree is, we’ll see how it works internally. There are
many algorithms out there which construct Decision Trees, but one of the best is called
as ID3 Algorithm. ID3 Stands for Iterative Dichotomiser 3.
Entropy
Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the
measure of the amount of uncertainty or randomness in data.

Information Gain
Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a
set S is the effective change in entropy after deciding on a particular attribute A. It
measures the relative change in entropy with respect to the independent variables.

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in
the CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high
Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2


Measures of impurity for evaluating splits in decision trees: An impurity measure is a
heuristic for selection of the splitting criterion that best separates a given dataset D of
class-labeled training tuples into individual classes. If we were to split D into smaller
partitions according to the outcomes of the splitting criterion, ideally each partition
would be pure (i.e., all of the tuples that fallinto a given partition would belong to the
same class). Conceptually, the “best” splitting criterion is the one that most closely
results in such a scenario. Attribute selection measures are also known as splitting rules
because they determine how the tuples at a given node are to be split. The attribute
selection measure provides a ranking for each attribute describing the given training
tuples. The attribute having the best score for the measure4 is chosen as the splitting

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

attribute for the given tuples. If the splitting attribute is continuous-valued or if we are
restricted to binary trees then, respectively, either a split point or a splitting subset must
also be determined as part of the splitting criterion. The tree node created for partition D
is labeled with the splitting criterion, branches are grown for each outcome of the
criterion, and the tuples are partitioned accordingly.
Three popular attribute selection measures—information gain, gain ratio, and gini index
Information gain ID3 uses information gain as its attribute selection measure.
Let node N represent or hold the tuples of partition D. The attribute with the highest
information gain is chosen as the splitting attribute for node N. This attribute minimizes
the information needed to classify the tuples in the resulting partitions and reflects the
least randomness or “impurity” in these partitions. Such an approach minimizes the
expected number of tests needed to classify a given tuple and guarantees that a simple
tree is found. The expected information needed to classify a tuple in D is given by

Gain ratio The information gain measure is biased toward tests with many outcomes.
That is, it prefers to select attributes having alarge number of values.

The attribute with the maximum gain ratio is selected as the splitting attribute. Note,
however, that as the split information approaches 0, the ratio becomes unstable.

Gini index The Giniindexis usedin CART. Using the notation described above, the
Giniindex measures the impurity of D, a data partition or set of training tuples, as

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

ID3: ID3 (Iterative Dichotomiser 3) is an algorithm used to generate a decision tree


invented by Ross Quinlan. ID3 is the precursor to the C4.5 algorithm. Very simply, ID3
builds a decision tree from a fixed set of examples. The resulting tree is used to classify
future samples. The examples of the given ExampleSet have several attributes and every
example belongs to a class (like yes or no). The leaf nodes of the decision tree contain
the class name whereas a non-leaf node is a decision node. The decision node is an
attribute test with each branch (to another decision tree) being a possible value of the
attribute. ID3 uses feature selection heuristic to help it decide which attribute goes into a
decision node. The required heuristic can be selected by the criterion parameter.
The ID3 algorithm can be summarized as follows: Take all unused attributes and calculate
their selection criterion Choose the attribute for which the selection criterion has the best
value Make node containing that attribute

ID3 searches through the attributes of the training instances and extracts the attribute that
best separates the given examples. If the attribute perfectly classifies the training sets then
ID3 stops; otherwise it recursively operates on the n (where n = number of possible values
of an attribute) partitioned subsets to get their best attribute. The algorithm uses a greedy
search, meaning it picks the best attribute and never looks back to reconsider earlier
choices.

Some major benefits of ID3 are:

 Understandable prediction rules are created from the training data.


 Builds a short tree in relatively small time.
 It only needs to test enough attributes until all data is classified.
 Finding leaf nodes enables test data to be pruned, reducing the number of tests.
ID3 may have some disadvantages in some cases e.g.

 Data may be over-fitted or over-classified, if a small sample is tested.


 Only one attribute at a time is tested for making a decision.

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is
an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can
be used for classification, and for this reason, C4.5 is often referred to as a statistical
classifier. This algorithm uses gain radio for feature selection and to construct the decision
tree. It handles both continuous and discrete features. C4.5 algorithm is widely used
because of its quick classification and high precision.

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

Advantages of C4.5 over other Decision Tree systems:

1. The algorithm inherently employs Single Pass Pruning Process to Mitigate over fitting.
2. It can work with both Discrete and Continuous Data.
3. C4.5 can handle the issue of incomplete data very well.

C4.5 is the successor to ID3 and removed the restriction that features must be categorical
by dynamically defining a discrete attribute (based on numerical variables) that partitions
the continuous attribute value into a discrete set of intervals. C4.5 converts the trained trees
(i.e. the output of the ID3 algorithm) into sets of if-then rules. This accuracy of each rule is
then evaluated to determine the order in which they should be applied. Pruning is done by
removing a rule’s precondition if the accuracy of the rule improves without it.
C4.5 starts with large sets of cases belonging to known classes. The cases, described by
any mixture of nominal and numeric properties, are scrutinized for patterns that allow the
classes to be reliably discriminated. These patterns are then expressed as models, in the
form of decision trees or sets of if-then rules, that can be used to classify new cases, with
emphasis on making the models understandable as well as accurate.
CART decision trees: The Classification and Regression Tree methodology, also known
as the CART was introduced in 1984 by Leo.
The CART or Classification & Regression Trees methodology refers to these two types of
decision trees.
While there are many classification and regression trees tutorials and classification and
regression trees ppts out there, here is a simple definition of the two kinds of decisions trees.
It also includes classification and regression trees examples.
(i) Classification Trees
A classification tree is an algorithm where the target variable is fixed or categorical. The
algorithm is then used to identify the “class” within which a target variable would most
likely fall.
(ii) Regression Trees
A regression tree refers to an algorithm where the target variable is and the algorithm is
used to predict it’s value. As an example of a regression type problem, you may want to
predict the selling prices of a residential house, which is a continuous dependent variable.
Difference Between Classification and Regression Trees
Decision trees are easily understood and there are several classification and regression trees
ppts to make things even simpler. However, it’s important to understand that there are some
fundamental differences between classification and regression trees.
Classification trees are used when the dataset needs to be split into classes which belong to
the response variable.

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

Regression trees, on the other hand, are used when the response variable is continuous.

CART Working

Pruning the tree: Pruning means simplifying and optimizing a Decision tree by
removing sections of the tree that are uncritical and redundant to classify instances. The
idea of pruning originally stems from an attempt to prevent so-called over fitting in trees
that were created through Machine learning. Over fitting describes the undesired induction
of noise in a tree.
Pruning processes can be divided into two types (pre- and post-pruning).
Pre-pruning procedures prevent a complete induction of the training set by replacing a
stop () criterion in the induction algorithm (e.g. max. Tree depth or information gain
(Attr)> minGain). Pre-pruning methods are considered to be more efficient because they
do not induce an entire set, but rather trees remain small from the start. Prepruning
methods share a common problem, the horizon effect.
Post-pruning is the most common way of simplifying trees. Here, nodes and subtrees are
replaced with leaves to improve complexity. Pruning can not only significantly reduce the

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

size but also improve the classification accuracy of unseen objects. It may be the case that
the accuracy of the assignment on the test set deteriorates, but the accuracy of the
classification properties of the tree increases overall.

Bottom-up pruning
These procedures start at the last node in the tree (the lowest point). Following recursively
upwards, they determine the relevance of each individual node. If the relevance for the
classification is not given, the node is dropped or replaced by a leaf. The advantage is that
no relevant sub-trees can be lost with this method. These methods include Reduced Error
Pruning (REP), Minimum Cost Complexity Pruning (MCCP), or Minimum Error Pruning
(MEP).
Top-Down Pruning
In contrast to the bottom-up method, this method starts at the root of the tree. Following
the structure below, a relevance check is carried out which decides whether a node is
relevant for the classification of all n items or not. By pruning the tree at an inner node, it
can happen that an entire sub-tree (regardless of its relevance) is dropped. One of these
representatives is pessimistic error pruning (PEP), which brings quite good results with
unseen items.
Reduced error pruning
One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each
node is replaced with its most popular class. If the prediction accuracy is not affected then
the change is kept. While somewhat naive, reduced error pruning has the advantage
of simplicity and speed
Strengths and weakness of decision tree approach :

Decision Tree is used to solve both classification and regression problems. But the main
drawback of Decision Tree is that it generally leads to overfitting of the data. Lets discuss
its advantages and disadvantages in detail.

Strengths of Decision Tree:

1. Clear Visualization: The algorithm is simple to understand, interpret and visualize as


the idea is mostly used in our daily lives. Output of a Decision Tree can be easily
interpreted by humans.

2. Simple and easy to understand: Decision Tree looks like simple if-else
statements which are very easy to understand.

3. Decision Tree can be used for both classification and regression problems.

Downloaded by Parise Veeraswamy ([email protected])


lOMoARcPSD|43768427

4. Decision Tree can handle both continuous and categorical variables.

5. No feature scaling required: No feature scaling (standardization and normalization)


required in case of Decision Tree as it uses rule based approach instead of distance
calculation.

6. Handles non-linear parameters efficiently: Non linear parameters don't affect the
performance of a Decision Tree unlike curve based algorithms. So, if there is high non-
linearity between the independent variables, Decision Trees may outperform as compared
to other curve based algorithms.

7. Decision Tree can automatically handle missing values.

8. Decision Tree is usually robust to outliers and can handle them automatically.

9. Less Training Period: Training period is less as compared to Random Forest because
it generates only one tree unlike forest of trees in the Random Forest.

Weakness of Decision Tree

1. Overfitting: This is the main problem of the Decision Tree. It generally leads to
overfitting of the data which ultimately leads to wrong predictions. In order to fit the data
(even noisy data), it keeps generating new nodes and ultimately the tree becomes too
complex to interpret. In this way, it loses its generalization capabilities. It performs very
well on the trained data but starts making a lot of mistakes on the unseen data.

2. High variance: As mentioned in point 1, Decision Tree generally leads to the


overfitting of data. Due to the overfitting, there are very high chances of high variance in
the output which leads to many errors in the final estimation and shows high inaccuracy in
the results. In order to achieve zero bias (overfitting), it leads to high variance.

3. Unstable: Adding a new data point can lead to re-generation of the overall tree and all
nodes need to be recalculated and recreated.

4. Affected by noise: Little bit of noise can make it unstable which leads to wrong
predictions.

5. Not suitable for large datasets: If data size is large, then one single tree may grow
complex and lead to overfitting. So in this case, we should use Random Forest instead of a
single Decision Tree.

Downloaded by Parise Veeraswamy ([email protected])

You might also like