0% found this document useful (0 votes)

79 views25 pages

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

The document discusses artificial neural networks (ANN) and deep learning. It introduces the basic idea of ANN as networks of simple processing units that can learn complex nonlinear functions. The simplest ANN is the perceptron, which learns linear decision boundaries. Multi-layer neural networks can learn nonlinear functions using techniques like backpropagation to calculate gradients and update weights. Recent trends in deep learning allow training very deep neural networks with many layers to learn complex hierarchical features from data.

Uploaded by

Ishwar Mht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views25 pages

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

Uploaded by

Ishwar Mht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Data Mining

Lecture Notes for Chapter 4

Artificial Neural Networks

Introduction to Data Mining , 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

10/12/2020 Introduction to Data Mining, 2nd Edition 1

Artificial Neural Networks (ANN)

Basic Idea: A complex non-linear function can be

learned as a composition of simple processing units
ANN is a collection of simple processing units
(nodes) that are connected by directed links (edges)
– Every node receives signals from incoming edges,
performs computations, and transmits signals to
outgoing edges
– Analogous to human brain where nodes are neurons
and signals are electrical impulses
– Weight of an edge determines the strength of
connection between the nodes
– Simplest ANN: Perceptron (single neuron)
10/12/2020 Introduction to Data Mining, 2nd Edition 2
Basic Architecture of Perceptron

Activation Function

Learns linear decision boundaries

Similar to logistic regression (activation function is sign
instead of sigmoid)
10/12/2020 Introduction to Data Mining, 2nd Edition 3
Perceptron Example

X1 X2 X3 Y
1 0 0 -1
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 -1
0 1 0 -1
0 1 1 1
0 0 0 -1

Output Y is 1 if at least two of the three inputs are equal to 1.

10/12/2020 Introduction to Data Mining, 2nd Edition 4

Perceptron Example

X1 X2 X3 Y
1 0 0 -1
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 -1
0 1 0 -1
0 1 1 1
0 0 0 -1

Y  sign ( 0 . 3 X 1  0 . 3 X 2  0 . 3 X 3  0 . 4 )
 1 if x  0
where sign ( x )  
 1 if x  0
10/12/2020 Introduction to Data Mining, 2nd Edition 5
Perceptron Learning Rule

Initialize the weights (w0, w1, …, wd)

Repeat
– For each training example (xi, yi)
 Compute
 Update the weights:

Until stopping condition is met

k: iteration number; : learning rate

10/12/2020 Introduction to Data Mining, 2nd Edition 6

Perceptron Learning Rule

Weight update formula:

Intuition:
– Update weight based on error: e =
– If y = , e=0: no update needed
– If y > , e=2: weight must be increased so
that will increase
– If y < , e=-2: weight must be decreased so
that will decrease
10/12/2020 Introduction to Data Mining, 2nd Edition 7
Example of Perceptron Learning

  0.1
X 1 X2 X3 Y w0 w1 w2 w3 Epoch w0 w1 w2 w3
1 0 0 -1 0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 -0.2 -0.2 0 0 1 -0.2 0 0.2 0.2
2 0 0 0 0.2 2 -0.2 0 0.4 0.2
1 1 0 1
3 0 0 0 0.2
1 1 1 1 3 -0.4 0 0.4 0.2
4 0 0 0 0.2
0 0 1 -1 5 -0.2 0 0 0 4 -0.4 0.2 0.4 0.4
0 1 0 -1 6 -0.2 0 0 0 5 -0.6 0.2 0.4 0.2
0 1 1 1 7 0 0 0.2 0.2 6 -0.6 0.4 0.4 0.2
0 0 0 -1 8 -0.2 0 0.2 0.2
Weight updates over
Weight updates over first epoch all epochs

10/12/2020 Introduction to Data Mining, 2nd Edition 8

Perceptron Learning

Since y is a linear
combination of input
variables, decision
boundary is linear

10/12/2020 Introduction to Data Mining, 2nd Edition 9

Perceptron Learning

Since y is a linear
combination of input
variables, decision
boundary is linear

For nonlinearly separable problems, perceptron

learning algorithm will fail because no linear
hyperplane can separate the data perfectly

10/12/2020 Introduction to Data Mining, 2nd Edition 10

Nonlinearly Separable Data

XOR Data

y  x1  x2
x1 x2 y
0 0 -1
1 0 1
0 1 1
1 1 -1

10/12/2020 Introduction to Data Mining, 2nd Edition 11

Multi-layer Neural Network

More than one hidden layer of

computing nodes

Every node in a hidden layer

operates on activations from
preceding layer and transmits
activations forward to nodes of
next layer

Also referred to as
“feedforward neural networks”

10/12/2020 Introduction to Data Mining, 2nd Edition 12

Multi-layer Neural Network

Multi-layer neural networks with at least one

hidden layer can solve any type of classification
task involving nonlinear decision surfaces
XOR Data

10/12/2020 Introduction to Data Mining, 2nd Edition 13

Why Multiple Hidden Layers?

Activations at hidden layers can be viewed as features

extracted as functions of inputs
Every hidden layer represents a level of abstraction
– Complex features are compositions of simpler features

Number of layers is known as depth of ANN

– Deeper networks express complex hierarchy of features

10/12/2020 Introduction to Data Mining, 2nd Edition 14

Multi-Layer Network Architecture

�
�

Activation value Activation

at node i at layer l Function Linear Predictor

10/12/2020 Introduction to Data Mining, 2nd Edition 15

Activation Functions

10/12/2020 Introduction to Data Mining, 2nd Edition 16

Learning Multi-layer Neural Network

Can we apply perceptron learning rule to each

node, including hidden nodes?
– Perceptron learning rule computes error term
e = y - and updates weights accordingly
 Problem: how to determine the true value of y for
hidden nodes?
– Approximate error in hidden nodes by error in
the output nodes
 Problem:
– Not clear how adjustment in the hidden nodes affect overall
error
– No guarantee of convergence to optimal solution

10/12/2020 Introduction to Data Mining, 2nd Edition 17

Gradient Descent

Loss Function to measure errors across all training points

Squared Loss:

Gradient descent: Update parameters in the direction of

“maximum descent” in the loss function across all points

: learning rate

Stochastic gradient descent (SGD): update the weight for every

instance (minibatch SGD: update over min-batches of instances)

10/12/2020 Introduction to Data Mining, 2nd Edition 18

Computing Gradients
=

Using chain rule of differentiation (on a single instance):

For sigmoid activation function:

How can we compute for every layer?

10/12/2020 Introduction to Data Mining, 2nd Edition 19
Backpropagation Algorithm

At output layer L:

At a hidden layer (using chain rule):

– Gradients at layer l can be computed using gradients at layer l + 1

– Start from layer L and “backpropagate” gradients to all previous
layers
Use gradient descent to update weights at every epoch
For next epoch, use updated weights to compute loss fn. and its gradient
Iterate until convergence (loss does not change)
10/12/2020 Introduction to Data Mining, 2nd Edition 20
Design Issues in ANN

Number of nodes in input layer

– One input node per binary/continuous attribute
– k or log2 k nodes for each categorical attribute with k
values
Number of nodes in output layer
– One output for binary class problem
– k or log2 k nodes for k-class problem
Number of hidden layers and nodes per layer
Initial weights and biases
Learning rate, max. number of epochs, mini-batch size for
mini-batch SGD, …

10/12/2020 Introduction to Data Mining, 2nd Edition 21

Characteristics of ANN

Multilayer ANN are universal approximators but could

suffer from overfitting if the network is too large
Gradient descent may converge to local minimum
Model building can be very time consuming, but testing
can be very fast
Can handle redundant and irrelevant attributes because
weights are automatically learnt for all attributes
Sensitive to noise in training data
Difficult to handle missing attributes

10/12/2020 Introduction to Data Mining, 2nd Edition 22

Deep Learning Trends

Training deep neural networks (more than 5-10 layers)

could only be possible in recent times with:
– Faster computing resources (GPU)
– Larger labeled training sets
– Algorithmic Improvements in Deep Learning
Recent Trends:
– Specialized ANN Architectures:
Convolutional Neural Networks (for image data)
Recurrent Neural Networks (for sequence data)
Residual Networks (with skip connections)
– Unsupervised Models: Autoencoders
– Generative Models: Generative Adversarial Networks
10/12/2020 Introduction to Data Mining, 2nd Edition 23
Vanishing Gradient Problem

Sigmoid activation function easily saturates (show zero gradient

with z) when z is too large or too small
Lead to small (or zero) gradients of squared loss with weights,
especially at hidden layers, leading to slow (or no) learning

10/12/2020 Introduction to Data Mining, 2nd Edition 24

Handling Vanishing Gradient Problem

Use of Cross-entropy loss function

Use of Rectified Linear Unit (ReLU) Activations:

10/12/2020 Introduction to Data Mining, 2nd Edition 25

03-ANN English Lecture
No ratings yet
03-ANN English Lecture
20 pages
(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
100% (8)
(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
51 pages
Neural Networks in Data Mining
No ratings yet
Neural Networks in Data Mining
22 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
20 pages
A Seminar Report On NEURAL NETWORK PDF
No ratings yet
A Seminar Report On NEURAL NETWORK PDF
26 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Chap4 Ann
No ratings yet
Chap4 Ann
11 pages
Chap4 Ann
No ratings yet
Chap4 Ann
22 pages
Neural Network
No ratings yet
Neural Network
97 pages
IS23A Chuong 7 Hocsau-Deep Learning v1
No ratings yet
IS23A Chuong 7 Hocsau-Deep Learning v1
44 pages
Learning Processes
No ratings yet
Learning Processes
30 pages
Artificial Neural Networks: Slides Are By: Tan, Steinbach, Karpatne, Kumar
No ratings yet
Artificial Neural Networks: Slides Are By: Tan, Steinbach, Karpatne, Kumar
26 pages
Intro to Artificial Neural Networks
No ratings yet
Intro to Artificial Neural Networks
20 pages
Unit 4
No ratings yet
Unit 4
18 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
28 pages
Neural Network
No ratings yet
Neural Network
58 pages
Chapter 1
No ratings yet
Chapter 1
313 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Ccs355 Neural Networks and Deep Learning Unit1
No ratings yet
Ccs355 Neural Networks and Deep Learning Unit1
29 pages
Unit 1 Part Unknown
No ratings yet
Unit 1 Part Unknown
54 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
24 pages
Unit 1
No ratings yet
Unit 1
72 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
Unit-4 Full
No ratings yet
Unit-4 Full
48 pages
Unit 2-Handout
No ratings yet
Unit 2-Handout
5 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
Lec08 Classification KNN ANN
No ratings yet
Lec08 Classification KNN ANN
39 pages
Ai 7
No ratings yet
Ai 7
41 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
12 Advanced Machine Learning Algorithms
No ratings yet
12 Advanced Machine Learning Algorithms
41 pages
Ai 7
No ratings yet
Ai 7
41 pages
13 Nnbasics
No ratings yet
13 Nnbasics
22 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
Neural Network
No ratings yet
Neural Network
55 pages
MLDM2006S Lecture 01 Introduction
No ratings yet
MLDM2006S Lecture 01 Introduction
45 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
Unit V
No ratings yet
Unit V
42 pages
Deep Learning Tutorial for Business
No ratings yet
Deep Learning Tutorial for Business
58 pages
3) Multi-Layer Perceptron Learning in Tensorflow
No ratings yet
3) Multi-Layer Perceptron Learning in Tensorflow
7 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
38 pages
NN PDF
No ratings yet
NN PDF
23 pages
Classification of Heart Disease Dataset Using Multilayer Feed Forward Backpropogation Algorithm
No ratings yet
Classification of Heart Disease Dataset Using Multilayer Feed Forward Backpropogation Algorithm
7 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
247-Article Text-253-1-10-20150217
No ratings yet
247-Article Text-253-1-10-20150217
13 pages
CV Lec5
No ratings yet
CV Lec5
54 pages
Chap3 Sec2 Overfitting
No ratings yet
Chap3 Sec2 Overfitting
22 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Soft Computing Question Paper
No ratings yet
Soft Computing Question Paper
2 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
DM - MOD - 1 Part II
No ratings yet
DM - MOD - 1 Part II
14 pages
Applications of Neural Networks in Data Mining: M.Charles Arockiaraj
No ratings yet
Applications of Neural Networks in Data Mining: M.Charles Arockiaraj
4 pages
Supervised Learning
No ratings yet
Supervised Learning
14 pages
NN Unit 2
No ratings yet
NN Unit 2
20 pages
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
No ratings yet
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
40 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
30 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Single-Layer Perceptrons Guide
No ratings yet
Single-Layer Perceptrons Guide
11 pages
Neural Networks and Deep Learning Notes
No ratings yet
Neural Networks and Deep Learning Notes
88 pages
Neural Networks & Fuzzy Systems
No ratings yet
Neural Networks & Fuzzy Systems
24 pages
Generalized Learning To Create An Energy Efficient ZMP-Based Walking 2014
No ratings yet
Generalized Learning To Create An Energy Efficient ZMP-Based Walking 2014
12 pages
Data Mining Introduction Guide
No ratings yet
Data Mining Introduction Guide
95 pages
Final Exam ANNFL 2015-1
No ratings yet
Final Exam ANNFL 2015-1
9 pages
UNIT I Artificial Neural Networks Hightlighted
No ratings yet
UNIT I Artificial Neural Networks Hightlighted
136 pages
Analysis of Heart Diseases Dataset Using Neural Network Approach
No ratings yet
Analysis of Heart Diseases Dataset Using Neural Network Approach
8 pages
Fundamentals of Soft Computing
No ratings yet
Fundamentals of Soft Computing
256 pages
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
No ratings yet
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
40 pages
IANN - Lab Manual - GEC
No ratings yet
IANN - Lab Manual - GEC
65 pages
Neuro Fuzzy - Session 3
No ratings yet
Neuro Fuzzy - Session 3
16 pages
Advance Computer Networks: Spring 2020-21 Lect. #09
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #09
17 pages
ASC Unit I
No ratings yet
ASC Unit I
32 pages
Deep Learning UNIT 1&2
No ratings yet
Deep Learning UNIT 1&2
69 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
Mi 2
No ratings yet
Mi 2
605 pages
NNFL 3unit
No ratings yet
NNFL 3unit
10 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Improving Returns On Stock Investment Through Neural Network Selection
No ratings yet
Improving Returns On Stock Investment Through Neural Network Selection
7 pages
T.C. Ankara Yildirim Beyazit University Graduate School of Social Sciences
No ratings yet
T.C. Ankara Yildirim Beyazit University Graduate School of Social Sciences
181 pages
4 - Sinteza Perceptron
No ratings yet
4 - Sinteza Perceptron
49 pages
Multilayer Perceptron & Learning
No ratings yet
Multilayer Perceptron & Learning
27 pages
The Combination of Hebbian and Predictive Plasticity Learns Invariant Object Representations in Deep Sensory Networks
No ratings yet
The Combination of Hebbian and Predictive Plasticity Learns Invariant Object Representations in Deep Sensory Networks
29 pages
ACN - Lect 04
No ratings yet
ACN - Lect 04
16 pages
ACN - Lect 02
No ratings yet
ACN - Lect 02
15 pages
Advance Computer Networks: Spring 2020-21 Lect. #06
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #06
15 pages
Some Practice Questions
No ratings yet
Some Practice Questions
10 pages
NN 02
No ratings yet
NN 02
25 pages
Advance Computer Networks: Spring 2020-21 Lect. #07
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #07
11 pages
Advance Computer Networks: Spring 2020-21 Lect. #08
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #08
10 pages
Advance Computer Networks: Spring 2020-21 Lect. #01
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #01
8 pages
Lec. 12-B Artificial Neural Networks - Single Neruon PID
No ratings yet
Lec. 12-B Artificial Neural Networks - Single Neruon PID
18 pages
DL3 Backpropagation
No ratings yet
DL3 Backpropagation
17 pages
ANN Syllabus
No ratings yet
ANN Syllabus
2 pages
Neural Learning and ART Explained
No ratings yet
Neural Learning and ART Explained
3 pages
Question Bank ANN
No ratings yet
Question Bank ANN
6 pages

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

Uploaded by

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

Uploaded by

Data Mining

Lecture Notes for Chapter 4

Artificial Neural Networks

Introduction to Data Mining , 2nd Edition

10/12/2020 Introduction to Data Mining, 2nd Edition 1

Basic Idea: A complex non-linear function can be

Learns linear decision boundaries

Output Y is 1 if at least two of the three inputs are equal to 1.

10/12/2020 Introduction to Data Mining, 2nd Edition 4

Initialize the weights (w0, w1, …, wd)

Until stopping condition is met

10/12/2020 Introduction to Data Mining, 2nd Edition 6

Weight update formula:

10/12/2020 Introduction to Data Mining, 2nd Edition 8

10/12/2020 Introduction to Data Mining, 2nd Edition 9

For nonlinearly separable problems, perceptron

10/12/2020 Introduction to Data Mining, 2nd Edition 10

10/12/2020 Introduction to Data Mining, 2nd Edition 11

More than one hidden layer of

Every node in a hidden layer

10/12/2020 Introduction to Data Mining, 2nd Edition 12

Multi-layer neural networks with at least one

10/12/2020 Introduction to Data Mining, 2nd Edition 13

Activations at hidden layers can be viewed as features

Number of layers is known as depth of ANN

10/12/2020 Introduction to Data Mining, 2nd Edition 14

Activation value Activation

10/12/2020 Introduction to Data Mining, 2nd Edition 15

10/12/2020 Introduction to Data Mining, 2nd Edition 16

Can we apply perceptron learning rule to each

10/12/2020 Introduction to Data Mining, 2nd Edition 17

Loss Function to measure errors across all training points

Gradient descent: Update parameters in the direction of

Stochastic gradient descent (SGD): update the weight for every

10/12/2020 Introduction to Data Mining, 2nd Edition 18

Using chain rule of differentiation (on a single instance):

For sigmoid activation function:

How can we compute for every layer?

At a hidden layer (using chain rule):

– Gradients at layer l can be computed using gradients at layer l + 1

Number of nodes in input layer

10/12/2020 Introduction to Data Mining, 2nd Edition 21

Multilayer ANN are universal approximators but could

10/12/2020 Introduction to Data Mining, 2nd Edition 22

Training deep neural networks (more than 5-10 layers)

Sigmoid activation function easily saturates (show zero gradient

10/12/2020 Introduction to Data Mining, 2nd Edition 24

Use of Cross-entropy loss function

Use of Rectified Linear Unit (ReLU) Activations:

10/12/2020 Introduction to Data Mining, 2nd Edition 25

You might also like