Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
44 views29 pages

Pert19 - Learning From Examples II

Uploaded by

82gfmcz5fs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views29 pages

Pert19 - Learning From Examples II

Uploaded by

82gfmcz5fs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Course : Artificial Intelligence (COMP6065)

Non-official Slides

Learning from Example II

Session 19

Revised by Williem, S. Kom., Ph.D.


1
Learning Outcomes
At the end of this session, students will be able to:

• LO 5 : Apply various techniques to an agent when acting under


certainty

• LO 6 : Apply how to process natural language and other


perceptual signs in order that an agent can interact
intelligently with the world

2
Outline
1. The Theory of Learning

2. Regression and Classification with Linear Models

3. Artificial Neural Networks

4. Practical Machine Learning

5. Summary

3
The Theory of Learning
• How can we be sure that our learning algorithm has produced a
hypothesis that will predict the correct value for previously
unseen input ?

• In formal term, how do we know that the hypothesis h is close to


the target function f if we don’t know what f is?

• How many examples do we need to get a good h?

• What hypothesis space should we use?

• If hypothesis space is very complex, can we even find the best h?

4
The Theory of Learning
• How many examples are needed for learning?

– Addressed by computational learning theory

• Any hypothesis that is seriously wrong will almost certainly be


“found out” with high probability after a small number of
examples, because it will make an incorrect prediction.

• Thus, any hypothesis that is consistent with a sufficiently large


set of training examples is unlikely to be seriously wrong: that is,
it must be probably approximately correct

– PAC learning algorithm


5
The Theory of Learning
• X = set of all possible examples

• D = probability distribution from which examples are drawn,


assumed same for training and test sets.

• H = set of possible hypotheses

• m = number of training examples


error (h)  P(h( x)  f ( x) | x drawn from D)
Hypothesis space:
H bad f 
H is approximately
correct if error(h)  
6
The Theory of Learning
• PAC learning example: learning decision list

– A decision list consists of a series of tests

– Decision lists resemble decision trees, but structure is


simpler, branch only in one direction

N N
Patrons(x,Some) Patrons(x,Full)  Fri/Sat(x)
no

Y Y
yes yes

We can measure the number of examples needed for PAC learning!


7
Regression and Classification
with Linear Models
• Learning a linear model: Fitting a straight line

– Has been used for hundred of years

• Cases:

– Univariate linear regression

– Linear classifiers with a threshold

8
Regression and Classification
with Linear Models
• Univariate linear regression

– It want to estimate a linear model that fits the examples

hw  x   w1 x  w0
• w is the weights coefficient

• h is the estimated output

– What we have to do is minimizing the empirical loss

9
Regression and Classification
with Linear Models
• Univariate linear regression

– How?

• Partial derivatives are zero

• Gradient descent

10
Regression and Classification
with Linear Models
• Univariate linear regression

– Gradient descent

• α is the learning rate (step size)

• Convergence  the derivatives are below threshold

11
Regression and Classification
with Linear Models
• Univariate linear regression

– Gradient descent

• The derivatives and the final update function

12
Study Case Student
1
Test 1
50
Test 2
32
2 51 33
• Test 2 Grade = w0 +w1*(Test 1 Grade) 3
4
52
53
34
35
5 54 36
• From Data: 6 55 37
7 56 39
– Estimate w0 8
9
57
58
40
41
10 59 42
– Estimate w1 11 60 43
12 61 44
13 62 46
14 63 47
15 64 48
16 65 49
17 66 50
18 67 51
19 68 53
20 69 54
21 70 55
22 71 56
23 72 57
13
Regression and Classification
with Linear Models
• Linear classifiers with a threshold

– We can use the linear function to do classification

– A linear decision boundary is a line that separates two


classes using a straight line

Plot of two seismic data parameters


• X1 body wave magnitude
• X2 surface wave magnitude
• White circles: earthquakes
• Black circles: Nuclear explosions

14
Artificial Neural Network

15
Artificial Neural Network
• An artificial network that imitates the neurons in our brain

– There is hypothesis that mental activity consists of


electrochemical activity in network of brain cells

• Each node in ANN fires when a linear combination exceed


some threshold (linear classifier)

16
Artificial Neural Network
• Neural networks are composed of nodes or units connected
by directed links

• A link from unit i and j serves to propagate activation ai

• Each link has numeric weight wi,j associated with it

• Then, we apply activation function g to derive the output


(activation) a

17
Artificial Neural Network
• The activation function g is typically
– A hard threshold (perceptron) or
– A soft threshold (sigmoid perceptron)
• There are two fundamentally distinct ways to connect each
node
– Feed-forward network: has a connection in one direction
• Arranged in layers (and hidden units if more than 1
layer)
– Recurrent network: feeds its output back its input

18
Artificial Neural Network
• Neural Network for Quake Enemy Dead
Sound Low Health
• Four input perceptron
– One input for each condition
• Four perceptron hidden layer
– Fully connected
• Five output perceptron
– One output for each action
– Choose action with highest output
– Or, probabilistic action selection Attack Wander Spawn
Retreat Chase
19
Artificial Neural Network
• Feed-forward neural networks

– Single-layer (Perceptron network)

• Works in linear problem only

– Multi-layer network

• Works in non-linear problem too

20
Artificial Neural Network
• See the cases of and, or, xor for single-layer perceptron

– How can we solve the xor?

21
Artificial Neural Network
• Multi-layer network

– Contains hidden units (3,4)

– Activation function is complex

• How can we optimize?

– Using back propagation for gradient descent!

• We propagate the derivatives value

22
Artificial Neural Network
• The back-propagation process can be summarized as follows

– Compute the derivative values for the output units, using


the observed error

– Starting with output layer, repeat the following for each


layer until the earliest hidden layer is reached

• Propagate the derivative values back to the previous


layer

• Update the weights between the two layer

23
Practical Machine Learning
• Handwritten digit recognition

– We demonstrate an application of a multilayer feed


forward network for printed character recognition.

– For simplicity, we can limit our task to the recognition of


digits from 0 to 9. Each digit is represented by a 5 x 9
bitmap.

24
Practical Machine Learning
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45

25
Practical Machine Learning
• The number of neurons in the input layer is decided by the
number of pixels in the bit map. The bit map in our example
consists of 45 pixels, and thus we need 45 input neurons.

• The output layer has 10 neurons – one neuron for each digit
to be recognized.

26
Practical Machine Learning
0 1 1 0
1 2 2 1
1 3
3 0
1
1 4
4 0
0 5 2
5 0
3
6 0
1 41 4
7 0
1 42 5

1 43
8 0
1 44 9 0
1 45 10 0

27
Summary
• Computational learning theory analyzes the sample
complexity and computational complexity of inductive learning

• Linear regression is a widely used model, can be solved by


gradient descent search

• A linear classifier with a hard threshold can be trained to fit


data that are linearly separable

• Neural networks represent complex nonlinear functions with


a network of linear-threshold units

28
References
• Stuart Russell, Peter Norvig. 2010. Artificial Intelligence : A
Modern Approach. Pearson Education. New Jersey.
ISBN:9780132071482

• http://aima.cs.berkeley.edu

29

You might also like