0% found this document useful (0 votes)

5 views122 pages

Lecture 14

The document discusses linear discriminant functions and the perceptron as an early classifier, explaining its learning algorithm and geometric view. It emphasizes the relationship between regression and classification, highlighting the goal of learning a function that captures the relationship between input and output. The document also covers linear regression, the criterion function for optimization, and the linear least squares method for finding optimal parameters.

Uploaded by

jayanths.242sp014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views122 pages

Lecture 14

Uploaded by

jayanths.242sp014

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 122

Recap

• We have been considering linear discriminant

functions.

PR NPTEL course – p.1/122

Recap

• We have been considering linear discriminant

functions.
• Such a linear classifier is given by

Xd′

h(X) = 1 if wi φi (X) + w0 > 0

i=1
= 0 Otherwise
where φi are fixed functions.

PR NPTEL course – p.2/122

Recap

• We have been considering linear discriminant

functions.
• Such a linear classifier is given by

Xd′

h(X) = 1 if wi φi (X) + w0 > 0

i=1
= 0 Otherwise
where φi are fixed functions.
• We have been considering the case φi (X) = xi for
simplicity.
PR NPTEL course – p.3/122
Perceptron

• Perceptron is the earliest such classifier.

PR NPTEL course – p.4/122

Perceptron

• Perceptron is the earliest such classifier.

• Assuming augumented feature vector,
h(X) = sgn(W T X).

PR NPTEL course – p.5/122

Perceptron

• Perceptron is the earliest such classifier.

• Assuming augumented feature vector,
h(X) = sgn(W T X).
• ‘find weighted sum and threshold’

PR NPTEL course – p.6/122

Perceptron

• Perceptron is the earliest such classifier.

• Assuming augumented feature vector,
h(X) = sgn(W T X).
• ‘find weighted sum and threshold’

PR NPTEL course – p.7/122

Perceptron Learning Algorithm

• A simple iterative algorithm.

PR NPTEL course – p.8/122

Perceptron Learning Algorithm

• A simple iterative algorithm.

• Each iteration, we locally try to correct errors.

PR NPTEL course – p.9/122

Perceptron Learning Algorithm

• A simple iterative algorithm.

• Each iteration, we locally try to correct errors.

Let ∆W (k) = W (k + 1) − W (k). Then

∆W (k) =0 if W (k)T X(k) > 0 & y(k) = 1, or

W (k)T X(k) < 0 & y(k) = 0
= X(k) if W (k)T X(k) ≤ 0 & y(k) = 1
= − X(k) if W (k)T X(k) ≥ 0 & y(k) = 0

PR NPTEL course – p.10/122

Perceptron: Geometric view

The algorithm has a simple geometric view. Consider the

following data set.

PR NPTEL course – p.11/122

• Suppose W (k) misclassifies a pattern.

PR NPTEL course – p.12/122

• Now the correction made to W (k) can be seen as

PR NPTEL course – p.13/122

• We showed that: if the training set is linearly
separable, then the algorithm would find a separating
hyperplane in finitely many iterations.

PR NPTEL course – p.14/122

• We showed that: if the training set is linearly
separable, then the algorithm would find a separating
hyperplane in finitely many iterations.
• We also saw the ‘batch’ version of the algorithm. It is
shown to be a gradient descent on a reasonable cost
function.

PR NPTEL course – p.15/122

Perceptron

• A simple ‘device’: Weighted sum and threshold.

PR NPTEL course – p.16/122

Perceptron

• A simple ‘device’: Weighted sum and threshold.

• A simple learning machine. (A neuron model).

PR NPTEL course – p.17/122

Perceptron

• A simple ‘device’: Weighted sum and threshold.

• A simple learning machine. (A neuron model).

PR NPTEL course – p.18/122

• Perceptron is an interesting algorithm to learn linear
classifiers.

PR NPTEL course – p.19/122

• Perceptron is an interesting algorithm to learn linear
classifiers.
• Works only when data is linearly separable.

PR NPTEL course – p.20/122

• Perceptron is an interesting algorithm to learn linear
classifiers.
• Works only when data is linearly separable.
• In general, not possible to know beforehand whether
data is linearly separable.

PR NPTEL course – p.21/122

PR NPTEL course – p.22/122

Regression Problems

• Recall that the regression or function learning

problem is closely related to learning classifiers.

PR NPTEL course – p.23/122

Regression Problems

• Recall that the regression or function learning

problem is closely related to learning classifiers.
• The training set would be
{(Xi , yi ), i = 1, · · · , n} with Xi ∈ ℜd , yi ∈ ℜ, ∀i.

PR NPTEL course – p.24/122

Regression Problems

• Recall that the regression or function learning

problem is closely related to learning classifiers.
• The training set would be
{(Xi , yi ), i = 1, · · · , n} with Xi ∈ ℜd , yi ∈ ℜ, ∀i.
• The main difference is that the ‘targets’ or the ‘output’,
yi , is continuous valued in regression problem while it
can take only finitely many distinct values in a
classifier.

PR NPTEL course – p.25/122

• In a regression problem, the goal is to learn a
function, f : ℜd → ℜ, that captures the relationship
between X and y . We write ŷ = f (X).

PR NPTEL course – p.26/122

PR NPTEL course – p.27/122

• In a regression problem, the goal is to learn a
function, f : ℜd → ℜ, that captures the relationship
between X and y . We write ŷ = f (X).
• Note that any such function can also be viewed as a
classifier.
We can take h(X) = sgn(f (X)) as the classifier.
• We search over a suitably parameterized class of
functions to find the best one.

PR NPTEL course – p.28/122

PR NPTEL course – p.29/122

Linear Regression

• We will now consider learning a linear function

d
X
f (X) = wi xi + w0
j=1

where W = (w1 , · · · , wd )T ∈ ℜd and w0 ∈ ℜ are the

parameters.

PR NPTEL course – p.30/122

Linear Regression

• We will now consider learning a linear function

d
X
f (X) = wi xi + w0
j=1

where W = (w1 , · · · , wd )T ∈ ℜd and w0 ∈ ℜ are the

parameters.
• Thus a linear model can be expressed as
f (X) = W T X + w0 .

PR NPTEL course – p.31/122

Linear Regression

• We will now consider learning a linear function

d
X
f (X) = wi xi + w0
j=1

where W = (w1 , · · · , wd )T ∈ ℜd and w0 ∈ ℜ are the

parameters.
• Thus a linear model can be expressed as
f (X) = W T X + w0 .
• As earlier, by using an augumented vector X , we can
write this as f (X) = W T X .
PR NPTEL course – p.32/122
• Now, to learn ‘optimal’ W , we need a criterion
function.

PR NPTEL course – p.33/122

• Now, to learn ‘optimal’ W , we need a criterion
function.
• The criterion function assigns a figure of merit or cost
to each W ∈ ℜd+1 .

PR NPTEL course – p.34/122

• Now, to learn ‘optimal’ W , we need a criterion
function.
• The criterion function assigns a figure of merit or cost
to each W ∈ ℜd+1 .
• Then the optimal W would be one that optimizes the
criterion function.

PR NPTEL course – p.35/122

PR NPTEL course – p.36/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.

PR NPTEL course – p.37/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.
• Consider a function J : ℜd+1 → ℜ defined by
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.38/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.
• Consider a function J : ℜd+1 → ℜ defined by
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We take the ‘optimal’ W to be the minimizer of J(·).

PR NPTEL course – p.39/122

Linear Least Squares Regression

• We want to find a W such that ŷ(X) = f (X) = W T X

is a good fit for the training data.
• Consider a function J : ℜd+1 → ℜ defined by
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We take the ‘optimal’ W to be the minimizer of J(·).

• Known as linear least squares method.

PR NPTEL course – p.40/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.41/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• If we are learning a classifier we can have

yi ∈ {−1, +1}.

PR NPTEL course – p.42/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• If we are learning a classifier we can have

yi ∈ {−1, +1}.
• Note that finally we would use sign of W T X as the
classifier output.

PR NPTEL course – p.43/122

• We want to find W to minimize
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• If we are learning a classifier we can have

yi ∈ {−1, +1}.
• Note that finally we would use sign of W T X as the
classifier output.
• Thus minimizing J is a good way to learn linear
discriminant functions also.
PR NPTEL course – p.44/122
• We want to find minimizer of
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.45/122

• We want to find minimizer of
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• This is a quadratic function and we can analytically

find the minimizer.

PR NPTEL course – p.46/122

• We want to find minimizer of
n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• This is a quadratic function and we can analytically

find the minimizer.
• For this we rewrite J(W ) into a more convenient form.

PR NPTEL course – p.47/122

• Recall that we take all vectors to be column vectors.

PR NPTEL course – p.48/122

• Recall that we take all vectors to be column vectors.
• Hence each training sample Xi is a (d + 1) × 1 matrix.

PR NPTEL course – p.49/122

• Recall that we take all vectors to be column vectors.
• Hence each training sample Xi is a (d + 1) × 1 matrix.
• Let A be a matrix given by
T
A = [X1 · · · Xn ]

PR NPTEL course – p.50/122

• Recall that we take all vectors to be column vectors.
• Hence each training sample Xi is a (d + 1) × 1 matrix.
• Let A be a matrix given by
T
A = [X1 · · · Xn ]
• A is a n × (d + 1) matrix whose ith row is given by XiT .

PR NPTEL course – p.51/122

PR NPTEL course – p.52/122

• Let Y be a n × 1 vector whose ith element is yi .

PR NPTEL course – p.53/122

• Let Y be a n × 1 vector whose ith element is yi .
• Hence AW − Y would be a n × 1 vector whose ith
element is (XiT W − yi ).

PR NPTEL course – p.54/122

• Let Y be a n × 1 vector whose ith element is yi .
• Hence AW − Y would be a n × 1 vector whose ith
element is (XiT W − yi ).
• Hence we have
n
1 X ¡ T
¢2 1
J(W ) = X W − yi
i = (AW −Y )T (AW −Y )
2 i=1
2

PR NPTEL course – p.55/122

• To find minimizer of J(·) we need to equate its

gradient to zero

PR NPTEL course – p.56/122

• We have
∇ J(W ) = AT (AW − Y )

PR NPTEL course – p.57/122

• We have
∇ J(W ) = AT (AW − Y )
• Equating the gradient to zero, we get

(AT A)W = AT Y

PR NPTEL course – p.58/122

• We have
∇ J(W ) = AT (AW − Y )
• Equating the gradient to zero, we get

(AT A)W = AT Y
• The optimal W satisfies this system of linear
equations. (Called normal equations).

PR NPTEL course – p.59/122

• AT A is a (d + 1) × (d + 1) matrix.

PR NPTEL course – p.60/122

• AT A is a (d + 1) × (d + 1) matrix.
• AT A is invertible if A has linearly independent
columns. (This is because null space of A is same as
null space of AT A).

PR NPTEL course – p.61/122

• AT A is a (d + 1) × (d + 1) matrix.
• AT A is invertible if A has linearly independent
columns. (This is because null space of A is same as
null space of AT A).
• Rows of A are the training samples Xi .

PR NPTEL course – p.62/122

PR NPTEL course – p.63/122

• Hence columns of A are linearly independent if no
feature can be obtained as a linear combination of
other features.

PR NPTEL course – p.64/122

• Hence columns of A are linearly independent if no
feature can be obtained as a linear combination of
other features.
• If we assume features are linearly independent then
A would have linearly independent columns and
hence AT A would be invertible.

PR NPTEL course – p.65/122

PR NPTEL course – p.66/122

• The optimal W is a solution of (AT A)W = AT Y .

PR NPTEL course – p.67/122

• The optimal W is a solution of (AT A)W = AT Y .
• When AT A is invertible, we get the optimal W as

W ∗ = (AT A)−1 AT Y = A† Y
where A† = (AT A)−1 AT , is called the generalized
inverse of A.

PR NPTEL course – p.68/122

• The optimal W is a solution of (AT A)W = AT Y .
• When AT A is invertible, we get the optimal W as

W ∗ = (AT A)−1 AT Y = A† Y
where A† = (AT A)−1 AT , is called the generalized
inverse of A.
• The above W ∗ is the linear least squares solution for
our regression (or classification) problem.

PR NPTEL course – p.69/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .

PR NPTEL course – p.70/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .
• A is a n × (d + 1) matrix and normally n >> d.

PR NPTEL course – p.71/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .
• A is a n × (d + 1) matrix and normally n >> d.
• Consider the (over-determined) system of linear
equations AW = Y .

PR NPTEL course – p.72/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

minimize ||AW − Y ||2 .
• A is a n × (d + 1) matrix and normally n >> d.
• Consider the (over-determined) system of linear
equations AW = Y .
• The system may or may not be consistent. But, we
seek to find W ∗ to minimize squared error.

PR NPTEL course – p.73/122

Geometry of Least Squares

• Our least squares method seeks to find a W to

PR NPTEL course – p.74/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .

PR NPTEL course – p.75/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .
• Let C0 , C1 , · · · , Cd be the columns of A.

PR NPTEL course – p.76/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .
• Let C0 , C1 , · · · , Cd be the columns of A.
• Then AW = w0 C0 + w1 C1 + · · · + wd Cd .

PR NPTEL course – p.77/122

• The least squares method is trying to find a ‘best-fit’
W for the systems AW = Y .
• Let C0 , C1 , · · · , Cd be the columns of A.
• Then AW = w0 C0 + w1 C1 + · · · + wd Cd .
• Thus, for any W , AW is a linear combination of
columns of A.

PR NPTEL course – p.78/122

PR NPTEL course – p.79/122

• Otherwise, we want the projection of Y onto the
column space of A.

PR NPTEL course – p.80/122

• Otherwise, we want the projection of Y onto the
column space of A.
• That is, we want to find a vector Z in the column
space of A that is closest to Y .

PR NPTEL course – p.81/122

PR NPTEL course – p.82/122

• Otherwise, we want the projection of Y onto the
column space of A.
• That is, we want to find a vector Z in the column
space of A that is closest to Y .
• Any vector in the column space of A can be written as
Z = AW for some W .
• Hence we want to find Z to minimize ||Z − Y ||2
subject to the constraint that Z = AW for some W .

PR NPTEL course – p.83/122

PR NPTEL course – p.84/122

• Let us take the original (and not augumented) data
vectors and write our model as
ŷ(X) = f (X) = W T X + w0 where now W ∈ ℜd .

PR NPTEL course – p.85/122

• Let us take the original (and not augumented) data
vectors and write our model as
ŷ(X) = f (X) = W T X + w0 where now W ∈ ℜd .
• Now we have
n
1 X ¡ T ¢2
J(W ) = W Xi + w0 − yi
2 i=1

PR NPTEL course – p.86/122

• Let us take the original (and not augumented) data
vectors and write our model as
ŷ(X) = f (X) = W T X + w0 where now W ∈ ℜd .
• Now we have
n
1 X ¡ T ¢2
J(W ) = W Xi + w0 − yi
2 i=1

• For any given W we can find best w0 by equating the

partial derivative to zero.

PR NPTEL course – p.87/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

PR NPTEL course – p.88/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

Equating the partial derivative to zero, we get

PR NPTEL course – p.89/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

Equating the partial derivative to zero, we get

Pn T
i=1 (W Xi + w0 − yi ) = 0

PR NPTEL course – p.90/122

We have
n
∂J X
= (W T Xi + w0 − yi )
∂w0 i=1

Equating the partial derivative to zero, we get

Pn T
i=1 (W Xi + w0 − yi ) = 0
X n
T
Pn
⇒ nw0 + W i=1 Xi = yi
i=1

PR NPTEL course – p.91/122

This gives us
n
Ã n
!
1 X 1 X
w0 = yi − W T Xi
n i=1
n i=1

PR NPTEL course – p.92/122

This gives us
n
Ã n
!
1 X 1 X
w0 = yi − W T Xi
n i=1
n i=1

• Thus, w0 accounts for difference in the average of

W T X and average of y .

PR NPTEL course – p.93/122

This gives us
n
Ã n
!
1 X 1 X
w0 = yi − W T Xi
n i=1
n i=1

• Thus, w0 accounts for difference in the average of

W T X and average of y .
• So, w0 is often called the bias term.

PR NPTEL course – p.94/122

• We have taken our linear model to be
d
X
ŷ(X) = f (X) = wj xj
j=0

PR NPTEL course – p.95/122

• We have taken our linear model to be
d
X
ŷ(X) = f (X) = wj xj
j=0

• As mentioned earlier, we could instead choose any

fixed set of basis functions φi .

PR NPTEL course – p.96/122

• We have taken our linear model to be
d
X
ŷ(X) = f (X) = wj xj
j=0

• As mentioned earlier, we could instead choose any

fixed set of basis functions φi .
• Then the model would be
d′
X
ŷ(X) = f (X) = wj φj (X)
j=0
PR NPTEL course – p.97/122
• We can use the same criterion of minimizing sum of
squares of errors.
n
1 X ¡ T ¢2
J(W ) = W Φ(Xi ) − yi
2 i=1

where Φ(Xi ) = (φ0 (Xi ), · · · , φd′ (Xi ))T .

PR NPTEL course – p.98/122

• We can use the same criterion of minimizing sum of
squares of errors.
n
1 X ¡ T ¢2
J(W ) = W Φ(Xi ) − yi
2 i=1

where Φ(Xi ) = (φ0 (Xi ), · · · , φd′ (Xi ))T .

• We want the minimizer of J(·).

PR NPTEL course – p.99/122

• We can learn W using the same method as earlier.

PR NPTEL course – p.100/122

• We can learn W using the same method as earlier.
• Thus, we will again have

W ∗ = (AT A)−1 AT Y

PR NPTEL course – p.101/122

• We can learn W using the same method as earlier.
• Thus, we will again have

W ∗ = (AT A)−1 AT Y
• The only difference is that now the ith row of matrix A
would be
[φ0 (Xi ) φ1 (Xi ) · · · φd′ (Xi )]

PR NPTEL course – p.102/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).

PR NPTEL course – p.103/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.

PR NPTEL course – p.104/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.
• Now the model is
ŷ(X) = f (X) = w0 + w1 X + w2 X 2 + · · · + wm X m

PR NPTEL course – p.105/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.
• Now the model is
ŷ(X) = f (X) = w0 + w1 X + w2 X 2 + · · · + wm X m
• The model is: y is a mth degree polynomial in X .

PR NPTEL course – p.106/122

• As an example: Let d = 1. (Then Xi , yi ∈ ℜ).
• Take φj (X) = X j , j = 0, 1, · · · m.
• Now the model is
ŷ(X) = f (X) = w0 + w1 X + w2 X 2 + · · · + wm X m
• The model is: y is a mth degree polynomial in X .
• All such problems are tackled in a uniform fashion
using the least squares method we presented.

PR NPTEL course – p.107/122

LMS algorithm

• We are finding W ∗ that minimizes

n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

PR NPTEL course – p.108/122

LMS algorithm

• We are finding W ∗ that minimizes

n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We could have found the minimum through an

iterative scheme using gradient descent.

PR NPTEL course – p.109/122

LMS algorithm

• We are finding W ∗ that minimizes

n
1 X ¡ T ¢2
J(W ) = Xi W − yi
2 i=1

• We could have found the minimum through an

iterative scheme using gradient descent.
• The gradient of J is given by
n
X ¡ T
¢
∇J(W ) = Xi X W − yi
i
i=1

PR NPTEL course – p.110/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

PR NPTEL course – p.111/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

• In analogy with what we saw in Perceptron algorithm,

this can be viewed as a ‘batch’ version.

PR NPTEL course – p.112/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

• In analogy with what we saw in Perceptron algorithm,

this can be viewed as a ‘batch’ version.
• We use the current W to find the errors on all training
data and then do all the ‘corrections’ together.

PR NPTEL course – p.113/122

• The iterative gradient descent scheme would be
n
X ¡ T
¢
W (k + 1) = W (k) − η Xi X W (k) − yi
i
i=1

• In analogy with what we saw in Perceptron algorithm,

this can be viewed as a ‘batch’ version.
• We use the current W to find the errors on all training
data and then do all the ‘corrections’ together.
• We can instead have an incremental version of this
algorithm.
PR NPTEL course – p.114/122
• For the incremental version, at each iteration we pick
one of the training samples. Call this X(k).

PR NPTEL course – p.115/122

• For the incremental version, at each iteration we pick
one of the training samples. Call this X(k).
• The error on this sample would be
1 T 2
2
(X(k) W (k) − y(k)) .

PR NPTEL course – p.116/122

• For the incremental version, at each iteration we pick
one of the training samples. Call this X(k).
• The error on this sample would be
1 T 2
2
(X(k) W (k) − y(k)) .
• Using the gradient of only this term, we get the
incremental version as
W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

PR NPTEL course – p.117/122

PR NPTEL course – p.118/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

PR NPTEL course – p.119/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

• Here (X(k), y(k)) is the training example picked at
iteration k and W (k) is the weight vector at iteration k .

PR NPTEL course – p.120/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

• Here (X(k), y(k)) is the training example picked at
iteration k and W (k) is the weight vector at iteration k .
• We do not need to have all training examples together
with us. We can learn W from a stream of examples
without needing to store them.

PR NPTEL course – p.121/122

• In the LMS algorithm, we iteratively update W as

W (k + 1) = W (k) − η X(k) (X(k)T W (k) − y(k))

PR NPTEL course – p.122/122

Algebra: Distributive Property Guide
No ratings yet
Algebra: Distributive Property Guide
3 pages
MATH 2413 Calculus I Prerequisite Review
No ratings yet
MATH 2413 Calculus I Prerequisite Review
6 pages
Lecture 34
No ratings yet
Lecture 34
135 pages
Lecture 33
No ratings yet
Lecture 33
131 pages
Exam Schedual
No ratings yet
Exam Schedual
119 pages
Lecture 36
No ratings yet
Lecture 36
133 pages
Lecture 5
No ratings yet
Lecture 5
127 pages
Bayes Classifier and Error Analysis
No ratings yet
Bayes Classifier and Error Analysis
128 pages
Lec 10
No ratings yet
Lec 10
196 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Introduction To Machine Learning - Unit 4 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 4 - Week 1
5 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
123 pages
SVM Optimization Explained
No ratings yet
SVM Optimization Explained
99 pages
Introduction To Machine Learning - Unit 5 - Week 2
No ratings yet
Introduction To Machine Learning - Unit 5 - Week 2
4 pages
Bayes Classifier and Decision Making
No ratings yet
Bayes Classifier and Decision Making
81 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Midterm Solutions
No ratings yet
Midterm Solutions
11 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
Classification
No ratings yet
Classification
47 pages
Introduction To Machine Learning - Unit 4 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 4 - Week 1
5 pages
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Lec 02
No ratings yet
Lec 02
36 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
07 - Linear Models For Classification
No ratings yet
07 - Linear Models For Classification
76 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Lecture 4-Revision - Part3 - PCA - Reg
No ratings yet
Lecture 4-Revision - Part3 - PCA - Reg
39 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Supervised Learning: Linear Models
No ratings yet
Supervised Learning: Linear Models
34 pages
Book
No ratings yet
Book
130 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
16 pages
16 - The Key To The Most Powerful ML Models
No ratings yet
16 - The Key To The Most Powerful ML Models
25 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
NN Theory
No ratings yet
NN Theory
138 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
2 Linear
No ratings yet
2 Linear
91 pages
Introduction To Machine Learning - Unit 6 - Week 3
No ratings yet
Introduction To Machine Learning - Unit 6 - Week 3
5 pages
Model Selection Techniques - An Overview: Jie Ding, Vahid Tarokh, and Yuhong Yang
No ratings yet
Model Selection Techniques - An Overview: Jie Ding, Vahid Tarokh, and Yuhong Yang
21 pages
Machine Learning MCQ Assignment
No ratings yet
Machine Learning MCQ Assignment
56 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
Y4 Baseline Assessment Ingles 4
No ratings yet
Y4 Baseline Assessment Ingles 4
4 pages
Application of Concepts of Differentials and Integral Calculus
No ratings yet
Application of Concepts of Differentials and Integral Calculus
9 pages
Comprehensive Statistics Guide
No ratings yet
Comprehensive Statistics Guide
81 pages
2023 - 2024 Consolidated Supplementary Examinations Timetable - Draft
No ratings yet
2023 - 2024 Consolidated Supplementary Examinations Timetable - Draft
32 pages
Syllabus Format PG Maths
No ratings yet
Syllabus Format PG Maths
26 pages
STD7 Mathematics Model1
100% (1)
STD7 Mathematics Model1
152 pages
Math Assignment 1 Questions
No ratings yet
Math Assignment 1 Questions
2 pages
ID: 6d99b141
No ratings yet
ID: 6d99b141
35 pages
S V C C P: OME ERY Hallenging Alculus Roblems
No ratings yet
S V C C P: OME ERY Hallenging Alculus Roblems
12 pages
Neural Embedding for NLP Experts
No ratings yet
Neural Embedding for NLP Experts
9 pages
Fraction Concepts
No ratings yet
Fraction Concepts
17 pages
6 Projection of Planes
No ratings yet
6 Projection of Planes
42 pages
ملزمة ماث لغات للصف الأول الإعدادي - موقع ملزمتي
No ratings yet
ملزمة ماث لغات للصف الأول الإعدادي - موقع ملزمتي
26 pages
Geometry m1 Topic e Overview
No ratings yet
Geometry m1 Topic e Overview
1 page
Test Maths
No ratings yet
Test Maths
5 pages
Systems of Equations Elimination - p1-2
No ratings yet
Systems of Equations Elimination - p1-2
2 pages
The Nature of Mathematics
No ratings yet
The Nature of Mathematics
19 pages
Important Question For Cbse Class 6 Maths Chapter 1 Knowing Our Numbers
No ratings yet
Important Question For Cbse Class 6 Maths Chapter 1 Knowing Our Numbers
4 pages
Discrete Structure & Logic Lab REPORT: KCS353 Session: 2021-2022
No ratings yet
Discrete Structure & Logic Lab REPORT: KCS353 Session: 2021-2022
15 pages
Geometry Formulas: Triangle Formula
No ratings yet
Geometry Formulas: Triangle Formula
4 pages
Quadratic Equation
No ratings yet
Quadratic Equation
57 pages
Math 121 Exams 2020-2021
No ratings yet
Math 121 Exams 2020-2021
3 pages
Permutation and Combination
No ratings yet
Permutation and Combination
8 pages
CS140 Programming Tutorials
No ratings yet
CS140 Programming Tutorials
33 pages
EPPD1043-topic 01 Functions - Part 5 - Exponential and Logarithma
No ratings yet
EPPD1043-topic 01 Functions - Part 5 - Exponential and Logarithma
16 pages
Texas Instruments Ti-34 Manual
No ratings yet
Texas Instruments Ti-34 Manual
45 pages
Mixed Series Practice Worksheet
No ratings yet
Mixed Series Practice Worksheet
5 pages
1.1 Functions Topic Questions 0606 Set 2 QP Ms
No ratings yet
1.1 Functions Topic Questions 0606 Set 2 QP Ms
14 pages