Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views37 pages

M03 Clustering

The document discusses clustering techniques in machine learning, specifically focusing on k-means clustering and its application in signal processing, such as audio encoding. It highlights the Lloyd-Max algorithm for quantization and introduces Gaussian Mixture Models (GMM) as an alternative to k-means, addressing its limitations. The lecture emphasizes the importance of optimizing membership matrices and using Mahalanobis distance for better clustering results.

Uploaded by

mpeducation2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views37 pages

M03 Clustering

The document discusses clustering techniques in machine learning, specifically focusing on k-means clustering and its application in signal processing, such as audio encoding. It highlights the Lloyd-Max algorithm for quantization and introduces Gaussian Mixture Models (GMM) as an alternative to k-means, addressing its limitations. The lecture emphasizes the importance of optimizing membership matrices and using Mahalanobis distance for better clustering results.

Uploaded by

mpeducation2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

ENGR-E 511; ENGR-E 399

“Machine Learning for Signal Processing”


Module 03: Lecture 01:
Clustering
Minje Kim
Department of Intelligent Systems Engineering
Email: [email protected]
Website: http://minjekim.com
Research Group: http://saige.sice.indiana.edu
Meeting Request: http://doodle.com/minje
Motivating Problems
CD
How do we represent music in a CD?
What? What is CD?
44.1 kHz, 16 bit, LPCM
What does it mean? 1/44100 sec
250

200

150

100

50

-50

-100

-150

-200

-250

We sample from the (continuous) waveform at every 1/44100 second Time

Each sample is represented with one of 216=65536 values


• e.g. 0000 0000 0000 1000 : -0.9998
• e.g. 1111 1111 1101 1100 : +0.9989
Distribution
• e.g. 1000 0000 0000 0000 : 0 of the samples
Can we do better? -1 0 1
• (Real-valued) samples in the same block is represented by the same value (and I don’t like it)

2
Motivating Problems
The Lloyds-Max algorithm
First of all, we need to save the bits
What if there’s a lack of bits? 8bits 4bits
• Some values are misrepresented

Assume that the raw audio samples are following a distribution like:
1.2

0.8

pdf
0.6

0.4

0.2

0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Which do you prefer between the red boundaries and blue boundaries (they are all for 2 bit encoding)?

3
Motivating Problems
The Lloyds-Max algorithm
You prefer the blue boundaries because they go well with the underlying structure of the sample
distribution
Underlying structure? 1.2

0.8

pdf
0.6

0.4

=-0.6, =0.1
0.2 =-0.3, =0.1
=0.25, =0.1
=0.5, =0.1
mixture

0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

And they sound better!

4bits (Lloyds) 4bits (Uniform) 8bits (Lloyds) 8bits (Uniform) Original

4
k-Means Clustering
A scalar case
Where’s the distortion from?
Let’s start from the red boundaries we don’t like
The discrepancy between the representative and the actual samples
We want to find the representative that creates the least discrepancy
• For each quantization level 1.2

How do we measure the amount of discrepancy?


1

0.8

pdf
0.6

=-0.6, =0.1
0.4 =-0.3, =0.1
=0.25, =0.1
=0.5, =0.1
mixture
0.2

0
-1 ⇤⇥
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

✓1

5
k-Means Clustering
A scalar case
1.2

So, the objective for j-th range is to find the representative value
0.8

X
that minimizes the error

pdf
0.6

arg min ||xi ✓j ||2 0.4


=-0.6,
=-0.3,
=0.1
=0.1

✓j i2Cj
=0.25,
=0.5,
=0.1
=0.1
mixture
0.2

For all examples that belong to the j-th range 0


-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

• Cj holds the indices for j-th range P


1
So, what’s the solution? (why?) ✓j = |Cj | i2Cj xi C1
Eventually, we need to do this for all J ranges
J X j=1 j=J
X
arg min ||xi ✓j ||2 i=1 1 0 0 0
✓1 ,✓2 ,··· ,✓J j=1 i2C
j 0 0 0 1

Let me put it in another way 1 0 0 0


4th sample belongs
J X
X N 0 1 0 0
to the 2nd cluster
arg min uij ||xi ✓j ||2 0 0 1 0
✓1 ,✓2 ,··· ,✓J j=1 i=1
… … … …
Where uij is a matrix that indicates the membership i=N 0 1 0 0
• e.g. uij = 1 then i 2 Cj X
The membership matrix uij should also meet another constraint: uij = 1
j

6
k-Means Clustering
A scalar case
Are we done?
No, we assumed that the boundaries are correct, but they aren’t

1.2

pdf 0.8

0.6

=-0.6, =0.1
0.4 =-0.3, =0.1
=0.25, =0.1
=0.5, =0.1
mixture
0.2

0
-1 -0.8 ⇤
-0.6 -0.4
⇤ -0.2 0

0.2 0.4

0.6 0.8 1

In other words, we don’t know if the uij matrix is correct

7
k-Means Clustering
A scalar case
We need to optimize w.r.t. the membership matrix as well
J X
X N
arg min uij ||xi ✓j ||2
✓ ,U j=1 i=1

The k-means clustering algorithm on scalar samples


Initialize the means ✓ = {✓1 , ✓2 , · · · , ✓J }> with random numbers
Update the membership matrix
• This is actually a complicated optimization problem (why?), but the solution is simple
• For a fixed set of means ⇢
1 if j = arg minj 0 ||xi ✓j 0 ||2
uij =
0 otherwise
P
Update the means ✓j = |C1j | i2Cj xi
1
PN
= |Cj | i=1 uij xi

8
k-Means Clustering
A scalar case
Let’s get back to the CD encoding problem (Lloyd-Max algorithm)
Now instead of all the possible real values between -1 and +1
We replace the values within each range with their corresponding representatives, i.e. the means.
1.2

1
4bits (Lloyds) 4bits (Uniform)

0.8
pdf

0.6

=-0.6, =0.1
8bits (Lloyds) 8bits (Uniform)
0.4 =-0.3, =0.1
=0.25, =0.1
=0.5, =0.1
mixture
0.2

0
-1 -0.8

-0.6 -0.4
⇤ -0.2 0

0.2 0.4
⇤ 0.6 0.8 1
Original

9
Motivating Problems
Black cat, red wall, gray ground
Now let’s move on to the multi-dimensional case
In general, how do we quantize a vector?
First off, can you (verbally) describe this picture?

10
Motivating Problems
Black cat, red wall, gray ground
k-means with three clusters

11
Motivating Problems
Black cat, red wall, gray ground
k-means with 8 clusters and 16 clusters

Algorithm-wise everything is the same except for the fact that the input samples are 3D (RGB) vectors
J X
X N J X
X N
arg min uij ||xi ✓j ||2 arg min uij ||xi ✓ j ||2
✓ ,U j=1 i=1 ⇥,U j=1 i=1

12
Motivating Problems
Black cat, red wall, gray ground

✓2 xi

✓1

✓3

13
Vector Quantization
Clustering on multi-dimensional samples
What we did is something called Vector Quantization (VQ)
Do clustering
Replace vector samples with the mean of the cluster they belong to

In image, we had 3D vectors (RGB)


If we assume J clusters, we need log2J bits to encode a pixel
• Rather than 8 bits per pixel

What we need:
A good clustering
• A small number of means that are representative enough
Dictionary (codebook)
• A codeword corresponds to one of the means
Index to the codebook
• Index of the pixel-wise membership

14
Gaussian Mixture Model
What’s wrong with k-means?
What I don’t like about k-means
Euclidean distance, hard decision, equiprobable clusters, diagonal cov…
3 3

2 2

1 1

0 0

-1 -1

-2 -2

4 4
-3 -3

-4 -4 3 3

-4 -3 -2 -1 0 1 2 3 4 5 -4 -3 -2 -1 0 1 2 3 4 5

Input data with GT means K-means clustering results 2 2

1 1

0 0

-1 -1

-2 -2

-3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3 4 5

Input data with GT means K-means clustering results

15
Gaussian Mixture Model
An alternative: Mahalanobis distance
Cluster 1 or 2?
Let’s tweak k-means
First, let’s take variance into account for the distance metric
Mahalanobis distance:
s
(xi µj ) 2 µj Mean of j-th cluster µ1 xi µ2
DM (xi ||µj ) = 2
j j Standard dev. of j-th cluster 3

Multi-dimensional cases
q with covariance:
2
DM (xi ||µj ) = (xi µj )> ⌃ 1 (xi µj )
For a 2D Gaussian with 
⌃=
1 0.7 1

0.7 1
(1,1) p 0

• Euclidean: 2
• Mahalanobis: 1.0847 -1

(1,-1) p
• Euclidean: 2 -2

• Mahalanobis: 2.5820
-3
-3 -2 -1 0 1 2 3

16
Gaussian Mixture Model
Maximum Likelihood
Mixture of Gaussians (MoG) or Gaussian Mixture Model (GMM)
A maximum likelihood problem
• Given the data, find the best fit among the family of prob. distributions with a certain parametric form

Which one is a better fit?

4 4

3 3

2 2

1 1

0 0

-1 -1

-2 -2

-3 -3

-4 -3 -2 -1 0 1 2 3 4 5 -4 -3 -2 -1 0 1 2 3 4 5

17
Gaussian Mixture Model
Maximum Likelihood
We know how to solve a maximum likelihood problem:
N
Y
arg max p(xi ; ⇥)
⇥ i=1
For the GMM case,
we can break down the likelihood as follows:
N X
Y J
j=1 j=2
L= Pj N (xi ; µj , ⌃j )
i=1 j=1

Because,
J
X
p(xi ; ⇥) = Pj N (xi ; µj , ⌃j )
j=1

Note that:
⇥ = {P1 , µ1 , ⌃1 , P2 , µ2 , ⌃2 , · · · , PJ , µJ , ⌃J }

18
Gaussian Mixture Model
Maximum Likelihood
We also know the p.d.f. of a Gaussian: ⇣ ⌘
1 1 > 1
N (xi ; µj , ⌃j ) = exp (xi µj ) ⌃j (xi µj )
(2⇡)D/2 |⌃j |1/2 2
Therefore, the likelihood is
N X
Y J
L= Pj N (xi ; µj , ⌃j )
i=1 j=1
!
N X
Y J
1 ⇣ 1 ⌘
= Pj exp (xi µj )> ⌃j 1 (xi µj )
i=1 j=1
(2⇡)D/2 |⌃j |1/2 2

Then the log-likelihood is:


N J
!
X X
LL = log Pj N (xi ; µj , ⌃j )
i=1 j=1
✓ !
N
X J
X 1 ⇣ 1 ⌘◆
= log Pj exp (xi µj )> ⌃j 1 (xi µj )
i=1 j=1
(2⇡)D/2 |⌃j |1/2 2

19
Gaussian Mixture Model
Maximum Likelihood
The objective function:
✓X
J ◆
arg max LL + Pj 1
⇥ j=1
N J
! ✓X
J ◆
X X
arg max log Pj N (xi ; µj , ⌃j ) + Pj 1
⇥ i=1 j=1 j=1

Differentiation is difficult
Why?
• Because of the summation inside the logarithm

What should we do?


Jensen’s inequality
!
J
X Pj N (xi ; µj , ⌃j )U ij J
X ⇣ Pj N (xi ; µ , ⌃j ) ⌘
j
log U ij log
j=1
U ij j=1
U ij
What? Why?

20
Gaussian Mixture Model
Jensen’s Inequality
For example ✓
1 2

1 2
f x1 + x2 f (x1 ) + f (x2 )
3 3 3 3
f (x1 ) ⇥

⇥ f (x2 )

x1 1 2
x1 + x2
x2
3 3
✓P ◆ P
ax a f (xi )
For a concave function f : f Pi i Pi
ai ai
P P P
Or: f ai xi ai f (xi ) if ai = 1 and ai 0
2
Logarithmic functions are concave 0

ln p k
Why? 1 -2
0
log (x) =
x -4
1 0 5 10 15 20
log00 (x) = pk
x2

21
Gaussian Mixture Model
Expectation Maximization (EM)
Let’s get back to the ML problem for GMM
!
N
X J
X Pj N (xi ; µj , ⌃j )U ij X
LL = log U ij = 1 and U ij 0
i=1 j=1
U ij j
N XJ
!
X Pj N (xi ; µj , ⌃j )
U ij log Jensen's inequality
i=1 j=1
U ij
N X
J
!
X p(j|xi )p(xi ) p(xi |j)p(j) N (xi ; µj , ⌃j )Pj
= U ij log * p(j|xi ) = P =
i=1 j=1
U ij j p(xi |j)p(j) p(xi )
N X J
! J
X p(j|xi ) X
= U ij log + U ij log p(xi )
i=1 j=1
U ij j=1
N X
J
! N
X p(j|xi ) X
= U ij log + log p(xi ) = DKL U ij p(j|xi ) + log p(xi )
i=1 j=1
U ij
i=1

What if we fix all the other parameters and maximize LL w.r.t. U ij ?


When the KL divergence is minimal: U ij = p(j|xi )

This procedure is the E-step of the EM algorithm for GMM

22
Gaussian Mixture Model ⇣ ⌘
Expectation Maximization (EM) N (xi ; µj , ⌃j ) =
1
exp
1
(xi > 1
µj ) ⌃j (xi µj )
(2⇡)D/2 |⌃j |1/2 2
M-step
PJ
We find ⇥ that maximizes LL+ ( j=1 Pj 1)
<latexit sha1_base64="cJqTnKS5Shxu4mVhENpM7OMbCuI=">AAACQnicZZDLbhMxFIY95dISbgGWbCwipCJKMoMq0Q0oEl0gxCJIpK1Uh5HHcyZx4svIPlMRjeZ1+iRdsi0SrwArxJYFzjQLSo9k6/Pv8x/Zf1Yq6TGOv0cb167fuLm5datz+87de/e7Dx4eeFs5AWNhlXVHGfegpIExSlRwVDrgOlNwmC3eru4PT8B5ac0nXJYw0XxqZCEFxyCl3WHNvHCyRI9LBZQJrj6023OmwpScMwUFbjNf6bSev06az+/pKJ2/SJiT0xk+a9JuL+7HbdGrkKyhR9Y1Srs/WW5FpcGgUNz74yQucVJzh1IoaDqs8lByseBTOA5ouAY/qdufNvRpUHJaWBeWQdqq/zpqrr1f6myHBtAcZzs008G2Qn95NBZ7k1qaskIw4mJyUSmKlq5yorl0IFAtA/AQUHgcFTPuuMCQZoe1xnow9uE00NLMYSH1YN/ZMrNfBjkUfQ/YdEI6yf9ZXIWDl/0k7icfd3vDN+uctshj8oRsk4S8IkPyjozImAhySr6Sc/ItOot+RL+i3xetG9Ha84hcqujPX/PHsEU=</latexit>

N X
J
!
X Pj N (xi ; µj , ⌃j )
LL U ij log
i=1 j=1
U ij
XN X J N X
X J constant
= U ij log Pj N (xi ; µj , ⌃j ) U ij log U ij
i=1 j=1 i=1 j=1

Therefore the final objective function for the M-step is


XN X J ⇣X ⌘
arg max U ij log Pj N (xi ; µj , ⌃j ) + Pj 1
⇥ i=1 j=1 j

More specifically (since we’re going to do the partial differentiation)


1 XN ⇣1 1 ⌘
>
> 11
N (xi ; µj ,arg
⌃jmax
) = Jµ , Jµ = exp U ij (x xii µj ) ⌃⌃j (xii µj ) + const.
µj (2⇡)j D/2 |⌃jj |1/2i=1 2 2
X N X
arg max JPj , JPj = U ij log Pj + ( Pj 1) + const.
Pj i=1 j

1⇣ 1 ⇣ N
X 11 ⌘

1
N (x
arg max ;µ
Ji⌃ , , ⌃Jj⌃
) j== U ij exp
log |⌃j | (x
(xi µjj))>
µ >

⌃j (xi µj ) + const.
⌃j
j j (2⇡) D/2 |⌃ j | 1/2
2 22 i
i=1

23
Gaussian Mixture Model
Expectation Maximization (EM)
M-step
Partial differentiation w.r.t. the parameters and find the local maxima
• For the means:
X N PN XN ⇣ ⌘
@Jµj U ij x i
1 1 1 >
> 1
1
= U ij xi µj = 0, µj = Pi=1 N (xi ; µj , ⌃j )arg
= max JD/2
µj , Jµ 1/2
=exp U ij(xx
ii µ
µjj) ⌃ (xii µµj j )+ const.
⌃j (x
@µj N µ (2⇡) |⌃ j | j
2 2
i=1 U ij i=1
j
i=1
• For the priors: PN N
X X
@JPj X @JPj
=( Pj 1) = 0 i=1 U ij arg max JPj , J Pj = U ij log Pj + ( Pj 1) + const.
= + =0
@ j
@Pj Pj Pj i=1 j
J X
X N J
X
, U ij = Pj
j=1 i=1 j=1
J X
X N
, = U ij = N
j=1 i=1
PN
i=1 U ij
, Pj =
N
P
• For the covariance (see matrixcookbook 2.1.2 and 2.2): i U ij (xi µj )(xi µj )>
⌃j = P
i U ij

24
Gaussian Mixture Model
Expectation Maximization (EM)
E-step: calculate posterior probabilities
Pj N (xi ; µj , ⌃j )
U ij = p(j|xi ) = P
j Pj N (xi ; µj , ⌃j )

M-step: update parameters PN


U ij xi
µj = Pi=1
N
i=1 U ij
PN
i=1 U ij
Pj =
N
P
i U ij (xi µj )(xi µj )>
⌃j = P
i U ij

25
Gaussian Mixture Model
Too much math?

26
Mixture of Multinomial Distributions
Clustering Musical Notes
How many clusters? (samples are magnitude spectra)
I’m curious what kind of notes are there in the signal
8000

7000

6000

5000
Freq (Hz)

4000

3000

2000

1000

0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3 3.3 3.6 3.9 4.2 4.5
Time (sec)

27
Mixture of Multinomial Distributions
Clustering Musical Notes
For i-th spectrum xi :
Although the magnitudes are real numbers we can scale them to convert them into integers:
[1.2, 1.35, 0.1, 5.525] = 0.025 ⇥ [48, 54, 4, 221]
Then, we can think of this as an observation from a multinomial dist.
N ! Y xid
M(xi ; ✓) = Q ✓d
d xid ! d
EM for mixture of multinomial distributions
Initialize two mean spectra (random numbers) ✓ 1 , ✓ 2 2 RD
+
Initialize two prior prob (random numbers that sum to one) P1 + P2 = 1
Calculate posterior prob (E-step)
P1 M(xi ; ✓ 1 ) It’s usually fine not to convert the spectra
p(j = 1|xi ) = into integers, although it’s not strictly correct.
P1 M(xi ; ✓ 1 ) + P2 M(xi ; ✓ 2 ) In this example I just normalized each spectrum.
Q xid Q xid
P1 Q Nx!id ! d ✓1d P
d
Q 1 d ✓1dQ
= Q xid Q xid = xid xid
P1 Q Nx!id ! d ✓1d + P2 Q Nx!id ! d ✓2d P1 d ✓1d + P2 d ✓2d
d d
Update means (M-step)

28
Mixture of Multinomial Distributions
Clustering Musical Notes
It’s actually a difficult clustering task with a lot of spurious local minima
nfft=1024, hop=256 nfft=4096, hop=512
8000 8000

7000 7000

6000 6000

5000 5000
Freq (Hz)

Freq (Hz)
4000 4000

3000 3000

2000 2000

1000 1000

0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3 3.3 3.6 3.9 4.2 4.5 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3 3.3 3.6 3.9 4.2 4.5
Time (sec) Time (sec)

p(1|x ) 0.7 p(1|x ) 0.7


i i

0.6 0.6

0.5 0.5

0.4 0.4

1-p(1|xi) 0.3 1-p(1|xi) 0.3


0 50 100 150 200 250 0 50 100 150

29
Locality Sensitive Hashing
From clustering to hashing
Hashing is a popular concept in database
There is a query to the database
Instead of comparing the original representations, a hash function maps the query down to an integer (binary) address
The address is associated with a bucket. It can contain a few different database records
• We say that those records collide
Then, we refine the search inside the bucket
This is cheaper than seeing the entire database

Traditional challenges
Records are better off evenly distributed (for the speed)
Overflow

30
Locality Sensitive Hashing
From clustering to hashing
There’s another hashing concept in machine learning
Locality sensitive hashing or semantic hashing
For the data points xi and xj in a D-dimensional space
If they are close enough D(xi ||xj ) < ⌧
Then, the Hamming distance between them after hashing is zero H (xi )|| (xj ) = 0 with probability p
If they are far enough D(xi ||xj ) c⌧
Then, the Hamming distance between them after hashing is zero H (xi )|| (xj ) = 0 with probability q
p>q !
The hash function (xi ) that meets above conditions are said to be in the locality sensitive hash
function family
In other words…
Originally similar items collide in the same bucket
Share the same address
Quantized using the same binary string
Are in the same cluster

31
Locality Sensitive Hashing
From clustering to hashing
How do we find the hash function?
Well, it’s not easy

One way is to rely on a bunch of random projections


2 3 02 3 2 31
+1 +1 · · ·
1 1 x1
6 1 7 B6 +1 1 ··· +1 7 6 x2 7C
6 7 B6 7 6 7C
K 6 .. 7 = sign B6 .. .. .. .. 7·6 .. 7C
4 . 5 @4 . . . . 5 4 . 5A
1 +1 +1 · · · 1 xD
With K projections, we can represent D dimensional data with K bits
• e.g. 513 magnitude Fourier coefficients (real) with K=128 bits

Why does it work?


Johnson-Lindenstrauss Theorem

It’s also related to sparse coding and compressive sensing

32
8000

7000

Locality Sensitive Hashing


6000

5000

Freq (Hz)
4000

From clustering to hashing


3000

2000

1000

Let’s see how the original spectra are similar to each other 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
Time (sec)
2.7 3 3.3 3.6 3.9 4.2 4.5

Euclidean Distance Inner Product Cosine Distance

50 50 50

100 100 100

150 150 150

200 200 200

250 250 250

50 100 150 200 250 50 100 150 200 250 50 100 150 200 250

And in terms of Hamming distance after random projection


Hamming, K=32 Hamming, K=128 Hamming, K=512

50 50 50

100 100 100

150 150 150

200 200 200

250 250 250

50 100 150 200 250 50 100 150 200 250 50 100 150 200 250

33
Spectral Hashing
More machine learning involved
Just another hashing technique, but it tries to minimize the difference between

Original pairwise similarity Pairwise Hamming distance


X 8000

min W ij H(y i ||y j )


7000
ij

subject to: y i 2 { 1, +1}K 6000

X
yi = 0 5000

Freq (Hz)
i 4000

y>
i yj = 0 if i 6= j
3000

See the paper for the more 2000

optimization detail 1000

But, the basic idea is to see the problem


as an eigendecomposition problem 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3 3.3 3.6 3.9 4.2 4.5
Time (sec)

That’s why it’s called spectral hashing 1

-1
0 50 100 150 200 250

Weiss, Yair, Antonio Torralba, and Rob Fergus. "Spectral hashing." Advances in neural information processing systems. 2009.

34
Locality Sensitive Hashing
Why is it useful?
For faster detection
Matching hash codes x̃t (Xt )

Query (Q) ! q̃
DB of millions of items

Keyword spotting demo HMM versus Spatial-Temporal WTA Hashing


1
Clean query 1
Noisy query
Hit Rate (Hits / Total Positives)

Hit Rate (Hits / Total Positives)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
Hash; AUC=0.91 Hash; AUC=0.69
HMM; AUC=0.71 HMM; AUC=0.21
0 0
0 1 2 3 4 5 0 1 2 3 4 5
False Positive Rate (FP / min) False Positive Rate (FP / min)

35
Reading
Textbook 6.8 – 6.16
Textbook 2.5.5
Bishop, “Pattern Recognition and Machine Learning” Chapter 9

36
Thank You!

37

You might also like