0% found this document useful (0 votes)

70 views7 pages

Lab 4 - Unsupervised Learning: K-Means Clustering

This document provides instructions for completing an unsupervised learning lab using k-means clustering and Kohonen's Self-Organizing Map (SOM) algorithm. It introduces the SOM Toolbox used for the lab and discusses representing data for the toolbox. Tasks include clustering two dimensional data using k-means, using k-means to schedule courses to minimize clashes, and applying Kohonen's algorithm to cluster animal data and explore using it for the Travelling Salesman Problem. Detailed instructions are provided on loading and preparing data for k-means and SOM algorithms in the toolbox.

Uploaded by

Faisal zafar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views7 pages

Lab 4 - Unsupervised Learning: K-Means Clustering

Uploaded by

Faisal zafar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lab 4 — Unsupervised

Learning

INTRODUCTION
In this lab you will experiment with unsupervised clustering algorithms. You will use
a simple k-means algorithm, and Kohonen’s algorithm for creating Self-Organizing
Feature Maps (SOFM). Both of these algorithms are used in finding clusters in data.
The first simply finds clusters; the second organizes the clusters onto a one or two
dimensional grid such that nearby points on the grid correspond to nearby clusters.
To do this lab, we will use the SOMToolbox available from http://www.cis.
hut.fi/projects/somtoolbox/ in Finland. Although the Matlab neural network
toolbox does have unsupervised clustering functions, past experience has shown that
they do not work very well. The SOMToolbox seems to one which is actually used and
works very well. Unfortunately, it uses complicated commands.
The toolbox is located in /opt/info/courses/NeuralNets/SOMToolbox with
the .m files located in the subdirectory mfiles and the documentation located in the
subdirectory docs. The main index to the documentation is in the file somtoolbox.html.
One problem with the SOMToolbox is that it represents data in a way with is the
transpose of the neural network toolbox representation. I.e the SOMToolbox represents
patterns as rows, whereas up to now we have been represented patterns as columns. The
transpose operator in Matlab is the apostrophe ’, so if data is a neural network toolbox
data matrix, data’ would be the data matrix for the SOMToolbox.

TASKS
K-means clustering
SOMTool box does not contain competitive learning algorithms, as far as I can tell. It
does contain k-means algorithms. Competitive learning is an on-line implementation
of k-means clustering and can be used to find clusters in data. There are two versions
of k-means. kmeans takes the data and the number of clusters, and returns the cluster
centers and the cluster for each data point. kmeans_clusters tries a variety of differ-
ent numbers of clusters and returns a heuristic which suggest a particular number of

1
clusters to be used.

1. There are two sets of two-dimensional datasets, called clusters1 and clusters2.
These are in the directory /opt/info/courses/NeuralNets/unsup. Train the system
to see the weights learn the clusters. (Train on one of these.)
2. Here is an example which shows how clustering can help you visualize properties
of data. You have to schedule eight courses. You have less than eight time-
slots, so there will be some clashes. You want to reduce the number of students
who cannot take all their courses due to clashes. Use the kmeans_clusters
algorithm to find clusters. How many time-slots will you need? What courses
should not be on the same day? Here is the list of students preferences:
course : Neural Vision Chaos Genetic A I Parallel Formal Manage

Nets Alg Proc Methods ment

Student
Alice y y y y
Adam y y y y
Carl y y y y
Ivan y y y y
Jacek y y y y
Jim y y y y
John y y y y
Maria y y y y
Mary y y y y
Richard y y y y
Roberto y y y y

3. Using the data from above, categorize student interest.

SOFM and Kohonen’s Algorithm

Apply Kohonen’s algorithm to the clustering data and observe the development of the
feature map.
Learning a Taxonomy: Kohonen’s algorithm is used to find clusters in data. Often
you can see clusters in the 2-d grid which you could not see in the high dimen-
sional data simply because you cannot look at data in more than 2 dimensions
(of course you can also see clusters which are spurious). To see how data can
be clustered, cluster the following animals: dove, hen, duck, goose, owl, hawk,
eagle, fox, dog, wolf, cat, tiger, lion, horse, zebra, and cow based on the fol-
low attributes: size:(small, medium,large); characteristics:(2 legs, 4 legs, hair,
hooves, mane, feathers); behaviour:(hunt, swim, fly, run) using Kohonen’s algo-
rithm. Does the algorithm organize the data in a useful way?
Travelling Salesman Problem: There are three data files, TSP1.dat, TSP2.dat, and
TSP3.dat. These consist of locations, each representing a city. The task is to

2
make a tour which visits every city once and returns to the start. Can you think
of a way of using Kohonen’s algorithm to find a good tour?

Resources and methods — Details of the SOMToolbox

K-means clustering
Task 1 :
Load the data using load clusters1;. You can plot the data using plot(clusters1(1,:),clusters1(2,:
You will clearly see that the data is clustered.
The kmeans function works like this
[c,p,err]=kmeans(’seq’,clusters1’,3);
The first input can be either ’seq’ or ’batch’ for sequential or batch updates,
the second input is the data (note the transpose operator) and the third is the
number of clusters. The outputs are the location of the cluster centers, the
cluster number for each data point, and the error (a measure of the distance
between the cluster centers and the data associated with the cluster). You
can see the centers on the data via,
hold on
plot(c(:,1),c(:,2),’r*’);
Another thing to do is to color the data by cluster. This can be done using
a function I created for AdaBoost visualization. Do the following,
close % close the current graph to draw a new graph.
draw_weighted_data(clusters1,ones(1,500),p);
The 500 is the number of data points. This function was written for super-
vised learning and boosting. It uses a different symbol for each target, and
colors to denote the weight (or probability) of that data point. By calling
it with all 1’s as the targets, all data gets the same symbol, and the color
denotes the cluster (which is stored in p).
The kmeans_clusters algorithm works like this,
[c,p,err,ind]=kmeans_clusters(clusters1’);
This tries 1 cluster, 2 clusters, and so on up to some large number of clus-
ters. It returns c and p which are cell arrays to the cluster centers and
assignments of data to clusters respectively. So c{4} gives the centers us-
ing 4 clusters, and p{4} is the assignment of the data to those 4 clusters. It
also returns err which is the error of each clustering; this will decrease as
the number of clusters increases. The final quantity is ind. This is a heuris-
tic quantity which is a measure of cluster quality. The lower this value the
better the clustering. Thus, a command such as
[temp,bestk]=min(ind);

3
will return in bestk the suggested number of clusters which gives the best
clustering.
Task 2: No additional commands are required to do this. You have to create the data
yourself.

SOFM and Kohonen’s algorithm

Data representation: The data is usually represented as a structure. This is not neces-
sary; normal numerical matrices are accepted by most functions. The structure
includes field names and data classes, although these labels are optional. As
an example, look at the file ’iris.data’ in the SOMToolbox/mfiles directory. It
looks like this,

4
#n SepalL SepalW PetalL PetalW
5.1 3.5 1.4 0.2 Setosa
4.9 3.0 1.4 0.2 Setosa
4.7 3.2 1.3 0.2 Setosa
4.6 3.1 1.5 0.2 Setosa
5.0 3.6 1.4 0.2 Setosa
.
.
.
5.3 3.7 1.5 0.2 Setosa
5.0 3.3 1.4 0.2 Setosa
7.0 3.2 4.7 1.4 Versicolor
6.4 3.2 4.5 1.5 Versicolor
6.9 3.1 4.9 1.5 Versicolor
5.5 2.3 4.0 1.3 Versicolor
6.5 2.8 4.6 1.5 Versicolor
.
.
.

The number at the top tells the number of attributes, the next line tells the name
of the attributes, and following that are data point per line followed by a name
of that data point. This data is of four different types of iris flowers (setosa,
versicolor, etc.) based on sepal length, sepal width, petal length and petal width.
To read this in, use

sD=som_read_data(’iris.data’);

where sD is the name of the structure where the data is stored.

You can also construct the data from within Matlab. Here is an example of that.
Load clusters2 into Matlab. This data has four clusters, where the first 25

4
points are in one cluster, the next are in another, etc. You can construct a data
structure as follows,

% Make data structure and add names of attributes.

sD=som_data_struct(clusters2’,’comp_names’,{’x’,’y’});

% Now add labels to the data.

sD=som_label(sD,’add’,[1:25],’LowerLeft’);
sD=som_label(sD,’add’,[26:50],’LowerRight’);
sD=som_label(sD,’add’,[51:75],’UpperRight’);
sD=som_label(sD,’add’,[76:100],’UpperLeft’);

I emphasize that the labels are optional, although you will find them useful for
visualization.
Creating a SOFM: The basic command is

sM=som_make(sD);

This creates and trains a self-organizing feature map which is stored in the struc-
ture sM. The number of units and learning parameters are chosen by the algo-
rithm. They can be controlled by given more inputs to this function. An impor-
tant field of this structure is sM.codebook which contains the weights, which are
the centers of the clusters represented by each node.
You can control the size of the map using a command of this form,

sM=som_make(sD,’msize’,[3,4]);

to make a 3 4 SOFM. For a complete list of inputs, type help som_make. You
can make a map without training it like this,

sM=som_map_struct(2,’msize’,[10 1],’rect’,’toroid’);

which makes a SOM with 2 inputs and a 10 1 SOFM. The neighborhood is rect-
angular and the global geometry is toroid, which means that the grid is wrapped
around itself like a doughnut. A map can be trained using

sM=som_seqtrain(sM,sD,’trainlen’,5);

which trains network sM on data sD for 5 training epochs.

I have also written a function to watch the SOM train. This only works with 2-d
data, because it plots the data and the weights of the SOM. The command is

sM=som_viz_train(10,sD,sM);

5
The number is the number of training epochs.
SOM visualization: The SOMToolbox has myriad visualization functions. The basic
command is som_show. However, I do not find the default very useful. Here are
some useful commands,
To view the SOFM with labels showing which data is associated with each
node to the following (after training, of course):
% This transfers the labels from sD onto the SOFM
sM=som_autolabel(sM,sD,’vote’);

% This draws the SOFM keeping the cells empty, and titles it ’Lables’
som_show(sM,’empty’,’Labels’)

% This puts the labels on.

som_show_add(’label’,sM)
The first command labels those nodes which are more like a data pattern
than any other node (so not all nodes will be labelled). The parameter ’vote’
means that if more than one pattern causes that node to fire, they will vote
on the label. Other possibilities are ’freq’ which takes the most frequent
one, and ’add’ which puts all labels on.
You might want to label the nodes with the pattern which is most like it.
This way, all nodes will be labelled, even those which never are the winner.
A command which is useful is som_bmus which finds the best matching
units for the patterns. It is called like this, units=som_bmus(sM,sD);.
To get the patterns which best match the neurons, reverse the order of the
arguments.

% One way to label the nodes

vec=som_bmus(sD,sM);
sM.labels=sD.labels(vec);
som_show_add(’label’,sM);
For 2-d data, it is useful to plot the grid. This can be done as follows,
som_grid(sM,’coord’,sM.codebook); , or more prettily,
S=som_grid(sM1,’coord’,sM1.codebook,’Label’,sM1.labels,’labelcolor’,’k’);
Example: Here is an example illustrating the above using the cluster2 data.

load cluster2;
% Make data structure and add names of attributes.
sD=som_data_struct(clusters2’,’comp_names’,{’x’,’y’});

6
% Now add labels to the data.
sD=som_label(sD,’add’,[1:25],’LowerLeft’);
sD=som_label(sD,’add’,[26:50],’LowerRight’);
sD=som_label(sD,’add’,[51:75],’UpperRight’);
sD=som_label(sD,’add’,[76:100],’UpperLeft’);

% Let us watch a network train

sM=som_viz_train(20,sD,sM);

% We can view the map

sM=som_autolabel(sM,sD,’vote’);

% This draws the SOFM keeping the cells empty, and titles it ’Lables’
som_show(sM,’empty’,’Labels’)

% This puts the labels on.

som_show_add(’label’,sM)

% Here is something very beautiful. God knows what it shows.

som_show(sM1,’umati’,’all’,’compi’,’all’,’empty’,’Labels’)
som_show_add(’label’,sM1,’subplot’,4)

Grade 3 English FAL Term3 Weeks 1 To 10
No ratings yet
Grade 3 English FAL Term3 Weeks 1 To 10
18 pages
Bic08 w10
No ratings yet
Bic08 w10
23 pages
MLT Unit 1 Vaishali
No ratings yet
MLT Unit 1 Vaishali
44 pages
An Introduction To Children With Language Disorders - 5th Edition ISBN 0133827097, 9780133827095 Full Text
No ratings yet
An Introduction To Children With Language Disorders - 5th Edition ISBN 0133827097, 9780133827095 Full Text
14 pages
Guide To Getting Started With FemDom-Lilly
No ratings yet
Guide To Getting Started With FemDom-Lilly
6 pages
The Begum S Millions 1st Edition Jules Verne Download
No ratings yet
The Begum S Millions 1st Edition Jules Verne Download
59 pages
Module 3
No ratings yet
Module 3
21 pages
(Reading Certificate) Egemen Türedi 16 Oct 2025
No ratings yet
(Reading Certificate) Egemen Türedi 16 Oct 2025
2 pages
最新英语电影评论
100% (2)
最新英语电影评论
7 pages
X SST Summer Holidays Homework
No ratings yet
X SST Summer Holidays Homework
4 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
9 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
Kmeans
No ratings yet
Kmeans
18 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
13 pages
Lect 7
No ratings yet
Lect 7
24 pages
Ucchista Ganapati Mantra Guide
No ratings yet
Ucchista Ganapati Mantra Guide
19 pages
Unec 1734186881
No ratings yet
Unec 1734186881
50 pages
ML 8
No ratings yet
ML 8
12 pages
SOM Unit
No ratings yet
SOM Unit
44 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Part 1 Speaking
100% (1)
Part 1 Speaking
7 pages
UNIT 3 and 4
No ratings yet
UNIT 3 and 4
23 pages
Aryan Babar 2023200009 NNFL Exp 7
No ratings yet
Aryan Babar 2023200009 NNFL Exp 7
6 pages
ML Unit 3
No ratings yet
ML Unit 3
24 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
AppliedML Chap1 Clustering
No ratings yet
AppliedML Chap1 Clustering
37 pages
Glorious Things of Thee Are Spoken
No ratings yet
Glorious Things of Thee Are Spoken
1 page
MLP U4
No ratings yet
MLP U4
11 pages
Ca2 Pe-Ec702c
No ratings yet
Ca2 Pe-Ec702c
8 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
French Days of The Week
0% (1)
French Days of The Week
2 pages
Soft Organizing Maps
No ratings yet
Soft Organizing Maps
13 pages
Self-Organizing Maps
No ratings yet
Self-Organizing Maps
12 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
9 Som
No ratings yet
9 Som
32 pages
Self-Organizing Maps for Dimensionality Reduction
No ratings yet
Self-Organizing Maps for Dimensionality Reduction
46 pages
Adjective Practice Worksheet
No ratings yet
Adjective Practice Worksheet
2 pages
Tut 8
No ratings yet
Tut 8
1 page
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Unsupervised Learning: Clustering & Anomaly Detection
No ratings yet
Unsupervised Learning: Clustering & Anomaly Detection
50 pages
BWM100 Moc 15 1 e
No ratings yet
BWM100 Moc 15 1 e
201 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
ZTE UMTS Cell Selection and Reselection: - UMTS Radio Network Planning & Optimization Dept
No ratings yet
ZTE UMTS Cell Selection and Reselection: - UMTS Radio Network Planning & Optimization Dept
83 pages
Som MJJ
No ratings yet
Som MJJ
29 pages
ICT in ELT: Pros and Cons Analysis
No ratings yet
ICT in ELT: Pros and Cons Analysis
10 pages
DSE Lab Assignment - Writeup - 7
No ratings yet
DSE Lab Assignment - Writeup - 7
4 pages
(Balasko, Dkk. 2007) Fuzzy Clustering
No ratings yet
(Balasko, Dkk. 2007) Fuzzy Clustering
77 pages
High-Lift Propulsive Airfoil With Integrated Crossflow Fan
No ratings yet
High-Lift Propulsive Airfoil With Integrated Crossflow Fan
10 pages
In The Supreme Court of Bangladesh (High Court Division) : Md. Imman Ali and Sk. Hassan Arif, JJ
No ratings yet
In The Supreme Court of Bangladesh (High Court Division) : Md. Imman Ali and Sk. Hassan Arif, JJ
14 pages
Sentence Opening Sheet for Revision
No ratings yet
Sentence Opening Sheet for Revision
4 pages
Moyet-Quiles V Colomba-Rivera - Document No. 4
No ratings yet
Moyet-Quiles V Colomba-Rivera - Document No. 4
1 page
Fantasy Companion Character Sheet Fillable
No ratings yet
Fantasy Companion Character Sheet Fillable
1 page
The Present Simple and Present Continuous in Engli Activities Promoting Classroom Dynamics Group Form - 94392
No ratings yet
The Present Simple and Present Continuous in Engli Activities Promoting Classroom Dynamics Group Form - 94392
2 pages
Self Organizing Map
No ratings yet
Self Organizing Map
4 pages
Assemblage Sculpture
No ratings yet
Assemblage Sculpture
21 pages
Stephen Hawking: His Life and Vestige
No ratings yet
Stephen Hawking: His Life and Vestige
23 pages
Workplace Childcare (Organization Psychology)
No ratings yet
Workplace Childcare (Organization Psychology)
19 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
Unsupervised Machine Learning in Python
100% (2)
Unsupervised Machine Learning in Python
89 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
Module 3
100% (1)
Module 3
79 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
Architecture Books Overview 2016
No ratings yet
Architecture Books Overview 2016
8 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
16 pages
1 Subject: Implementing The Kohonen's SOM (Self Organizing Map) Algorithm With Tanagra
No ratings yet
1 Subject: Implementing The Kohonen's SOM (Self Organizing Map) Algorithm With Tanagra
14 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Intro to Exploratory Data Analysis
No ratings yet
Intro to Exploratory Data Analysis
17 pages
Hindu Medieval Salvation Islamic Sufism: Bhakti Movement
No ratings yet
Hindu Medieval Salvation Islamic Sufism: Bhakti Movement
1 page
Heidegger and The Question of Daseins Being-A-Whole
No ratings yet
Heidegger and The Question of Daseins Being-A-Whole
4 pages
NNC Lust
No ratings yet
NNC Lust
38 pages
Machine Learning Section3 Ebook PDF
No ratings yet
Machine Learning Section3 Ebook PDF
15 pages
Approaches To Acting.1
No ratings yet
Approaches To Acting.1
8 pages
Symbolic Literacy in Modern Society
No ratings yet
Symbolic Literacy in Modern Society
6 pages
Ult SCH 94 Benchmark
No ratings yet
Ult SCH 94 Benchmark
14 pages
Fuzzy Clustering Toolbox
No ratings yet
Fuzzy Clustering Toolbox
77 pages
Clustering Technique: Mohammad Ali Joneidi
No ratings yet
Clustering Technique: Mohammad Ali Joneidi
3 pages
MLCH9
No ratings yet
MLCH9
45 pages
A Comparative Study of Fuzzy Logic With Artificial Neural Networks Algorithms in Clustering
No ratings yet
A Comparative Study of Fuzzy Logic With Artificial Neural Networks Algorithms in Clustering
3 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
12 pages

Lab 4 - Unsupervised Learning: K-Means Clustering

Uploaded by

Lab 4 - Unsupervised Learning: K-Means Clustering

Uploaded by

Lab 4 — Unsupervised

Nets Alg Proc Methods ment

3. Using the data from above, categorize student interest.

SOFM and Kohonen’s Algorithm

Resources and methods — Details of the SOMToolbox

SOFM and Kohonen’s algorithm

where sD is the name of the structure where the data is stored.

% Make data structure and add names of attributes.

% Now add labels to the data.

which trains network sM on data sD for 5 training epochs.

% This puts the labels on.

% One way to label the nodes

% Let us watch a network train

% We can view the map

% This puts the labels on.

% Here is something very beautiful. God knows what it shows.

You might also like