0% found this document useful (0 votes)

16 views38 pages

Lecture UnsupervisedML - SOM

The document discusses Self-Organizing Maps (SOM), an unsupervised machine learning method that enables dimensional reduction and visualization of complex datasets, particularly useful in material informatics. It outlines the SOM algorithm, its advantages over other methods like K-Means and PCA, and how it can be combined with these methods for enhanced data analysis. Additionally, it introduces various implementations of SOM, including augmented SOMPY and MiniSOM, and highlights their applications in materials research.

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views38 pages

Lecture UnsupervisedML - SOM

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Unsupervised ML SOM

and how to choose a

Data Science method
Quick Review of the methods we learned

> Statistical analysis

> Supervised ML
– Linear regression
– NN,
– KNN,
– Decision Tree
– SVM
> Unsupervised ML
– K-Means Clustering
– PCA
– ….. Why another one,
Why another method, SOM

> Demonstrate some data science methods that are not

widely used or well known, but also can be very useful
for material informatic study
> Introduce a method I have used, and feel is adequate
to the uniqueness of many materials study
applications.
> Demonstrate how various data science methods can
be used together to drive improved results
> Demonstrate a few projects using the same methods
so that we can understand a methods from user point
of view
What is Self-Organizing Map (SOM)

> An Unsupervised ML method

> Dimensional reduction, enabling powerful
visualizations of the data:
– K-Means does clustering, but neither dimensionality
reduction nor visualization
– PCA does dimensionality reduction, enabling visualization to
certain level (not applicable if the first 3 principal
components won’t represent the data well), however, it
does not perform clustering. Besides, the visualization does
not keep the original topographic information.
> Give some insights into how data is clustered in high
dimensions
What is SOM

> You can think of SOM as an artificial neural network

with a single neuronal layer, whose neurons are
arranged in a two-dimensional matrix.
– The 2D matrix can been seen as a position map that
captures the characteristics of the data
> Merits of SOM
– Effective in training big datasets
– Since this is a 2D matrix, visualization of the resulting map
is possible
– kept the topography of the original data,
– Possible to present the Euclidean distance between data
points
Algorithm of SOM
– Normalization of the input data, all features will be distributed more
balancely
– Initialization: each (x,y) position in the map is assigned a weight for each
input neuron, thus associating a weight vector for each map position.

– Iteration:
> Choose a sample from dataset
> Calculate Euclidean distance between that sample and each weight vector
> The (x,y) position ”closest” to the sample is declared the Best Matching Unit

> The weights vector for the BMU get adjusted to more closely match the sample.
Amount of adjustment (learning) decreases as we go through iterations

> The weights vector for neighbors of the BMU also get adjusted, to a lesser extent.
The number of neighbors and how much they get adjusted also depends on
hyperparameters and the number of iterations.

– Convergence:
> Max number of iterations
> Monitoring of topological error
– Reference: https://link.springer.com/article/10.1007/BF00337288
Self-Organizing Map (SOM)

How does it work?

𝑎!
𝑏!
𝑐!
𝑥! = 𝑑
!
𝑒!
𝑓!
Two Dimensional Mesh structure

Each connection can deform

a
1 11 12

b 6 10
f
2
4 8
3
9
5 7
c
e
d
a
1 11 12

b 6 10
f
2
4 8
3
9
5 7
c
e
d
Self-Organizing Map (SOM) Algorithm

> Dragging Nodes

> “Flattening a crumpled paper”
U-matrix and how to use it to get insights for
clustering

> After training, the nodes in

the 2D key map are not
evenly distributed. The
adjacent data point might
not be similar to each
other in the higher
dimension space.
> U-matrix use the concept
of the heatmap to
illustrate the distance in
Euclidean space
Using SOM in conjunction with other methods

> Since this is a dimensionality

reduction method, for smaller
dataset, you can initialize your
SOM map using the first 2
Principal components,
essentially the 2D PCA map
> K-means can also be run on the
same dataset, and
corresponding clusters can be
visualized on SOM map.
K-Means clustering and U-Matrix
They can be compared to validate the results!

> SOM can provide a means to visualize K-Means!

> If the boundary matches well, then the training is
successful
Different Implementations of SOM

> SOM is just an algorithm, there are many

packages you can use that implement it
> We will introduce
– An augmented version of SOMPY, a version our group has
contributions on
– MiniSOM
The uniqueness and functions of augmented
SOMPY

https://github.com/DataScienceUWMSE/SOM

> Utilizes PCA for initialization, and include K-Means

Clustering overlay
> “Heat maps” provide a way to visualize each
feature after training
> Projection function helps users find additional
correlations or patterns among features,
including for categorical data
“heatmap” concept

> Map each node’s

weight onto the 2D
map
> Number of heat maps
equals to number of
input variables
Example of utilizing the
heatmap on materials research
Example 1 Granta Data Set: Experimental Commercial Materials
Property Dataset
> Training data set
contains 398 commercial
materials and 21
numerical properties
Example of utilizing the heatmap on materials
research
Example 1 Granta Data Set: Experimental Commercial Materials
Property Dataset (continue)
Project information concept

> Overlay one specific data

property onto SOM, can
use even categorical
values
> Easily identify patterns
Example of utilizing the project function on
materials research

Example 1 Granta Data Set: Experimental Commercial Materials

Property Dataset (continue), finding the outliers’ uniqueness
Example of utilizing the projection function on
materials research

Example 2 OPV materials study using an experimental dataset

Reference Y.Huang, J. Phys. Chem. C 2020, 124, 12871−12882

> Dataset includes 1203 donor

polymers of Donor-Acceptor
pairs, with properties
related to the proficiency of
the charge transfer.
Molecular Descriptors

Python package of Molecular Descriptor

> There are Python tools to extract molecular

structural or geometrical information from
notation of molecule, such as SMILES (Simplified
molecular-input line-entry system)
> We will introduce Mordred, (covered in the Hands-
on session)
The advantage of using MiniSOM

> SOMPY is not as easy to use as the other packages

introduced in this class.
– The Augmented SOMPY has contribution from a few
Materials Science researchers in our group, including
your TA Jimin, Qian
> MiniSOM is relatively easier to use, well
documented and constantly maintained, and
have the basic implementation of the SOM
algorithm
What MiniSOM provides

> It has :
– The core implementation of SOM
– Visualization
– U-Matrix (“distance map” in MiniSOM)
– Project certain feature onto SOM

> Doesn’t have:

– PCA initialization
– Cannot generate heatmap for each features
– K-Means clustering,
Hyperparameters of SOM

> Length of input vectors (the number of properties)

> Map size, the most important one
> Map topology – rectangular or hexagonal
– Important in defining the notion of “neighbors”
> Sigma – spread of the neighborhood function
> Learning Rate – initial learning rate, decreases with the
number of iterations
> Decay function – defines how much learning rate and sigma
decrease with the number of iterations
> Neighborhood function – defines how much neighbors of
the BMU get impacted at each iteration (eg gaussian,
bubble,…)
> Activation distance function (eg Euclidean distance)
> Initialization method – random or PCA
Hands-on session and HW for this week

AI Introduction by Ahmed Banafa
100% (1)
AI Introduction by Ahmed Banafa
76 pages
Beginners Guide To Anomaly Detection Using Self Organizing Maps
No ratings yet
Beginners Guide To Anomaly Detection Using Self Organizing Maps
10 pages
Unit 4 5 NN
No ratings yet
Unit 4 5 NN
15 pages
Soft Organizing Maps
No ratings yet
Soft Organizing Maps
13 pages
Self-Organizing Maps for Dimensionality Reduction
No ratings yet
Self-Organizing Maps for Dimensionality Reduction
46 pages
Kohonen Self Organizing Feature Map Algorithm
No ratings yet
Kohonen Self Organizing Feature Map Algorithm
2 pages
Self-Organizing Maps
No ratings yet
Self-Organizing Maps
12 pages
A Hybrid Parallel SOM Algorithm For Large Maps in Data-Mining
No ratings yet
A Hybrid Parallel SOM Algorithm For Large Maps in Data-Mining
11 pages
RIO Marie Mai 2018 HAL
No ratings yet
RIO Marie Mai 2018 HAL
21 pages
8-Som With E-Miner
No ratings yet
8-Som With E-Miner
8 pages
Self-Organizing Maps (SOM) : Dr. Saed Sayad
No ratings yet
Self-Organizing Maps (SOM) : Dr. Saed Sayad
14 pages
Bic08 w10
No ratings yet
Bic08 w10
23 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
8 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Self Organizing Map
No ratings yet
Self Organizing Map
4 pages
Self-Organizing Map Demystified - Peter Leow
No ratings yet
Self-Organizing Map Demystified - Peter Leow
71 pages
Clustering Technique: Mohammad Ali Joneidi
No ratings yet
Clustering Technique: Mohammad Ali Joneidi
3 pages
Self-Organizing Maps (SOM)
No ratings yet
Self-Organizing Maps (SOM)
22 pages
Kohonen Som Presentation
No ratings yet
Kohonen Som Presentation
10 pages
A Distributed Approach For Supervised Som and Application To Facies Classification
No ratings yet
A Distributed Approach For Supervised Som and Application To Facies Classification
6 pages
Kohonen Self Organizing Maps
No ratings yet
Kohonen Self Organizing Maps
36 pages
SOM Algorithm Theory and Explanation
No ratings yet
SOM Algorithm Theory and Explanation
3 pages
SOM Unit
No ratings yet
SOM Unit
44 pages
Kohonen Maps for Data Clustering
No ratings yet
Kohonen Maps for Data Clustering
15 pages
Discovering Intra Day Price Patterns Using Som
No ratings yet
Discovering Intra Day Price Patterns Using Som
7 pages
Cartogram Representations of Self-Organizing Virtual Geographies
No ratings yet
Cartogram Representations of Self-Organizing Virtual Geographies
107 pages
Artificial Neural Network Unsupervised Learning: U.S. Congress Synapse
No ratings yet
Artificial Neural Network Unsupervised Learning: U.S. Congress Synapse
2 pages
Self Organizing Map
No ratings yet
Self Organizing Map
9 pages
Unsupervised Learning Handout
No ratings yet
Unsupervised Learning Handout
43 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
31 pages
Kohonen SOM
No ratings yet
Kohonen SOM
4 pages
Som MJJ
No ratings yet
Som MJJ
29 pages
Introduction To Self Organizing Feature Maps
No ratings yet
Introduction To Self Organizing Feature Maps
6 pages
Barbara Hammer and Alexander Hasenfuss - Topographic Mapping of Large Dissimilarity Data Sets
No ratings yet
Barbara Hammer and Alexander Hasenfuss - Topographic Mapping of Large Dissimilarity Data Sets
58 pages
Unit 4 NN
No ratings yet
Unit 4 NN
8 pages
Self-Organizing Maps: Unsupervised Learning and Clustering
No ratings yet
Self-Organizing Maps: Unsupervised Learning and Clustering
22 pages
Self-Organizing Map: Machine Learning Data Mining
No ratings yet
Self-Organizing Map: Machine Learning Data Mining
10 pages
Ult SCH 94 Benchmark
No ratings yet
Ult SCH 94 Benchmark
14 pages
Ann 3
No ratings yet
Ann 3
49 pages
Book Chapter14 SOM
No ratings yet
Book Chapter14 SOM
23 pages
CR 1341
No ratings yet
CR 1341
4 pages
Physic A A 2004
No ratings yet
Physic A A 2004
9 pages
Intro to Self-Organizing Maps
No ratings yet
Intro to Self-Organizing Maps
14 pages
Kohonen Self-Organizing Feature Map (SOM)
No ratings yet
Kohonen Self-Organizing Feature Map (SOM)
19 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
45 pages
Self Organizing Map Thesis
100% (2)
Self Organizing Map Thesis
4 pages
Self Organizing Maps (SOM)
No ratings yet
Self Organizing Maps (SOM)
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
13 pages
Learning of Alphabets Using Kohonen's Self Organized Featured Map
No ratings yet
Learning of Alphabets Using Kohonen's Self Organized Featured Map
5 pages
SOM Algorithm Aimad
No ratings yet
SOM Algorithm Aimad
4 pages
Intro to Unsupervised Neural Networks
No ratings yet
Intro to Unsupervised Neural Networks
14 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
Adaptive Neural Map Clustering
No ratings yet
Adaptive Neural Map Clustering
34 pages
Mid 2 NN
No ratings yet
Mid 2 NN
14 pages
Sou Men Neural Network
No ratings yet
Sou Men Neural Network
6 pages
Research Paper (2) DIYA
No ratings yet
Research Paper (2) DIYA
12 pages
ANN-unit 4
No ratings yet
ANN-unit 4
25 pages
Self-Organizing Map Implementation - CodeProject
No ratings yet
Self-Organizing Map Implementation - CodeProject
14 pages
Self-Organizing Maps Explained
No ratings yet
Self-Organizing Maps Explained
39 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
28 pages
HW 1
No ratings yet
HW 1
11 pages
hw0 22au
No ratings yet
hw0 22au
5 pages
HW 4
No ratings yet
HW 4
13 pages
HW 2
No ratings yet
HW 2
10 pages
White Board 1-12
No ratings yet
White Board 1-12
20 pages
White Board 1-10
No ratings yet
White Board 1-10
19 pages
HW 5
No ratings yet
HW 5
3 pages
More Detailed Content of The Course
No ratings yet
More Detailed Content of The Course
4 pages
Homework 2: Directions, Reminders and Policies
No ratings yet
Homework 2: Directions, Reminders and Policies
2 pages
Linear Regression and Logistic Regression
No ratings yet
Linear Regression and Logistic Regression
19 pages
Custom GPTs - A Comprehensive Guide
No ratings yet
Custom GPTs - A Comprehensive Guide
9 pages
Aruna
No ratings yet
Aruna
21 pages
15 Trends Changing The Face of The Beauty Industry in 2020 - CB Insights Research PDF
100% (1)
15 Trends Changing The Face of The Beauty Industry in 2020 - CB Insights Research PDF
50 pages
Virtual Humans Today and Tomorrow
No ratings yet
Virtual Humans Today and Tomorrow
78 pages
2023TOU30321
No ratings yet
2023TOU30321
201 pages
Cotton Disease Detection via Deep Learning
No ratings yet
Cotton Disease Detection via Deep Learning
19 pages
Face Mask Detection by Using Convolutional Neural Network 2
No ratings yet
Face Mask Detection by Using Convolutional Neural Network 2
26 pages
Deep Learning for Stroke Detection
No ratings yet
Deep Learning for Stroke Detection
47 pages
Sequential Sentence Classification in Research Papers Using Cross-Domain Multi-Task Learning
No ratings yet
Sequential Sentence Classification in Research Papers Using Cross-Domain Multi-Task Learning
24 pages
AI & ML in Finance: Bibliometric Review
No ratings yet
AI & ML in Finance: Bibliometric Review
14 pages
AI Vision Tech: Megvii's Market Impact
No ratings yet
AI Vision Tech: Megvii's Market Impact
21 pages
Pepar 1
No ratings yet
Pepar 1
13 pages
Gisw2019 Web Ethiopia
No ratings yet
Gisw2019 Web Ethiopia
8 pages
Sathyabama: Register Number
No ratings yet
Sathyabama: Register Number
2 pages
Limitation of Ai in Agriculture: Managing Risk
No ratings yet
Limitation of Ai in Agriculture: Managing Risk
2 pages
A Machine Learning Based Classification Model To Support University Students With Dyslexia With Personalized Tools and Strategies
No ratings yet
A Machine Learning Based Classification Model To Support University Students With Dyslexia With Personalized Tools and Strategies
12 pages
DrReham Lecture 2 Intelligent Agent
No ratings yet
DrReham Lecture 2 Intelligent Agent
30 pages
GenAI 360 Degrees
No ratings yet
GenAI 360 Degrees
39 pages
Insurance 4.0: Benefits and Challenges of Digital Transformation 1st Ed. Edition Bernardo Nicoletti Digital Version 2025
No ratings yet
Insurance 4.0: Benefits and Challenges of Digital Transformation 1st Ed. Edition Bernardo Nicoletti Digital Version 2025
118 pages
2025 (4) xx23 xxx23
No ratings yet
2025 (4) xx23 xxx23
13 pages
AI Office Webinar
No ratings yet
AI Office Webinar
25 pages
Kitabu Investor Pitch - 03 - 07 - 2023
No ratings yet
Kitabu Investor Pitch - 03 - 07 - 2023
22 pages
Tianjic A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation
No ratings yet
Tianjic A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation
19 pages
Hybrid Startup Presentation Template: Combining Pitching and Detailed Presentation
No ratings yet
Hybrid Startup Presentation Template: Combining Pitching and Detailed Presentation
13 pages
Consumer Trust and Perceived Risk For Voice-Controlled Artificial Intelligence - The Case of Siri
No ratings yet
Consumer Trust and Perceived Risk For Voice-Controlled Artificial Intelligence - The Case of Siri
7 pages
CRM - Midterm Exam - Abdul Nafi - Batch - 17B - Roll No - 82
No ratings yet
CRM - Midterm Exam - Abdul Nafi - Batch - 17B - Roll No - 82
8 pages
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
No ratings yet
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
47 pages
Generative Ai Primer
No ratings yet
Generative Ai Primer
4 pages

Lecture UnsupervisedML - SOM

Uploaded by

Lecture UnsupervisedML - SOM

Uploaded by

Unsupervised ML SOM

and how to choose a

> Statistical analysis

> Demonstrate some data science methods that are not

> An Unsupervised ML method

> You can think of SOM as an artificial neural network

How does it work?

Each connection can deform

> Dragging Nodes

> After training, the nodes in

> Since this is a dimensionality

> SOM can provide a means to visualize K-Means!

> SOM is just an algorithm, there are many

> Utilizes PCA for initialization, and include K-Means

> Map each node’s

> Overlay one specific data

Example 1 Granta Data Set: Experimental Commercial Materials

Example 2 OPV materials study using an experimental dataset

> Dataset includes 1203 donor

Python package of Molecular Descriptor

> There are Python tools to extract molecular

> SOMPY is not as easy to use as the other packages

> Doesn’t have:

> Length of input vectors (the number of properties)

You might also like