Self-Organizing Map (SOM) - Theory
A Self-Organizing Map (SOM) is an unsupervised learning algorithm developed by Teuvo Kohonen.
It is used for clustering and visualization of high-dimensional data. The SOM maps input data into a
lower-dimensional (usually 2D) grid of neurons while preserving the topological properties of the
data.
Key Concepts:
1. Each neuron (node) has a weight vector of the same dimension as the input data.
2. The neurons are arranged in a 2D grid.
3. The training process adjusts the weights so that similar inputs activate nearby neurons.
Purpose:
- Data visualization
- Pattern recognition
- Dimensionality reduction
How SOM Works - Step-by-Step (Easy Explanation)
1. Initialization:
- Each neuron in the grid is assigned a random weight vector.
2. Training Phase (Iterative):
- Randomly pick an input vector from the dataset.
- Calculate the Euclidean distance between this input and all the neuron weights.
- The neuron with the smallest distance is called the Best Matching Unit (BMU).
- Determine the BMU's neighbors in the grid.
- Update the BMU and its neighbors' weights to move closer to the input vector.
3. Updating Weights:
- The learning rate and neighborhood radius shrink over time to fine-tune the map.
4. Repetition:
- The above process is repeated for many iterations to form an organized map.
Result:
- Similar input vectors are mapped to nearby neurons, forming clusters.
SOM Components and Terminologies
Input Vector (x):
- A data sample fed into the network for training.
Weight Vector (w_ij):
- A vector associated with each neuron, adjusted during training.
Best Matching Unit (BMU):
- The neuron whose weight is most similar to the input vector.
Neighborhood Function (h_ij,b):
- A function (often Gaussian) that defines how strongly the neighbors of the BMU are affected.
Learning Rate ():
- Controls how much the weights are adjusted in each step; it decreases over time.
Topological Distance:
- The distance between neurons on the grid.
Applications:
- Image compression
- Market segmentation
- Speech recognition
- Text mining
SOM helps to organize complex data into an understandable map without needing labels.