SVM – Notes
SVM = Support Vector Machine
used for classification (also works for regression but not common)
idea: find a line (or hyperplane) that separates classes the best
how it works:
finds the “best” boundary between classes
best = the one with maximum margin
→ margin = distance between boundary and nearest points from each class
→ these close points = support vectors
why margin matters?
larger margin = better generalization
less chance of overfitting
linear SVM:
works when data is linearly separable (can be split by straight line)
if not, use kernel trick to map data to higher dimension
kernels:
help to deal with non-linear data
common kernels:
→ linear
→ polynomial
→ RBF (radial basis function) = Gaussian
→ sigmoid
kernel lets us draw curve instead of line without explicitly adding more features
example:
data: height and weight
goal: classify male/female
→ linear SVM draws line that separates the groups
→ if it’s not clean, RBF kernel can help draw a curve
C parameter:
controls trade-off between margin size and classification error
→ high C = less margin, try to classify everything right (risk of overfit)
→ low C = more margin, allow some errors (more general)
pros:
works well in high dimensions
effective when classes are clearly separated
uses only support vectors (not whole data)
cons:
slow on large datasets
hard to choose right kernel + parameters
doesn’t give probability (just class)
used in:
image classification
bioinformatics
handwriting recognition
face detection
reminders:
always scale data before using SVM (like between 0 and 1)
not best for big datasets but good when features are many
try different kernels to see what works best
extra note:
if data is not separable even with kernel, maybe try soft margin or just use
different model (like Random Forest or XGBoost)