L2 Regularization (Ridge
Regression)
Numerical Example
Given Data
• Features (X):
• x1 = [1, 2], x2 = [2, 3]
• Target (y): y = [3, 5]
• Model: y = w1 * x1 + w2 * x2
• Loss with L2: Loss = MSE + λ * sum(w_j²)
Step-by-Step Calculation
• Initial weights: w1 = 1.0, w2 = 1.0
• Regularization parameter: λ = 0.1
Step 1: Predictions
• ŷ1 = 1*1 + 1*2 = 3
• ŷ2 = 1*2 + 1*3 = 5
• MSE = ((3-3)^2 + (5-5)^2)/2 = 0
Step 2: Ridge Loss
• Loss = MSE + λ(w1² + w2²)
• Loss = 0 + 0.1(1² + 1²) = 0.2
Step 3: Larger Weights
• w1 = 5, w2 = -3
• ŷ1 = 5*1 + (-3)*2 = -1
• ŷ2 = 5*2 + (-3)*3 = 1
• MSE = ((3-(-1))² + (5-1)²)/2 = 16
• Penalty = 0.1(25 + 9) = 3.4
• Total Loss = 16 + 3.4 = 19.4
Key Insight
• Ridge Regression:
• - Keeps all features (no zero weights)
• - Penalizes large coefficients
• - Improves generalization by shrinking weights