Performance Metrics for Evaluation of Machine Learning Models
Introduction
Performance metrics are essential tools in machine learning to evaluate the effectiveness and accuracy of a
model.
Different metrics apply to different types of machine learning tasks such as classification and regression.
Classification Metrics
I. Classification Metrics
Used when the output variable is categorical (e.g., Yes/No, Spam/Not Spam). A confusion matrix is the
foundation for many classification metrics.
Confusion Matrix:
Predicted Yes Predicted No
Actual Yes True Positive False Negative
Actual No False Positive True Negative
1. Accuracy: (TP + TN) / (TP + TN + FP + FN)
- Measures overall correctness.
- Good for balanced datasets.
Performance Metrics for Evaluation of Machine Learning Models
2. Precision: TP / (TP + FP)
- Measures how many selected items are relevant.
- Important in spam detection.
3. Recall (Sensitivity): TP / (TP + FN)
- Measures how many relevant items are selected.
- Important in medical diagnoses.
4. F1 Score: 2 * (Precision * Recall) / (Precision + Recall)
- Harmonic mean of precision and recall.
- Balances false positives and false negatives.
5. ROC-AUC: Shows the trade-off between TPR and FPR.
- AUC close to 1 means a better model.
6. Specificity: TN / (TN + FP)
- Measures the proportion of actual negatives correctly identified.
7. Log Loss: Penalizes false classifications with confidence.
- Lower value indicates better performance.
Regression Metrics
II. Regression Metrics
Performance Metrics for Evaluation of Machine Learning Models
Used when the output variable is continuous (e.g., predicting price, temperature).
1. Mean Absolute Error (MAE): Average of absolute errors.
- Easy to interpret.
2. Mean Squared Error (MSE): Average of squared errors.
- Penalizes large errors.
3. Root Mean Squared Error (RMSE): Square root of MSE.
- Same units as target.
4. R² Score: 1 - (Sum of Squares Error / Total Sum of Squares)
- Explains how well the model captures variance.
Summary
Summary Table:
| Metric | Type | Good For |
|------------|----------------|----------------------------------|
| Accuracy | Classification | Balanced class distributions |
| Precision | Classification | Avoiding false positives |
| Recall | Classification | Avoiding false negatives |
| F1 Score | Classification | When precision & recall matter |
Performance Metrics for Evaluation of Machine Learning Models
| ROC-AUC | Classification | Evaluating classifiers |
| MAE | Regression | Simple average error |
| MSE/RMSE | Regression | Penalizing larger errors |
| R² Score | Regression | Explained variance |