Handwritten Digit Classification Using
ML Models (UCI Dataset)
1. Introduction
This project explores the classification of handwritten digits using traditional Machine
Learning models. The dataset used is the UCI Digits dataset, which consists of 8x8 images of
digits. The goal is to classify the digits (0-9) based on pixel intensity features using various
models and compare their performance.
2. Dataset Overview
The dataset comprises 1,797 samples with 64 features each, corresponding to 8x8 grayscale
images of handwritten digits. The task is to classify each image into one of the 10 digit
classes (0 to 9). The dataset was normalized and then split into training and testing sets.
• Dataset shape: (1797, 64)
• Number of classes: 10
• Training set: (1437, 64)
• Test set: (360, 64)
3. Models Evaluated
The following models were trained and evaluated:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Decision Tree
- Random Forest
4. Model Evaluation and Results
Logistic Regression
• Accuracy: 0.9361
• Weighted Precision: 0.9366
• Weighted Recall: 0.9361
• Weighted F1-score: 0.9353
• 5-fold Cross-Validation Accuracy: 0.9429 ± 0.0061
K-Nearest Neighbors
• Accuracy: 0.9861
• Weighted Precision: 0.9867
• Weighted Recall: 0.9861
• Weighted F1-score: 0.9861
• 5-fold Cross-Validation Accuracy: 0.9882 ± 0.0087
Support Vector Machine
• Accuracy: 0.9917
• Weighted Precision: 0.9920
• Weighted Recall: 0.9917
• Weighted F1-score: 0.9917
• 5-fold Cross-Validation Accuracy: 0.9882 ± 0.0052
Decision Tree
• Accuracy: 0.8333
• Weighted Precision: 0.8372
• Weighted Recall: 0.8333
• Weighted F1-score: 0.8335
• 5-fold Cross-Validation Accuracy: 0.8427 ± 0.0233
Random Forest
• Accuracy: 0.9611
• Weighted Precision: 0.9620
• Weighted Recall: 0.9611
• Weighted F1-score: 0.9609
• 5-fold Cross-Validation Accuracy: 0.9756 ± 0.0062
5. Model Comparison Summary
Model Accuracy Precision Recall F1-Score
Logistic 0.9361 0.9366 0.9361 0.9353
Regression
K-Nearest 0.9861 0.9867 0.9861 0.9861
Neighbors
Support Vector 0.9917 0.9920 0.9917 0.9917
Machine
Decision Tree 0.8333 0.8372 0.8333 0.8335
Random Forest 0.9611 0.9620 0.9611 0.9609
6. Conclusion
Among the evaluated models, the Support Vector Machine (SVM) performed the best in
terms of overall accuracy and weighted metrics, achieving an accuracy of 99.17%. K-Nearest
Neighbors also showed excellent performance, closely following SVM. Logistic Regression
and Random Forest models provided reliable results, while Decision Trees underperformed
relative to the others. This project demonstrates that even without deep learning,
traditional machine learning models can achieve high accuracy on well-structured datasets
like the UCI Digits dataset.