Student Performance Prediction - ML Project
Project Title
Student Performance Prediction using Machine Learning
Student Name
Akanksha Sharma
Enrollment Number
EN22CS301080
Introduction
Predicting student academic performance is essential in identifying students who may need additional
support. Using historical data like attendance, study time, and previous grades, machine learning can classify
students into categories such as 'Pass' or 'Fail'. This project uses supervised learning, specifically Logistic
Regression, to build a predictive model.
Objective
The main objective is to predict whether a student will pass or fail based on their performance-related
features using machine learning techniques.
Dataset Description
The dataset includes the following features:
- study_time: Hours spent studying per week
- absences: Number of school absences
- G1, G2: Previous period grades
- passed: Final result (yes/no)
Machine Learning Process
Student Performance Prediction - ML Project
1. Data Collection: Load student performance data
2. Data Preprocessing: Handle missing values, encode labels
3. Feature Selection: Select useful columns for training
4. Splitting Dataset: 80% train and 20% test
5. Model Training: Apply Logistic Regression
6. Evaluation: Use accuracy and confusion matrix
Algorithms Used
- Logistic Regression: A classification algorithm based on the sigmoid function.
- Metrics: Accuracy, Confusion Matrix, Precision, Recall
Python Code
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv("student_data.csv")
X = data.drop("passed", axis=1)
y = data["passed"]
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
Student Performance Prediction - ML Project
# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True)
plt.show()
Flowchart of the Process
See the flowchart below.
Confusion Matrix Graph
See the accuracy graph below.
Result
The model achieved high accuracy, successfully classifying students as pass or fail. The confusion matrix
showed good prediction capability.
Conclusion
Machine learning can provide early predictions of student performance, allowing educators to take timely
action. Logistic regression is a simple yet effective algorithm for such classification tasks. This project
demonstrates a complete machine learning workflow from data collection to prediction.
Student Performance Prediction - ML Project