0% found this document useful (0 votes)

7 views3 pages

Code Explanation

This document outlines a machine learning workflow using a Random Forest Classifier to predict students' intended programs based on their profiles. It includes steps for data import, preprocessing, encoding categorical variables, model training with hyperparameter tuning, evaluation of model accuracy, and visualization of feature importance. Additionally, it provides metrics such as precision, recall, and F1 score for a comprehensive model assessment.

Uploaded by

Christelle Martinez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Code Explanation

Uploaded by

Christelle Martinez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

1.

Import Required Libraries

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_score,
recall_score, f1_score
import matplotlib.pyplot as plt

Pandas: For handling data.

sklearn: For model training, preprocessing, evaluation, and tuning.

matplotlib: For visualizing feature importance.

2. Upload CSV File

from google.colab import files
uploaded = files.upload()

Opens a file upload widget in Colab to allow uploading the student_profiles.csv.

3. Load and Preprocess Data

df = pd.read_csv('student_profiles.csv')
df.ffill(inplace=True)

Loads the dataset into a DataFrame.

ffill(): Forward fills any missing data.

4. Convert Grades to Numerical Values

grade_mapping = {
"Below 60": 55,
"60-69": 64.5,
"70-79": 74.5,
"80-89": 84.5,
"90-100": 95
}
for col in ['Math Grade', 'Science Grade', 'English Grade']:
df[col] = df[col].map(grade_mapping)

Converts categorical grade ranges into numeric scores using a custom dictionary.
5. Encode Categorical Columns
label_encoders = {}
categorical_cols = ['Career Interest', 'Learning Style', 'Work
Preference', 'Intended Program']
for col in categorical_cols:
le = LabelEncoder()
df[col] = le.fit_transform(df[col])
label_encoders[col] = le

Converts categorical strings to numerical values (Label Encoding) for model

compatibility.

6. Split Data into Features and Target

X = df.drop('Intended Program', axis=1)
y = df['Intended Program']

X: Features (independent variables)

y: Target (dependent variable – the program students intend to take)

7. Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

Splits the data into 80% training and 20% testing subsets.

8. Random Forest with Grid Search

rf = RandomForestClassifier(random_state=42)

param_grid = {
'n_estimators': [100, 200],
'max_depth': [None, 10, 20],
'min_samples_split': [2, 5],
'min_samples_leaf': [1, 2]
}

grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=3,

n_jobs=-1, verbose=2)
grid_search.fit(X_train, y_train)

Initializes a Random Forest.

Defines a grid of hyperparameters.

Runs GridSearchCV to find the best combination using 3-fold cross-validation.

9. Evaluate Best Model

best_rf = grid_search.best_estimator_

y_pred = best_rf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

Extracts the best model.

Makes predictions on test data.

Prints the accuracy.

10. Feature Importance Visualization

importances = best_rf.feature_importances_
features = X.columns
importance_df = pd.DataFrame({'Feature': features, 'Importance':
importances}).sort_values(by='Importance', ascending=False)

plt.figure(figsize=(10, 6))
plt.barh(importance_df['Feature'], importance_df['Importance'])
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

Extracts feature importance from the model.

Visualizes which features had the most influence on predictions.

11. Additional Metrics

print(f"Tuned Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
print(f"Tuned Precision: {precision_score(y_test, y_pred,
average='weighted') * 100:.2f}%")
print(f"Tuned Recall: {recall_score(y_test, y_pred,
average='weighted') * 100:.2f}%")
print(f"Tuned F1 Score: {f1_score(y_test, y_pred, average='weighted')
* 100:.2f}%")

Evaluates the model further using precision, recall, and F1-score for weighted
(multi-class) classification.

Nicholas Papayanis - Planning Paris Before Haussmann-The Johns Hopkins University Press (2004)
No ratings yet
Nicholas Papayanis - Planning Paris Before Haussmann-The Johns Hopkins University Press (2004)
346 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Documentation
No ratings yet
Documentation
7 pages
Machine Learning Project
No ratings yet
Machine Learning Project
29 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
Assignment
No ratings yet
Assignment
5 pages
AML Code For m2
No ratings yet
AML Code For m2
7 pages
Car Evaluation Data Analysis & Random Forest Model
No ratings yet
Car Evaluation Data Analysis & Random Forest Model
12 pages
Modelling and Simmulation Assignment - Ipynb - Colab
No ratings yet
Modelling and Simmulation Assignment - Ipynb - Colab
7 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Featureselection
No ratings yet
Featureselection
11 pages
Program 4: Public
No ratings yet
Program 4: Public
10 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Record
No ratings yet
Record
22 pages
ML Lab
No ratings yet
ML Lab
29 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
22BCE7750 ML Assignment
No ratings yet
22BCE7750 ML Assignment
23 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
Untitled Document
No ratings yet
Untitled Document
2 pages
23BCE7199 ML Lab Assignment
No ratings yet
23BCE7199 ML Lab Assignment
15 pages
27 KrishParasShah
No ratings yet
27 KrishParasShah
17 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
Lab Week 7
No ratings yet
Lab Week 7
3 pages
Medical Data ML
No ratings yet
Medical Data ML
6 pages
Machine Learning Lab Assignment 1
No ratings yet
Machine Learning Lab Assignment 1
23 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
Machine Learning Data Prep Guide
No ratings yet
Machine Learning Data Prep Guide
17 pages
Predict Student Passfail
No ratings yet
Predict Student Passfail
1 page
Python Final
No ratings yet
Python Final
19 pages
Naive
No ratings yet
Naive
5 pages
QB 1
No ratings yet
QB 1
11 pages
ADS Expt5 BE9 29
No ratings yet
ADS Expt5 BE9 29
3 pages
Exp 5
No ratings yet
Exp 5
4 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Lab 13
No ratings yet
Lab 13
5 pages
ML Lab Works
No ratings yet
ML Lab Works
14 pages
Practical 3 - Categorical Feature Engineering
No ratings yet
Practical 3 - Categorical Feature Engineering
6 pages
Data Mining Presentation
No ratings yet
Data Mining Presentation
13 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
Amll
No ratings yet
Amll
1 page
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
ML Cheat Sheet
No ratings yet
ML Cheat Sheet
7 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
21CSC305P ML - Lab Programs 1 - 9
No ratings yet
21CSC305P ML - Lab Programs 1 - 9
36 pages
Source Code
No ratings yet
Source Code
20 pages
Python Essential Methods in Machine Learning
No ratings yet
Python Essential Methods in Machine Learning
6 pages
Phase 3.PDF Ramana
No ratings yet
Phase 3.PDF Ramana
17 pages
AIML Short Term Internship Session 10 Summary-1719293295226
No ratings yet
AIML Short Term Internship Session 10 Summary-1719293295226
3 pages
ML5&6&7&8&9&10
No ratings yet
ML5&6&7&8&9&10
35 pages
Machine File
No ratings yet
Machine File
27 pages
HandsOn Solutions
No ratings yet
HandsOn Solutions
41 pages
Advanced NX Meshing Techniques
No ratings yet
Advanced NX Meshing Techniques
22 pages
Center 2025
No ratings yet
Center 2025
34 pages
Interspecies Reviewers - Ecstasy Days, Vol 1
No ratings yet
Interspecies Reviewers - Ecstasy Days, Vol 1
229 pages
SEO Basics: Search Engines & Optimization
No ratings yet
SEO Basics: Search Engines & Optimization
52 pages
Drop Box
No ratings yet
Drop Box
59 pages
7a Aming Pinahahayag (A) SATB A Cappella
No ratings yet
7a Aming Pinahahayag (A) SATB A Cappella
1 page
Transformers and Attention Mechanisms - Pre Quiz - Attempt Review
No ratings yet
Transformers and Attention Mechanisms - Pre Quiz - Attempt Review
5 pages
Cambridge Checkpoint Science Student's Book 1 Riley Peter Download
100% (2)
Cambridge Checkpoint Science Student's Book 1 Riley Peter Download
31 pages
Unit 3 - Listening - STUDENT
No ratings yet
Unit 3 - Listening - STUDENT
3 pages
Lesson Plan - Where Were You at
No ratings yet
Lesson Plan - Where Were You at
6 pages
Tutorial Letter 302/4/2024: Presenting Assignment Answers and Referencing
No ratings yet
Tutorial Letter 302/4/2024: Presenting Assignment Answers and Referencing
46 pages
About Illustrator Theory
No ratings yet
About Illustrator Theory
3 pages
Shi'Ur Qomah
No ratings yet
Shi'Ur Qomah
2 pages
Demonology 32893204
No ratings yet
Demonology 32893204
6 pages
REGULATION 2022R (Curriculam and Syllabus) UPDATED - 20.04.2024
No ratings yet
REGULATION 2022R (Curriculam and Syllabus) UPDATED - 20.04.2024
87 pages
01 Slurm14.3TrainingHands On
No ratings yet
01 Slurm14.3TrainingHands On
1 page
Insaaf by Amnah El Yaqoub - 3
No ratings yet
Insaaf by Amnah El Yaqoub - 3
209 pages
Electronics Engineer Portfolio
No ratings yet
Electronics Engineer Portfolio
1 page
Components of GIS (Praveen) AMREEN
No ratings yet
Components of GIS (Praveen) AMREEN
20 pages
12 Ip
No ratings yet
12 Ip
4 pages
JanakSutha 11geetham5
100% (3)
JanakSutha 11geetham5
2 pages
Exercise Grade 12
No ratings yet
Exercise Grade 12
7 pages
Have You Got Any Brothers or Sisters? Yes, I Have. / No, I Haven't
No ratings yet
Have You Got Any Brothers or Sisters? Yes, I Have. / No, I Haven't
2 pages
Install Arch Linux from Existing Linux
No ratings yet
Install Arch Linux from Existing Linux
6 pages
Task 2 - English Literature - Jeisson Fabian Murcia Martinez
No ratings yet
Task 2 - English Literature - Jeisson Fabian Murcia Martinez
11 pages
Percakapan Bahasa Inggris
No ratings yet
Percakapan Bahasa Inggris
10 pages
Future Continuous Tense Guide
0% (1)
Future Continuous Tense Guide
2 pages
Infinitives - Rule - and - Check - Answer Key
No ratings yet
Infinitives - Rule - and - Check - Answer Key
4 pages

Code Explanation

Uploaded by

Code Explanation

Uploaded by

1.

Import Required Libraries

Pandas: For handling data.

sklearn: For model training, preprocessing, evaluation, and tuning.

matplotlib: For visualizing feature importance.

2. Upload CSV File

Opens a file upload widget in Colab to allow uploading the student_profiles.csv.

3. Load and Preprocess Data

Loads the dataset into a DataFrame.

ffill(): Forward fills any missing data.

4. Convert Grades to Numerical Values

Converts categorical strings to numerical values (Label Encoding) for model

6. Split Data into Features and Target

X: Features (independent variables)

y: Target (dependent variable – the program students intend to take)

8. Random Forest with Grid Search

grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=3,

Initializes a Random Forest.

Defines a grid of hyperparameters.

9. Evaluate Best Model

Extracts the best model.

Makes predictions on test data.

Prints the accuracy.

10. Feature Importance Visualization

Extracts feature importance from the model.

Visualizes which features had the most influence on predictions.

11. Additional Metrics

You might also like