0% found this document useful (0 votes)

7 views3 pages

Code Explanation

This document outlines a machine learning workflow using a Random Forest Classifier to predict students' intended programs based on their profiles. It includes steps for data import, preprocessing, encoding categorical variables, model training with hyperparameter tuning, evaluation of model accuracy, and visualization of feature importance. Additionally, it provides metrics such as precision, recall, and F1 score for a comprehensive model assessment.

Uploaded by

Christelle Martinez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Code Explanation

Uploaded by

Christelle Martinez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

1.

Import Required Libraries

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_score,
recall_score, f1_score
import matplotlib.pyplot as plt

Pandas: For handling data.

sklearn: For model training, preprocessing, evaluation, and tuning.

matplotlib: For visualizing feature importance.

2. Upload CSV File

from google.colab import files
uploaded = files.upload()

Opens a file upload widget in Colab to allow uploading the student_profiles.csv.

3. Load and Preprocess Data

df = pd.read_csv('student_profiles.csv')
df.ffill(inplace=True)

Loads the dataset into a DataFrame.

ffill(): Forward fills any missing data.

4. Convert Grades to Numerical Values

grade_mapping = {
"Below 60": 55,
"60-69": 64.5,
"70-79": 74.5,
"80-89": 84.5,
"90-100": 95
}
for col in ['Math Grade', 'Science Grade', 'English Grade']:
df[col] = df[col].map(grade_mapping)

Converts categorical grade ranges into numeric scores using a custom dictionary.
5. Encode Categorical Columns
label_encoders = {}
categorical_cols = ['Career Interest', 'Learning Style', 'Work
Preference', 'Intended Program']
for col in categorical_cols:
le = LabelEncoder()
df[col] = le.fit_transform(df[col])
label_encoders[col] = le

Converts categorical strings to numerical values (Label Encoding) for model

compatibility.

6. Split Data into Features and Target

X = df.drop('Intended Program', axis=1)
y = df['Intended Program']

X: Features (independent variables)

y: Target (dependent variable – the program students intend to take)

7. Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

Splits the data into 80% training and 20% testing subsets.

8. Random Forest with Grid Search

rf = RandomForestClassifier(random_state=42)

param_grid = {
'n_estimators': [100, 200],
'max_depth': [None, 10, 20],
'min_samples_split': [2, 5],
'min_samples_leaf': [1, 2]
}

grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=3,

n_jobs=-1, verbose=2)
grid_search.fit(X_train, y_train)

Initializes a Random Forest.

Defines a grid of hyperparameters.

Runs GridSearchCV to find the best combination using 3-fold cross-validation.

9. Evaluate Best Model

best_rf = grid_search.best_estimator_

y_pred = best_rf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

Extracts the best model.

Makes predictions on test data.

Prints the accuracy.

10. Feature Importance Visualization

importances = best_rf.feature_importances_
features = X.columns
importance_df = pd.DataFrame({'Feature': features, 'Importance':
importances}).sort_values(by='Importance', ascending=False)

plt.figure(figsize=(10, 6))
plt.barh(importance_df['Feature'], importance_df['Importance'])
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

Extracts feature importance from the model.

Visualizes which features had the most influence on predictions.

11. Additional Metrics

print(f"Tuned Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
print(f"Tuned Precision: {precision_score(y_test, y_pred,
average='weighted') * 100:.2f}%")
print(f"Tuned Recall: {recall_score(y_test, y_pred,
average='weighted') * 100:.2f}%")
print(f"Tuned F1 Score: {f1_score(y_test, y_pred, average='weighted')
* 100:.2f}%")

Evaluates the model further using precision, recall, and F1-score for weighted
(multi-class) classification.

2nd Summative Test Business Math
No ratings yet
2nd Summative Test Business Math
2 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
Cvs
100% (1)
Cvs
23 pages
MT Post Editing Guidelines
100% (1)
MT Post Editing Guidelines
42 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Record
No ratings yet
Record
22 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
ML Lab
No ratings yet
ML Lab
29 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
ML5&6&7&8&9&10
No ratings yet
ML5&6&7&8&9&10
35 pages
Program 4: Public
No ratings yet
Program 4: Public
10 pages
Medical Data ML
No ratings yet
Medical Data ML
6 pages
Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Selenium Python Guide
No ratings yet
Selenium Python Guide
75 pages
Machine Learning Project
No ratings yet
Machine Learning Project
29 pages
1 Amortized Analysis: Indian Institute of Information Technology Design and Manufacturing, Kancheepuram
No ratings yet
1 Amortized Analysis: Indian Institute of Information Technology Design and Manufacturing, Kancheepuram
6 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
37 pages
ML Lab Works
No ratings yet
ML Lab Works
14 pages
Python Final
No ratings yet
Python Final
19 pages
ML Cheat Sheet
No ratings yet
ML Cheat Sheet
7 pages
Greedy Solution To The Fractional Knapsack Prob
No ratings yet
Greedy Solution To The Fractional Knapsack Prob
3 pages
Phase 3.PDF Ramana
No ratings yet
Phase 3.PDF Ramana
17 pages
Robotics for Enthusiasts
No ratings yet
Robotics for Enthusiasts
5 pages
Assignment
No ratings yet
Assignment
5 pages
AML Code For m2
No ratings yet
AML Code For m2
7 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
27 KrishParasShah
No ratings yet
27 KrishParasShah
17 pages
Documentation
No ratings yet
Documentation
7 pages
Machine Learning Lab Assignment 1
No ratings yet
Machine Learning Lab Assignment 1
23 pages
ADS Expt5 BE9 29
No ratings yet
ADS Expt5 BE9 29
3 pages
21CSC305P ML - Lab Programs 1 - 9
No ratings yet
21CSC305P ML - Lab Programs 1 - 9
36 pages
Untitled Document
No ratings yet
Untitled Document
2 pages
Featureselection
No ratings yet
Featureselection
11 pages
23BCE7199 ML Lab Assignment
No ratings yet
23BCE7199 ML Lab Assignment
15 pages
Chapter 3 - 4 Solutions
100% (2)
Chapter 3 - 4 Solutions
15 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
22BCE7750 ML Assignment
No ratings yet
22BCE7750 ML Assignment
23 pages
Modelling and Simmulation Assignment - Ipynb - Colab
No ratings yet
Modelling and Simmulation Assignment - Ipynb - Colab
7 pages
Source Code
No ratings yet
Source Code
20 pages
tc5 Manual en PDF
No ratings yet
tc5 Manual en PDF
15 pages
Naive
No ratings yet
Naive
5 pages
Predict Student Passfail
No ratings yet
Predict Student Passfail
1 page
Python Essential Methods in Machine Learning
No ratings yet
Python Essential Methods in Machine Learning
6 pages
Amll
No ratings yet
Amll
1 page
Car Evaluation Data Analysis & Random Forest Model
No ratings yet
Car Evaluation Data Analysis & Random Forest Model
12 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Exp 5
No ratings yet
Exp 5
4 pages
Machine Learning Data Prep Guide
No ratings yet
Machine Learning Data Prep Guide
17 pages
QB 1
No ratings yet
QB 1
11 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
AIML Short Term Internship Session 10 Summary-1719293295226
No ratings yet
AIML Short Term Internship Session 10 Summary-1719293295226
3 pages
Practical 3 - Categorical Feature Engineering
No ratings yet
Practical 3 - Categorical Feature Engineering
6 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Lab 13
No ratings yet
Lab 13
5 pages
IT Professional Resume
100% (2)
IT Professional Resume
3 pages
Machine File
No ratings yet
Machine File
27 pages
Lab Week 7
No ratings yet
Lab Week 7
3 pages
Data Mining Presentation
No ratings yet
Data Mining Presentation
13 pages
Waqas Hussain PDF
No ratings yet
Waqas Hussain PDF
4 pages
Technical: Iso/Iec TR 13335-3
No ratings yet
Technical: Iso/Iec TR 13335-3
6 pages
7
No ratings yet
7
10 pages
Objects:: Xing, Alto and Waganr Represents Individual Objects. in This Context Each Car Object
No ratings yet
Objects:: Xing, Alto and Waganr Represents Individual Objects. in This Context Each Car Object
9 pages
Gtu Artificial Intelligence
No ratings yet
Gtu Artificial Intelligence
7 pages
Exam 1: Test Taking Instructions. No Calculators, Laptops or Other Assisting Devices Are Allowed. Write
No ratings yet
Exam 1: Test Taking Instructions. No Calculators, Laptops or Other Assisting Devices Are Allowed. Write
8 pages
What Is The Role of IIS ?
No ratings yet
What Is The Role of IIS ?
37 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
C01 Introduction To Media Access
No ratings yet
C01 Introduction To Media Access
15 pages
User Manual For Online E-Consignment Declaration (DTD 05/01/2010)
No ratings yet
User Manual For Online E-Consignment Declaration (DTD 05/01/2010)
8 pages
SP3361
No ratings yet
SP3361
2 pages
SQL Commands and Concepts Explained
No ratings yet
SQL Commands and Concepts Explained
4 pages
Strong Naming On Precompiled Assemblies
0% (1)
Strong Naming On Precompiled Assemblies
75 pages
Mohammed Rizwan Ali
No ratings yet
Mohammed Rizwan Ali
3 pages
Clean Sweep
No ratings yet
Clean Sweep
4 pages
Installation Log
No ratings yet
Installation Log
4 pages
Y8 Autumn Block 6 WO2 Find Probabilities From A Sample Space 2019
No ratings yet
Y8 Autumn Block 6 WO2 Find Probabilities From A Sample Space 2019
2 pages
Shake It-2
No ratings yet
Shake It-2
1 page
Electronic Artwork Guidelines
No ratings yet
Electronic Artwork Guidelines
9 pages
Chapter 85: Checking The Status of The Saposcol Before Performance Checking The Status of The Saposcol Before Performance Tuning Tuning
No ratings yet
Chapter 85: Checking The Status of The Saposcol Before Performance Checking The Status of The Saposcol Before Performance Tuning Tuning
4 pages

Code Explanation

Uploaded by

Code Explanation

Uploaded by

1.

Import Required Libraries

Pandas: For handling data.

sklearn: For model training, preprocessing, evaluation, and tuning.

matplotlib: For visualizing feature importance.

2. Upload CSV File

Opens a file upload widget in Colab to allow uploading the student_profiles.csv.

3. Load and Preprocess Data

Loads the dataset into a DataFrame.

ffill(): Forward fills any missing data.

4. Convert Grades to Numerical Values

Converts categorical strings to numerical values (Label Encoding) for model

6. Split Data into Features and Target

X: Features (independent variables)

y: Target (dependent variable – the program students intend to take)

8. Random Forest with Grid Search

grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=3,

Initializes a Random Forest.

Defines a grid of hyperparameters.

9. Evaluate Best Model

Extracts the best model.

Makes predictions on test data.

Prints the accuracy.

10. Feature Importance Visualization

Extracts feature importance from the model.

Visualizes which features had the most influence on predictions.

11. Additional Metrics

You might also like