0% found this document useful (0 votes)

13 views22 pages

Student Performance Analysis

The document contains a detailed analysis of a dataset named 'StudentsPerformance.csv', which includes information about students' demographics and their scores in math, reading, and writing. It includes data loading, descriptive statistics, and various visualizations to explore relationships between scores and factors like gender, lunch type, and parental education. The dataset consists of 1000 entries with 8 attributes, and the analysis highlights trends and distributions in student performance.

Uploaded by

informativetutor66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views22 pages

Student Performance Analysis

Uploaded by

informativetutor66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

import pandas as pd

# Load data
df = pd.read_csv("StudentsPerformance.csv")

# Preview
df.head()

gender race/ethnicity parental level of education lunch \

0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

df.tail(10)

gender race/ethnicity parental level of education

lunch \
990 male group E high school free/reduced

991 female group B some high school standard

992 female group D associate's degree free/reduced

993 female group D bachelor's degree free/reduced

994 male group A high school standard

995 female group E master's degree standard

996 male group C high school free/reduced

997 female group C high school free/reduced

998 female group D some college standard

999 female group D some college free/reduced

test preparation course math score reading score writing score

990 completed 86 81 75
991 completed 65 82 78

992 none 55 76 76

993 none 62 72 74

994 none 63 63 62

995 completed 88 99 95

996 none 62 55 55

997 completed 59 71 65

998 completed 68 78 77

999 none 77 86 86

df.shape

(1000, 8)

df.dtypes

gender object
race/ethnicity object
parental level of education object
lunch object
test preparation course object
math score int64
reading score int64
writing score int64
dtype: object

df['math score'].describe()

count 1000.00000
mean 66.08900
std 15.16308
min 0.00000
25% 57.00000
50% 66.00000
75% 77.00000
max 100.00000
Name: math score, dtype: float64

df['lunch'].describe()

count 1000
unique 2
top standard
freq 645
Name: lunch, dtype: object

df.describe()

math score reading score writing score

count 1000.00000 1000.000000 1000.000000
mean 66.08900 69.169000 68.054000
std 15.16308 14.600192 15.195657
min 0.00000 17.000000 10.000000
25% 57.00000 59.000000 57.750000
50% 66.00000 70.000000 69.000000
75% 77.00000 79.000000 79.000000
max 100.00000 100.000000 100.000000

df.iloc[100]

gender male
race/ethnicity group B
parental level of education some college
lunch standard
test preparation course none
math score 79
reading score 67
writing score 67
Name: 100, dtype: object

df.loc[:,"lunch"]

0 standard
1 standard
2 standard
3 free/reduced
4 standard
...
995 standard
996 free/reduced
997 free/reduced
998 standard
999 free/reduced
Name: lunch, Length: 1000, dtype: object

df.sort_values(by = "math score", ascending=False).head()

gender race/ethnicity parental level of education

lunch \
962 female group E associate's degree standard

458 female group E bachelor's degree standard

149 male group E associate's degree free/reduced

625 male group D some college standard

916 male group E bachelor's degree standard

test preparation course math score reading score writing score

962 none 100 100 100

458 none 100 100 100

149 completed 100 100 93

625 completed 100 97 99

916 completed 100 100 100

df["lunch"].head(10)

0 standard
1 standard
2 standard
3 free/reduced
4 standard
5 standard
6 standard
7 free/reduced
8 free/reduced
9 free/reduced
Name: lunch, dtype: object

df[df["math score"]==99]

gender race/ethnicity parental level of education lunch \

114 female group E bachelor's degree standard
263 female group E high school standard
306 male group E some college standard

test preparation course math score reading score writing score

114 completed 99 100 100

263 none 99 93 90

306 completed 99 87 81

Q1 = df['math score'].quantile(0.25)
Q3 = df['math score'].quantile(0.75)

IQR = Q3-Q1

print("The interquartile range is: ", IQR)

The interquartile range is: 20.0

df.isnull()

gender race/ethnicity parental level of education lunch \

0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False False False
.. ... ... ... ...
995 False False False False
996 False False False False
997 False False False False
998 False False False False
999 False False False False

test preparation course math score reading score writing score

0 False False False False

1 False False False False

2 False False False False

3 False False False False

4 False False False False

.. ... ... ... ...

995 False False False False

996 False False False False

997 False False False False

998 False False False False

999 False False False False

[1000 rows x 8 columns]

df.isnull().sum()
gender 0
race/ethnicity 0
parental level of education 0
test preparation course 0
math score 0
reading score 0
writing score 0
dtype: int64

df.count()

gender 1000
race/ethnicity 1000
parental level of education 1000
test preparation course 1000
math score 1000
reading score 1000
writing score 1000
dtype: int64

# remove all the rows that contain a missing value

df.dropna(inplace=True)

# Students with high math scores (above 90)

df[df["math score"] > 90]

gender race/ethnicity parental level of education \

34 male group E some college
104 male group C some college
114 female group E bachelor's degree
121 male group B associate's degree
149 male group E associate's degree
165 female group C bachelor's degree
171 male group E some high school
179 female group D some high school
233 male group E some high school
263 female group E high school
286 male group E associate's degree
306 male group E some college
451 female group E some college
458 female group E bachelor's degree
469 male group C some college
501 female group B associate's degree
503 female group E associate's degree
521 female group C associate's degree
539 male group A associate's degree
546 female group A some high school
562 male group C bachelor's degree
566 female group E bachelor's degree
571 male group A bachelor's degree
594 female group C bachelor's degree
612 male group C bachelor's degree
618 male group D master's degree
623 male group A some college
625 male group D some college
685 female group E master's degree
689 male group E some college
710 male group C some college
712 female group D some college
717 female group C associate's degree
719 male group E associate's degree
736 male group C associate's degree
779 male group E associate's degree
784 male group C bachelor's degree
815 male group B some high school
846 male group C master's degree
855 female group B bachelor's degree
864 male group C associate's degree
886 female group E associate's degree
903 female group D bachelor's degree
916 male group E bachelor's degree
919 male group B some college
934 male group C associate's degree
950 male group E high school
957 female group D master's degree
962 female group E associate's degree
979 female group C associate's degree

test preparation course math score reading score writing score

34 none 97 87 82

104 completed 98 86 90

114 completed 99 100 100

121 completed 91 89 92

149 completed 100 100 93

165 completed 96 100 100

171 none 94 88 78

179 completed 97 100 100

233 none 92 87 78

263 none 99 93 90
286 completed 97 82 88

306 completed 99 87 81

451 none 100 92 97

458 none 100 100 100

469 none 91 74 76

501 completed 94 87 92

503 completed 95 89 92

521 none 91 86 84

539 completed 97 92 86

546 completed 92 100 97

562 completed 96 90 92

566 completed 92 100 100

571 none 91 96 92

594 completed 92 100 99

612 completed 94 90 91

618 none 95 81 84

623 completed 100 96 86

625 completed 100 97 99

685 completed 94 99 100

689 none 93 90 83

710 completed 93 84 90

712 none 98 100 99

717 completed 96 96 99

719 completed 91 73 80

736 none 92 79 84

779 completed 94 85 82
784 completed 91 81 79

815 completed 94 86 87

846 completed 91 85 85

855 none 97 97 96

864 none 97 93 91

886 completed 93 100 95

903 completed 93 100 100

916 completed 100 100 100

919 completed 91 96 91

934 completed 98 87 90

950 none 94 73 71

957 none 92 100 100

962 none 100 100 100

979 none 91 95 94

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

%matplotlib inline
warnings.filterwarnings('ignore')
plt.rcParams["figure.figsize"] = [10, 5]

# Load data
df = pd.read_csv("StudentsPerformance.csv")

df.head()

gender race/ethnicity parental level of education lunch \

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

# Color palettes
sns.palplot(sns.color_palette("colorblind"))
plt.title("Color Palette: Colorblind")
plt.show()

sns.palplot(sns.color_palette("Reds"))
plt.title("Color Palette: Reds")
plt.show()

sns.histplot(df['math score'], kde=False)

plt.title("Distribution of Math Scores")
plt.show()
sns.distplot(df['reading score'], hist=False)
plt.title("KDE of Reading Scores")
plt.show()

plt.figure(figsize=(8, 8))
sns.distplot(df['writing score'])
plt.title("Distribution of Writing Scores")
plt.show()

plt.figure(figsize=(8, 8))
sns.scatterplot(x="math score", y="writing score", hue="gender",
data=df)
plt.title("Math vs Writing Scores by Gender")
plt.show()
# 5. Bar Plot: Average Writing Score by Gender and Lunch Type
plt.figure(figsize=(8, 8))
sns.barplot(x="gender", y="writing score", hue="lunch", data=df)
plt.title("Average Writing Score by Gender & Lunch")
plt.show()
# 6. Relplot – math vs reading
sns.relplot(x="math score", y="reading score", hue="gender",
style="gender", kind="scatter", data=df)
plt.title("Math vs Reading by Gender")
plt.show()
# 8. Lineplot – reading vs writing by gender
plt.figure(figsize=(7, 7))
sns.lineplot(x="reading score", y="writing score", hue="gender",
data=df)
plt.title("Reading vs Writing by Gender (Lineplot)")
plt.show()
# 10. Barplot – math score by test prep and gender
plt.figure(figsize=(7, 7))
sns.barplot(x="test preparation course", y="math score", hue="gender",
data=df)
plt.title("Math Score by Test Prep and Gender")
plt.show()
# 11. Boxplot – reading score by parental education
plt.figure(figsize=(12, 6))
sns.boxplot(x="parental level of education", y="reading score",
data=df)
plt.title("Reading Score by Parental Education")
plt.xticks(rotation=45)
plt.show()
# 12. Violin plot – writing score by lunch
plt.figure(figsize=(6, 6))
sns.violinplot(x="lunch", y="writing score", data=df)
plt.title("Writing Score by Lunch Type")
plt.show()
# 13. Boxplot – writing score by gender
sns.boxplot(x="gender", y="writing score", data=df)
plt.title("Writing Score by Gender")
plt.show()
plt.figure(figsize=(8, 6))
sns.boxplot(x="race/ethnicity", y="math score", data=df)
plt.title("Math Score by Race/Ethnicity Group")
plt.show()
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

from sklearn.preprocessing import LabelEncoder

# Make a copy to avoid errors if run twice

df = df.copy()

# List of categorical columns

cat_cols = ['gender', 'race/ethnicity', 'parental level of education',
'lunch', 'test preparation course']

# Apply Label Encoding to each categorical column

for col in cat_cols:
le = LabelEncoder()
df[col] = le.fit_transform(df[col])

# 🎯 Create target column

df['pass_math'] = (df['math score'] >= 50).astype(int)
# 🎯 Define features and target
X = df.drop(['math score', 'pass_math'], axis=1)
y = df['pass_math']

# ✂️ Train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=0)

# 🧠 Train Logistic Regression

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

LogisticRegression()

# ✅ Evaluate
from sklearn.metrics import accuracy_score, confusion_matrix
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc * 100:.2f}%")

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Accuracy: 93.50%
Confusion Matrix:
[[ 20 7]
[ 6 167]]

Hyundai Noida
100% (1)
Hyundai Noida
56 pages
Career Choice Influences Guide
100% (5)
Career Choice Influences Guide
2 pages
I222153 Lab03
No ratings yet
I222153 Lab03
28 pages
Practise
No ratings yet
Practise
9 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
E-Poster Clinical Project
No ratings yet
E-Poster Clinical Project
1 page
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
Lab 03 Numpy - Ipynb - Colab
No ratings yet
Lab 03 Numpy - Ipynb - Colab
15 pages
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
No ratings yet
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
21 pages
Exercise 3
No ratings yet
Exercise 3
59 pages
Codealpha Studentseda
No ratings yet
Codealpha Studentseda
2 pages
Pound Ezra The Cantos
100% (1)
Pound Ezra The Cantos
615 pages
Report Site Visit (Proton)
100% (1)
Report Site Visit (Proton)
43 pages
A09Ass02 - Jupyter Notebook
No ratings yet
A09Ass02 - Jupyter Notebook
11 pages
Jamboree
No ratings yet
Jamboree
17 pages
Webinar Practice
No ratings yet
Webinar Practice
418 pages
EDA Student
No ratings yet
EDA Student
8 pages
StudentsPerformance22 CSV
No ratings yet
StudentsPerformance22 CSV
21 pages
Copia de StudentsPerformance - EJEMPLO
No ratings yet
Copia de StudentsPerformance - EJEMPLO
44 pages
Excercise 1
No ratings yet
Excercise 1
44 pages
SAT Revyan F
No ratings yet
SAT Revyan F
6 pages
Madrid Vs Mapoy
No ratings yet
Madrid Vs Mapoy
2 pages
Assignment 2 DSBDA
No ratings yet
Assignment 2 DSBDA
12 pages
DSBDA Prac2
No ratings yet
DSBDA Prac2
2 pages
Exams CSV
No ratings yet
Exams CSV
21 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
16 pages
BFD 2303 Business Statistics 1
No ratings yet
BFD 2303 Business Statistics 1
4 pages
History of Tango
No ratings yet
History of Tango
22 pages
12th Activity 1
No ratings yet
12th Activity 1
6 pages
Students Performance
No ratings yet
Students Performance
49 pages
Difficult Employees
No ratings yet
Difficult Employees
3 pages
PMA Experiment 1
No ratings yet
PMA Experiment 1
9 pages
Prac 1 Feb
No ratings yet
Prac 1 Feb
22 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
1st Program
No ratings yet
1st Program
4 pages
Webinar Practice
No ratings yet
Webinar Practice
323 pages
Student Dropout
No ratings yet
Student Dropout
38 pages
Students Performance in An Admission Test
No ratings yet
Students Performance in An Admission Test
29 pages
Solution Manual For Statistics 3rd Edition Agresti Franklin 0321755944 9780321755940 PDF Download
No ratings yet
Solution Manual For Statistics 3rd Edition Agresti Franklin 0321755944 9780321755940 PDF Download
148 pages
Data Manipulation With Python Pandas 1700003764
No ratings yet
Data Manipulation With Python Pandas 1700003764
10 pages
Vani Ganapathy
No ratings yet
Vani Ganapathy
2 pages
Students Performance
No ratings yet
Students Performance
21 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Parts Manual Parts Manual Parts Manual Parts Manual: Mfg. No: 122Q02-0001-H1
No ratings yet
Parts Manual Parts Manual Parts Manual Parts Manual: Mfg. No: 122Q02-0001-H1
25 pages
Proccesing
No ratings yet
Proccesing
25 pages
Assignment-Data Preprocessing (All)
No ratings yet
Assignment-Data Preprocessing (All)
1 page
DW 14
No ratings yet
DW 14
14 pages
Study Performance
No ratings yet
Study Performance
4 pages
00 - Lesson - Data Science Workflow - Jupyter Notebook
No ratings yet
00 - Lesson - Data Science Workflow - Jupyter Notebook
6 pages
Students Performance
No ratings yet
Students Performance
46 pages
Students Performance
No ratings yet
Students Performance
17 pages
Study Performance
No ratings yet
Study Performance
21 pages
Ex 8
No ratings yet
Ex 8
3 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
Students Exam Scores Analysis - Ipynb
No ratings yet
Students Exam Scores Analysis - Ipynb
4 pages
Buurtzorg: Dutch Home Care Revolution
No ratings yet
Buurtzorg: Dutch Home Care Revolution
10 pages
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
No ratings yet
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
2 pages
Assignment College
No ratings yet
Assignment College
6 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Demographic and Performance Data
No ratings yet
Demographic and Performance Data
11 pages
Asatasdfs
No ratings yet
Asatasdfs
6 pages
Pubmed Microneedl Set
No ratings yet
Pubmed Microneedl Set
3 pages
Student Test Prep Impact Analysis
No ratings yet
Student Test Prep Impact Analysis
63 pages
Student Grade Prediction
No ratings yet
Student Grade Prediction
9 pages
BR SprayMaster
No ratings yet
BR SprayMaster
16 pages
Section 2 Lesson 1
No ratings yet
Section 2 Lesson 1
32 pages
Wolves Back Day
No ratings yet
Wolves Back Day
1 page
Judge Spinner Cases 01 01 2010 To 06 16 2012
No ratings yet
Judge Spinner Cases 01 01 2010 To 06 16 2012
75 pages
Regression Intuition I
No ratings yet
Regression Intuition I
8 pages
Original
No ratings yet
Original
48 pages
Additional Illustration 17
No ratings yet
Additional Illustration 17
2 pages
Creating Dummy Variables in Education Data
No ratings yet
Creating Dummy Variables in Education Data
6 pages
Service Manual: Finisher
No ratings yet
Service Manual: Finisher
235 pages
Promotion Form
No ratings yet
Promotion Form
2 pages
Anti-Sexual Harassment Guide
No ratings yet
Anti-Sexual Harassment Guide
14 pages
BUS 445 - Tutorial 3
No ratings yet
BUS 445 - Tutorial 3
8 pages
SoftTest03022023 0937
No ratings yet
SoftTest03022023 0937
5 pages
Internship Progress Report Vivek
No ratings yet
Internship Progress Report Vivek
10 pages
Lambda Functions & Alternative Methods in Python
No ratings yet
Lambda Functions & Alternative Methods in Python
8 pages
Walmart Display Makes and Models 2
No ratings yet
Walmart Display Makes and Models 2
1 page
1 - PEDDINSERT - Insertion Machines
No ratings yet
1 - PEDDINSERT - Insertion Machines
26 pages
Adv Funct Materials - 2024 - Haque - Heterogeneous Integration of High Endurance Ferroelectric and Piezoelectric Epitaxial
No ratings yet
Adv Funct Materials - 2024 - Haque - Heterogeneous Integration of High Endurance Ferroelectric and Piezoelectric Epitaxial
10 pages
Digital Natives Digital Immigrants - II
No ratings yet
Digital Natives Digital Immigrants - II
24 pages
S5 M1 Quiz 8 - Binomial and Poisson (II)
No ratings yet
S5 M1 Quiz 8 - Binomial and Poisson (II)
2 pages
Soal Ulangan Genap3
No ratings yet
Soal Ulangan Genap3
7 pages
The Neptune, Pool Layout Programme: Proo F
No ratings yet
The Neptune, Pool Layout Programme: Proo F
1 page
Alka Bhagat
No ratings yet
Alka Bhagat
2 pages