Let's assume we're part of an educational data team aiming to assess student performance, learning trends, and institutional effectiveness across subjects and demographics. Our goal is to derive insights that can guide curriculum changes, targeted support, and performance optimization.
- Clean and preprocess real-world student data
- Explore performance trends across different demographics
- Visualize the effects of test prep and education level
- Derive actionable insights for educators and policymakers
- Python (Pandas, NumPy)
- SQL (SQLite / PostgreSQL / MySQL)
- Matplotlib & Seaborn
- Jupyter Notebook
- GitHub
- Removed missing values and duplicates
- Standardized column names
- Checked for outliers and invalid entries
📁 File: student_performance_cleaned.csv
Performed detailed EDA using Pandas and created insightful visualizations using Matplotlib and Seaborn:
- Gender-based score comparison
- Score trends by parental education level
- Impact of lunch type on academic scores
- Test preparation course effectiveness
- Correlation analysis between math, reading, and writing
📁 File: student_perfrormace.ipynb
Using SQL queries on the cleaned data, we answered:
- Rank all students by their overall average score.
- Compare each student's math score with their group’s average (by gender).
- Top-scoring student in each parental education group.
- Most improved subject per student (relative to their lowest score).
- Gender-wise performance gap in each subject.
- Subject-wise percentile rank of students.
- Average score by test preparation and lunch status (cross-category comparison).
📁 File: studentd_performance_queries.sql
- Females outperform males in Reading and Writing.
- Males slightly outperform females in Math.
- Students who completed the test preparation course score significantly higher across all subjects.
- Group E performs the best on average across all subjects.
- Group A performs the lowest, suggesting potential educational inequality.
- Students with parents holding Master’s or Bachelor’s degrees have higher average scores.
- Educational background of parents appears to have a strong positive influence.
- Students with standard lunch consistently score higher than those with free/reduced lunch, possibly indicating socio-economic influence.
- Math Scores (t=5.38, p<0.001)
- Males scored significantly higher in Math than females (strong statistical evidence).
- The large positive t-statistic (5.38) means the male average was much higher.
- Data Cleaning & Preprocessing: Deepak
- SQL Queries & Business Insights: Niharika
- Visualization & EDA: Pranay
- Presentation, Github & Documentation: Niharika , Pranay and Deepak
📽 Watch here: Video Presentation Link
For questions, contact any of the team members via email or raise an issue in this repository.