Student Score Visualisation
ABSTRACT
The "Student Performance Analysis and Visualization" project aims to analyze and visualize
the academic performance of a group of students based on their math and science scores. The
project begins by generating simulated student data, including student names, unique math
scores, and unique science scores. These scores are then cleaned to remove missing values
and compute a total score.
The analysis encompasses various aspects of the dataset, including descriptive statistics,
subject-wise score comparisons using bar graphs, a scatter plot with a linear regression line to
explore the relationship between math and science scores, and a multiple linear regression
model to predict total scores based on math and science scores.
In addition to these analyses, the project offers a comprehensive visualization suite. It
includes individual bar graphs for each student's scores, a box plot to visualize score
distributions, a correlation matrix heatmap to examine relationships between variables, and a
residual plot to assess the assumptions of the regression model. Histograms and pairwise
scatterplots provide further insights into score distributions and relationships.
Furthermore, the project conducts ANOVA tests to investigate potential differences in math
and science scores based on student names, shedding light on any statistically significant
variations among students.
This project provides a comprehensive exploration of student performance data, offering
valuable insights and visualizations for educators and researchers interested in understanding
academic achievement patterns
Dept. Of CSE, DSATM 2022-2023 1
Student Score Visualisation
INTRODUCTION
In today's educational landscape, understanding student performance and academic
achievement is crucial for educators, administrators, and policymakers. Data analysis and
visualization play a pivotal role in gaining insights into student scores and performance
trends. The "Student Performance Analysis and Visualization" project serves as a
comprehensive exploration of academic data, providing a valuable toolkit for examining and
interpreting student performance.
This project begins by simulating student data, each with unique math and science scores,
mirroring the diversity of academic profiles encountered in real-world educational settings.
Through data cleaning and aggregation, we prepare the dataset for analysis, ensuring
accuracy and completeness.
PROGRAM CODE
Dept. Of CSE, DSATM 2022-2023 2
Student Score Visualisation
# Load required packages
library(dplyr)
library(ggplot2)
library(gridExtra) # For arranging plots
library(tidyr)
# Simulated student data with unique math and science scores
set.seed(123) # For reproducibility
student_data <- data.frame(
student_id = 1:100,
name = sample(c("Rohan", "Aisha", "Sanya", "Vikram", "Raj"), 100, replace = TRUE)
# Generate unique math and science scores for each student
student_data <- student_data %>%
group_by(name) %>%
mutate(
math_score = sample(40:95, n(), replace = FALSE),
science_score = sample(40:100, n(), replace = FALSE)
) %>%
ungroup()
# Data Cleaning
cleaned_data <- student_data %>%
Dept. Of CSE, DSATM 2022-2023 3
Student Score Visualisation
filter(!is.na(math_score) & !is.na(science_score)) %>%
mutate(total_score = math_score + science_score)
# Descriptive Statistics
summary_stats <- cleaned_data %>%
summarize(
avg_math = mean(math_score),
avg_science = mean(science_score),
avg_total = mean(total_score),
max_total = max(total_score)
print(summary_stats)
# Bar graph: Subject-wise scores comparison
bar_data <- cleaned_data %>%
gather(key = "subject", value = "score", math_score, science_score)
ggplot(bar_data, aes(x = subject, y = score, fill = subject)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Subject-wise Scores Comparison",
x = "Subject", y = "Score", fill = "Subject") +
theme_minimal()
# Scatter plot with linear regression line
Dept. Of CSE, DSATM 2022-2023 4
Student Score Visualisation
ggplot(cleaned_data, aes(x = math_score, y = science_score)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Math vs. Science Scores",
x = "Math Score", y = "Science Score")
# Multiple Linear Regression
multi_reg <- lm(total_score ~ math_score + science_score, data = cleaned_data)
# Summary of the multiple linear regression model
summary(multi_reg)
# Individual bar graphs for each student's scores
individual_plots <- list()
for (student_name in unique(student_data$name)) {
student_scores <- student_data %>%
filter(name == student_name) %>%
select(name, math_score, science_score) %>%
gather(key = "subject", value = "score", math_score, science_score)
individual_plots[[student_name]] <- ggplot(student_scores, aes(x = subject, y = score, fill =
subject)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = paste("Scores for", student_name),
x = "Subject", y = "Score", fill = "Subject") +
Dept. Of CSE, DSATM 2022-2023 5
Student Score Visualisation
theme_minimal()
# Combine all plots using grid.arrange
combined_plots <- grid.arrange(grobs = individual_plots, ncol = 2)
# Display the combined plots
print(combined_plots)
# Box plots for math and science scores
boxplot_data <- cleaned_data %>%
gather(key = "subject", value = "score", math_score, science_score)
ggplot(boxplot_data, aes(x = subject, y = score, fill = subject)) +
geom_boxplot() +
labs(title = "Box Plot of Math and Science Scores",
x = "Subject", y = "Score", fill = "Subject") +
theme_minimal()
# Correlation matrix heatmap
correlation_matrix <- cor(cleaned_data[, c("math_score", "science_score", "total_score")])
correlation_matrix_plot <- ggplot(data = as.data.frame(correlation_matrix),
aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
Dept. Of CSE, DSATM 2022-2023 6
Student Score Visualisation
scale_fill_gradient(low = "white", high = "blue") +
labs(title = "Correlation Matrix Heatmap",
x = "Variable 1", y = "Variable 2", fill = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Residual plot for multiple linear regression
residuals <- residuals(multi_reg)
ggplot(data = cleaned_data, aes(x = total_score, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, color = "red", linetype = "dashed") +
labs(title = "Residual Plot for Multiple Linear Regression",
x = "Total Score", y = "Residuals") +
theme_minimal()
# Histogram of total scores
ggplot(cleaned_data, aes(x = total_score)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black") +
labs(title = "Histogram of Total Scores",
x = "Total Score", y = "Frequency") +
theme_minimal()
# Pairwise scatterplots
pairs(cleaned_data[, c("math_score", "science_score", "total_score")])
Dept. Of CSE, DSATM 2022-2023 7
Student Score Visualisation
# ANOVA for math scores by student name
anova_math <- aov(math_score ~ name, data = cleaned_data)
summary(anova_math)
# ANOVA for science scores by student name
anova_science <- aov(science_score ~ name, data = cleaned_data)
summary(anova_science)
Flowchart:
Dept. Of CSE, DSATM 2022-2023 8
Student Score Visualisation
Dept. Of CSE, DSATM 2022-2023 9
Student Score Visualisation
LIST OF FIGURES:
Fig 1: Residual plot for multiple line regression
Fig 2: Box plot of science and maths score
Fig 3: Subject verses Score comparison
Fig 4: Student score visualisation
Fig 5: Maths vs Science score
Dept. Of CSE, DSATM 2022-2023 10
Student Score Visualisation
OUTPUT
Df Sum Sq Mean Sq F value Pr(>F)
name 4 427 106.9 0.342 0.849
Residuals 95 29690 312.5
Dept. Of CSE, DSATM 2022-2023 11
Student Score Visualisation
OUTPUT SCREENSHOTS
Fig 1
Dept. Of CSE, DSATM 2022-2023 12
Student Score Visualisation
Fig 2
Fig 3
Dept. Of CSE, DSATM 2022-2023 13
Student Score Visualisation
Fig 4
Fig 5
Dept. Of CSE, DSATM 2022-2023 14
Student Score Visualisation
CONCLUSION
In our project, we looked at how students are doing in their studies. We started by creating a
pretend group of students with different scores in math and science. We made sure our data
was clean and correct so that we could study it properly.
First, we found out some basic things about the scores, like the average (typical) scores and
the best scores.
Then, we used pictures and graphs to show the scores in math and science. This helped us see
which subjects students were better at and if they were related.
We also made a special math equation to guess how well a student would do in all their
subjects. This can help teachers and schools plan better.
We didn't just look at numbers. We also used pictures to show how scores are different for
each student. This can help us understand what's going on.
Lastly, we checked if there were big differences in scores based on students' names.
In summary, our project showed that using data and pictures can help us understand how
students are doing in school. This can help teachers and schools make better decisions to help
students succeed. We know there's more to explore, and this project is just the beginning.
Dept. Of CSE, DSATM 2022-2023 15
Student Score Visualisation
REFERENCES
1. Books
"R for Data Science" by Hadley Wickham and Garrett Grolemund.
"Shiny in Action" by Hadley Wickham.
"Data Visualization with ggplot2" by Hadley Wickham.
2. Online Tutorials and Documentation:
R Project's official website: https://www.r-project.org/
Shiny documentation: https://shiny.rstudio.com/
ggplot2 documentation: https://ggplot2.tidyverse.org/
RStudio's online learning resources: https://education.rstudio.com/learn/
3. Blogs and Websites:
R-bloggers: https://www.r-bloggers.com/
RStudio blog: https://blog.rstudio.com/
R Views: https://rviews.rstudio.com/
4. Forums and Q&A:
Stack Overflow's R tag: https://stackoverflow.com/questions/tagged/r
RStudio Community: https://community.rstudio.com/
5. GitHub Repositories and Projects:
Explore GitHub repositories related to Shiny applications and R data
visualization.
Dept. Of CSE, DSATM 2022-2023 16