Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
55 views16 pages

Academic Performance Insights

The document describes a student score visualization project that aims to analyze and visualize academic performance data. It generates simulated student data with unique math and science scores. These scores are then cleaned and analyzed using descriptive statistics, graphs, regression models, and ANOVA tests to explore patterns and relationships in student performance. The project provides visualizations of individual and grouped student scores to offer insights for educators.

Uploaded by

Sajan Hegde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views16 pages

Academic Performance Insights

The document describes a student score visualization project that aims to analyze and visualize academic performance data. It generates simulated student data with unique math and science scores. These scores are then cleaned and analyzed using descriptive statistics, graphs, regression models, and ANOVA tests to explore patterns and relationships in student performance. The project provides visualizations of individual and grouped student scores to offer insights for educators.

Uploaded by

Sajan Hegde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Student Score Visualisation

ABSTRACT
The "Student Performance Analysis and Visualization" project aims to analyze and visualize
the academic performance of a group of students based on their math and science scores. The
project begins by generating simulated student data, including student names, unique math
scores, and unique science scores. These scores are then cleaned to remove missing values
and compute a total score.

The analysis encompasses various aspects of the dataset, including descriptive statistics,
subject-wise score comparisons using bar graphs, a scatter plot with a linear regression line to
explore the relationship between math and science scores, and a multiple linear regression
model to predict total scores based on math and science scores.

In addition to these analyses, the project offers a comprehensive visualization suite. It


includes individual bar graphs for each student's scores, a box plot to visualize score
distributions, a correlation matrix heatmap to examine relationships between variables, and a
residual plot to assess the assumptions of the regression model. Histograms and pairwise
scatterplots provide further insights into score distributions and relationships.

Furthermore, the project conducts ANOVA tests to investigate potential differences in math
and science scores based on student names, shedding light on any statistically significant
variations among students.

This project provides a comprehensive exploration of student performance data, offering


valuable insights and visualizations for educators and researchers interested in understanding
academic achievement patterns

Dept. Of CSE, DSATM 2022-2023 1


Student Score Visualisation

INTRODUCTION

In today's educational landscape, understanding student performance and academic


achievement is crucial for educators, administrators, and policymakers. Data analysis and
visualization play a pivotal role in gaining insights into student scores and performance
trends. The "Student Performance Analysis and Visualization" project serves as a
comprehensive exploration of academic data, providing a valuable toolkit for examining and
interpreting student performance.

This project begins by simulating student data, each with unique math and science scores,
mirroring the diversity of academic profiles encountered in real-world educational settings.
Through data cleaning and aggregation, we prepare the dataset for analysis, ensuring
accuracy and completeness.

PROGRAM CODE

Dept. Of CSE, DSATM 2022-2023 2


Student Score Visualisation

# Load required packages

library(dplyr)

library(ggplot2)

library(gridExtra) # For arranging plots

library(tidyr)

# Simulated student data with unique math and science scores

set.seed(123) # For reproducibility

student_data <- data.frame(

student_id = 1:100,

name = sample(c("Rohan", "Aisha", "Sanya", "Vikram", "Raj"), 100, replace = TRUE)

# Generate unique math and science scores for each student

student_data <- student_data %>%

group_by(name) %>%

mutate(

math_score = sample(40:95, n(), replace = FALSE),

science_score = sample(40:100, n(), replace = FALSE)

) %>%

ungroup()

# Data Cleaning

cleaned_data <- student_data %>%

Dept. Of CSE, DSATM 2022-2023 3


Student Score Visualisation

filter(!is.na(math_score) & !is.na(science_score)) %>%

mutate(total_score = math_score + science_score)

# Descriptive Statistics

summary_stats <- cleaned_data %>%

summarize(

avg_math = mean(math_score),

avg_science = mean(science_score),

avg_total = mean(total_score),

max_total = max(total_score)

print(summary_stats)

# Bar graph: Subject-wise scores comparison

bar_data <- cleaned_data %>%

gather(key = "subject", value = "score", math_score, science_score)

ggplot(bar_data, aes(x = subject, y = score, fill = subject)) +

geom_bar(stat = "identity", position = "dodge") +

labs(title = "Subject-wise Scores Comparison",

x = "Subject", y = "Score", fill = "Subject") +

theme_minimal()

# Scatter plot with linear regression line

Dept. Of CSE, DSATM 2022-2023 4


Student Score Visualisation

ggplot(cleaned_data, aes(x = math_score, y = science_score)) +

geom_point() +

geom_smooth(method = "lm", se = FALSE, color = "blue") +

labs(title = "Math vs. Science Scores",

x = "Math Score", y = "Science Score")

# Multiple Linear Regression

multi_reg <- lm(total_score ~ math_score + science_score, data = cleaned_data)

# Summary of the multiple linear regression model

summary(multi_reg)

# Individual bar graphs for each student's scores

individual_plots <- list()

for (student_name in unique(student_data$name)) {

student_scores <- student_data %>%

filter(name == student_name) %>%

select(name, math_score, science_score) %>%

gather(key = "subject", value = "score", math_score, science_score)

individual_plots[[student_name]] <- ggplot(student_scores, aes(x = subject, y = score, fill =


subject)) +

geom_bar(stat = "identity", position = "dodge") +

labs(title = paste("Scores for", student_name),

x = "Subject", y = "Score", fill = "Subject") +

Dept. Of CSE, DSATM 2022-2023 5


Student Score Visualisation

theme_minimal()

# Combine all plots using grid.arrange

combined_plots <- grid.arrange(grobs = individual_plots, ncol = 2)

# Display the combined plots

print(combined_plots)

# Box plots for math and science scores

boxplot_data <- cleaned_data %>%

gather(key = "subject", value = "score", math_score, science_score)

ggplot(boxplot_data, aes(x = subject, y = score, fill = subject)) +

geom_boxplot() +

labs(title = "Box Plot of Math and Science Scores",

x = "Subject", y = "Score", fill = "Subject") +

theme_minimal()

# Correlation matrix heatmap

correlation_matrix <- cor(cleaned_data[, c("math_score", "science_score", "total_score")])

correlation_matrix_plot <- ggplot(data = as.data.frame(correlation_matrix),

aes(x = Var1, y = Var2, fill = value)) +

geom_tile() +

Dept. Of CSE, DSATM 2022-2023 6


Student Score Visualisation

scale_fill_gradient(low = "white", high = "blue") +

labs(title = "Correlation Matrix Heatmap",

x = "Variable 1", y = "Variable 2", fill = "Correlation") +

theme_minimal() +

theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Residual plot for multiple linear regression

residuals <- residuals(multi_reg)

ggplot(data = cleaned_data, aes(x = total_score, y = residuals)) +

geom_point() +

geom_hline(yintercept = 0, color = "red", linetype = "dashed") +

labs(title = "Residual Plot for Multiple Linear Regression",

x = "Total Score", y = "Residuals") +

theme_minimal()

# Histogram of total scores

ggplot(cleaned_data, aes(x = total_score)) +

geom_histogram(binwidth = 5, fill = "blue", color = "black") +

labs(title = "Histogram of Total Scores",

x = "Total Score", y = "Frequency") +

theme_minimal()

# Pairwise scatterplots

pairs(cleaned_data[, c("math_score", "science_score", "total_score")])

Dept. Of CSE, DSATM 2022-2023 7


Student Score Visualisation

# ANOVA for math scores by student name

anova_math <- aov(math_score ~ name, data = cleaned_data)

summary(anova_math)

# ANOVA for science scores by student name

anova_science <- aov(science_score ~ name, data = cleaned_data)

summary(anova_science)

Flowchart:
Dept. Of CSE, DSATM 2022-2023 8
Student Score Visualisation

Dept. Of CSE, DSATM 2022-2023 9


Student Score Visualisation

LIST OF FIGURES:

Fig 1: Residual plot for multiple line regression

Fig 2: Box plot of science and maths score

Fig 3: Subject verses Score comparison

Fig 4: Student score visualisation

Fig 5: Maths vs Science score

Dept. Of CSE, DSATM 2022-2023 10


Student Score Visualisation

OUTPUT
Df Sum Sq Mean Sq F value Pr(>F)
name 4 427 106.9 0.342 0.849
Residuals 95 29690 312.5

Dept. Of CSE, DSATM 2022-2023 11


Student Score Visualisation

OUTPUT SCREENSHOTS

Fig 1

Dept. Of CSE, DSATM 2022-2023 12


Student Score Visualisation

Fig 2

Fig 3

Dept. Of CSE, DSATM 2022-2023 13


Student Score Visualisation

Fig 4

Fig 5

Dept. Of CSE, DSATM 2022-2023 14


Student Score Visualisation

CONCLUSION

In our project, we looked at how students are doing in their studies. We started by creating a
pretend group of students with different scores in math and science. We made sure our data
was clean and correct so that we could study it properly.

First, we found out some basic things about the scores, like the average (typical) scores and
the best scores.

Then, we used pictures and graphs to show the scores in math and science. This helped us see
which subjects students were better at and if they were related.

We also made a special math equation to guess how well a student would do in all their
subjects. This can help teachers and schools plan better.

We didn't just look at numbers. We also used pictures to show how scores are different for
each student. This can help us understand what's going on.

Lastly, we checked if there were big differences in scores based on students' names.

In summary, our project showed that using data and pictures can help us understand how
students are doing in school. This can help teachers and schools make better decisions to help
students succeed. We know there's more to explore, and this project is just the beginning.

Dept. Of CSE, DSATM 2022-2023 15


Student Score Visualisation

REFERENCES

1. Books

 "R for Data Science" by Hadley Wickham and Garrett Grolemund.

 "Shiny in Action" by Hadley Wickham.

 "Data Visualization with ggplot2" by Hadley Wickham.

2. Online Tutorials and Documentation:

 R Project's official website: https://www.r-project.org/

 Shiny documentation: https://shiny.rstudio.com/

 ggplot2 documentation: https://ggplot2.tidyverse.org/

 RStudio's online learning resources: https://education.rstudio.com/learn/

3. Blogs and Websites:

 R-bloggers: https://www.r-bloggers.com/

 RStudio blog: https://blog.rstudio.com/

 R Views: https://rviews.rstudio.com/

4. Forums and Q&A:

 Stack Overflow's R tag: https://stackoverflow.com/questions/tagged/r

 RStudio Community: https://community.rstudio.com/

5. GitHub Repositories and Projects:

 Explore GitHub repositories related to Shiny applications and R data


visualization.

Dept. Of CSE, DSATM 2022-2023 16

You might also like