Better visualizations
student <- read_csv("./student_final_data.csv")
student %>%
group_by(race_ethnicity_group) %>%
summarize(freq = n(),
mean = mean(total_marks),
sd = sd(total_marks),
se = sd / sqrt(freq)) %>%
ggplot(aes(x = race_ethnicity_group,
y = mean,
color = race_ethnicity_group)) +
geom_errorbar(aes(ymin = mean - se,
ymax = mean + se)) +
geom_point() + labs(title = "Visualizing uncertainty around estimation of total marks by ethnic group", y = "mean of total marks")
Observation It is observed that students who belong to ethnic group E performed significantly better than other students.
ggplot(student, aes(x= race_ethnicity_group, y = mean_marks, fill = test_preparation_course)) +
geom_col(position = "dodge") +
facet_wrap(~lunch)+
labs(title="Scores by Ethnic Background for Free/Reduced and Standard Lunch",
x ="Ethnic Background",
y ="Average Score") +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
Observation - Majority of students who took the test preparation course performed better than those who did not. Whereas for students who took standard lunch and belong to group D and E, the test_preparation_course didn’t make any significant difference.
student%>%
select(gender, math_marks, reading_marks, writing_marks)%>%
gather(key, value, -gender)%>%
ggplot( aes(x=gender, y = value , fill = gender )) +
geom_boxplot()+
facet_grid(~key)+
labs(title ="Marks by Gender", x= "Gender", y ="Marks")
Observation - Majority of Female students have performed better in reading and writing whereas majority of male students have performed better in maths.
student%>%
select(parent_highest_education, math_marks, reading_marks, writing_marks)%>%
gather(key, value, -parent_highest_education)%>%
ggplot( aes(x=parent_highest_education, y = value , fill = parent_highest_education )) +
geom_boxplot()+
facet_grid(~key)+
labs(title ="Marks distribution as per parent highest education level", x= "parent_highest_education", y ="Marks") +
theme(panel.spacing = unit(1, "lines")) +
coord_flip()
Observation - It is quite clear from the plot that students whose parent’s highest education level is master’s degree performed better in reading, writing and maths.
At this point, we have answered how does the parental highest education level impacts their child’s performance. Additionally, we also got an answer to which gender performed better in an average. From the Marks by Gender plot , it is observed that female students performed well in reading and writing whereas male students were better in maths. We also investigated about the importance of test preparation courses for the students. We found that it helped majorly to students belonging to ethnic group A,B and C.
Firstly, I would like perform an analysis of grade distribution among the students by gender. Secondly, Is there a link between a student’s lunch choice and their average grade? Students who received free or reduced meals were likely from low-income homes. It would be fascinating to compare these pupils to students from higher-income families.
The graphs are plotted and made sure that the end user is not overwhelmed with too much information. Overall, the visuals are user-friendly.
Would like to answer - I wanted to analyze the number of hours spent in test_preparatory_course so that I could extrapolate the relationship between the number of hours dedicated by students and the marks they received.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Dhal (2022, May 19). Data Analytics and Computational Social Science: HW5. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompdhal27hw5/
BibTeX citation
@misc{dhal2022hw5, author = {Dhal, Pragyanta}, title = {Data Analytics and Computational Social Science: HW5}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompdhal27hw5/}, year = {2022} }