Data Wrangling
spec_tbl_df [1,000 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ gender : chr [1:1000] "female" "female" "female" "male" ...
$ race/ethnicity : chr [1:1000] "group B" "group C" "group B" "group A" ...
$ parental level of education: chr [1:1000] "bachelor's degree" "some college" "master's degree" "associate's degree" ...
$ lunch : chr [1:1000] "standard" "standard" "standard" "free/reduced" ...
$ test preparation course : chr [1:1000] "none" "completed" "none" "none" ...
$ math score : num [1:1000] 72 69 90 47 76 71 88 40 64 38 ...
$ reading score : num [1:1000] 72 90 95 57 78 83 95 43 64 60 ...
$ writing score : num [1:1000] 74 88 93 44 75 78 92 39 67 50 ...
- attr(*, "spec")=
.. cols(
.. gender = col_character(),
.. `race/ethnicity` = col_character(),
.. `parental level of education` = col_character(),
.. lunch = col_character(),
.. `test preparation course` = col_character(),
.. `math score` = col_double(),
.. `reading score` = col_double(),
.. `writing score` = col_double()
.. )
- attr(*, "problems")=<externalptr>
gender race/ethnicity
0 0
parental level of education lunch
0 0
test preparation course math score
0 0
reading score writing score
0 0
As we can see it is a clean dataset.
student$total_marks = student$math_marks + student$reading_marks + student$writing_marks
student$mean_marks = round((student$total_marks)/3,2)
student <- student %>%
mutate(grade = case_when(
mean_marks >= 90 & mean_marks <= 100 ~ "A",
mean_marks >= 80 & mean_marks < 90 ~ "B",
mean_marks >= 70 & mean_marks < 80 ~ "C",
mean_marks >= 60 & mean_marks < 70 ~ "D",
mean_marks >= 50 & mean_marks < 60 ~ "E",
mean_marks < 50 ~ "F"
)%>% as.factor()
)
Lets have a look at our data again :
str(student)
tibble [1,000 × 11] (S3: tbl_df/tbl/data.frame)
$ gender : Factor w/ 2 levels "female","male": 1 1 1 2 2 1 1 2 2 1 ...
$ race_ethnicity_group : Factor w/ 5 levels "group A","group B",..: 2 3 2 1 3 2 2 2 4 2 ...
$ parent_highest_education: Factor w/ 6 levels "some high school",..: 5 3 6 4 3 4 3 3 2 2 ...
$ lunch : Factor w/ 2 levels "free/reduced",..: 2 2 2 1 2 2 2 1 1 1 ...
$ test_preparation_course : Factor w/ 2 levels "completed","none": 2 1 2 2 2 2 1 2 1 2 ...
$ math_marks : num [1:1000] 72 69 90 47 76 71 88 40 64 38 ...
$ reading_marks : num [1:1000] 72 90 95 57 78 83 95 43 64 60 ...
$ writing_marks : num [1:1000] 74 88 93 44 75 78 92 39 67 50 ...
$ total_marks : num [1:1000] 218 247 278 148 229 232 275 122 195 148 ...
$ mean_marks : num [1:1000] 72.7 82.3 92.7 49.3 76.3 ...
$ grade : Factor w/ 6 levels "A","B","C","D",..: 3 2 1 6 3 3 1 6 4 6 ...
summary(student)
gender race_ethnicity_group parent_highest_education
female:518 group A: 89 some high school :179
male :482 group B:190 high school :196
group C:319 some college :226
group D:262 associate's degree:222
group E:140 bachelor's degree :118
master's degree : 59
lunch test_preparation_course math_marks
free/reduced:355 completed:358 Min. : 0.00
standard :645 none :642 1st Qu.: 57.00
Median : 66.00
Mean : 66.09
3rd Qu.: 77.00
Max. :100.00
reading_marks writing_marks total_marks mean_marks
Min. : 17.00 Min. : 10.00 Min. : 27.0 Min. : 9.00
1st Qu.: 59.00 1st Qu.: 57.75 1st Qu.:175.0 1st Qu.: 58.33
Median : 70.00 Median : 69.00 Median :205.0 Median : 68.33
Mean : 69.17 Mean : 68.05 Mean :203.3 Mean : 67.77
3rd Qu.: 79.00 3rd Qu.: 79.00 3rd Qu.:233.0 3rd Qu.: 77.67
Max. :100.00 Max. :100.00 Max. :300.0 Max. :100.00
grade
A: 52
B:146
C:261
D:256
E:182
F:103
write.csv(student, "./student_final_data.csv")
Which gender performs better on an average?
How is the performance of students who have completed the preparation course against those who have not?
How much does the parental highest education level impact their child’s performance
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Dhal (2022, May 19). Data Analytics and Computational Social Science: HW3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompdhal27hw3/
BibTeX citation
@misc{dhal2022hw3, author = {Dhal, Pragyanta}, title = {Data Analytics and Computational Social Science: HW3}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompdhal27hw3/}, year = {2022} }