MADT_Homework 3

Identifying variables in WA school accountability data

Meredith Derian-Toth
2/21/2022

My research interest is in schols and districts that are lower performing. I am interested to know more about the learning environment as well as the community environment that may effect student learning and student attendance. The variables in this dataset only give us some of this information, such as student performance data, student attendance data, and some demographic data.

library(readr)
WA_Edu_Improvment_2017_2019 <- read_csv("WA_School_Improvment_2017_-_2019_Runs.csv")
View(WA_Edu_Improvment_2017_2019)
dim(WA_Edu_Improvment_2017_2019)
[1] 72850    56
head(WA_Edu_Improvment_2017_2019)
# A tibble: 6 × 56
  `ESD Organization Id` `ESD Name`    `District Orga…` `District Code`
                  <dbl> <chr>                    <dbl>           <dbl>
1                100006 Puget Sound …           100229           17001
2                100003 Educational …           100278            6037
3                100009 Northwest Ed…           100142           31025
4                100009 Northwest Ed…           100159           31006
5                100009 Northwest Ed…           100142           31025
6                100007 Educational …           100195           11001
# … with 52 more variables: `District Name` <chr>,
#   `School Code` <dbl>, `School Name` <chr>,
#   `School Organization Id` <dbl>, `School Type` <chr>,
#   `Student Group` <chr>, `Proficiency ELA Numerator` <chr>,
#   `Proficiency ELA Denominator` <dbl>,
#   `Proficiency ELA Rate` <chr>, `Proficiency ELA Decile` <dbl>,
#   `Proficiency Math Numerator` <chr>, …

There are many variables in this dataset, a few that I will be focusing on are:

District code and school code, these variables are discrete integars.

Student group is a categorical text variable that identifies students of a particular race, ethnicity, low income status, or status as an egnlish language learner .

Proficiency ELA Rate, Proficiency Math Rate, Regular Attendance Rate, and Grade FourYear Rate are all continuous variables

Potential research questions are:

Do school’s with low attendance rates also have low 4-year graduation rates?

Do school’s with low attendance rates also perform lower on ELA or Math proficiency assessments?

Is there a relationship between school’s student group populations and their attendance rate and/or academic proficiency scores?

library("tidyverse")
WA_Edu_Improvment_2017_2019_Filtered2<-WA_Edu_Improvment_2017_2019 %>%
  filter(!grepl('Suppress | N',"Regular Attendance Rate"))%>%
ggplot(aes("Student Group", "Regular Attendance Rate")) + 
  geom_boxplot()

The above code is me trying to plot the attendance rate for the different student groups but I can’t seem to make it work. I think it’s probably because there are both integars and text in the Regular Attendance Rate column, though I can’t seem to filter out the text. I will come back to this problem.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Derian-Toth (2022, Feb. 23). Data Analytics and Computational Social Science: MADT_Homework 3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommderiantothdacss601hw2mdt/

BibTeX citation

@misc{derian-toth2022madt_homework,
  author = {Derian-Toth, Meredith},
  title = {Data Analytics and Computational Social Science: MADT_Homework 3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommderiantothdacss601hw2mdt/},
  year = {2022}
}