A Review of Massachusetts 12th Grade Dropout Rates v. Teacher & Guidance Salaries
The Massachusetts Department of Elementary and Secondary Education publishes data related to high school student drop-out rates as well as teacher pay rates. These are made available in separate datasets. (https://www.doe.mass.edu/SchDistrictData.html)
My goal was to examine the most recent available data on dropout rates by high school against the teacher and guidance department pay rates to determine if there was a correlation.
The data provided for dropout rates was presented at 2 levels: district level and individual school level. It also contained the data for the years 2010 through 2019.
Given the variation in school experience within districts, the school level data was of more interest than the district level data. Ultimately, I was interested in the most recent school level data.
This reduces this dataset to 4 columns: District Name School Name Total Dropouts for 2018-2019 Total Students for the same period
For reference, I also created a dataset of just district level dropout information.
# Verify library path
.libPaths()
[1] "C:/Users/theresa/Documents/R/win-library/4.1"
[2] "C:/Program Files/R/R-4.1.2/library"
# set the working directory to be the same as the library path
setwd("C:/Users/theresa/Documents/R/win-library/4.1")
# verify the working directory
getwd()
[1] "C:/Users/theresa/Documents/R/win-library/4.1"
# Installing Tidyverse and readxl packages with explicitly defining the URL of where it lives. This is to get around a Mirror error.
# install.packages("tidyverse", repos = "http://cran.us.r-project.org")
# install.packages("readxl", repos = "http://cran.us.r-project.org")
# load the necessary libraries for the processing
library(tidyverse)
library(dbplyr)
library(readxl)
library(readr)
library(stringr)
library(ggplot2)
# Load in the files and display them for clarification.
mass_dropout_rates_gr12 <-read_csv("c:/users/theresa/Documents/DACSS Local/DataSets/DropOutRates2018-19.csv", skip = 1)
names(mass_dropout_rates_gr12) <-str_replace_all(names(mass_dropout_rates_gr12), c(" "="-"))
mass_dropout_2018_2019 <- mass_dropout_rates_gr12 %>%
select(`District-Name`, `School-Name`, `2018-19`, `High-School-Enrollment`, `Total-Dropout-Count`)
school_dropout_rates_2018_2019 <- mass_dropout_2018_2019 %>%
filter(!grepl('District',`School-Name`)) %>%
arrange(desc(`2018-19`))
district_dropout_rates_2018_2019 <- mass_dropout_2018_2019 %>%
filter(grepl('District',`School-Name`)) %>%
select(`District-Name`,`2018-19`,`High-School-Enrollment`,`Total-Dropout-Count`) %>%
arrange(desc(`High-School-Enrollment`))
head(mass_dropout_2018_2019)
# A tibble: 6 x 5
`District-Name` `School-Name` `2018-19` `High-School-Enrollment`
<chr> <chr> <dbl> <dbl>
1 Abington District Results 0.6 540
2 Abington Abington High 0.6 540
3 Agawam District Results 1.7 1104
4 Agawam Agawam High 1.7 1104
5 Amesbury District Results 2 611
6 Amesbury Amesbury High 1.4 560
# ... with 1 more variable: `Total-Dropout-Count` <dbl>
dim(school_dropout_rates_2018_2019)
[1] 405 5
head(school_dropout_rates_2018_2019)
# A tibble: 6 x 5
`District-Name` `School-Name` `2018-19` `High-School-E~`
<chr> <chr> <dbl> <dbl>
1 Springfield Liberty Prep~ 66.7 9
2 Springfield Springfield ~ 43.7 231
3 Brockton Edison Acade~ 38.7 238
4 Boston Day and Evening Aca~ Boston Day a~ 36.8 421
5 Lowell Middlesex Academy C~ Lowell Middl~ 36.6 82
6 Brockton Frederick Do~ 36.4 11
# ... with 1 more variable: `Total-Dropout-Count` <dbl>
dim(district_dropout_rates_2018_2019)
[1] 308 4
head(district_dropout_rates_2018_2019)
# A tibble: 6 x 4
`District-Name` `2018-19` `High-School-Enrollment` `Total-Dropout-~`
<chr> <dbl> <dbl> <dbl>
1 Boston 4.2 15035 631
2 Worcester 2.6 7158 184
3 Springfield 4.4 6955 309
4 Lynn 4.7 4559 216
5 Brockton 3.9 4419 171
6 Newton 0.3 4016 12
The teacher data included the 2018-2019 salary information for all school levels pre-k - 12. The data also included many other operational expenditures for each school.
The financial data for this analysis needed to be reduced to only relevant data: District Name School Name Teacher Salary Guidance Dept Salary
preliminary_school_data_hs <-read_csv("c:/users/theresa/Documents/DACSS Local/DataSets/preliminary-school-ppx.csv", skip = 3)
# To create easier field names, replace the spaces with a dash in the original column names.
names(preliminary_school_data_hs) <-str_replace_all(names(preliminary_school_data_hs), c(" "="-"))
# Reduce the data set to the relevant columns
preliminary_school_data_g12 <- preliminary_school_data_hs %>%
filter(grepl('9-12',`Grade-Level`)) %>%
select(District...2,`School-Name`,`Teachers-Per-100-Students`,`Avg-Teacher-Salary`,`Guidance-&-Psych...22`)
# Convert the salary numbers from character to numeric
preliminary_school_data_g12$`Avg-Teacher-Salary` <- parse_number(preliminary_school_data_g12$`Avg-Teacher-Salary`)
preliminary_school_data_g12$`Guidance-&-Psych...22` <- parse_number(preliminary_school_data_g12$`Guidance-&-Psych...22`)
# If the salary wasn't provided (was a non-numeric), replace it with the median of all the other salary data
preliminary_school_data_g12 <- preliminary_school_data_g12 %>%
mutate(`Avg-Teacher-Salary` = replace_na(`Avg-Teacher-Salary`, repl = 'median'))
preliminary_school_data_g12
# A tibble: 268 x 5
District...2 `School-Name` `Teachers-Per-~` `Avg-Teacher-S~`
<chr> <chr> <chr> <chr>
1 Abington Abington High 6.6 88445
2 Agawam Agawam High 8.4 82804
3 Amesbury Amesbury High 9.0 78508
4 Amesbury Amesbury Innovation~ 8.3 85348
5 Andover Andover High 7.2 89805
6 Arlington Arlington High 7.5 72903
7 Ashland Ashland High 6.8 83551
8 Attleboro Attleboro High 6.5 90712
9 Attleboro Attleboro Community~ 2.5 median
10 Bedford Bedford High 9.6 98994
# ... with 258 more rows, and 1 more variable:
# `Guidance-&-Psych...22` <dbl>
# create a single dataset with dropout rates and salary data for 2018-2019.
dropout_data_with_salary_info <- merge(school_dropout_rates_2018_2019,preliminary_school_data_g12, by.x = 'School-Name', by.y ='School-Name')
head(dropout_data_with_salary_info)
School-Name District-Name 2018-19
1 Abington High Abington 0.6
2 Acton-Boxborough Regional High Acton-Boxborough 0.4
3 Agawam High Agawam 1.7
4 Algonquin Regional High Northboro-Southboro 0.3
5 Amesbury High Amesbury 1.4
6 Amesbury Innovation High School Amesbury 7.8
High-School-Enrollment Total-Dropout-Count District...2
1 540 3 Abington
2 1834 8 Acton-Boxborough
3 1104 19 Agawam
4 1440 4 Northboro-Southboro
5 560 8 Amesbury
6 51 4 Amesbury
Teachers-Per-100-Students Avg-Teacher-Salary Guidance-&-Psych...22
1 6.6 88445 540
2 6.9 91373 981
3 8.4 82804 716
4 8.3 91342 721
5 9.0 78508 884
6 8.3 85348 1719
# This was a very frustrating attempt to define a new column that classified the range of dropouts.
#dropout_data_with_salary_info <- dropout_data_with_salary_info %>%
# mutate(dropout-level = if_else(.$`2018-19` >= 10, "HIGH", (if_else(.$`2018-19` >= 7, "MODERATE", #(if_else(.$`2018-19` >5, "LOW MODERATE", (if_else(.$`2018-19` >3, "LOW", "EXTREMELY LOW") ) )) ))))
head(dropout_data_with_salary_info)
School-Name District-Name 2018-19
1 Abington High Abington 0.6
2 Acton-Boxborough Regional High Acton-Boxborough 0.4
3 Agawam High Agawam 1.7
4 Algonquin Regional High Northboro-Southboro 0.3
5 Amesbury High Amesbury 1.4
6 Amesbury Innovation High School Amesbury 7.8
High-School-Enrollment Total-Dropout-Count District...2
1 540 3 Abington
2 1834 8 Acton-Boxborough
3 1104 19 Agawam
4 1440 4 Northboro-Southboro
5 560 8 Amesbury
6 51 4 Amesbury
Teachers-Per-100-Students Avg-Teacher-Salary Guidance-&-Psych...22
1 6.6 88445 540
2 6.9 91373 981
3 8.4 82804 716
4 8.3 91342 721
5 9.0 78508 884
6 8.3 85348 1719
Is there a correlation between teacher salaries and dropout rate.
## Salary v. Dropout Rate
ggplot(dropout_data_with_salary_info,aes(`Avg-Teacher-Salary`,`2018-19` )) + geom_point()
Is there a correlation between guidance counselor salaries and dropout rate.
ggplot(dropout_data_with_salary_info,aes(`Guidance-&-Psych...22`,`2018-19` )) + geom_point()
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
TLamkin (2022, Feb. 20). Data Analytics and Computational Social Science: HW3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomtlamkin867082/
BibTeX citation
@misc{tlamkin2022hw3, author = {TLamkin, }, title = {Data Analytics and Computational Social Science: HW3}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomtlamkin867082/}, year = {2022} }