HW3

A Review of Massachusetts 12th Grade Dropout Rates v. Teacher & Guidance Salaries

TLamkin
2022-02-16

INTRODUCTION

The Massachusetts Department of Elementary and Secondary Education publishes data related to high school student drop-out rates as well as teacher pay rates. These are made available in separate datasets. (https://www.doe.mass.edu/SchDistrictData.html)

My goal was to examine the most recent available data on dropout rates by high school against the teacher and guidance department pay rates to determine if there was a correlation.

Tidy-ing the school data:

The data provided for dropout rates was presented at 2 levels: district level and individual school level. It also contained the data for the years 2010 through 2019.

Given the variation in school experience within districts, the school level data was of more interest than the district level data. Ultimately, I was interested in the most recent school level data.

This reduces this dataset to 4 columns: District Name School Name Total Dropouts for 2018-2019 Total Students for the same period

For reference, I also created a dataset of just district level dropout information.

# Verify library path

.libPaths()
[1] "C:/Users/theresa/Documents/R/win-library/4.1"
[2] "C:/Program Files/R/R-4.1.2/library"          
# set the working directory to be the same as the library path

setwd("C:/Users/theresa/Documents/R/win-library/4.1")

# verify the working directory

getwd()
[1] "C:/Users/theresa/Documents/R/win-library/4.1"
# Installing Tidyverse and readxl packages with explicitly defining the URL of where it lives. This is to get around a Mirror error. 

# install.packages("tidyverse", repos = "http://cran.us.r-project.org")
# install.packages("readxl", repos = "http://cran.us.r-project.org")

# load the necessary libraries for the processing

library(tidyverse)
library(dbplyr)
library(readxl)
library(readr)
library(stringr)
library(ggplot2)

# Load in the files and display them for clarification. 

mass_dropout_rates_gr12 <-read_csv("c:/users/theresa/Documents/DACSS Local/DataSets/DropOutRates2018-19.csv", skip = 1)

names(mass_dropout_rates_gr12) <-str_replace_all(names(mass_dropout_rates_gr12), c(" "="-"))
 
mass_dropout_2018_2019 <- mass_dropout_rates_gr12 %>% 
  select(`District-Name`, `School-Name`, `2018-19`, `High-School-Enrollment`, `Total-Dropout-Count`)

school_dropout_rates_2018_2019 <- mass_dropout_2018_2019 %>% 
 filter(!grepl('District',`School-Name`)) %>% 
  arrange(desc(`2018-19`))

district_dropout_rates_2018_2019 <- mass_dropout_2018_2019 %>% 
  filter(grepl('District',`School-Name`))  %>% 
  select(`District-Name`,`2018-19`,`High-School-Enrollment`,`Total-Dropout-Count`) %>%
  arrange(desc(`High-School-Enrollment`))  

head(mass_dropout_2018_2019)
# A tibble: 6 x 5
  `District-Name` `School-Name`    `2018-19` `High-School-Enrollment`
  <chr>           <chr>                <dbl>                    <dbl>
1 Abington        District Results       0.6                      540
2 Abington        Abington High          0.6                      540
3 Agawam          District Results       1.7                     1104
4 Agawam          Agawam High            1.7                     1104
5 Amesbury        District Results       2                        611
6 Amesbury        Amesbury High          1.4                      560
# ... with 1 more variable: `Total-Dropout-Count` <dbl>
dim(school_dropout_rates_2018_2019)
[1] 405   5
head(school_dropout_rates_2018_2019)
# A tibble: 6 x 5
  `District-Name`             `School-Name` `2018-19` `High-School-E~`
  <chr>                       <chr>             <dbl>            <dbl>
1 Springfield                 Liberty Prep~      66.7                9
2 Springfield                 Springfield ~      43.7              231
3 Brockton                    Edison Acade~      38.7              238
4 Boston Day and Evening Aca~ Boston Day a~      36.8              421
5 Lowell Middlesex Academy C~ Lowell Middl~      36.6               82
6 Brockton                    Frederick Do~      36.4               11
# ... with 1 more variable: `Total-Dropout-Count` <dbl>
dim(district_dropout_rates_2018_2019)
[1] 308   4
head(district_dropout_rates_2018_2019)
# A tibble: 6 x 4
  `District-Name` `2018-19` `High-School-Enrollment` `Total-Dropout-~`
  <chr>               <dbl>                    <dbl>             <dbl>
1 Boston                4.2                    15035               631
2 Worcester             2.6                     7158               184
3 Springfield           4.4                     6955               309
4 Lynn                  4.7                     4559               216
5 Brockton              3.9                     4419               171
6 Newton                0.3                     4016                12

Tidy-ing the Salary data:

The teacher data included the 2018-2019 salary information for all school levels pre-k - 12. The data also included many other operational expenditures for each school.

The financial data for this analysis needed to be reduced to only relevant data: District Name School Name Teacher Salary Guidance Dept Salary

preliminary_school_data_hs <-read_csv("c:/users/theresa/Documents/DACSS Local/DataSets/preliminary-school-ppx.csv", skip = 3) 

# To create easier field names, replace the spaces with a dash in the original column names.

names(preliminary_school_data_hs) <-str_replace_all(names(preliminary_school_data_hs), c(" "="-"))

# Reduce the data set to the relevant columns 

preliminary_school_data_g12 <- preliminary_school_data_hs %>%
 filter(grepl('9-12',`Grade-Level`)) %>%
  select(District...2,`School-Name`,`Teachers-Per-100-Students`,`Avg-Teacher-Salary`,`Guidance-&-Psych...22`) 

# Convert the salary numbers from character to numeric

preliminary_school_data_g12$`Avg-Teacher-Salary` <-           parse_number(preliminary_school_data_g12$`Avg-Teacher-Salary`) 
preliminary_school_data_g12$`Guidance-&-Psych...22` <-  parse_number(preliminary_school_data_g12$`Guidance-&-Psych...22`)

# If the salary wasn't provided (was a non-numeric), replace it with the median of all the other salary data

preliminary_school_data_g12 <- preliminary_school_data_g12 %>%
  mutate(`Avg-Teacher-Salary` = replace_na(`Avg-Teacher-Salary`, repl = 'median')) 


preliminary_school_data_g12
# A tibble: 268 x 5
   District...2 `School-Name`        `Teachers-Per-~` `Avg-Teacher-S~`
   <chr>        <chr>                <chr>            <chr>           
 1 Abington     Abington High        6.6              88445           
 2 Agawam       Agawam High          8.4              82804           
 3 Amesbury     Amesbury High        9.0              78508           
 4 Amesbury     Amesbury Innovation~ 8.3              85348           
 5 Andover      Andover High         7.2              89805           
 6 Arlington    Arlington High       7.5              72903           
 7 Ashland      Ashland High         6.8              83551           
 8 Attleboro    Attleboro High       6.5              90712           
 9 Attleboro    Attleboro Community~ 2.5              median          
10 Bedford      Bedford High         9.6              98994           
# ... with 258 more rows, and 1 more variable:
#   `Guidance-&-Psych...22` <dbl>
# create a single dataset with dropout rates and salary data for 2018-2019. 

dropout_data_with_salary_info <- merge(school_dropout_rates_2018_2019,preliminary_school_data_g12, by.x = 'School-Name', by.y ='School-Name') 
head(dropout_data_with_salary_info)
                      School-Name       District-Name 2018-19
1                   Abington High            Abington     0.6
2  Acton-Boxborough Regional High    Acton-Boxborough     0.4
3                     Agawam High              Agawam     1.7
4         Algonquin Regional High Northboro-Southboro     0.3
5                   Amesbury High            Amesbury     1.4
6 Amesbury Innovation High School            Amesbury     7.8
  High-School-Enrollment Total-Dropout-Count        District...2
1                    540                   3            Abington
2                   1834                   8    Acton-Boxborough
3                   1104                  19              Agawam
4                   1440                   4 Northboro-Southboro
5                    560                   8            Amesbury
6                     51                   4            Amesbury
  Teachers-Per-100-Students Avg-Teacher-Salary Guidance-&-Psych...22
1                       6.6              88445                   540
2                       6.9              91373                   981
3                       8.4              82804                   716
4                       8.3              91342                   721
5                       9.0              78508                   884
6                       8.3              85348                  1719
# This was a very frustrating attempt to define a new column that classified the range of dropouts. 
#dropout_data_with_salary_info <- dropout_data_with_salary_info %>%
# mutate(dropout-level = if_else(.$`2018-19` >= 10, "HIGH", (if_else(.$`2018-19` >= 7, "MODERATE", #(if_else(.$`2018-19` >5, "LOW MODERATE", (if_else(.$`2018-19` >3, "LOW", "EXTREMELY LOW") ) )) ))))
head(dropout_data_with_salary_info)
                      School-Name       District-Name 2018-19
1                   Abington High            Abington     0.6
2  Acton-Boxborough Regional High    Acton-Boxborough     0.4
3                     Agawam High              Agawam     1.7
4         Algonquin Regional High Northboro-Southboro     0.3
5                   Amesbury High            Amesbury     1.4
6 Amesbury Innovation High School            Amesbury     7.8
  High-School-Enrollment Total-Dropout-Count        District...2
1                    540                   3            Abington
2                   1834                   8    Acton-Boxborough
3                   1104                  19              Agawam
4                   1440                   4 Northboro-Southboro
5                    560                   8            Amesbury
6                     51                   4            Amesbury
  Teachers-Per-100-Students Avg-Teacher-Salary Guidance-&-Psych...22
1                       6.6              88445                   540
2                       6.9              91373                   981
3                       8.4              82804                   716
4                       8.3              91342                   721
5                       9.0              78508                   884
6                       8.3              85348                  1719

Teacher Salary v. Dropout Rate:

Is there a correlation between teacher salaries and dropout rate.

## Salary v. Dropout Rate

ggplot(dropout_data_with_salary_info,aes(`Avg-Teacher-Salary`,`2018-19` )) + geom_point() 

Guidance Salary v. Dropout Rate:

Is there a correlation between guidance counselor salaries and dropout rate.

ggplot(dropout_data_with_salary_info,aes(`Guidance-&-Psych...22`,`2018-19` )) + geom_point() 

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

TLamkin (2022, Feb. 20). Data Analytics and Computational Social Science: HW3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomtlamkin867082/

BibTeX citation

@misc{tlamkin2022hw3,
  author = {TLamkin, },
  title = {Data Analytics and Computational Social Science: HW3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomtlamkin867082/},
  year = {2022}
}