Shaye Hallee - DACSS 601 HW02

Reading in ACS 2019 disability population estimates.

Shaye Hallee
2022-02-12

Introducing the data

Here, we’ll be looking at data about disabled populations in US counties. Specifically, we’re using Subject Table S1810 from the 2019 1-year population estimates from the American Community Survey, an on-going demographics survey run by the U.S. Census Bureau. This table includes lots of data including county populations and disabled populations across different demographics.1

We’re going to answer the following questions:

  1. Which US county has the highest disabled population (by count)?
  2. Which county has the lowest disabled population (by count)?
  3. Which counties have a much higher than average disabled population (by percentage)?
  4. Which counties have a much lower than average disabled population (by percentage)?

Table S1810 is incredibly large, so we’ll pull out the following columns:

Variable Class Description
County char (text) county name
State char (text) state name
cty_ni_pop dbl (numerical) total estimated 2019 county population of noninstitutionalized civilians
cty_ni_dis_pop dbl (numerical) estimated 2019 county population of disabled, noninstitutionalized civilians
cty_pct_disabled dbl (numerical) disabled population as a percentage of the total county population

“Noninstitutionalized civilians” means people who aren’t in the armed forces and don’t live in institutions like prisons, hospitals, or nursing homes.2 These other two groups usually rely on their respective institutions to meet their support and access needs, and they usually have higher disabled populations. Surveys like the ACS are mostly used to plan community resources, so they exclude these groups with the assumption that they won’t be interacting with the communities around them.3

I might have to find more thorough data if I plan to use demographics information in future projects.

Reading in the data

Let’s read the data in, free it from an unnecessary row, and put it all in a tibble.

library(tidyverse)
library(knitr)

data <- read.csv("ACS_ST_1Y_2019_Disability_County/data_all.csv", encoding = "UTF-8")
data <- data[c(2:nrow(data)),]
data <- as_tibble(data)

Let’s make sure it’s a tibble of about the expected size.

class(data)
[1] "tbl_df"     "tbl"        "data.frame"
dim(data)
[1] 840 416

Done!

Cleaning up the data

Right now, our tibble has a lot of very cool data that we won’t be using, and the column names aren’t human-friendly.

Let’s extract the right columns and give them (marginally) friendlier names. We’ll use dplyr::select for that.

data <- select(data,
               NAME,
               cty_ni_pop = S1810_C01_001E,
               cty_ni_dis_pop = S1810_C02_001E,
               cty_pct_disabled = S1810_C03_001E)

For kicks, let’s separate the “NAME” column into “County” and “State.”

data <- separate(data, NAME, c("County", "State"), sep = ", ")

Let’s turn the appropriate columns into numerical values. This is kind of sloppy, but it’s just three columns in a script we probably won’t use again. Famous last words, I know.

data$cty_ni_pop <- as.numeric(data$cty_ni_pop)
data$cty_ni_dis_pop <- as.numeric(data$cty_ni_dis_pop)
data$cty_pct_disabled <- as.numeric(data$cty_pct_disabled)

Here’s what the data looks like now:

kable(head(data))
County State cty_ni_pop cty_ni_dis_pop cty_pct_disabled
Baldwin County Alabama 220911 31901 14.4
Calhoun County Alabama 111075 22269 20.0
Cullman County Alabama 82841 14480 17.5
DeKalb County Alabama 70392 7583 10.8
Elmore County Alabama 75409 9707 12.9
Etowah County Alabama 101470 15944 15.7
kable(tail(data))
County State cty_ni_pop cty_ni_dis_pop cty_pct_disabled
Mayagüez Municipio Puerto Rico 71018 15705 22.1
Ponce Municipio Puerto Rico 129198 28785 22.3
San Juan Municipio Puerto Rico 313915 60014 19.1
Toa Alta Municipio Puerto Rico 71897 6140 8.5
Toa Baja Municipio Puerto Rico 73735 16284 22.1
Trujillo Alto Municipio Puerto Rico 63312 15870 25.1

Finding cool things in the data

First, let’s find the average disabled population in a US county, as a percentage of the total population.

mean_pct_disabled <- mean(data$cty_pct_disabled)
mean_pct_disabled
[1] 13.70964

We’ll use dplyr::filter to answer the questions from the intro.

  1. Which US county has the highest disabled population (by count)?
highest_disabled_pop <- filter(data, (data$cty_ni_dis_pop == max(data$cty_ni_dis_pop)))
kable(highest_disabled_pop)
County State cty_ni_pop cty_ni_dis_pop cty_pct_disabled
Los Angeles County California 9964081 984931 9.9
  1. Which county has the lowest disabled population (by count)?
lowest_disabled_pop <- filter(data, (data$cty_ni_dis_pop == min(data$cty_ni_dis_pop)))
kable(lowest_disabled_pop)
County State cty_ni_pop cty_ni_dis_pop cty_pct_disabled
Walker County Texas 61093 4947 8.1
  1. Which counties have a much higher than average disabled population (by percentage)? Let’s arbitrarily use 1.75 times the mean as our threshold.
hi_disability <- filter(data, data$cty_pct_disabled >= 1.75*mean_pct_disabled)
kable(hi_disability)
County State cty_ni_pop cty_ni_dis_pop cty_pct_disabled
Talladega County Alabama 76722 20102 26.2
Walker County Alabama 62896 17381 27.6
Charlotte County Florida 186002 45368 24.4
Walker County Georgia 68199 16593 24.3
Raleigh County West Virginia 70907 17130 24.2
Bayamón Municipio Puerto Rico 164521 43015 26.1
Caguas Municipio Puerto Rico 124149 32099 25.9
Guaynabo Municipio Puerto Rico 83119 19948 24.0
Trujillo Alto Municipio Puerto Rico 63312 15870 25.1
  1. Which counties have a much lower than average disabled population (by percentage)? Let’s (also arbitrarily) use 0.5 times the mean as our threshold.
lo_disability <- filter(data, data$cty_pct_disabled <= 0.5*mean_pct_disabled)
kable(lo_disability)
County State cty_ni_pop cty_ni_dis_pop cty_pct_disabled
Gwinnett County Georgia 930955 63740 6.8
Carver County Minnesota 104708 6822 6.5
Fort Bend County Texas 806384 53265 6.6
Arlington County Virginia 231652 14506 6.3
Loudoun County Virginia 411654 23713 5.8
Alexandria city Virginia 155298 10181 6.6

What’s next?

None of this tells us anything particularly interesting without looking at some complementary data. I’d be interested in looking at other data from Subject Table S1810 to see if there are correlations with race, overall population size, age, or type of disability. Other data sets are out there with data on poverty, food access, urbanization, and lots of other information, and it’ll be very cool to check out some of that data.


  1. U.S. Census Bureau, 2019 American Community Survey 1-Year Estimates, https://data.census.gov/cedsci/table?t=Disability&tid=ACSST1Y2019.S1810↩︎

  2. U.S. Census Bureau, American Community Survey and Puerto Rico Community Survey 2019 Code List, https://www2.census.gov/programs-surveys/acs/tech_docs/code_lists/2019_ACS_Code_Lists.pdf↩︎

  3. Brault, M. (2008). Disability Status and the Characteristics of People in Group Quarters: A Brief Analysis of Disability Prevalence Among the Civilian Noninstitutionalized and Total Populations in the American Community Survey. U.S. Census Bureau “Working Papers”. https://www.census.gov/library/working-papers/2008/demo/brault-01.html↩︎

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Hallee (2022, Feb. 13). Data Analytics and Computational Social Science: Shaye Hallee - DACSS 601 HW02. Retrieved from https://shayehallee.github.io/coursework/2022-02-12-dacss-601-hw02/

BibTeX citation

@misc{hallee2022shaye,
  author = {Hallee, Shaye},
  title = {Data Analytics and Computational Social Science: Shaye Hallee - DACSS 601 HW02},
  url = {https://shayehallee.github.io/coursework/2022-02-12-dacss-601-hw02/},
  year = {2022}
}