Final Part 3

finalpart3

caleb.hill

Author

Caleb Hill

Published

December 11, 2022

Introduction

With climate change upsetting normal weather patterns, there has been exciting research into how precipitation shocks (flooding and drought events) impact health outcomes, (Rizmie, d., Preuz, L, Miraldo, M. & Atun, R. (2022); Tapak, L., Maryanaji, Z., Hamidi, O., Abbasi, H. & Najafi-Vosough, R. (2018); Trinh, T., Feeny, S. & Posso, A. (2018)). Alongside this train of thought, multiple research reports state that there is a relationship between re-hospitalization rates and social characteristics, such as demographic and economic identifiers, (Barnett, Hsu & McWilliams, 2015; Murray, Allen, Clark, Daly & Jacobs, 2021). Specifically, racial characteristics play a large role in predicting re-hospitalization in a population (Li, Cai & Glance, 2015). While some articles examine economic and health factors contributing to these disparities, very few dig deep into environmental factors that influence this phenomenon, (Spatz, Bernheim, Horwitz & Herrin, 2020). With your zipcode affecting up to 60% of your health outcomes, this research is relevant to better improving one of our most costly health expenditures: hospitalization. Therefore, this research report aims to integrate these two concepts, testing if precipitation shocks have a positive relationship on not just hospital admissions but also re-admissions.

Re-hospitalization is a substantially costlier expenditure, as readmitting a patient further increases costs – especially if the diagnosis was untreated, poorly treated, or incorrectly treated. Most inpatient episodes characterized as a re-hospitalization when the patient is readmitted to the hospital 60 days after discharge. If the cause is different, sometimes that is counted as a re-hospitalization; other times, not so much, (Bhosale, K., Nath, R., Pandit, N., Agarwal, P., Khairnar, S., Yadav, B. & Chandrakar, S., 2020).

Research Question

This paper aims to explore how precipitation shocks impact re-hospitalization rates on a county-by-county level. We will control for socio-demographic and economic variables. We will also stratify by rural/urban classification, to determine if counties above or below 250,000 population experience differences in re-hospitalization rates, dependent upon these explanatory variables. This is purely because literature shows that rural communities face more acute and poorer health outcomes.

The data-set chosen for this analysis is taken from the Agency for Healthcare Research and Quality, Social Determinants of Health (SDOH) Database. This data-set has over 1,400 variables to explore each SDOH domain: social context, economic context, education, healthcare, and the environment. We shall pull data from three of these five domains: social, economic, and environmental.

How re-hospitalization is measured is not clarified per this data-set’s codebook. However, the Center for Medicare and Medicaid (CMS) 30-day Risk-Standardized Readmission Rate (RSRR) measures re-hospitalization as an unplanned readmission to inpatient services. It does stratify and specify based upon diagnosis. As the AHRQ is a federal agency alongside CMS, it is likely that they are pulling from CMS for this measure and aggregating various diagnoses into one county rate.

Hypothesis

The hypothesis for this research report is:

Precipitation has a positive impact on re-hospitalization rates within rural counties.

Therefore, the null hypothesis is:

Precipitation does not have a positive impact on re-hospitalization rates within rural counties.

Various regression models shall be employed to determine the relationship – or lack thereof – between these variables.

First I’ll import the relevant librarie and set up a UMass color scheme for crimson and black.

Then I’ll import the dataset and view the first six rows.

Code

df <- SDOH_2020_COUNTY_1_0 <- read_excel("_data/SDOH_2020_COUNTY_1_0.xlsx", 
                                         sheet = "Data")

Warning: Expecting logical in OA1673 / R1673C391: got '46123'

Warning: Expecting logical in OA1765 / R1765C391: got '32510'

Warning: Expecting logical in OB1765 / R1765C392: got '41025'

Warning: Expecting logical in OC1765 / R1765C393: got '41037'

Warning: Expecting logical in OA2799 / R2799C391: got '49017'

Warning: Expecting logical in OB2799 / R2799C392: got '49019'

Warning: Expecting logical in OC2799 / R2799C393: got '49025'

Warning: Expecting logical in OD2799 / R2799C394: got '49055'

Warning: Expecting logical in OA2844 / R2844C391: got '51760'

Code

head(df)

# A tibble: 6 × 685
   YEAR COUNTYFIPS STATEFIPS STATE COUNTY REGION TERRI…¹ ACS_T…² ACS_T…³ ACS_T…⁴
  <dbl> <chr>      <chr>     <chr> <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>
1  2020 01001      01        Alab… Autau… South        0   55639   54929   52404
2  2020 01003      01        Alab… Baldw… South        0  218289  216518  206329
3  2020 01005      01        Alab… Barbo… South        0   25026   24792   23694
4  2020 01007      01        Alab… Bibb … South        0   22374   22073   21121
5  2020 01009      01        Alab… Bloun… South        0   57755   57164   54250
6  2020 01011      01        Alab… Bullo… South        0   10173   10143    9579
# … with 675 more variables: ACS_TOT_POP_ABOVE15 <dbl>,
#   ACS_TOT_POP_ABOVE16 <dbl>, ACS_TOT_POP_16_19 <dbl>,
#   ACS_TOT_POP_ABOVE25 <dbl>, ACS_TOT_CIVIL_POP_ABOVE18 <dbl>,
#   ACS_TOT_CIVIL_VET_POP_ABOVE25 <dbl>, ACS_TOT_OWN_CHILD_BELOW17 <dbl>,
#   ACS_TOT_WORKER_NWFH <dbl>, ACS_TOT_WORKER_HH <dbl>,
#   ACS_TOT_CIVILIAN_LABOR <dbl>, ACS_TOT_CIVIL_EMPLOY_POP <dbl>,
#   ACS_TOT_POP_POV <dbl>, ACS_TOT_CIVIL_NONINST_POP_POV <dbl>, …

Next I want to verify the class is a dataframe. Otherwise, I’ll need to transform the data to make it easier to work with.

Code

class(df)

[1] "tbl_df"     "tbl"        "data.frame"

All good here.

Now on to data transformation. We will need to select only the relevant columns for this analysis.

Code

df_new <- df %>%
  select(COUNTYFIPS,
         STATE,
         COUNTY,
         AHRF_USDA_RUCC_2013,
         NOAAC_PRECIPITATION_AVG_YEARLY,
         ACS_PCT_MALE,
         ACS_PCT_FEMALE,
         ACS_PCT_AIAN,
         ACS_PCT_ASIAN,
         ACS_PCT_BLACK,
         ACS_PCT_HISPANIC,
         ACS_PCT_MULT_RACE,
         ACS_PCT_NHPI,
         ACS_PCT_OTHER_RACE,
         ACS_PCT_WHITE,
         SAIPE_MEDIAN_HH_INCOME,
         SAIPE_PCT_POV,
         ACS_PCT_COMMT_60MINUP,
         ACS_PCT_RENTER_HU_COST_50PCT,
         LTC_AVG_OBS_REHOSP_RATE) 
nrow(df_new)

[1] 3229

Code

head(df_new)

# A tibble: 6 × 20
  COUNTYF…¹ STATE COUNTY AHRF_…² NOAAC…³ ACS_P…⁴ ACS_P…⁵ ACS_P…⁶ ACS_P…⁷ ACS_P…⁸
  <chr>     <chr> <chr>  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 01001     Alab… Autau… 2          5.72    48.6    51.4    0.28    1.17   19.5 
2 01003     Alab… Baldw… 3          5.27    48.5    51.5    0.69    0.93    8.77
3 01005     Alab… Barbo… 6          5.75    52.6    47.4    0.35    0.49   47.7 
4 01007     Alab… Bibb … 1          5.61    53.7    46.3    0.05    0.25   22.6 
5 01009     Alab… Bloun… 1          5.96    49.6    50.4    0.1     0.41    1.4 
6 01011     Alab… Bullo… 6          5.40    54.8    45.2    0       1.35   68.6 
# … with 10 more variables: ACS_PCT_HISPANIC <dbl>, ACS_PCT_MULT_RACE <dbl>,
#   ACS_PCT_NHPI <dbl>, ACS_PCT_OTHER_RACE <dbl>, ACS_PCT_WHITE <dbl>,
#   SAIPE_MEDIAN_HH_INCOME <dbl>, SAIPE_PCT_POV <dbl>,
#   ACS_PCT_COMMT_60MINUP <dbl>, ACS_PCT_RENTER_HU_COST_50PCT <dbl>,
#   LTC_AVG_OBS_REHOSP_RATE <dbl>, and abbreviated variable names ¹COUNTYFIPS,
#   ²AHRF_USDA_RUCC_2013, ³NOAAC_PRECIPITATION_AVG_YEARLY, ⁴ACS_PCT_MALE,
#   ⁵ACS_PCT_FEMALE, ⁶ACS_PCT_AIAN, ⁷ACS_PCT_ASIAN, ⁸ACS_PCT_BLACK

Out of 1400+ variables, we’ve whittled them down to 20. Of those 20, we have four (4) that are unique identifiers (FIPS, State, County, and Rural-Urban Continuation Code), ten (10) socio-demographic, four (4) economic, one (1) for precipitation, and one (1) for re-hospitalization.

Before we launch into exploring these variables via descriptive statistics, first we need to determine where the NAs are and see if any of the variables will have a substantial amount of missing data.

Code

kable(colSums(is.na(df_new)))

	x
COUNTYFIPS	0
STATE	0
COUNTY	0
AHRF_USDA_RUCC_2013	9
NOAAC_PRECIPITATION_AVG_YEARLY	123
ACS_PCT_MALE	8
ACS_PCT_FEMALE	8
ACS_PCT_AIAN	8
ACS_PCT_ASIAN	8
ACS_PCT_BLACK	8
ACS_PCT_HISPANIC	8
ACS_PCT_MULT_RACE	8
ACS_PCT_NHPI	8
ACS_PCT_OTHER_RACE	8
ACS_PCT_WHITE	8
SAIPE_MEDIAN_HH_INCOME	87
SAIPE_PCT_POV	87
ACS_PCT_COMMT_60MINUP	8
ACS_PCT_RENTER_HU_COST_50PCT	8
LTC_AVG_OBS_REHOSP_RATE	410

Plenty of variables with missing data. The most concerning is – of course – our outcome variable, Re-Hospitalization Rates. This is not ideal. However, 410 / 3229 (12.6%) is not bad. That still leaves us with plenty of counties to review.

Code

df_new <- df_new %>%
  drop_na() %>%
  print(nrow(df_new))

# A tibble: 2,814 × 20
   COUNTYFIPS STATE   COUNTY          AHRF_USDA_RUCC_2013
   <chr>      <chr>   <chr>           <chr>              
 1 01001      Alabama Autauga County  2                  
 2 01003      Alabama Baldwin County  3                  
 3 01005      Alabama Barbour County  6                  
 4 01007      Alabama Bibb County     1                  
 5 01009      Alabama Blount County   1                  
 6 01011      Alabama Bullock County  6                  
 7 01013      Alabama Butler County   6                  
 8 01015      Alabama Calhoun County  3                  
 9 01017      Alabama Chambers County 6                  
10 01019      Alabama Cherokee County 6                  
   NOAAC_PRECIPITATION_AVG_YEARLY ACS_PCT_MALE ACS_PCT_FEMALE ACS_PCT_AIAN
                            <dbl>        <dbl>          <dbl>        <dbl>
 1                           5.72         48.6           51.4         0.28
 2                           5.27         48.5           51.5         0.69
 3                           5.75         52.6           47.4         0.35
 4                           5.61         53.7           46.3         0.05
 5                           5.96         49.6           50.4         0.1 
 6                           5.40         54.8           45.2         0   
 7                           5.37         45.9           54.1         0.33
 8                           5.74         48.0           52.0         0.31
 9                           5.93         47.6           52.4         0.24
10                           5.85         49.4           50.6         0.58
   ACS_PCT_ASIAN ACS_PCT_BLACK ACS_PCT_HISPANIC ACS_PCT_MULT_RACE ACS_PCT_NHPI
           <dbl>         <dbl>            <dbl>             <dbl>        <dbl>
 1          1.17         19.5              2.88              2.55         0.04
 2          0.93          8.77             4.56              2.6          0   
 3          0.49         47.7              4.44              2.09         0   
 4          0.25         22.6              2.68              0.51         0   
 5          0.41          1.4              9.28              2.24         0.1 
 6          1.35         68.6              8.1               0.74         0   
 7          1.32         44.6              1.47              1.64         0   
 8          0.81         21.2              3.85              2.49         0.29
 9          1.1          40                2.52              0.95         0.03
10          0.1           4.8              1.66              1.64         0   
   ACS_PCT_OTHER_RACE ACS_PCT_WHITE SAIPE_MEDIAN_HH_INCOME SAIPE_PCT_POV
                <dbl>         <dbl>                  <dbl>         <dbl>
 1               0.67          75.8                  67565          11.2
 2               1.56          85.4                  71135           8.9
 3               3.1           46.3                  38866          25.5
 4               0.04          76.6                  50907          17.8
 5               1.8           94.0                  55203          13.1
 6               3.13          26.2                  33124          30.8
 7               0.5           51.6                  42268          20.6
 8               1.64          73.3                  50259          14.5
 9               0.79          56.9                  39318          16.3
10               0.81          92.1                  50388          14.7
   ACS_PCT_COMMT_60MINUP ACS_PCT_RENTER_HU_COST_50PCT LTC_AVG_OBS_REHOSP_RATE
                   <dbl>                        <dbl>                   <dbl>
 1                  6.06                         26.6                    0.14
 2                  7.53                         20.8                    0.14
 3                 11.8                          22.4                    0.22
 4                 10.4                          27.4                    0.16
 5                 18.6                          22.6                    0.14
 6                 12.7                          34                      0.05
 7                  9.14                         29.5                    0.11
 8                  7.21                         20.5                    0.13
 9                  5.93                         13.0                    0.15
10                 10.2                          12.2                    0.19
# … with 2,804 more rows

2,814 x 20 is a good place to start.

This next step will conver the Rural-Urban Continuum Code to a binary Rural/Urban. If the county is coded as a four (4) or higher, then they are classified as rural, non-metro, having a population of less than 250,000 as defined by USDA as “Rural.”

Code

df_new <- df_new %>%
  mutate(Rurality = if_else(
    AHRF_USDA_RUCC_2013 >= 4, "Rural", "Urban")
    )
head(df_new)

# A tibble: 6 × 21
  COUNTYF…¹ STATE COUNTY AHRF_…² NOAAC…³ ACS_P…⁴ ACS_P…⁵ ACS_P…⁶ ACS_P…⁷ ACS_P…⁸
  <chr>     <chr> <chr>  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 01001     Alab… Autau… 2          5.72    48.6    51.4    0.28    1.17   19.5 
2 01003     Alab… Baldw… 3          5.27    48.5    51.5    0.69    0.93    8.77
3 01005     Alab… Barbo… 6          5.75    52.6    47.4    0.35    0.49   47.7 
4 01007     Alab… Bibb … 1          5.61    53.7    46.3    0.05    0.25   22.6 
5 01009     Alab… Bloun… 1          5.96    49.6    50.4    0.1     0.41    1.4 
6 01011     Alab… Bullo… 6          5.40    54.8    45.2    0       1.35   68.6 
# … with 11 more variables: ACS_PCT_HISPANIC <dbl>, ACS_PCT_MULT_RACE <dbl>,
#   ACS_PCT_NHPI <dbl>, ACS_PCT_OTHER_RACE <dbl>, ACS_PCT_WHITE <dbl>,
#   SAIPE_MEDIAN_HH_INCOME <dbl>, SAIPE_PCT_POV <dbl>,
#   ACS_PCT_COMMT_60MINUP <dbl>, ACS_PCT_RENTER_HU_COST_50PCT <dbl>,
#   LTC_AVG_OBS_REHOSP_RATE <dbl>, Rurality <chr>, and abbreviated variable
#   names ¹COUNTYFIPS, ²AHRF_USDA_RUCC_2013, ³NOAAC_PRECIPITATION_AVG_YEARLY,
#   ⁴ACS_PCT_MALE, ⁵ACS_PCT_FEMALE, ⁶ACS_PCT_AIAN, ⁷ACS_PCT_ASIAN, …

Descriptive Statistics

For our preliminary analysis, we’re going to provide summary statistics analyzing the variables relevant to our research question, excluding the socio-demographic control variables, and a visualization for each.

Code

kable(describe(df_new))

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
COUNTYFIPS*	1	2814	1.407500e+03	8.124762e+02	1407.50000	1.407500e+03	1043.009100	1.000000e+00	2814.000	2.813000e+03	0.0000000	-1.2012794	15.3161135
STATE*	2	2814	2.442928e+01	1.359471e+01	23.00000	2.441874e+01	16.308600	1.000000e+00	48.000	4.700000e+01	0.0584273	-1.2662366	0.2562759
COUNTY*	3	2814	8.369058e+02	4.694846e+02	831.50000	8.339405e+02	584.885700	1.000000e+00	1674.000	1.673000e+03	0.0526568	-1.1127220	8.8503264
AHRF_USDA_RUCC_2013*	4	2814	4.772210e+00	2.619826e+00	6.00000	4.717140e+00	2.965200	1.000000e+00	9.000	8.000000e+00	0.0039932	-1.2985960	0.0493867
NOAAC_PRECIPITATION_AVG_YEARLY	5	2814	3.611196e+00	1.628908e+00	3.62375	3.638924e+00	1.887844	2.241667e-01	9.745	9.520833e+00	-0.0899446	-0.7490480	0.0307068
ACS_PCT_MALE	6	2814	4.994695e+01	2.242017e+00	49.57000	4.966734e+01	1.171254	4.319000e+01	69.540	2.635000e+01	2.8174177	14.3096504	0.0422646
ACS_PCT_FEMALE	7	2814	5.005306e+01	2.242020e+00	50.43000	5.033267e+01	1.171254	3.046000e+01	56.810	2.635000e+01	-2.8174100	14.3095899	0.0422647
ACS_PCT_AIAN	8	2814	1.396734e+00	5.115499e+00	0.32000	4.677753e-01	0.326172	0.000000e+00	76.740	7.674000e+01	9.0423583	99.6407741	0.0964331
ACS_PCT_ASIAN	9	2814	1.394538e+00	2.540860e+00	0.66000	8.754218e-01	0.652344	0.000000e+00	37.670	3.767000e+01	6.0286489	53.6120545	0.0478982
ACS_PCT_BLACK	10	2814	9.453362e+00	1.460088e+01	2.63000	6.009964e+00	3.350676	0.000000e+00	87.790	8.779000e+01	2.2444231	5.0398860	0.2752434
ACS_PCT_HISPANIC	11	2814	9.176947e+00	1.321884e+01	4.27500	6.063504e+00	3.669435	0.000000e+00	98.900	9.890000e+01	3.1611507	11.8393833	0.2491905
ACS_PCT_MULT_RACE	12	2814	3.413696e+00	2.268010e+00	2.85500	3.082020e+00	1.638273	0.000000e+00	18.860	1.886000e+01	1.7825475	4.7682557	0.0427546
ACS_PCT_NHPI	13	2814	7.894460e-02	1.933154e-01	0.02000	3.630110e-02	0.029652	0.000000e+00	3.820	3.820000e+00	7.0468064	84.3573375	0.0036442
ACS_PCT_OTHER_RACE	14	2814	2.127616e+00	3.404236e+00	1.05000	1.416137e+00	1.156428	0.000000e+00	43.910	4.391000e+01	4.7889038	34.3014627	0.0641738
ACS_PCT_WHITE	15	2814	8.213514e+01	1.606041e+01	88.06500	8.489697e+01	11.052783	1.114000e+01	99.150	8.801000e+01	-1.5233572	2.1283704	0.3027572
SAIPE_MEDIAN_HH_INCOME	16	2814	5.738628e+04	1.442985e+04	55107.00000	5.584910e+04	11735.520300	2.599700e+04	155362.000	1.293650e+05	1.4172637	3.6789671	272.0193680
SAIPE_PCT_POV	17	2814	1.371606e+01	5.320367e+00	12.80000	1.323637e+01	4.744320	3.000000e+00	39.600	3.660000e+01	1.0397109	1.6688697	0.1002951
ACS_PCT_COMMT_60MINUP	18	2814	8.155924e+00	4.864799e+00	6.85500	7.487016e+00	3.810282	0.000000e+00	35.910	3.591000e+01	1.5015546	3.0016425	0.0917071
ACS_PCT_RENTER_HU_COST_50PCT	19	2814	2.060974e+01	6.649144e+00	20.78000	2.057608e+01	6.093486	0.000000e+00	49.260	4.926000e+01	0.1136840	0.5492796	0.1253440
LTC_AVG_OBS_REHOSP_RATE	20	2814	1.449645e-01	8.432610e-02	0.14000	1.422425e-01	0.059304	0.000000e+00	1.000	1.000000e+00	1.6404370	12.5443704	0.0015896
Rurality*	21	2814	1.393390e+00	4.885890e-01	1.00000	1.366785e+00	0.000000	1.000000e+00	2.000	1.000000e+00	0.4362437	-1.8103344	0.0092105

Following this, we’re going to stratify each variable by rurality and determine either the difference in distribution (histogram) and/or central tendency (boxplot).

Average Yearly Precipitation

Code

ggplot(df_new, aes(x = NOAAC_PRECIPITATION_AVG_YEARLY,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_histogram() +
  xlab("Average Annual Precipitation") +
  ylab("Count") +
  theme_bw()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Code

ggplot(df_new, aes(x = NOAAC_PRECIPITATION_AVG_YEARLY,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_boxplot() +
  xlab("Average Annual Precipitation") +
  ylab("Count") +
  theme_bw()

Average precipitation each month is fairly uniform, with the mean at 3.49 inches of rain, on average, each month. The mean does differ from Urban to Rural, with Rural counties being drier than their Urban counterparts, but not by much. This variable will most likely provide less variation in the analysis compared to others.

Median Household Income

Code

ggplot(df_new, aes(x = SAIPE_MEDIAN_HH_INCOME,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_histogram() +
  xlab("Median Household Income") +
  ylab("Count") +
  theme_bw()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Code

ggplot(df_new, aes(x = SAIPE_MEDIAN_HH_INCOME,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_boxplot() +
  xlab("Median Household Income") +
  ylab("Count") +
  theme_bw()

Very close to a normal distribution, if barely left-skewed. A couple of high out-liers, hovering around $90,000+ in median household income, but the mean holds at $57,465.

Rural Counties have substantially lower median household income, by roughly $10,ooo to $20,000. Urban counties also make up the bulk of out-liers.

Percent in Poverty

Code

ggplot(df_new, aes(x = SAIPE_PCT_POV,
                     fill = Rurality)) +
  geom_histogram() +
  xlab("Percent Poverty") +
  ylab("Count") +
  scale_fill_manual(values = umass_colors) +
  theme_bw()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Code

ggplot(df_new, aes(x = SAIPE_PCT_POV,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_boxplot() +
  xlab("Percent Poverty") +
  ylab("Count") +
  theme_bw()

Another close to normal distribution. Most counties have poverty rates ranging from 10% to 20%. There are of course out-liers, especially a good number below 10%, but those are rare.

Urbanized counties have substantially less percentage poverty compared to their rural counterparts, and rural counties make up the bulk of outliers post-30%. Only one Urban county has a poverty rate higher than 30%.

Percent Commuting Alone, Over 60 Minutes

Code

ggplot(df_new, aes(x = ACS_PCT_COMMT_60MINUP,
                     fill = Rurality)) +
  geom_histogram() +
  xlab("Percent Commuting Alone, Over 60 Minutes") +
  ylab("Count") +
  scale_fill_manual(values = umass_colors) +
  theme_bw()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Code

ggplot(df_new, aes(x = ACS_PCT_COMMT_60MINUP,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_boxplot() +
  xlab("Percent Commuting Alone, Over 60 Minutes") +
  ylab("Count") +
  theme_bw()

The majority of counties fall below 10% of their population commuting up to and more than 60 minutes for work. Very little variation between urban and rural counties here, which is surprising considering population density would lead one to believe rural counties travel longer on average. Perhaps the inverse is correct, in that highly remote communities means commutes are more local.

Percent Renter Housing Costs Over 50 Percent of Income

Code

ggplot(df_new, aes(x = ACS_PCT_RENTER_HU_COST_50PCT,
                     fill = Rurality)) +
  geom_histogram() +
  xlab("Percent Renter Housing Costs Over 50 Percent of Income") +
  ylab("Count") +
  scale_fill_manual(values = umass_colors) +
  theme_bw()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Code

ggplot(df_new, aes(x = ACS_PCT_RENTER_HU_COST_50PCT,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_boxplot() +
  xlab("Percent Renter Housing Costs Over 50 Percent of Income") +
  ylab("Count") +
  theme_bw()

This is a startling figure. On average, 20% of counties have renters where 50% or more of their income goes toward housing costs. These leaves little to no room for other expenses and drives economic instability. The data is normally distributed and barely left-skewed – but still an item to consider with further analysis.

Urban counties also have a higher renter housing burden on average, with rural counties having more out-liers.

Re-hospitalization Rate

Code

ggplot(df_new, aes(x = LTC_AVG_OBS_REHOSP_RATE,
                   fill = Rurality)) +
  geom_histogram() +
  xlab("Re-Hospitalization Rate") +
  ylab("Count") +
  scale_fill_manual(values = umass_colors) +
  theme_bw()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Code

ggplot(df_new, aes(x = LTC_AVG_OBS_REHOSP_RATE,
                   fill = Rurality)) +
  scale_fill_manual(values = umass_colors) +
  geom_boxplot() +
  xlab("Re-Hospitalization Rate") +
  ylab("Count") +
  theme_bw()

Another right skewed variable. Lots of counties with 0.00 rates of re-hospitalization, and few, if any, above 0.50 per 100,000 people. The same can be said when comparing urban vs. rural, as there seems to be little variation. Rural counties do have more out-liers, as to be expected with low access to healthcare and quality healthcare services.From a health perspective, this is good news! From a research perspective, that’s going to make analysis a little trickier. However, the somewhat normal and/or bimodal distribution should be fairly easy to work with. While needing some transformation for a linear regression, we can test multiple models per each variable to determine which amendment provides the most robust inference.

Analysis

Hypothesis Testing

Remember that the hypothesis for this research report is:

Precipitation has a positive impact on re-hospitalization rates within rural counties.

We have fifteen (15) explanatory variables to work with, so we can run different regressions to determine what variables influence re-hospitalization rates the most – if at all – and how they interact with other variables.

Reminder that the fifteen (15) explanatory and control variables are broken down into three domains: the primary independent and dependent variables, socio-demographic variables, and economic variables.

Independent and Dependent variables:

Average Annual Precipitation
Re-Hospitalization Rates

Socio-demographic variables:

Percent Male
Percent Female
Percent American Indian
Percent Asian
Percent Black
Percent Hispanic
Percent Multiple Races
Percent Other Race
Percent White

Economic variables:

Median Household Income
Percent Poverty
Percent Rental Housing Cost, over 50%
Percent Commuting Alone, over 60 minutes

We will run four models to test the hypothesis. They shall examine each environmental variable’s impact on the dependent variable, re-hospitalization rates. The control variables will be the socio-demographic and economic variables, thirteen (13) in total.

Model 1 will take just the independent and dependent variables by themselves.

Model 2 will add socio-demographic variables as control.

Model 3 will add economic variables as control.

Model 4 will add both socio-demographic and economic variables as control.

Code

df_new <- df_new %>%
  filter(Rurality == "Rural") 

model1 <- lm(LTC_AVG_OBS_REHOSP_RATE ~ 
               NOAAC_PRECIPITATION_AVG_YEARLY,
             df_new)

model2 <- lm(LTC_AVG_OBS_REHOSP_RATE ~ 
               NOAAC_PRECIPITATION_AVG_YEARLY +
               ACS_PCT_MALE +
               ACS_PCT_FEMALE +
               ACS_PCT_AIAN +
               ACS_PCT_ASIAN +
               ACS_PCT_BLACK +
               ACS_PCT_HISPANIC +
               ACS_PCT_MULT_RACE +
               ACS_PCT_NHPI +
               ACS_PCT_OTHER_RACE +
               ACS_PCT_WHITE,
             df_new)
  
model3 <- lm(LTC_AVG_OBS_REHOSP_RATE ~ 
               NOAAC_PRECIPITATION_AVG_YEARLY +
               SAIPE_MEDIAN_HH_INCOME +
               SAIPE_PCT_POV +
               ACS_PCT_COMMT_60MINUP +
               ACS_PCT_RENTER_HU_COST_50PCT,
             df_new)
  
model4 <- lm(LTC_AVG_OBS_REHOSP_RATE ~ 
               NOAAC_PRECIPITATION_AVG_YEARLY +
               ACS_PCT_MALE +
               ACS_PCT_FEMALE +
               ACS_PCT_AIAN +
               ACS_PCT_ASIAN +
               ACS_PCT_BLACK +
               ACS_PCT_HISPANIC +
               ACS_PCT_MULT_RACE +
               ACS_PCT_NHPI +
               ACS_PCT_OTHER_RACE +
               ACS_PCT_WHITE + 
               SAIPE_MEDIAN_HH_INCOME +
               SAIPE_PCT_POV +
               ACS_PCT_COMMT_60MINUP +
               ACS_PCT_RENTER_HU_COST_50PCT,
             df_new)

Let’s plot this regression. We will employ log transformations for re-hospitalization rates, as it is skewed.

Code

ggplot(df_new, aes(NOAAC_PRECIPITATION_AVG_YEARLY,
                   log(LTC_AVG_OBS_REHOSP_RATE))) +
  geom_point(color = "#881c1c",
             alpha = 0.3) +
  geom_smooth(method = lm,
              se = FALSE,
              fullrange = TRUE,
              color= "#212721") +
  labs(title = "Simple Linear Regression Model, Precipitation Against Re-Hospitalization",
       x = "Average Annual Precipitation",
       y = "Re-Hospitalization Rates") +
  theme_bw()

`geom_smooth()` using formula 'y ~ x'

Warning: Removed 206 rows containing non-finite values (stat_smooth).

We will reject the null hypothesis. While it looks like, at first glance, that there is little positive relationship, we can at least note that there is some positive relationship. In the next two sections, we will dig deeper into each model, examining the p-value, R-Squared value, and PRESS statistic to see what level of relationship is present.

Model Comparisons

Now we will compare the four (4) models in more depth.

Code

stargazer(model1,
          model2,
          model3,
          model4,
          type = "pdf",
          title = "Regression Results, Models 1 - 4",
          out = "models.pdf")


% Error: 'style' must be either 'latex' (default), 'html' or 'text.'

Code

summary(model1)


Call:
lm(formula = LTC_AVG_OBS_REHOSP_RATE ~ NOAAC_PRECIPITATION_AVG_YEARLY, 
    data = df_new)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.18242 -0.05750 -0.00742  0.04596  0.88653 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    0.094490   0.005200   18.17   <2e-16 ***
NOAAC_PRECIPITATION_AVG_YEARLY 0.013643   0.001348   10.12   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09446 on 1705 degrees of freedom
Multiple R-squared:  0.05667,   Adjusted R-squared:  0.05612 
F-statistic: 102.4 on 1 and 1705 DF,  p-value: < 2.2e-16

Code

summary(model2)


Call:
lm(formula = LTC_AVG_OBS_REHOSP_RATE ~ NOAAC_PRECIPITATION_AVG_YEARLY + 
    ACS_PCT_MALE + ACS_PCT_FEMALE + ACS_PCT_AIAN + ACS_PCT_ASIAN + 
    ACS_PCT_BLACK + ACS_PCT_HISPANIC + ACS_PCT_MULT_RACE + ACS_PCT_NHPI + 
    ACS_PCT_OTHER_RACE + ACS_PCT_WHITE, data = df_new)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.19955 -0.05549 -0.00717  0.04516  0.86871 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    -6.269e+02  9.443e+02  -0.664  0.50686    
NOAAC_PRECIPITATION_AVG_YEARLY  1.336e-02  1.692e-03   7.893 5.27e-15 ***
ACS_PCT_MALE                    6.475e+00  9.448e+00   0.685  0.49324    
ACS_PCT_FEMALE                  6.476e+00  9.448e+00   0.685  0.49315    
ACS_PCT_AIAN                   -2.056e-01  3.140e-01  -0.655  0.51259    
ACS_PCT_ASIAN                  -2.062e-01  3.138e-01  -0.657  0.51126    
ACS_PCT_BLACK                  -2.056e-01  3.139e-01  -0.655  0.51254    
ACS_PCT_HISPANIC                7.863e-04  2.594e-04   3.031  0.00247 ** 
ACS_PCT_MULT_RACE              -2.057e-01  3.140e-01  -0.655  0.51244    
ACS_PCT_NHPI                   -2.157e-01  3.143e-01  -0.686  0.49253    
ACS_PCT_OTHER_RACE             -2.087e-01  3.140e-01  -0.665  0.50627    
ACS_PCT_WHITE                  -2.059e-01  3.139e-01  -0.656  0.51195    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09415 on 1695 degrees of freedom
Multiple R-squared:  0.06839,   Adjusted R-squared:  0.06234 
F-statistic: 11.31 on 11 and 1695 DF,  p-value: < 2.2e-16

Code

summary(model3)


Call:
lm(formula = LTC_AVG_OBS_REHOSP_RATE ~ NOAAC_PRECIPITATION_AVG_YEARLY + 
    SAIPE_MEDIAN_HH_INCOME + SAIPE_PCT_POV + ACS_PCT_COMMT_60MINUP + 
    ACS_PCT_RENTER_HU_COST_50PCT, data = df_new)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.24663 -0.05770 -0.00557  0.04577  0.86249 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     7.230e-02  3.187e-02   2.269  0.02342 *  
NOAAC_PRECIPITATION_AVG_YEARLY  1.045e-02  1.600e-03   6.531 8.62e-11 ***
SAIPE_MEDIAN_HH_INCOME         -5.292e-08  4.180e-07  -0.127  0.89929    
SAIPE_PCT_POV                   2.220e-03  7.671e-04   2.895  0.00384 ** 
ACS_PCT_COMMT_60MINUP          -1.130e-03  5.499e-04  -2.056  0.03996 *  
ACS_PCT_RENTER_HU_COST_50PCT    6.144e-04  3.443e-04   1.784  0.07456 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09366 on 1701 degrees of freedom
Multiple R-squared:  0.07482,   Adjusted R-squared:  0.0721 
F-statistic: 27.51 on 5 and 1701 DF,  p-value: < 2.2e-16

Code

summary(model4)


Call:
lm(formula = LTC_AVG_OBS_REHOSP_RATE ~ NOAAC_PRECIPITATION_AVG_YEARLY + 
    ACS_PCT_MALE + ACS_PCT_FEMALE + ACS_PCT_AIAN + ACS_PCT_ASIAN + 
    ACS_PCT_BLACK + ACS_PCT_HISPANIC + ACS_PCT_MULT_RACE + ACS_PCT_NHPI + 
    ACS_PCT_OTHER_RACE + ACS_PCT_WHITE + SAIPE_MEDIAN_HH_INCOME + 
    SAIPE_PCT_POV + ACS_PCT_COMMT_60MINUP + ACS_PCT_RENTER_HU_COST_50PCT, 
    data = df_new)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23229 -0.05497 -0.00634  0.04535  0.87521 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    -6.595e+02  9.379e+02  -0.703 0.482070    
NOAAC_PRECIPITATION_AVG_YEARLY  1.051e-02  1.923e-03   5.466  5.3e-08 ***
ACS_PCT_MALE                    6.871e+00  9.384e+00   0.732 0.464161    
ACS_PCT_FEMALE                  6.873e+00  9.384e+00   0.732 0.464015    
ACS_PCT_AIAN                   -2.773e-01  3.125e-01  -0.888 0.374901    
ACS_PCT_ASIAN                  -2.771e-01  3.123e-01  -0.887 0.375049    
ACS_PCT_BLACK                  -2.772e-01  3.124e-01  -0.887 0.375121    
ACS_PCT_HISPANIC                4.957e-04  2.666e-04   1.859 0.063214 .  
ACS_PCT_MULT_RACE              -2.770e-01  3.125e-01  -0.886 0.375500    
ACS_PCT_NHPI                   -2.857e-01  3.128e-01  -0.913 0.361214    
ACS_PCT_OTHER_RACE             -2.792e-01  3.124e-01  -0.894 0.371696    
ACS_PCT_WHITE                  -2.769e-01  3.124e-01  -0.886 0.375604    
SAIPE_MEDIAN_HH_INCOME          2.885e-07  4.462e-07   0.646 0.518089    
SAIPE_PCT_POV                   3.312e-03  9.805e-04   3.378 0.000747 ***
ACS_PCT_COMMT_60MINUP          -1.156e-03  5.607e-04  -2.062 0.039362 *  
ACS_PCT_RENTER_HU_COST_50PCT    5.089e-04  3.520e-04   1.446 0.148414    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09347 on 1691 degrees of freedom
Multiple R-squared:  0.08393,   Adjusted R-squared:  0.07581 
F-statistic: 10.33 on 15 and 1691 DF,  p-value: < 2.2e-16

We want to look for two items in particular: 1) P-values that reach significant code of *** and 2) The highest Adjusted R-squared value.

Model 1’s Precipitation variable has the highest P-value compared to the other models, and the lowest Adjusted R-squared value.

Model 2’s significant variables are Precipitation and Hispanic Rate. However, the Hispanic Rate does not meet the significant code of ***. The Adjusted R-squared value is only slightly higher than Model 1.

Model 3’s significant variables are Precipitation, Percent Poverty, and Commuting. However, neither Percent Poverty nor Commuting meet ***. The Adjusted R-squared value is close to Model 4, so it appears that the economic variables have a higher impact over socio-demographic measures.

Model 4’s significant variables are Precipitation, Percent Poverty, and Commuting. Precipitation has the lowest P-value of the four models here, and Percent Poverty meets the significance threshold of ***. The Adjusted R-squared value is the highest here, though only at 0.076.

Finally, we’re going to calculate the PRESS statistic (Predicted Residual Sum of Squares) to best determine which model can predict the response variable based upon the explanatory variables.

Code

PRESS <- function(model) {
  i <- residuals(model)/(1 - lm.influence(model)$hat)
  sum(i^2)
}

PRESS(model1)

[1] 15.25297

Code

PRESS(model2)

[1] Inf

Code

PRESS(model3)

[1] 15.05951

Code

PRESS(model4)

[1] Inf

Model 3 has the lowest PRESS score, though Model 4 is not being calculated for some reason. This is probably due to an error with a socio-demographic variable.

Due to a strong p-value, Adjusted R-squared value, and model fit compared to the other three models, Model 4 will be chosen as the final model for the diagnostic exploration.

Diagnostics

Finally, we’ll plot the diagnostics to best understand the model.

Code

par(mfrow = c(2,3));
plot(model4, 
     which = 1:6) +
  theme_bw()

Warning: not plotting observations with leverage one:
  44

Warning: not plotting observations with leverage one:
  44

NULL

Code

ggsave("diagnostics.pdf",
       device = "pdf")

Saving 7 x 5 in image

`geom_smooth()` using formula 'y ~ x'

Warning: Removed 206 rows containing non-finite values (stat_smooth).

Of the six plots, Cook’s distance is the most striking and relevant, as there are three out-liers: 683, 1345, and 1508.

Code

df_new %>%
  filter(COUNTYFIPS == "26097" | COUNTYFIPS == "46079" | COUNTYFIPS == "48385") %>%
  head()

# A tibble: 3 × 21
  COUNTYF…¹ STATE COUNTY AHRF_…² NOAAC…³ ACS_P…⁴ ACS_P…⁵ ACS_P…⁶ ACS_P…⁷ ACS_P…⁸
  <chr>     <chr> <chr>  <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 26097     Mich… Macki… 7          3.13    51.2    48.8   16.4     0.73    3.46
2 46079     Sout… Lake … 6          1.46    52.0    48.0    2.12    0.05    0.39
3 48385     Texas Real … 9          1.81    44.9    55.1    0       0.06    1.17
# … with 11 more variables: ACS_PCT_HISPANIC <dbl>, ACS_PCT_MULT_RACE <dbl>,
#   ACS_PCT_NHPI <dbl>, ACS_PCT_OTHER_RACE <dbl>, ACS_PCT_WHITE <dbl>,
#   SAIPE_MEDIAN_HH_INCOME <dbl>, SAIPE_PCT_POV <dbl>,
#   ACS_PCT_COMMT_60MINUP <dbl>, ACS_PCT_RENTER_HU_COST_50PCT <dbl>,
#   LTC_AVG_OBS_REHOSP_RATE <dbl>, Rurality <chr>, and abbreviated variable
#   names ¹COUNTYFIPS, ²AHRF_USDA_RUCC_2013, ³NOAAC_PRECIPITATION_AVG_YEARLY,
#   ⁴ACS_PCT_MALE, ⁵ACS_PCT_FEMALE, ⁶ACS_PCT_AIAN, ⁷ACS_PCT_ASIAN, …

683 is Mackinac County, Michigan, one of the most remote areas in the Great Lakes, with a population of 10,906.

1345 is Mellette County, South Dakota, with a population of 1,908.

1508 is Real County, Texas, a county with only one incorporated area and a population of 2,826.

Otherwise, the plot looks good.

Normal Q-Q violates this test, as the points at the right tail of the plot do not generally fall along the line. This is very apparent for our three out-liers. Cook’s Distance vs. Leverage violates this test as well. The remaining plots do not violate their tested assumptions and further cement the model’s reliability.

Conclusion

This paper explored the relationship between re-hospitalization rates and precipitation, when controlling for common variables that regularly influence the dependent variable.

Four models were selected, one for each variable, to best determine which measure best impacted re-hospitalization rates. In all of the models, precipitation was statistically significant. While the adjusted R-squared value is negligible (0.076), there is a positive relationship that is statistically significant. Therein we see some form of an influence large amounts of annual precipitation has on re-hospitalization rates.

References

Barnett, M., Hsu, J. & McWilliams, M. (2015). “Patient Characteristics and Differences in Hospital Readmission Rates.” JAMA Intern Med., 175(11): 1803-1812.

Bhosale KH, Nath RK, Pandit N, Agarwal P, Khairnar S, Yadav B, & Chandrakar S. (2020). “Rate of Rehospitalization in 60 Days of Discharge and It’s Determinants in Patients with Heart Failure with Reduced Ejection Fraction in a Tertiary Care Centre in India.” Int J Heart Fail. 21;2(2):131-144.

Li, Y., Cai, X. & Glance, L. (2015). “Disparities in 30-day rehospitalization rates among Medicare skilled nursing facility residents by race and site of care.” Med Care, 53(12): 1058-1065.

Murray, F., Allen, M., Clark, C., Daly, C. & Jacobs, D. (2021). “Socio-demographic and -economic factors associated with 30-day readmission for conditions targeted by the hospital readmissions reduction program: a population-based study.” BMC Public Health, 21.

Rizmie, D., Preuz, L., Miraldo, M. & Atun, R. (2022). “Impact of extreme temperatures on emergency hospital admissions by age and socio-economic deprivation in England.” Social Science & Medicine, 308.

Tapak, L., Maryanaji, Z., Hamidi, O., Abbasi, H. & Najafi-Vosough, R. (2018). “Investigating the effect of climatic parameters on mental disorder admissions.” International Journal of Biometerology, 62, 2109-2118.

Trinh, T., Feeny, S. & Posso, A. (2018). “Rainfall shocks and child health: the role of parental mental health.” Climate and Development, 13(1), 34-48.

Spatz, E., Bernheim, S., Horwitz, L. & Herrin, J. (2020). Community factors and hospital wide readmission rates: Does context matter? PLoS One, 15(10).