DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

DACSS 601: Florida Homelessness

  • Final materials
    • Fall 2022 posts
    • final Posts

On this page

  • Homelessness in Florida
  • Introduction
  • Visualizations and Analysis
  • Reflection
  • References
  • Codebook

DACSS 601: Florida Homelessness

  • Show All Code
  • Hide All Code

  • View Source

Quantitative Review by County

finalproject
shelton
homelessness
Author

Dane Shelton

Published

December 18, 2022

Homelessness in Florida

florida_1820.csv contains population figures, homelessness counts, poverty counts and other demographic indicators3 at the county level from 2018 to 2020. All 67 Florida counties have data for the 3 years, resulting in 201 observations of 15 variables. Each observation provides a count (or rate) for all variables from a single county for a year. The variables selected closely mirror the stressors mentioned in the reference study (Zugazaga), along with supplementary variables in an attempt to completely capture the circumstances.

Homelessness is a complex living situation with several qualifying conditions; at its simplest state, the U.S Dept. of Housing and Urban Development defines it as lacking a fixed, regular nighttime residence (not a shelter) or having a nighttime residence not designed for human accommodation1.

On a single night in 2020, over 500,0002 people experienced homelessness in the United States. Florida - the third largest state by population - had the fourth largest homeless population of 2020 with 27,4872.

Florida counties represent a wide range of demographic profiles; the state is a hub to a variety of industries including tourism, defense, agriculture, and information technology. Investigating homelessness in Florida counties with robust data can lead to several conclusions about who is being impacted where, and how state policy is failing (or aiding) groups of a diverse population.

Introduction

  • Research Question
  • Hypothesis
  • Data Read-in and Tidying

Carole Zugazaga’s 2004 study of 54 single homeless men, 54 single homeless women, and 54 homeless women with children in the Central Florida area investigated stressful life events common among homeless people. Interview responses revealed that women were more likely to have been sexually or physically assaulted, while men were more likely to have been incarcerated or abuse drugs/alcohol. Homeless women with children were more likely to be in foster care as a youth.

Nearly a decade later, county-level data can be used to investigate the relationship between Zugazaga’s reported stressful life events (incarceration, drug arrests, poverty, forcible sex, foster care)3 and homelessness rates.

Research Question

Do particular life stressors increase a population’s vulnerability to homelessness?

Homelessness is not a new issue in the United States, yet homeless policy targets elimination via criminalization rather than prevention. A 2019 article from Homeless Voice provides a brief description of homeless policy in major citites across Florida. Despite state and federal governments being aware of the circumstances that increase vulnerability to homelessness for decades, I anticipate at least one Zugazaga’s five stressors to remain significant in a model relating stressors to Florida homelessness counts 2018-2020.

Research Hypothesis

H0: All stressors are insignificant in predicting homelessness counts ( Bi = 0 for i=0,1,2,…n )

HA: At least one stressor Bi is significant in predicting homelessness counts

The data were collected from the Florida Department of Health. Variable names3 were used as search indicators to produce counts for Florida counties. Unfortunately, we cannot accurately analyze the effect of COVID-19 as data is incomplete for the majority of counties in 2021.

Read in and tidying were done in a separate file. The process began as laborious, but was lightened by the discovery of 10 year tables. Before this discovery, I would enter a variable name into the search bar of Florida Department of Health, select 2018, and download the .xlsx file to my personal data folder. readxl was used to bring in the file, and mutate created a date column filled with 2018. Once the data was appropriate, I saved it in the environment under variable2018. This was repeated for years 2019 and 2020. Then all three tibbles were full _join by County to provide a dataset with 201 observations (67 counties by 3 years) and three variables, County, Year, and the variable measurement itself . This was written as a .csv back to the same personal data folder.

Once learning that I could draw 10-year tables from the website rather than having to download three individual .xlsx files, the process became smoother. Now, I would only rename excess years as “delete” to get a table of measurements for only 2018, 2019, and 2020. pivot_longer then moved the years to a self-titled column and transferred the values to a column of my naming. This was written as a .csv.

I then merged all the tables and wrote this as a .csv florida_full. I completed a sanity check using distinct county names to ensure I had 201 observations of 15 variables as desired.

Intro to Data Tables
    County               Year      Homeless (Count)   Population     
 Length:201         Min.   :2018   Min.   :   0.0   Min.   :   8367  
 Class :character   1st Qu.:2018   1st Qu.:  11.0   1st Qu.:  28089  
 Mode  :character   Median :2019   Median : 151.0   Median : 130642  
                    Mean   :2019   Mean   : 427.8   Mean   : 317746  
                    3rd Qu.:2020   3rd Qu.: 563.0   3rd Qu.: 367471  
                    Max.   :2020   Max.   :3516.0   Max.   :2864600  
                                                                     
 Unemployment Rate   Median Inc    Incarceration (Rateper1000) Poverty (Count) 
 Min.   : 2.100    Min.   :34583   Min.   : 0.60               Min.   :   906  
 1st Qu.: 3.400    1st Qu.:41401   1st Qu.: 2.50               1st Qu.:  4901  
 Median : 4.000    Median :50640   Median : 3.40               Median : 16210  
 Mean   : 4.697    Mean   :51116   Mean   : 3.84               Mean   : 42922  
 3rd Qu.: 5.600    3rd Qu.:58093   3rd Qu.: 4.50               3rd Qu.: 46034  
 Max.   :13.500    Max.   :83803   Max.   :18.60               Max.   :482656  
                                                                               
 Drug Arrests (Count) Relocated (Rate) Sub Abuse Enrollment (Count)
 Min.   :   13        Min.   : 4.689   Min.   :   5.0              
 1st Qu.:  225        1st Qu.:11.244   1st Qu.:  76.0              
 Median :  729        Median :12.700   Median : 250.0              
 Mean   : 1558        Mean   :13.288   Mean   : 877.6              
 3rd Qu.: 1903        3rd Qu.:14.544   3rd Qu.:1030.0              
 Max.   :13038        Max.   :22.553   Max.   :6272.0              
                                                                   
 Adult Psych Beds (Count) Severe Housing Problems (Rate) Forcible Sex (Count)
 Min.   :  0.00           Min.   : 9.6                   Min.   :   0.0      
 1st Qu.:  0.00           1st Qu.:13.3                   1st Qu.:  14.0      
 Median :  0.00           Median :15.4                   Median :  45.0      
 Mean   : 66.26           Mean   :15.8                   Mean   : 170.5      
 3rd Qu.: 84.00           3rd Qu.:17.3                   3rd Qu.: 225.0      
 Max.   :778.00           Max.   :29.8                   Max.   :1408.0      
                          NA's   :134                                        
 Foster Care (Count)
 Min.   :   3.0     
 1st Qu.:  33.0     
 Median : 153.0     
 Mean   : 326.1     
 3rd Qu.: 353.0     
 Max.   :2289.0     
                    

Expanding Intro to Data exposes summary statistics including mean, range, quantiles, and standard deviation for all 15 variables. The table below the summaries provides arranged figures for basic parameters of interest grouped by county. The data was nearly tidy, with complete observations for each variable aside from one - Severe Housing Issues rates for 2019 and 2020 (unrecorded) - these are later filled in with 2018’s value for the regression analysis.

Visualizations and Analysis

  • Visualization
  • Regression, Diagnostics, and Model Selection

ggplot2 is used to visualize important relationships between homeless counts and Zugazaga’s stressors. The Florida counties have been categorized into 4 Regions and 3 Income Levels:

Region
  • Northwest: Escambia County to Madison County; cities include Pensacola, Panama City Beach, and Tallahassee

  • North: Hamilton County to Marion County; cities include Jacksonville, Gainseville,and Ocala, St. Augustine

  • Central: Lake County to Okeechobee County; cities include Orlando, Kissimmee, Tampa, St. Petersburg

  • South: Sarasota County to Miami-Dade County; cities include Ft. Lauderdale, Ft. Myers, Miami, Boca Raton, West Palm Beach

Income Level
  • High: Median Income >= 60000

  • Medium: Median Income >= 40000

  • Low: Median Income < 40000

Code

# Categorize by regions... conditional better perhaps?


florida_og_plot <- florida_og_rates %>%
                    mutate('Region' = case_when(County == 'Escambia' |
                                              County == 'Santa Rosa'| 
                                              County == 'Okaloosa'| 
                                              County == 'Walton'| 
                                              County == 'Holmes'| 
                                              County == 'Washington'| 
                                              County == 'Bay'| 
                                              County == 'Jackson'| 
                                              County == 'Calhoun'| 
                                              County == 'Gulf'| 
                                              County == 'Gadsden'| 
                                              County == 'Escambia'| 
                                              County == 'Liberty'| 
                                              County == 'Leon'| 
                                              County == 'Wakulla'| 
                                              County == 'Franklin' | 
                                              County == 'Jefferson'| 
                                              County == 'Madison'| 
                                              County == 'Taylor' ~ 
'Northwest',
                                              County == 'Hamilton'| 
                                              County == 'Suwannee'| 
                                              County == 'Lafayette'| 
                                              County == 'Dixie'| 
                                              County == 'Gilchrist'| 
                                              County == 'Union'| 
                                              County == 'Baker'| 
                                              County == 'Columbia'| 
                                              County == 'Nassau'| 
                                              County == 'Levy'| 
                                              County == 'Bradford'| 
                                              County == 'Alachua'| 
                                              County == 'Nassau'| 
                                              County == 'Duval'| 
                                              County == 'Putnam'| 
                                              County == 'Marion'| 
                                              County == 'Volusia'| 
                                              County == 'Flagler'| 
                                              County == 'Citrus'| 
                                              County == 'Clay'| 
                                              County == 'St. Johns' ~ 
'North',
                                              County == 'Lake'|
                                              County == 'Sumter'|
                                              County == 'Seminole'|
                                              County == 'Orange'|
                                              County == 'Hernando'|
                                              County == 'Pasco'|
                                              County == 'Brevard'|
                                              County == 'Indian River'|
                                              County == 'Pinellas'|
                                              County == 'Hillsborough'|
                                              County == 'Polk'|
                                              County == 'Osceola'|
                                              County == 'Hardee'|
                                              County == 'Manatee'|
                                              County == 'Okeechobee'|
                                              County == 'Highlands' ~ 
'Central',
                                              County == 'St. Lucie'|
                                              County == 'Sarasota'|
                                              County == 'Martin'|
                                              County == 'Palm Beach'|
                                              County == 'Collier'|
                                              County == 'Broward'|
                                              County == 'Lee'|
                                              County == 'DeSoto'|
                                              County == 'Charlotte'|
                                              County == 'Hendry'|
                                              County == 'Monroe'|
                                              County == 'Miami-Dade'|
                                              County == 'Glades'|
                                              County == 'Hendry' 
~ 'South'))

# Categorize by Median Income Level

florida_og_plot <- florida_og_plot %>%
                    mutate('Income Level' = case_when(
                    `Median Inc` >= 60000 ~ 'High',
                    `Median Inc` < 60000 &
                    `Median Inc` >= 40000 ~ 'Medium',
                    `Median Inc` < 40000 ~ 'Low'))
a) Interactive Summary
Code
# Plot 1 
# Group by and Summarize to extract averages
florida_int <- florida_og_plot%>%
                group_by(County, Region)%>%
                  summarize('Homeless' = mean(`Homeless (Count)`),
                              'Population' = mean(Population))%>%

# Arrange before applying factor levels to retain ordering in plot
              arrange(desc(Population))%>%
                mutate(County = factor(County, County))%>%
# Prep Text for tooltip 
              mutate(text = paste("County: ", County, "\nRegion: ", Region, "\nPopulation: ", Population, "\nHomeless: ", Homeless, sep=""))

# Making interactive plot

florida_interactive <- ggplot(data = florida_int, aes(x=Population, y=Homeless, size = Population, color = Region, text=text)) +
    geom_point(alpha=0.7)+
      scale_size(range = c(1.0, 10.0), name= "") 
       # scale_fill_brewer(palette = 'Set2')
          
          

# Complete interactive plot 
florida_interactive_full <- ggplotly(florida_interactive, tooltip="text")

    
florida_interactive_full
  • Use the interactive plot to gain an idea of the size of counties in Florida and their homeless counts.

  • Hover over a point to see the details, and double-click an item in the legend to isolate the region.

b) Homeless Rate by Region
Code
# Plot 2

florida_box <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100 )%>%
      mutate(`Region`=fct_relevel(`Region`, "Northwest", "North", "Central", "South"))%>%
      ggplot(aes(y=`Homeless (Rate)`, x=`Region`, fill=`Region`)) +
        geom_boxplot(alpha=0.7)+
  #Scale of y axis
          scale_y_continuous(breaks=(seq(0,1.5,by=.25)))+
  # Dimensions of graph
            coord_cartesian(ylim=c(0,1.5)) +
              coord_flip()+
                scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
                                     theme(legend.position = "none")+
                                        labs(title= "Florida Homeless Rates", 
                                             subtitle="2018-2020", 
                                              x= " ", 
                                                y= "Homeless Rate (%)", 
                                            caption = "Visualized by Region")

florida_box

  • A look at the distributions of homeless rates across Florida counties displays where the highest rates in the state exists.

  • The state is generally uniform, with the bulk of each region’s distribution sitting below 0.25% of county populations.

    • The largest difference is between the distributions of Northwest Florida and South Florida. This can be attributed to the stark contrast in where the population in these regions are living.

    • South Florida is the most urbanized region in the state, with millions living in Miami, Ft. Lauderdale, and West Palm Beach; Northwest Florida is quite the opposite, with small coastal towns and rural inland towns reminiscent of Southern Alabama or Georgia.

c) Homeless Rate by Income Level
Code

# Plot 3
#Relevel Income, Mutate Response 
florida_income <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100)%>%
      mutate(`Income Level` = fct_relevel(`Income Level`, "Low", "Medium", "High"))%>%
      ggplot(aes(y=`Homeless (Rate)`, x=`Region`,
                  fill=`Income Level`, 
                    )) +
      geom_bar(position='dodge', stat='identity')+
          
              scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
                                     
                    labs(title= "Barplot: Income Level x Homeless Rate", 
                           subtitle="2018-2020", 
                              x= "Region", 
                                y= "Homeless Rate (%)", 
                                  caption = "Visualized by Income Level")

florida_income

  • The barplot further details the differences mentioned in Plot 2. Regions where the populations are less urbanized show Low Income counties reflecting a higher homeless rate as one would assume.

  • Once entering South Florida, wealth disparities in urbanized areas breaks this trend -counties with high income now report high homeless rates

d) Homeless Rate x Incarceration Rate per1000
Code
#Plot 4

florida_incarc <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100,
           `Incarceration (Rateper1000)`= `Incarceration (Rateper1000)`/10)%>%
      ggplot(aes(y=`Homeless (Rate)`, 
                  x=`Incarceration (Rateper1000)`, 
                    color=`Region`, 
                      label=`County`)) +
      geom_point(alpha=0.7)+
  # Must include because of label aesthetic
      ggrepel::geom_text_repel(show.legend = FALSE, 
                  max.overlaps = 15, 
                  alpha=0.7,
                    size=2.5, 
                      nudge_x = -.05, 
                        nudge_y =.05)+
          
              scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
              # Titles, Axes, Caption                   
                    labs(title= "Scatterplot: Incarceration x Homeless Rate", 
                           subtitle="2018-2020", 
                              x= "Incarceration Rate (per100)", 
                                y= "Homeless Rate (%)", 
                                  caption = "Visualized by Region")

florida_incarc

  • One of Zugazaga’s male stressors Incarceration Rate is illustrated with Homeless Rate; a positive trend exists relating incarceration rates and homelessness rates across Florida counties.

    • Many state correctional facilities are located in rural counties, explaining both Baker and Monroe observations’ large influence on the plot.
e) Homeless Rate x Drug Arrest Rate
Code
#Plot 4

florida_drug <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100,
           `Drug Arrests (Rate)`= `Drug Arrests (Rate)`*100)%>%
      ggplot(aes(y=`Homeless (Rate)`, 
                  x=`Drug Arrests (Rate)`, 
                    color=`Region`, 
                      label=`County`)) +
      geom_point(alpha=0.7)+
  # Must include because of label aesthetic
      ggrepel::geom_text_repel(show.legend = FALSE, 
                  max.overlaps = 15, 
                  alpha=0.7,
                    size=2.5, 
                      nudge_x = -.05, 
                        nudge_y =.05)+
          
              scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
                                     
                    labs(title= "Scatterplot: Drug Arrest Rate x Homeless Rate", 
                           subtitle="2018-2020", 
                              x= "Drug Arrests (Rate)", 
                                y= "Homeless Rate (%)", 
                                  caption = "Visualized by Region")

florida_drug

  • A similar positive association is seen comparing the homeless rate of a county with its Drug Arrest (Rate).

    • The high influence of Northwest Florida is likely due to stricter drug policies held by police in less urbanized areas.

on Assumption of Validity

While over 10 variables are predicting Homeless (Rate) across Florida counties, there are still limitations when attempting to comment on the magnitude of an individual stressor. Stressors influence homelessness by driving those in severe situations out of their home or away from their place of origin. Homeless (Rate) is not an ideal measure of magnitude as the homeless population migrating to escape or avoid certain stressors would result in counties with low stressor values having a higher homeless population; this effect is left unexplained by the following models.

  • The variable Relocated (Rate) is included as an attempt to control for new movement, however this doesn’t completely capture county-to-county migration.

  • FL Charts has data that records Population Who Lived in a Different County One Year Earlier, however with the data spanning 2009-2014, using values recorded 4 years prior to our data isn’t desirable either.

  • The most appropriate data to accurately capture county-to-county migration is here via the US Census Bureau. The -In, -Out, -Net... spreadsheet provides totals for each county in the United States and movement to all other US counties; unfortunately, this data is too complex to wrangle into the simple data set florida_1820.csv.

on Assumption of Linearity

Code
# Fit 1: A Linear Regression Model With All Vars

# Checking Linearity of variables not supported by our literature
# Correlation Matrix
florida_matrix <- florida_og_rates %>%
                    select(-c(contains("Count"), 
                              'Year', 
                              'Poverty (Rate)', 
                              'Severe Housing Problems (Rate)',
                              'Incarceration (Rateper1000)',
                              'Sub Abuse Enrollment (Rate)',
                              'Drug Arrests (Rate)',
                              'Adult Psych Beds (Rate)',
                              'Foster Care (Rate)',
                              'Forcible Sex (Rate)' ))%>%
                      pairs()

Code
florida_matrix
NULL

A quick look at stressors with a relationship to homelessness not mentioned in Zugazaga’s study, or those that needed further investigation are shown here to confirm linearity with the response, Homeless (Rate). Checking the bottom row,the associations are weak, but a linear approximation is appropriate.

Linear Regression Models

Fit 1: All Variables (No Transformations)
Code
# Linear relationship appears appropriate for all, possibly attempt log transformation on UE Rate?

# Creating A Linear Model with all variables included: No Transformations

# County Removed as too many levels; improvement: NWFL, NFL, CFL, SWFL, SOFLO categories?

# Fit 1 - OLS with all variables predicting homeless rate
fit1 <- florida_og_rates %>% 
          select(-'County')%>%
            
            lm(formula=`Homeless (Rate)` ~.)

summary(fit1)

Call:
lm(formula = `Homeless (Rate)` ~ ., data = .)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0024414 -0.0004897 -0.0001599  0.0004384  0.0034222 

Coefficients: (1 not defined because of singularities)
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      -4.128e-03  2.556e-03  -1.615 0.112968    
Year                                     NA         NA      NA       NA    
`Homeless (Count)`                5.430e-06  8.739e-07   6.214 1.28e-07 ***
Population                       -1.913e-09  3.885e-09  -0.492 0.624751    
`Unemployment Rate`              -6.668e-05  2.953e-04  -0.226 0.822334    
`Median Inc`                      3.952e-08  2.799e-08   1.412 0.164626    
`Incarceration (Rateper1000)`    -2.753e-05  7.492e-05  -0.367 0.714928    
`Poverty (Count)`                 2.761e-08  1.873e-08   1.474 0.147123    
`Drug Arrests (Count)`           -6.460e-08  2.561e-07  -0.252 0.801977    
`Relocated (Rate)`               -7.993e-05  5.790e-05  -1.380 0.174016    
`Sub Abuse Enrollment (Count)`    9.462e-08  3.900e-07   0.243 0.809368    
`Adult Psych Beds (Count)`       -2.652e-05  7.325e-06  -3.620 0.000719 ***
`Severe Housing Problems (Rate)`  1.813e-04  7.569e-05   2.395 0.020672 *  
`Forcible Sex (Count)`           -1.431e-06  2.445e-06  -0.585 0.561202    
`Foster Care (Count)`            -3.979e-06  1.511e-06  -2.633 0.011413 *  
`Poverty (Rate)`                 -2.976e-04  6.024e-03  -0.049 0.960809    
`Drug Arrests (Rate)`             7.704e-02  6.911e-02   1.115 0.270682    
`Sub Abuse Enrollment (Rate)`     6.409e-02  1.431e-01   0.448 0.656276    
`Adult Psych Beds (Rate)`         4.202e+00  2.208e+00   1.903 0.063122 .  
`Forcible Sex (Rate)`             3.869e-01  9.245e-01   0.418 0.677501    
`Foster Care (Rate)`              6.961e-01  2.996e-01   2.323 0.024549 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.001111 on 47 degrees of freedom
  (134 observations deleted due to missingness)
Multiple R-squared:  0.7193,    Adjusted R-squared:  0.6058 
F-statistic: 6.338 on 19 and 47 DF,  p-value: 1.329e-07
Code
rss1 <- deviance(fit1)
print(c('RSS Fit 1', rss1))
[1] "RSS Fit 1"            "5.80154282170776e-05"
  • The first model predicts Homeless (Rate) using all variables, without any transformations or interactions. This causes 134 observations to removed as they are missing values for Severe Housing Problems (Rate).

  • Only 1 variable - Severe Housing Problems (Rate) - is deemed significant at alpha = 0.05; those without a star (see output) are deemed inconsequential in predicting Homeless (Rate) by this model.

  • Effect of Relocated (Rate) is negative, indicating that migration can ‘help reduce’ homelessness by county, as predicted in ‘Assumptions on Validity’ (above)

  • Looking at the signs and magnitude of the predicted (insignificant) variables, they seem plausible - Increases in variables like Drug Arrests (Rate) or Sub Abuse Enrollment (Rate) increase response Homeless (Rate) substantially.

    • Sub Abuse Enrollement (Rate) can be interpreted here an an indication of how many people in the area are suffering from addiction/abuse problems, rather than a suggestion that substance abuse programs increase homelessness.
Fit 1: Diagnostics

  • Fit 1 does a poor job of obeying the assumptions regarding residuals of linear regression.

  • Residuals vs Fitted shows a negative trend the greater the fitted value is, violating the linearity and independence assumption.

    • Scale - Location confirms this as the standardized residuals increase in magnitude the greater the fitted value is.
  • Q-Q Plot shows a deviation from the diagonal, violating the assumption that residuals follow an approximately Normal distribution

  • There are several points that could be considered outliers due to their residual or leverage value, how greatly they influence the points around them in the model.

    • Monroe County (130), Hardee County (73), and Columbia County (34) all have large positive residuals, indicating our model greatly under-estimated the number of homeless people in this county.

    • Baker County (4) has worryingly high leverage, its explanatory values have great influence on the data

    • All of these outliers represent sparsely populated, rural counties, typically outside of more urbanized areas; hence large values for stressors will command great influence on the model.

Fit 2: All Variables, All Observations (Fill Severe Housing Rate)
Code
# fit 2
fit2 <- florida_og_rates %>% 
          select(-c('County','Year'))%>%
            #mutate(`Unemployment Rate` = log(`Unemployment Rate`))%>%
              fill('Severe Housing Problems (Rate)', .direction="down")%>%
                 lm(formula=`Homeless (Rate)` ~ . - `Median Inc`)
summary(fit2)

Call:
lm(formula = `Homeless (Rate)` ~ . - `Median Inc`, data = .)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0023982 -0.0005760 -0.0001193  0.0003527  0.0054020 

Coefficients:
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      -1.170e-03  6.202e-04  -1.887  0.06079 .  
`Homeless (Count)`                4.028e-06  4.080e-07   9.875  < 2e-16 ***
Population                       -3.923e-09  1.516e-09  -2.589  0.01041 *  
`Unemployment Rate`               2.761e-05  4.930e-05   0.560  0.57608    
`Incarceration (Rateper1000)`    -1.655e-05  3.571e-05  -0.463  0.64364    
`Poverty (Count)`                 2.058e-08  8.721e-09   2.360  0.01931 *  
`Drug Arrests (Count)`           -1.844e-07  1.044e-07  -1.766  0.07902 .  
`Relocated (Rate)`               -6.520e-05  2.649e-05  -2.461  0.01477 *  
`Sub Abuse Enrollment (Count)`    1.853e-07  1.731e-07   1.070  0.28588    
`Adult Psych Beds (Count)`       -1.895e-05  3.380e-06  -5.606 7.58e-08 ***
`Severe Housing Problems (Rate)`  1.822e-04  3.177e-05   5.733 4.03e-08 ***
`Forcible Sex (Count)`            1.474e-06  1.257e-06   1.172  0.24257    
`Foster Care (Count)`            -1.477e-06  5.711e-07  -2.586  0.01049 *  
`Poverty (Rate)`                 -5.887e-03  2.224e-03  -2.647  0.00884 ** 
`Drug Arrests (Rate)`             1.065e-01  2.599e-02   4.099 6.24e-05 ***
`Sub Abuse Enrollment (Rate)`    -5.298e-02  6.067e-02  -0.873  0.38366    
`Adult Psych Beds (Rate)`         4.170e+00  9.278e-01   4.494 1.24e-05 ***
`Forcible Sex (Rate)`            -1.535e-01  3.972e-01  -0.386  0.69967    
`Foster Care (Rate)`              4.293e-01  1.371e-01   3.131  0.00203 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0009618 on 182 degrees of freedom
Multiple R-squared:  0.5964,    Adjusted R-squared:  0.5565 
F-statistic: 14.94 on 18 and 182 DF,  p-value: < 2.2e-16
Code
bic2 <- BIC(fit2)
print(c('BIC Fit 2:', bic2))
[1] "BIC Fit 2:"        "-2136.04637375734"
Code
rss2 <- deviance(fit2)
print(c('RSS Fit 2', rss2))
[1] "RSS Fit 2"            "0.000168367650150255"
  • In Fit 2, values from Severe Housing Problems (Rate) were filled down to restore all observations for use in the model.

    • Example: Alachua County has the same Severe Housing Problems (Rate) for 2018-2020
  • Several key stressors were deemed significant, with Adult Psych Beds (Rate) having the largest magnitude

    • Stressors from Zugazaga’s study that were found significant by this model include Drug Arrests (Rate) and Foster Care (Rate)
Fit 2: Diagnostics

  • Using all observations in Fit 2has not improved the diagnostic plots, but including 2019 and 2020 values has revealed new outliers.

  • 2020 produced Liberty County (117) as an outlier, a small, rural county in the Panhandle of Florida.

    • In late 2018, Hurricane Michael devastated the area; the influence of this observation is likely a direct result of measurements being altered greatly or even unaccounted during 2019 due to the population being “in a transitional state”

    • Liberty’s records in 2020 will show a vast difference to the incomplete measures of 2019

Fit 3: Random Effects Model - Controlling for County over Time
Code
#tranform to panel data
florida_panel <-  pdata.frame(florida_og_rates, index=c('County','Year'))

#random effects model
#Exclude Median Inc because of large numeric difference
fit3 <- florida_panel %>%
          plm(formula = Homeless..Rate. ~ 
         Unemployment.Rate +             
         Incarceration..Rateper1000. +
         Relocated..Rate. +           
         Poverty..Rate. +        
         Drug.Arrests..Rate. +          
         Sub.Abuse.Enrollment..Rate. +  
         Adult.Psych.Beds..Rate. +      
         Forcible.Sex..Rate. +           
         Foster.Care..Rate., 
            model= 'random')
                   
summary(fit3)
Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = Homeless..Rate. ~ Unemployment.Rate + Incarceration..Rateper1000. + 
    Relocated..Rate. + Poverty..Rate. + Drug.Arrests..Rate. + 
    Sub.Abuse.Enrollment..Rate. + Adult.Psych.Beds..Rate. + Forcible.Sex..Rate. + 
    Foster.Care..Rate., data = ., model = "random")

Balanced Panel: n = 67, T = 3, N = 201

Effects:
                    var   std.dev share
idiosyncratic 4.793e-07 6.923e-04 0.246
individual    1.467e-06 1.211e-03 0.754
theta: 0.6866

Residuals:
       Min.     1st Qu.      Median     3rd Qu.        Max. 
-0.00193525 -0.00026946 -0.00010248  0.00017024  0.00612304 

Coefficients:
                               Estimate  Std. Error z-value Pr(>|z|)   
(Intercept)                  4.2668e-04  8.3580e-04  0.5105 0.609699   
Unemployment.Rate            9.5178e-06  4.1582e-05  0.2289 0.818953   
Incarceration..Rateper1000.  2.1187e-05  5.9881e-05  0.3538 0.723478   
Relocated..Rate.            -2.3556e-05  4.4214e-05 -0.5328 0.594191   
Poverty..Rate.               1.1404e-03  3.2398e-03  0.3520 0.724835   
Drug.Arrests..Rate.          7.2622e-02  2.7264e-02  2.6637 0.007729 **
Sub.Abuse.Enrollment..Rate. -1.4647e-02  5.4401e-02 -0.2693 0.787736   
Adult.Psych.Beds..Rate.      2.7492e+00  9.9533e-01  2.7621 0.005743 **
Forcible.Sex..Rate.         -3.1645e-01  4.0769e-01 -0.7762 0.437628   
Foster.Care..Rate.           2.3889e-01  1.7728e-01  1.3475 0.177804   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    0.00010218
Residual Sum of Squares: 9.3309e-05
R-Squared:      0.086778
Adj. R-Squared: 0.043746
Chisq: 18.1495 on 9 DF, p-value: 0.033478
Code
#bic3 <- BIC(fit3)
#print(c('BIC Fit 3:', bic3))

rss3 <- deviance(fit3)
print(c('RSS Fit 3', rss3))
[1] "RSS Fit 3"            "9.33086018067315e-05"
  • Evaluating the model with a random effects model allows us to control for unmeasureable differences between counties.

    • Each county receives its own intercept, drawn from a collection of possible intercepts
  • Only variables Drug Arrests (Rate) and Adult Psych Beds (Rate) retained their significance

    • Drug Arrests (Rate) saw a slight decrease in magnitude whereas Adult Psych Beds (Rate) increased.

    • Both are positive, indicating that increases in either of these rates result in an increase in Homeless Rate

Fit 4: Random Effects Model - Zugazaga’s variables
Code
# fit 4 - random efffects 2                   
fit4 <- plm(formula = Homeless..Rate. ~ 
              Drug.Arrests..Rate. + 
              lag(Sub.Abuse.Enrollment..Rate.,1) +
              Forcible.Sex..Rate. +
              Foster.Care..Rate.,   
            data = florida_panel,
            model= 'random')
                   
summary(fit4)
Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = Homeless..Rate. ~ Drug.Arrests..Rate. + lag(Sub.Abuse.Enrollment..Rate., 
    1) + Forcible.Sex..Rate. + Foster.Care..Rate., data = florida_panel, 
    model = "random")

Balanced Panel: n = 67, T = 2, N = 134

Effects:
                    var   std.dev share
idiosyncratic 1.848e-07 4.298e-04  0.12
individual    1.350e-06 1.162e-03  0.88
theta: 0.7469

Residuals:
       Min.     1st Qu.      Median     3rd Qu.        Max. 
-0.00193151 -0.00022543 -0.00002260  0.00019457  0.00192708 

Coefficients:
                                       Estimate  Std. Error z-value Pr(>|z|)
(Intercept)                          0.00089264  0.00037360  2.3893  0.01688
Drug.Arrests..Rate.                  0.09055443  0.02068601  4.3776  1.2e-05
lag(Sub.Abuse.Enrollment..Rate., 1) -0.08876300  0.06165290 -1.4397  0.14995
Forcible.Sex..Rate.                 -0.32816661  0.33634980 -0.9757  0.32923
Foster.Care..Rate.                   0.25476909  0.15898288  1.6025  0.10905
                                       
(Intercept)                         *  
Drug.Arrests..Rate.                 ***
lag(Sub.Abuse.Enrollment..Rate., 1)    
Forcible.Sex..Rate.                    
Foster.Care..Rate.                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    2.8828e-05
Residual Sum of Squares: 2.4526e-05
R-Squared:      0.14921
Adj. R-Squared: 0.12283
Chisq: 22.6247 on 4 DF, p-value: 0.00015047
Code
#bic4 <- BIC(fit4)
#print(c('BIC Fit 4:', bic4))

rss4 <- deviance(fit4)
print(c('RSS Fit 4', rss4))
[1] "RSS Fit 4"            "2.45264002390625e-05"
  • Again using a random effects model with only stressors mentioned in Zugazaga’s study, we see Drug Arrests (Rate) as a significant predictor of homelessness at the alpha = 0.05 level.

  • Forcible Sex may not capture the circumstances suggested in Zugazaga’s study, as an arrest rate for forcible sex crimes captures the perpetrators within a county, rather than the victims.

    • Many sex crimes go unreported or unpunished, due to familial or power relationships between the victim and offender
  • Foster Care and Sub Abuse Enrollement (Rate) have signs and values that correspond well with Zugazaga’s results

    • Lagging Sub Abuse Enrollement (Rate) shows an increase of citizens involved in substance abuse programs can lead to a decrease in homelessness in the following year

Model Selection

Stargazer Plot
## 
## Homelessness in Florida
## =======================================================================================================
##                                                             Dependent variable:                        
##                                     -------------------------------------------------------------------
##                                                      Homeless Rate                    Homeless..Rate.  
##                                                           OLS                              panel       
##                                                                                           linear       
##                                              (1)                     (2)               (3)       (4)   
## -------------------------------------------------------------------------------------------------------
## Year                                                                                                   
##                                                                                                        
##                                                                                                        
## `Homeless (Count)`                        0.00001***              0.00000***                           
##                                           (0.00000)               (0.00000)                            
##                                                                                                        
## Population                                  -0.000                 -0.000**                            
##                                            (0.000)                 (0.000)                             
##                                                                                                        
## `Unemployment Rate`                        -0.0001                 0.00003                             
##                                            (0.0003)               (0.00005)                            
##                                                                                                        
## `Median Inc`                               0.00000                                                     
##                                           (0.00000)                                                    
##                                                                                                        
## `Incarceration (Rateper1000)`              -0.00003                -0.00002                            
##                                            (0.0001)               (0.00004)                            
##                                                                                                        
## `Poverty (Count)`                          0.00000                0.00000**                            
##                                           (0.00000)                (0.000)                             
##                                                                                                        
## `Drug Arrests (Count)`                     -0.00000               -0.00000*                            
##                                           (0.00000)               (0.00000)                            
##                                                                                                        
## `Relocated (Rate)`                         -0.0001                -0.0001**                            
##                                            (0.0001)               (0.00003)                            
##                                                                                                        
## `Sub Abuse Enrollment (Count)`             0.00000                 0.00000                             
##                                           (0.00000)               (0.00000)                            
##                                                                                                        
## `Adult Psych Beds (Count)`               -0.00003***             -0.00002***                           
##                                           (0.00001)               (0.00000)                            
##                                                                                                        
## `Severe Housing Problems (Rate)`           0.0002**               0.0002***                            
##                                            (0.0001)               (0.00003)                            
##                                                                                                        
## `Forcible Sex (Count)`                     -0.00000                0.00000                             
##                                           (0.00000)               (0.00000)                            
##                                                                                                        
## `Foster Care (Count)`                     -0.00000**              -0.00000**                           
##                                           (0.00000)               (0.00000)                            
##                                                                                                        
## `Poverty (Rate)`                           -0.0003                -0.006***                            
##                                            (0.006)                 (0.002)                             
##                                                                                                        
## `Drug Arrests (Rate)`                       0.077                  0.107***                            
##                                            (0.069)                 (0.026)                             
##                                                                                                        
## `Sub Abuse Enrollment (Rate)`               0.064                   -0.053                             
##                                            (0.143)                 (0.061)                             
##                                                                                                        
## `Adult Psych Beds (Rate)`                   4.202*                 4.170***                            
##                                            (2.208)                 (0.928)                             
##                                                                                                        
## `Forcible Sex (Rate)`                       0.387                   -0.153                             
##                                            (0.924)                 (0.397)                             
##                                                                                                        
## `Foster Care (Rate)`                       0.696**                 0.429***                            
##                                            (0.300)                 (0.137)                             
##                                                                                                        
## Unemployment.Rate                                                                    0.00001           
##                                                                                     (0.00004)          
##                                                                                                        
## Incarceration..Rateper1000.                                                          0.00002           
##                                                                                     (0.0001)           
##                                                                                                        
## Relocated..Rate.                                                                    -0.00002           
##                                                                                     (0.00004)          
##                                                                                                        
## Poverty..Rate.                                                                        0.001            
##                                                                                      (0.003)           
##                                                                                                        
## Drug.Arrests..Rate.                                                                 0.073***  0.091*** 
##                                                                                      (0.027)   (0.021) 
##                                                                                                        
## Sub.Abuse.Enrollment..Rate.                                                          -0.015            
##                                                                                      (0.054)           
##                                                                                                        
## Adult.Psych.Beds..Rate.                                                             2.749***           
##                                                                                      (0.995)           
##                                                                                                        
## lag(Sub.Abuse.Enrollment..Rate., 1)                                                            -0.089  
##                                                                                                (0.062) 
##                                                                                                        
## Forcible.Sex..Rate.                                                                  -0.316    -0.328  
##                                                                                      (0.408)   (0.336) 
##                                                                                                        
## Foster.Care..Rate.                                                                    0.239     0.255  
##                                                                                      (0.177)   (0.159) 
##                                                                                                        
## Constant                                    -0.004                 -0.001*           0.0004    0.001** 
##                                            (0.003)                 (0.001)           (0.001)  (0.0004) 
##                                                                                                        
## -------------------------------------------------------------------------------------------------------
## Observations                                  67                     201               201       134   
## R2                                          0.719                   0.596             0.087     0.149  
## Adjusted R2                                 0.606                   0.556             0.044     0.123  
## Residual Std. Error                    0.001 (df = 47)         0.001 (df = 182)                        
## F Statistic                         6.338*** (df = 19; 47) 14.940*** (df = 18; 182) 18.150**  22.625***
## =======================================================================================================
## Note:                                                                       *p<0.1; **p<0.05; ***p<0.01
  • Comparing Residuals Sum Squared and R2 Fit 2, Fit 3, and Fit 4 I would select Fit 3 for inference. Although Fit 4 (with lag) had a lower Residual Sum Squared value, I appreciate the completeness of Fit 3 and believe the extra variables provide a better picture of how stressors impact the homeless population in Florida.

Transforming the data into panel data produces more accurate coefficients, as rather than 201 individual observations, the model considers 3 years of 67 individual observations. This results in smaller standard error. The R^2 is only considered in passing, as the goal of the study is inference not prediction.

Research Question:

  • All of Zugazaga’s effects had plausible signs demonstrating their influence on homelessness in Fit 3, but only Drug Arrests (Rate) was significant at the 0.05 level as hypothesized. This significance is a comment on the mathematical properties of the model rather than on the real-life effect of the stressors, which all are influential situations that can contribute to homelessness.

  • Drug Arrests (Rate) positive slope indicated that as the rate of arrests made for drug abuse/possession is in a county increases, so does the homeless rate in the county.

    • This is a comment on the availability of drugs in Florida counties, and how insufficient addiction treatment can contribute to other socioeconomic issues in a community

    • Criminalization does not solve the problem, it relocates it; it is likely many returning citizens will be caught in a cycle of drug abuse, incarceration, and homelessness.

Reflection

  • Conclusions
  • Improvements
On the Project Itself

R-Programming

  • Much of the tedious cleaning and read-in work could easily be cleaned up with applications of loops and functions

    • A fun winter project could be to “optimize” the file that created florida_1820.csv
  • A key to becoming an efficient coder is not only a solid understanding of syntax, but being able to troubleshoot errors using online resources

    • Familiarity comes with frequent use and searching for solutions via online forums, books, blogs…
  • It is important (and difficult) to create informative, readable code

    • Comment as though you’re guiding a stranger through the script.
  • Technical Skills: Github, R-Programming

Research

  • It was a very simple regression analysis, however I feel as I painted too broad a scope with my research question.

    • It was very difficult to feel satisfied with study as the ambiguous question left much to be desired with regression results.
  • Be thorough in explanation and assumptions taken when conducting research, report details

Was the Research Question Answered?
  • As hypothesized, the model proved several stressors to be significant in predicting Homeless Rates across Florida

    • The significance of Drug Arrests (Rate) in the 3 models using all of the observations allows us to reject H0: No stressors are significant in predicting Homeless (Rate).
  • Unfortunately, the study is unable to make a substantial comment on which stressors most increased vulnerability to Homelessness, evaluating magnitude. To do this, deeper demographic variables would need to be included, as well as improvements in controlling for stressors as a push factor in homeless migration.

    • We would also need to incorporate variables that aren’t strictly associated with homelessness (loss of a loved one, severe bodily injury, chronic illness…) for an improved comparison.
  • Because of limitations in the data, and the broad scope of the research question, the study isn’t able to make any new comments on the status of homelessness in Florida, it instead confirms the relevance of Zugazaga’s stressors to homeless life two decades later.

Prediction vs Inference
  • The goal of this brief study was to make inferences regarding stressors’ impact on Homelessness (Rate) in Florida.

  • If prediction was our focus, I would use new 2021 data from FL Charts without the Homeless (Rate) column to test the efficacy of Fit 2 as a predictive tool.

While the data is quick illustration of homelessness in Florida by county, there are improvements that could be made to both data collection and the research question itself to further the study.

Data
  • Unfortunately, FL Health Charts did not provide demographic breakdown for the homeless population (Age, Sex, Race), which would drastically widen the scope of the analysis, leading to far more interesting conclusions.

  • There is only have data for a three year period; this is too small of a range to make a strong statement about the impact of homeless policy on Florida counties or how the relevance of certain stressors has changed over time. For a more in depth study I would begin with a 10 year range.

Research Question
  • Demographic breakdown of stressors’ impact (Age, Sex, Race)

  • Modernize Zugazaga’s interviews to adjust variables for homelessness in 2022

    • conduct interviews with groups of single men, single women, and women with children
  • Reduce noise; once controlling for county, hone in on 1 or 2 certain stressors and their accompanying variables to view their impact on a population instead of viewing a broad range of stressors as a whole

  • Included life stressors not associated with homelessness for comparison

  • Extend the question to the entire country, providing a breakdown by state

  • Compare to foreign countries to contrast governments’ approaches to homelessness and leading causes of homelessness around the world.

References

Chang, W. (2022). R Graphics Cookbook, 2nd Edition. O’Reilly Media.

Grolemund, G., & Wickham, H. (2016). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.https://www.r-project.org.

Wickham, H. (2019). Advanced R, Second Edition (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781351201315

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

Zugazaga, C. (2004). Stressful life event experiences of homeless adults: A comparison of single men, single women, and women with children. J. Community Psychol., 32: 643-654. https://doi.org/10.1002/jcop.20025

Codebook

  • County - Florida county (67 total), divided into Region - Northwest Florida, Northeast Florida, Central Florida, and South Florida for visualizations

  • Population - Yearly population count for county, used as denomintor of all rate variables unless specified.

  • Year - Years 2018, 2019, 2020 included in this study

  • Homeless (Rate) - Yearly homeless count of a county divided by county population

  • Unemployment Rate - The ratio of unemployed to the civilian labor force, expressed as a percent

  • Median Inc - Median household income is the amount which divides the income distribution into two equal groups

  • Incarceration Rate per 1000 - Number of incarcerated people per 1000 (within county)

  • Poverty Rate - Number of people living below poverty line divided by population

  • Drug Arrests (Rate) - Arrests attributed to possession or sale of illegal drugs divided by population

  • Relocated (Rate) - The number of people over age 1 who lived in a different county the previous year

  • Sub Abuse Enrollment (Rate) - The number of beds indicates the number of adults (age 18 and over) who may receive substance abuse treatment on an in-patient basis

  • Adult Psych Beds (Rate) - When adults psychiatric distress are uninsured, charged with crimes or meet state criteria for civil commitment because they are violent/dangerous to themselves or others, psychiatric beds are where they are admitted for treatment. The number of beds indicates the number of people who may potentially receive adult (age 18 and over) psychiatric care on an in-patient basis. Divided by population

  • Severe Housing Problems (Rate) - The percentage of households with at least one or more of the following housing problems: lack of kitchen facilities; lack of plumbing facilities; more than 1.5 persons per room, severe cost burden (monthly housing costs including utlities exceed 50% of monthly income).

  • Forcible Sex (Rate) - Any sexual act or attempt involving force is classified as a forcible sex offense regardless of the age of the victim or the relationship of the victim to the offender, divided by population

  • Foster Care (Rate) - Foster care provides a safe and stable environment for children when the cannot be with their parents for some reason, divided by population :::

Footnotes

1.) Homeless Definition

2.) US Interagency Council on Homelessness

3.) Explanation of variables and collection method in Codebook tab

Source Code
---
title: "DACSS 601: Florida Homelessness"
subtitle: "Quantitative Review by County"
author: "Dane Shelton"
desription: "Data Exploration, Visualizations, Analysis"
date: "12/18/2022"
format:
  html:
    callout-appearance: "simple"
    callout-icon: FALSE
    df-print: paged
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - finalproject
  - shelton
  - homelessness
---

```{r}
#| label: setup
#| include: false
#| warning: false

library(tidyverse)
library(stargazer)
library(ggrepel)
library(plotly)
library(ggplot2)
library(GGally)
library(ggfortify)
library(flexmix)
library(plm)

knitr::opts_chunk$set(echo = FALSE, warning=FALSE, message = FALSE)
```

## Homelessness in Florida

`florida_1820.csv` contains population figures, homelessness counts, poverty counts and other demographic indicators^3^ at the county level from 2018 to 2020. All 67 Florida counties have data for the 3 years, resulting in 201 observations of 15 variables. Each observation provides a count (or rate) for all variables from a single county for a year. The variables selected closely mirror the stressors mentioned in the reference study (Zugazaga), along with supplementary variables in an attempt to completely capture the circumstances.

Homelessness is a complex living situation with several qualifying conditions; at its simplest state, the U.S Dept. of Housing and Urban Development defines it as **lacking a fixed, regular nighttime residence (not a shelter) or having a nighttime residence not designed for human accommodation**^1^.

On a single night in 2020, over 500,000^2^ people experienced homelessness in the United States. Florida - the third largest state by population - had the fourth largest homeless population of 2020 with 27,487^2^.

Florida counties represent a wide range of demographic profiles; the state is a hub to a variety of industries including tourism, defense, agriculture, and information technology. Investigating homelessness in Florida counties with robust data can lead to several conclusions about *who* is being impacted *where*, and how state policy is failing (or aiding) groups of a diverse population.

## Introduction

::: panel-tabset
## Research Question

Carole Zugazaga's 2004 study of 54 single homeless men, 54 single homeless women, and 54 homeless women with children in the Central Florida area investigated stressful life events common among homeless people. Interview responses revealed that women were more likely to have been sexually or physically assaulted, while men were more likely to have been incarcerated or abuse drugs/alcohol. Homeless women with children were more likely to be in foster care as a youth.

Nearly a decade later, county-level data can be used to investigate the relationship between Zugazaga's reported stressful life events (incarceration, drug arrests, poverty, forcible sex, foster care)^3^ and homelessness rates.

::: callout-note
## Research Question

Do particular life stressors increase a population's vulnerability to homelessness?
:::

## Hypothesis

Homelessness is not a new issue in the United States, yet homeless policy targets elimination via criminalization rather than prevention. A 2019 article from [Homeless Voice](https://homelessvoice.org/the-policies-and-laws-of-florida-cities/) provides a brief description of homeless policy in major citites across Florida. Despite state and federal governments being aware of the circumstances that increase vulnerability to homelessness for decades, I anticipate at least one Zugazaga's five stressors to remain significant in a model relating stressors to Florida homelessness counts 2018-2020.

::: callout-note
## Research Hypothesis

**H~0~:** All stressors are insignificant in predicting homelessness counts **(** B~i~ = 0 for i=0,1,2,...n **)**

**H~A~:** At least one stressor **B~i~** is significant in predicting homelessness counts
:::

```{r}
#| label: loading florida_1820
#| include: false

# This data was cleaned and formatted to a tidy .csv in another .qmd file, the manipulations were messy and probably inefficient (brute force); can upload if needed

florida_og <- readr::read_csv('_data/florida_1820.csv', show_col_types = FALSE)%>%
                      rename('Adult Psych Beds (Count)' = 'Adult Pysch Beds (Count)')

```

## Data Read-in and Tidying

The data were collected from the [Florida Department of Health](https://www.flhealthcharts.gov/charts/default.aspx). Variable names^3^ were used as search indicators to produce counts for Florida counties. Unfortunately, we cannot accurately analyze the effect of COVID-19 as data is incomplete for the majority of counties in 2021.

Read in and tidying were done in a separate file. The process began as laborious, but was lightened by the discovery of 10 year tables. Before this discovery, I would enter a variable name into the search bar of [Florida Department of Health](https://www.flhealthcharts.gov/charts/default.aspx), select 2018, and download the .xlsx file to my personal data folder. `readxl` was used to bring in the file, and `mutate` created a date column filled with 2018. Once the data was appropriate, I saved it in the environment under `variable2018`. This was repeated for years 2019 and 2020. Then all three tibbles were `full _join` by `County` to provide a dataset with 201 observations (67 counties by 3 years) and three variables, `County`, `Year`, and the variable measurement itself . This was written as a .csv back to the same personal data folder.

![](images/screenshot2-01.png)

Once learning that I could draw 10-year tables from the website rather than having to download three individual .xlsx files, the process became smoother. Now, I would only rename excess years as "delete" to get a table of measurements for only 2018, 2019, and 2020. `pivot_longer` then moved the years to a self-titled column and transferred the values to a column of my naming. This was written as a .csv.

![](images/screenshot3.png)

I then merged all the tables and wrote this as a .csv `florida_full`. I completed a sanity check using distinct county names to ensure I had 201 observations of 15 variables as desired.

![](images/screenshot5.png)

![](images/screenshot6-01.png)

::: {.callout-note collapse="true"}
## Intro to Data **Tables**

```{r}
#| label: EDA
#| output: TRUE

head(florida_og)

summary(florida_og)

# Changing Counts to Rates and Excluding Population 
#Surely there's a better way to do this!

florida_og_rates <- florida_og %>%
                      mutate('Homeless (Rate)' = `Homeless (Count)`/`Population`,
                             'Poverty (Rate)' = `Poverty (Count)`/`Population`,
                             'Drug Arrests (Rate)' = `Drug Arrests (Count)`/`Population`,
                             'Sub Abuse Enrollment (Rate)' = 
                               `Sub Abuse Enrollment (Count)`/`Population`,
                             'Adult Psych Beds (Rate)' = 
                               `Adult Psych Beds (Count)`/`Population`,
                             'Forcible Sex (Rate)' =  `Forcible Sex (Count)`/`Population`,
                             'Foster Care (Rate)' =  `Foster Care (Count)`/`Population`)


                      
                              
florida_county <- florida_og %>%
                      group_by(County)
                        
florida_county %>% 
  summarize('Mean Population' = mean(Population), 
            'Mean Homeless' = mean(`Homeless (Count)`),
            'Avg Homeless Rate' = mean(`Homeless (Count)`)/mean(Population),
            'Avg Median Income'= mean(`Median Inc`), 
            'Mean Poverty' = mean(`Poverty (Count)`), 
            'Avg Poverty Rate' = mean(`Poverty (Count)`)/mean(Population),
            'Avg Incarceration Rate (per 1000)' = mean(`Incarceration (Rateper1000)`))%>%
                arrange( desc(`Mean Population`), desc(`Mean Homeless`), 
                         desc(`Avg Median Income`))%>%
                                  mutate(across(c(2:3, 5:6), round, 0))
                
  

```
:::

Expanding **Intro to Data** exposes summary statistics including mean, range, quantiles, and standard deviation for all 15 variables. The table below the summaries provides arranged figures for basic parameters of interest grouped by county. The data was nearly tidy, with complete observations for each variable aside from one - `Severe Housing Issues` rates for 2019 and 2020 (unrecorded) - these are later filled in with 2018's value for the regression analysis.
:::

## Visualizations and Analysis

::: panel-tabset
## Visualization 

`ggplot2` is used to visualize important relationships between homeless counts and Zugazaga's stressors. The Florida counties have been categorized into 4 `Regions` and 3 `Income Levels`:

::: callout-note
## `Region`

-   **Northwest**: Escambia County to Madison County; cities include Pensacola, Panama City Beach, and Tallahassee

-   **North**: Hamilton County to Marion County; cities include Jacksonville, Gainseville,and Ocala, St. Augustine

-   **Central**: Lake County to Okeechobee County; cities include Orlando, Kissimmee, Tampa, St. Petersburg

-   **South**: Sarasota County to Miami-Dade County; cities include Ft. Lauderdale, Ft. Myers, Miami, Boca Raton, West Palm Beach
:::

![](images/Florida%20Map%20Chart.png)

::: callout-note
## `Income Level`

-   **High**: Median Income \>= 60000

-   **Medium**: Median Income \>= 40000

-   **Low**: Median Income \< 40000
:::

```{r}
#| label: categorize into regions
#| echo: true
#| collapse: true


# Categorize by regions... conditional better perhaps?


florida_og_plot <- florida_og_rates %>%
                    mutate('Region' = case_when(County == 'Escambia' |
                                              County == 'Santa Rosa'| 
                                              County == 'Okaloosa'| 
                                              County == 'Walton'| 
                                              County == 'Holmes'| 
                                              County == 'Washington'| 
                                              County == 'Bay'| 
                                              County == 'Jackson'| 
                                              County == 'Calhoun'| 
                                              County == 'Gulf'| 
                                              County == 'Gadsden'| 
                                              County == 'Escambia'| 
                                              County == 'Liberty'| 
                                              County == 'Leon'| 
                                              County == 'Wakulla'| 
                                              County == 'Franklin' | 
                                              County == 'Jefferson'| 
                                              County == 'Madison'| 
                                              County == 'Taylor' ~ 
'Northwest',
                                              County == 'Hamilton'| 
                                              County == 'Suwannee'| 
                                              County == 'Lafayette'| 
                                              County == 'Dixie'| 
                                              County == 'Gilchrist'| 
                                              County == 'Union'| 
                                              County == 'Baker'| 
                                              County == 'Columbia'| 
                                              County == 'Nassau'| 
                                              County == 'Levy'| 
                                              County == 'Bradford'| 
                                              County == 'Alachua'| 
                                              County == 'Nassau'| 
                                              County == 'Duval'| 
                                              County == 'Putnam'| 
                                              County == 'Marion'| 
                                              County == 'Volusia'| 
                                              County == 'Flagler'| 
                                              County == 'Citrus'| 
                                              County == 'Clay'| 
                                              County == 'St. Johns' ~ 
'North',
                                              County == 'Lake'|
                                              County == 'Sumter'|
                                              County == 'Seminole'|
                                              County == 'Orange'|
                                              County == 'Hernando'|
                                              County == 'Pasco'|
                                              County == 'Brevard'|
                                              County == 'Indian River'|
                                              County == 'Pinellas'|
                                              County == 'Hillsborough'|
                                              County == 'Polk'|
                                              County == 'Osceola'|
                                              County == 'Hardee'|
                                              County == 'Manatee'|
                                              County == 'Okeechobee'|
                                              County == 'Highlands' ~ 
'Central',
                                              County == 'St. Lucie'|
                                              County == 'Sarasota'|
                                              County == 'Martin'|
                                              County == 'Palm Beach'|
                                              County == 'Collier'|
                                              County == 'Broward'|
                                              County == 'Lee'|
                                              County == 'DeSoto'|
                                              County == 'Charlotte'|
                                              County == 'Hendry'|
                                              County == 'Monroe'|
                                              County == 'Miami-Dade'|
                                              County == 'Glades'|
                                              County == 'Hendry' 
~ 'South'))

# Categorize by Median Income Level

florida_og_plot <- florida_og_plot %>%
                    mutate('Income Level' = case_when(
                    `Median Inc` >= 60000 ~ 'High',
                    `Median Inc` < 60000 &
                    `Median Inc` >= 40000 ~ 'Medium',
                    `Median Inc` < 40000 ~ 'Low'))



```

::: {.callout-note collapse="true"}
## a) Interactive Summary

```{r}
#| label: plot1 - Interactive
#| echo: true
#| collapse: true
#| output: true

# Plot 1 
# Group by and Summarize to extract averages
florida_int <- florida_og_plot%>%
                group_by(County, Region)%>%
                  summarize('Homeless' = mean(`Homeless (Count)`),
                              'Population' = mean(Population))%>%

# Arrange before applying factor levels to retain ordering in plot
              arrange(desc(Population))%>%
                mutate(County = factor(County, County))%>%
# Prep Text for tooltip 
              mutate(text = paste("County: ", County, "\nRegion: ", Region, "\nPopulation: ", Population, "\nHomeless: ", Homeless, sep=""))

# Making interactive plot

florida_interactive <- ggplot(data = florida_int, aes(x=Population, y=Homeless, size = Population, color = Region, text=text)) +
    geom_point(alpha=0.7)+
      scale_size(range = c(1.0, 10.0), name= "") 
       # scale_fill_brewer(palette = 'Set2')
          
          

# Complete interactive plot 
florida_interactive_full <- ggplotly(florida_interactive, tooltip="text")

    
florida_interactive_full
                
```

- Use the interactive plot to gain an idea of the size of counties in Florida and their homeless counts.

- Hover over a point to see the details, and double-click an item in the legend to isolate the region.

:::

::: {.callout-note collapse="true"}
## b) Homeless Rate by Region

```{r}
#| label: plot2 - boxplot by region
#| echo: true
#| collapse: true
#| output: true

# Plot 2

florida_box <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100 )%>%
      mutate(`Region`=fct_relevel(`Region`, "Northwest", "North", "Central", "South"))%>%
      ggplot(aes(y=`Homeless (Rate)`, x=`Region`, fill=`Region`)) +
        geom_boxplot(alpha=0.7)+
  #Scale of y axis
          scale_y_continuous(breaks=(seq(0,1.5,by=.25)))+
  # Dimensions of graph
            coord_cartesian(ylim=c(0,1.5)) +
              coord_flip()+
                scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
                                     theme(legend.position = "none")+
                                        labs(title= "Florida Homeless Rates", 
                                             subtitle="2018-2020", 
                                              x= " ", 
                                                y= "Homeless Rate (%)", 
                                            caption = "Visualized by Region")

florida_box
```

-   A look at the distributions of homeless rates across Florida counties displays *where* the highest rates in the state exists.

-   The state is generally uniform, with the bulk of each region's distribution sitting below 0.25% of county populations.

    -   The largest difference is between the distributions of Northwest Florida and South Florida. This can be attributed to the stark contrast in where the population in these regions are living.

    -   South Florida is the most urbanized region in the state, with millions living in Miami, Ft. Lauderdale, and West Palm Beach; Northwest Florida is quite the opposite, with small coastal towns and rural inland towns reminiscent of Southern Alabama or Georgia.
:::

::: {.callout-note collapse="true"}
## c) Homeless Rate by Income Level

```{r}
#| label: plot3 - homeless x income
#| echo: true
#| collapse: true
#| output: true


# Plot 3
#Relevel Income, Mutate Response 
florida_income <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100)%>%
      mutate(`Income Level` = fct_relevel(`Income Level`, "Low", "Medium", "High"))%>%
      ggplot(aes(y=`Homeless (Rate)`, x=`Region`,
                  fill=`Income Level`, 
                    )) +
      geom_bar(position='dodge', stat='identity')+
          
              scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
                                     
                    labs(title= "Barplot: Income Level x Homeless Rate", 
                           subtitle="2018-2020", 
                              x= "Region", 
                                y= "Homeless Rate (%)", 
                                  caption = "Visualized by Income Level")

florida_income

```

-   The barplot further details the differences mentioned in `Plot 2`. Regions where the populations are less urbanized show Low Income counties reflecting a higher homeless rate as one would assume.

-   Once entering South Florida, wealth disparities in urbanized areas breaks this trend -counties with high income now report high homeless rates
:::

::: {.callout-note collapse="true"}
## d) Homeless Rate x Incarceration Rate per1000

```{r}
#| label: plot4 - scatter incarceration x homeless rate
#| echo: true
#| collapse: true
#| output: true

#Plot 4

florida_incarc <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100,
           `Incarceration (Rateper1000)`= `Incarceration (Rateper1000)`/10)%>%
      ggplot(aes(y=`Homeless (Rate)`, 
                  x=`Incarceration (Rateper1000)`, 
                    color=`Region`, 
                      label=`County`)) +
      geom_point(alpha=0.7)+
  # Must include because of label aesthetic
      ggrepel::geom_text_repel(show.legend = FALSE, 
                  max.overlaps = 15, 
                  alpha=0.7,
                    size=2.5, 
                      nudge_x = -.05, 
                        nudge_y =.05)+
          
              scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
              # Titles, Axes, Caption                   
                    labs(title= "Scatterplot: Incarceration x Homeless Rate", 
                           subtitle="2018-2020", 
                              x= "Incarceration Rate (per100)", 
                                y= "Homeless Rate (%)", 
                                  caption = "Visualized by Region")

florida_incarc

```

-   One of Zugazaga's male stressors `Incarceration Rate` is illustrated with Homeless Rate; a positive trend exists relating incarceration rates and homelessness rates across Florida counties.

    -   Many state correctional facilities are located in rural counties, explaining both **Baker** and **Monroe** observations' large influence on the plot.
:::

::: {.callout-note collapse="true"}
## e) Homeless Rate x Drug Arrest Rate

```{r}
#| label: plot5 - drug arrest rate x homeless rate
#| echo: true
#| collapse: true
#| output: true

#Plot 4

florida_drug <- florida_og_plot %>%
    mutate(`Homeless (Rate)` = `Homeless (Rate)`*100,
           `Drug Arrests (Rate)`= `Drug Arrests (Rate)`*100)%>%
      ggplot(aes(y=`Homeless (Rate)`, 
                  x=`Drug Arrests (Rate)`, 
                    color=`Region`, 
                      label=`County`)) +
      geom_point(alpha=0.7)+
  # Must include because of label aesthetic
      ggrepel::geom_text_repel(show.legend = FALSE, 
                  max.overlaps = 15, 
                  alpha=0.7,
                    size=2.5, 
                      nudge_x = -.05, 
                        nudge_y =.05)+
          
              scale_fill_brewer(palette = 'Set2')+
                                   theme_grey()+
                                     
                    labs(title= "Scatterplot: Drug Arrest Rate x Homeless Rate", 
                           subtitle="2018-2020", 
                              x= "Drug Arrests (Rate)", 
                                y= "Homeless Rate (%)", 
                                  caption = "Visualized by Region")

florida_drug

```

-   A similar positive association is seen comparing the homeless rate of a county with its `Drug Arrest (Rate)`.

    -   The high influence of Northwest Florida is likely due to stricter drug policies held by police in less urbanized areas.
:::

## Regression, Diagnostics, and Model Selection

### on Assumption of Validity

While over 10 variables are predicting `Homeless (Rate)` across Florida counties, there are still limitations when attempting to comment on the magnitude of an individual stressor. Stressors influence homelessness by driving those in severe situations *out* of their home or *away* from their place of origin. `Homeless (Rate)` is not an ideal measure of magnitude as the homeless population migrating to escape or avoid certain stressors would result in counties with low stressor values having a higher homeless population; this effect is left unexplained by the following models.

-   The variable `Relocated (Rate)` is included as an attempt to control for new movement, however this doesn't completely capture county-to-county migration.

-   FL Charts has data that records [Population Who Lived in a Different County One Year Earlier](https://www.flhealthcharts.gov/ChartsDashboards/rdPage.aspx?rdReport=NonVitalIndRateOnly.TenYrsRpt&cid=9759), however with the data spanning 2009-2014, using values recorded 4 years prior to our data isn't desirable either.

-   The most appropriate data to accurately capture county-to-county migration is [here](https://www.census.gov/data/tables/2019/demo/geographic-mobility/county-to-county-migration-2015-2019.html) via the US Census Bureau. The `-In, -Out, -Net...` spreadsheet provides totals for each county in the United States and movement to all other US counties; unfortunately, this data is too complex to wrangle into the simple data set `florida_1820.csv`.

### on Assumption of Linearity

```{r}
#| label: fit1 scatter
#| output: true
#| echo: true

# Fit 1: A Linear Regression Model With All Vars

# Checking Linearity of variables not supported by our literature
# Correlation Matrix
florida_matrix <- florida_og_rates %>%
                    select(-c(contains("Count"), 
                              'Year', 
                              'Poverty (Rate)', 
                              'Severe Housing Problems (Rate)',
                              'Incarceration (Rateper1000)',
                              'Sub Abuse Enrollment (Rate)',
                              'Drug Arrests (Rate)',
                              'Adult Psych Beds (Rate)',
                              'Foster Care (Rate)',
                              'Forcible Sex (Rate)' ))%>%
                      pairs()
florida_matrix
```

A quick look at stressors with a relationship to homelessness not mentioned in Zugazaga's study, or those that needed further investigation are shown here to confirm linearity with the response, `Homeless (Rate)`. Checking the bottom row,the associations are weak, but a linear approximation is appropriate.

### Linear Regression Models

::: {.callout-note collapse="true"}
## `Fit 1`: All Variables (No Transformations)

```{r}
#| label: fit1 - all variables
#| echo: true
#| output: true

# Linear relationship appears appropriate for all, possibly attempt log transformation on UE Rate?

# Creating A Linear Model with all variables included: No Transformations

# County Removed as too many levels; improvement: NWFL, NFL, CFL, SWFL, SOFLO categories?

# Fit 1 - OLS with all variables predicting homeless rate
fit1 <- florida_og_rates %>% 
          select(-'County')%>%
            
            lm(formula=`Homeless (Rate)` ~.)

summary(fit1)


rss1 <- deviance(fit1)
print(c('RSS Fit 1', rss1))
```

-   The first model predicts `Homeless (Rate)` using all variables, without any transformations or interactions. This causes 134 observations to removed as they are missing values for `Severe Housing Problems (Rate)`.

-   Only 1 variable - `Severe Housing Problems (Rate)` - is deemed significant at `alpha = 0.05`; those without a star (see output) are deemed inconsequential in predicting `Homeless (Rate)` by this model.

-   Effect of `Relocated (Rate)` is negative, indicating that migration can 'help reduce' homelessness by county, as predicted in 'Assumptions on Validity' (above)

-   Looking at the signs and magnitude of the predicted (insignificant) variables, they seem plausible - Increases in variables like `Drug Arrests (Rate)` or `Sub Abuse Enrollment (Rate)` increase response `Homeless (Rate)` substantially.

    -   `Sub Abuse Enrollement (Rate)` can be interpreted here an an indication of how many people in the area are suffering from addiction/abuse problems, rather than a suggestion that substance abuse programs increase homelessness.
:::

::: {.callout-note collapse="true"}
## `Fit 1`: Diagnostics

```{r}
#|label: diagnostics fit 1
#| output: true

#Calling diagnostic plot panel
diag1 <- autoplot(fit1,1:6,ncol=3)
diag1

# Check 34- Colmbia , 130- Monroe, 73 - Hardee, 4 - Baker
```

-   `Fit 1` does a poor job of obeying the assumptions regarding residuals of linear regression.

-   `Residuals vs Fitted` shows a negative trend the greater the fitted value is, violating the linearity and independence assumption.

    -   `Scale - Location` confirms this as the standardized residuals increase in magnitude the greater the fitted value is.

-   `Q-Q Plot` shows a deviation from the diagonal, violating the assumption that residuals follow an approximately Normal distribution

-   There are several points that could be considered outliers due to their residual or leverage value, how greatly they influence the points around them in the model.

    -   **Monroe County** (130), **Hardee County** (73), and **Columbia County** (34) all have large positive residuals, indicating our model greatly under-estimated the number of homeless people in this county.

    -   **Baker County** (4) has worryingly high leverage, its explanatory values have great influence on the data

    -   All of these outliers represent sparsely populated, rural counties, typically outside of more urbanized areas; hence large values for stressors will command great influence on the model.
:::

::: {.callout-note collapse="true"}
## `Fit 2`: All Variables, All Observations (Fill Severe Housing Rate)

```{r}
#| label: fit2 - all observations
#| echo: true
#| output: true

# fit 2
fit2 <- florida_og_rates %>% 
          select(-c('County','Year'))%>%
            #mutate(`Unemployment Rate` = log(`Unemployment Rate`))%>%
              fill('Severe Housing Problems (Rate)', .direction="down")%>%
                 lm(formula=`Homeless (Rate)` ~ . - `Median Inc`)
summary(fit2)

bic2 <- BIC(fit2)
print(c('BIC Fit 2:', bic2))

rss2 <- deviance(fit2)
print(c('RSS Fit 2', rss2))

```

-   In `Fit 2`, values from `Severe Housing Problems (Rate)` were filled down to restore all observations for use in the model.

    -   Example: Alachua County has the same `Severe Housing Problems (Rate)` for 2018-2020

-   Several key stressors were deemed significant, with `Adult Psych Beds (Rate)` having the largest magnitude

    -   Stressors from Zugazaga's study that were found significant by this model include `Drug Arrests (Rate)` and `Foster Care (Rate)`
:::

::: {.callout-note collapse="true"}
## `Fit 2`: Diagnostics

```{r}
#| label: diagnostics2
#| output: true

diag2 <- autoplot(fit2,1:6,ncol=3)
diag2
```

-   Using all observations in `Fit 2`has not improved the diagnostic plots, but including 2019 and 2020 values has revealed new outliers.

-   2020 produced **Liberty County** (117) as an outlier, a small, rural county in the Panhandle of Florida.

    -   In late 2018, Hurricane Michael devastated the area; the influence of this observation is likely a direct result of measurements being altered greatly or even unaccounted during 2019 due to the population being "in a transitional state"

    -   Liberty's records in 2020 will show a vast difference to the incomplete measures of 2019
:::

::: {.callout-note collapse="true"}
## `Fit 3`: Random Effects Model - Controlling for County over Time

```{r}
#| label: fit3 - panel data
#| echo: true
#| output: true

#tranform to panel data
florida_panel <-  pdata.frame(florida_og_rates, index=c('County','Year'))

#random effects model
#Exclude Median Inc because of large numeric difference
fit3 <- florida_panel %>%
          plm(formula = Homeless..Rate. ~ 
         Unemployment.Rate +             
         Incarceration..Rateper1000. +
         Relocated..Rate. +           
         Poverty..Rate. +        
         Drug.Arrests..Rate. +          
         Sub.Abuse.Enrollment..Rate. +  
         Adult.Psych.Beds..Rate. +      
         Forcible.Sex..Rate. +           
         Foster.Care..Rate., 
            model= 'random')
                   
summary(fit3)

#bic3 <- BIC(fit3)
#print(c('BIC Fit 3:', bic3))

rss3 <- deviance(fit3)
print(c('RSS Fit 3', rss3))

```

-   Evaluating the model with a random effects model allows us to control for unmeasureable differences between counties.

    -   Each county receives its own intercept, drawn from a collection of possible intercepts

-   Only variables `Drug Arrests (Rate)` and `Adult Psych Beds (Rate)` retained their significance

    -   `Drug Arrests (Rate)` saw a slight decrease in magnitude whereas `Adult Psych Beds (Rate)` increased.

    -   Both are positive, indicating that increases in either of these rates result in an increase in `Homeless Rate`
:::

::: {.callout-note collapse="true"}
## `Fit 4`: Random Effects Model - Zugazaga's variables

```{r}
#| label: fit4 - panel data - Zugazaga's variables
#| echo: true
#| output: true

# fit 4 - random efffects 2                   
fit4 <- plm(formula = Homeless..Rate. ~ 
              Drug.Arrests..Rate. + 
              lag(Sub.Abuse.Enrollment..Rate.,1) +
              Forcible.Sex..Rate. +
              Foster.Care..Rate.,   
            data = florida_panel,
            model= 'random')
                   
summary(fit4)

#bic4 <- BIC(fit4)
#print(c('BIC Fit 4:', bic4))

rss4 <- deviance(fit4)
print(c('RSS Fit 4', rss4))

```

-   Again using a random effects model with only stressors mentioned in Zugazaga's study, we see `Drug Arrests (Rate)` as a significant predictor of homelessness at the `alpha = 0.05` level.

-   `Forcible Sex` may not capture the circumstances suggested in Zugazaga's study, as an arrest rate for forcible sex crimes captures the perpetrators within a county, rather than the victims.

    -   Many sex crimes go unreported or unpunished, due to familial or power relationships between the victim and offender

-   `Foster Care` and `Sub Abuse Enrollement (Rate)` have signs and values that correspond well with Zugazaga's results

    -   Lagging `Sub Abuse Enrollement (Rate)` shows an increase of citizens involved in substance abuse programs can lead to a decrease in homelessness in the following year
:::

### Model Selection

::: {.callout-note collapse="true"}
## Stargazer Plot

```{r}
#| label: stargazer plot
#| collapse: true
#| output: true
#| echo: false

#Creating List of models to pass through stargazer
models<- list(fit1, fit2, fit3, fit4)

#Stargazer Plot
stargazer(models,type="text", title = "Homelessness in Florida", dep.var.labels = "Homeless Rate")


```
:::

-   Comparing Residuals Sum Squared and R^2^ `Fit 2`, `Fit 3`, and `Fit 4` I would select **Fit 3** for inference. Although `Fit 4` (with lag) had a lower Residual Sum Squared value, I appreciate the completeness of `Fit 3` and believe the extra variables provide a better picture of how stressors impact the homeless population in Florida.

Transforming the data into panel data produces more accurate coefficients, as rather than 201 individual observations, the model considers 3 years of 67 individual observations. This results in smaller standard error. The R\^2 is only considered in passing, as the goal of the study is inference not prediction.

**Research Question:**

-   All of Zugazaga's effects had plausible signs demonstrating their influence on homelessness in `Fit 3`, but only `Drug Arrests (Rate)` was significant at the `0.05` level as hypothesized. This significance is a comment on the mathematical properties of the model rather than on the real-life effect of the stressors, which all are influential situations that can contribute to homelessness.

-   `Drug Arrests (Rate)` positive slope indicated that as the rate of arrests made for drug abuse/possession is in a county increases, so does the homeless rate in the county.

    -   This is a comment on the availability of drugs in Florida counties, and how insufficient addiction treatment can contribute to other socioeconomic issues in a community

    -   Criminalization does not solve the problem, it relocates it; it is likely many returning citizens will be caught in a cycle of drug abuse, incarceration, and homelessness.
:::

## Reflection

::: panel-tabset
## Conclusions

::: callout-note
## On the Project Itself

**R-Programming**

-   Much of the tedious cleaning and read-in work could easily be cleaned up with applications of loops and functions

    -   A fun winter project could be to "optimize" the file that created `florida_1820.csv`

-   A key to becoming an efficient coder is not only a solid understanding of syntax, but being able to troubleshoot errors using online resources

    -   Familiarity comes with frequent use and searching for solutions via online forums, books, blogs...

-   It is important (and difficult) to create informative, readable code

    -   Comment as though you're guiding a stranger through the script.

-   Technical Skills: Github, R-Programming

**Research**

-   It was a very simple regression analysis, however I feel as I painted too broad a scope with my research question.

    -   It was very difficult to feel satisfied with study as the ambiguous question left much to be desired with regression results.

-   Be thorough in explanation and assumptions taken when conducting research, report details
:::

::: callout-note
## Was the Research Question Answered?

-   As hypothesized, the model proved several stressors to be significant in predicting Homeless Rates across Florida

    -   The significance of `Drug Arrests (Rate)` in the 3 models using all of the observations allows us to reject **H~0~:** No stressors are significant in predicting `Homeless (Rate)`.

-   Unfortunately, the study is unable to make a substantial comment on *which* stressors most increased vulnerability to Homelessness, evaluating magnitude. To do this, deeper demographic variables would need to be included, as well as improvements in controlling for stressors as a *push* factor in homeless migration.

    -   We would also need to incorporate variables that aren't strictly associated with homelessness (loss of a loved one, severe bodily injury, chronic illness...) for an improved comparison.

-   Because of limitations in the data, and the broad scope of the research question, the study isn't able to make any new comments on the status of homelessness in Florida, it instead confirms the relevance of Zugazaga's stressors to homeless life two decades later.
:::

::: callout-note
## Prediction vs Inference

-   The goal of this brief study was to make inferences regarding stressors' impact on `Homelessness (Rate)` in Florida.

-   If prediction was our focus, I would use new 2021 data from FL Charts without the `Homeless (Rate)` column to test the efficacy of `Fit 2` as a predictive tool.
:::

## Improvements

While the data is quick illustration of homelessness in Florida by county, there are improvements that could be made to both data collection and the research question itself to further the study.

::: callout-note
## Data

-   Unfortunately, [FL Health Charts](https://www.flhealthcharts.gov/charts/default.aspx) did not provide demographic breakdown for the homeless population (Age, Sex, Race), which would drastically widen the scope of the analysis, leading to far more interesting conclusions.

-   There is only have data for a three year period; this is too small of a range to make a strong statement about the impact of homeless policy on Florida counties or how the relevance of certain stressors has *changed* over time. For a more in depth study I would begin with a 10 year range.
:::

::: callout-note
## Research Question

-   Demographic breakdown of stressors' impact (Age, Sex, Race)

-   Modernize Zugazaga's interviews to adjust variables for homelessness in 2022

    -   conduct interviews with groups of single men, single women, and women with children

-   Reduce noise; once controlling for county, hone in on 1 or 2 certain stressors and their accompanying variables to view their impact on a population instead of viewing a broad range of stressors as a whole

-   Included life stressors not associated with homelessness for comparison

-   Extend the question to the entire country, providing a breakdown by state

-   Compare to foreign countries to contrast governments' approaches to homelessness and leading causes of homelessness around the world.
:::
:::

## References

**Chang, W. (2022).** R Graphics Cookbook, 2nd Edition. O'Reilly Media.

**Grolemund, G., & Wickham, H. (2016)**. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.

**R Core Team. (2020)**. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.https://www.r-project.org.

**Wickham, H. (2019).** Advanced R, Second Edition (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781351201315

**Wickham H (2016).** ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

**Zugazaga, C. (2004)**. Stressful life event experiences of homeless adults: A comparison of single men, single women, and women with children. J. Community Psychol., 32: 643-654. https://doi.org/10.1002/jcop.20025

## Codebook

-   `County` - Florida county (67 total), divided into `Region` - Northwest Florida, Northeast Florida, Central Florida, and South Florida for visualizations

-   `Population` - Yearly population count for county, used as denomintor of all `rate` variables unless specified.

-   `Year` - Years 2018, 2019, 2020 included in this study

-   `Homeless (Rate)` - Yearly homeless count of a county divided by county population

-   `Unemployment Rate` - The ratio of unemployed to the civilian labor force, expressed as a percent

-   `Median Inc` - Median household income is the amount which divides the income distribution into two equal groups

-   `Incarceration Rate per 1000` - Number of incarcerated people per 1000 (within county)

-   `Poverty Rate` - Number of people living below poverty line divided by population

-   `Drug Arrests (Rate)` - Arrests attributed to possession or sale of illegal drugs divided by population

-   `Relocated (Rate)` - The number of people over age 1 who lived in a different county the previous year

-   `Sub Abuse Enrollment (Rate)` - The number of beds indicates the number of adults (age 18 and over) who may receive substance abuse treatment on an in-patient basis

-   `Adult Psych Beds (Rate)` - When adults psychiatric distress are uninsured, charged with crimes or meet state criteria for civil commitment because they are violent/dangerous to themselves or others, psychiatric beds are where they are admitted for treatment. The number of beds indicates the number of people who may potentially receive adult (age 18 and over) psychiatric care on an in-patient basis. Divided by population

-   `Severe Housing Problems (Rate)` - The percentage of households with at least one or more of the following housing problems: lack of kitchen facilities; lack of plumbing facilities; more than 1.5 persons per room, severe cost burden (monthly housing costs including utlities exceed 50% of monthly income).

-   `Forcible Sex (Rate)` - Any sexual act or attempt involving force is classified as a forcible sex offense regardless of the age of the victim or the relationship of the victim to the offender, divided by population

-   `Foster Care (Rate)` - Foster care provides a safe and stable environment for children when the cannot be with their parents for some reason, divided by population :::

##### Footnotes

~1.) [Homeless Definition](https://www.law.cornell.edu/uscode/text/42/11302)~

~2.) [US Interagency Council on Homelessness](https://www.usich.gov/tools-for-action/2020-point-in-time-count/)~

~3.) Explanation of variables and collection method in Codebook tab~