Final Project Assignment#1: Project & Data Description

final_Project_assignment_1
final_project_data_description
Project & Data Description
Author

Rahul Somu

Published

April 18, 2023

library(tidyverse)
library(dplyr)
library(ggplot2)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Overview of the Final Project

Introduction:

Housing affordability is one of the major problems in the United States. With many households struggling to afford the rising cost of housing, it is critical to analyze the data trends over the country. According to statistics from the U.S. Census Bureau on median household income by city and region from 1996 to the present, Housing affordability has changed significantly over the time.

Although there has been general increase in median household income over the previous few decades, it has not been uniform across all the regions. Some regions have seen greater growth than others and similar swings have been observed in the median home prices, with some regions seeing more significant price increases than others.

The widening disparity between median household incomes and median property prices is one of the most obvious patterns in the statistics which is becoming more challenging for many Americans to afford homeownership as median property prices in many locations have been rising more quickly than median household incomes. The affordability issue has also been made worse by the fact that, despite the median household income rising, it has not kept up with the cost of living.

Overall, according to data from the U.S. Census Bureau, the affordability of housing has become a major worry over time, with many Americans finding it difficult to keep up with the rising cost of housing. The data emphasize the necessity of making policy changes to address the housing affordability crisis and guarantee that all Americans have access to safe, reasonably priced housing.

As part of this project I’m trying to focus on below: Visualize the trends in home values, the average household income, and inflation rates through time, broken down by area and city size. Create graphs showing the evolution of housing affordability over time in the country’s largest cities and how it differs by area and city size.

Data Sources:

As part of this project I found below resources that will help me understand the issue. Zillow Home Value Index (ZHVI) - a dataset from Zillow with the median home value for each region and city in the United States from 1996 to the present. https://www.kaggle.com/datasets/robikscube/zillow-home-value-index?resource=download

U.S. Census Bureau data on median household income by city and region from 1996 to the present. https://www.census.gov/data/tables/time-series/demo/income-poverty/historical-income-households.html

U.S. Bureau of Labor Statistics data on inflation rates by region and year from 1996 to the present. https://www.bls.gov/cpi/regional-resources.htm

  1. Util to Create a look up for states to region
state_to_region <- function(state) {
  region_lookup <- list(
    Northeast = c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "New Jersey", "New York", "Pennsylvania", "Rhode Island", "Vermont"),
    South = c("Delaware", "District of Columbia", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "West Virginia", "Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas", "Louisiana", "Oklahoma", "Texas"),
    Midwest = c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin", "Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"),
    West = c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming", "Alaska", "California", "Hawaii", "Oregon", "Washington")
  )
  region <- NA
  for (r in names(region_lookup)) {
    if (state %in% region_lookup[[r]]) {
      region <- r
      break
    }
  }
  return(region)
}
  1. Read The Datasets
income_data <- read_excel("Rahul_Final_projectData/income_state.xlsx")

df_inflation <- read_excel("Rahul_Final_projectData/USA_inflation_data.xlsx")

df_rentalindex <- read.csv("Rahul_Final_projectData/ZHVI.csv")
  1. Create functions for getting data for each state
#Function to get income data by state and year

get_median_se <- function(income_data, year, state){
  rownum <- which(income_data$State == state)
  colnum <- which(colnames(income_data) == year)
  median_income <- as.numeric(income_data[rownum, colnum])
  se <- as.numeric(income_data[rownum, colnum+1])
  #diff <- median_income - se
  return(c(as.character(median_income), as.character(se)))
}

get_median_se(income_data, "2021", "Alabama")
[1] "56929" "2294" 
#Function to get inflation data

avg_by_region_and_year <- function(df, state, year) {
  region <- state_to_region(state)
  subset_df <- subset(df, Region == region & Year == year)
  row_avgs <- apply(subset_df[, 4:ncol(subset_df)], 1, mean, na.rm = TRUE)
  overall_avg <- mean(row_avgs, na.rm = TRUE)
  return(overall_avg)
}

# example usage
avg_by_region_and_year(df_inflation, "Texas", 2000)
[1] 167.4143
df_rentalindex$Date <- as.Date(df_rentalindex$Date, format = "%Y-%m-%d")

# function to calculate average rental index for a state and year
get_state_rentalindex <- function(df,state, year) {
  df_filtered <- subset(df, format(as.Date(Date), "%Y") == year)
  state_avg <- mean(df_filtered[[state]])
  return(state_avg)
}
# example usage
get_state_rentalindex(df_rentalindex,"California", "2000")
[1] 192840.2
  1. present the descriptive information of the dataset(s) using the functions in Challenges 1, 2, and 3;
dim(income_data)
[1] 53 79
dim(df_inflation)
[1] 96 17
dim(df_rentalindex)
[1] 278  52
head(income_data)
  1. conduct summary statistics of the dataset(s); especially show the basic statistics (min, max, mean, median, etc.) for the variables you are interested in.
summary(df_rentalindex)
      Date               Virginia        California        Florida      
 Min.   :2000-01-01   Min.   :114448   Min.   :183263   Min.   :102232  
 1st Qu.:2005-10-08   1st Qu.:197891   1st Qu.:292524   1st Qu.:136109  
 Median :2011-07-16   Median :219631   Median :388784   Median :173778  
 Mean   :2011-07-17   Mean   :219795   Mean   :400685   Mean   :188378  
 3rd Qu.:2017-04-23   3rd Qu.:245837   3rd Qu.:471709   3rd Qu.:230129  
 Max.   :2023-02-01   Max.   :352788   Max.   :758837   Max.   :382968  
                                                                        
     Texas            Nevada          New.York        New.Jersey    
 Min.   :103491   Min.   :126840   Min.   :122871   Min.   :151148  
 1st Qu.:123530   1st Qu.:167170   1st Qu.:204061   1st Qu.:247960  
 Median :130655   Median :220389   Median :219991   Median :279515  
 Mean   :151983   Mean   :237218   Mean   :232584   Mean   :281958  
 3rd Qu.:177563   3rd Qu.:303531   3rd Qu.:253952   3rd Qu.:317507  
 Max.   :295310   Max.   :440144   Max.   :410896   Max.   :440368  
                                                                    
    Arizona         Tennessee      West.Virginia       Georgia      
 Min.   :127957   Min.   : 92784   Min.   : 58201   Min.   :106693  
 1st Qu.:151316   1st Qu.:111305   1st Qu.: 78732   1st Qu.:126001  
 Median :189856   Median :120526   Median : 83140   Median :140309  
 Mean   :210120   Mean   :137849   Mean   : 88808   Mean   :153769  
 3rd Qu.:252483   3rd Qu.:150538   3rd Qu.:100642   3rd Qu.:159109  
 Max.   :437047   Max.   :285261   Max.   :143983   Max.   :303796  
 NA's   :1                         NA's   :1                        
 North.Carolina   Massachusetts        Hawaii          Colorado     
 Min.   :116314   Min.   :183093   Min.   :176017   Min.   :169818  
 1st Qu.:135109   1st Qu.:285744   1st Qu.:369539   1st Qu.:214326  
 Median :145608   Median :317222   Median :419144   Median :228125  
 Mean   :159663   Mean   :333554   Mean   :438470   Mean   :278369  
 3rd Qu.:165305   3rd Qu.:369005   3rd Qu.:554296   3rd Qu.:339056  
 Max.   :303166   Max.   :557412   Max.   :861627   Max.   :556750  
                                                                    
    Michigan      South.Carolina      Vermont         Minnesota     
 Min.   : 79031   Min.   :100082   Min.   : 88760   Min.   :112685  
 1st Qu.:100723   1st Qu.:122438   1st Qu.:150899   1st Qu.:156275  
 Median :116302   Median :134929   Median :168068   Median :177589  
 Mean   :121370   Mean   :145298   Mean   :174899   Mean   :186909  
 3rd Qu.:127914   3rd Qu.:154536   3rd Qu.:197476   3rd Qu.:205064  
 Max.   :211833   Max.   :267835   Max.   :319746   Max.   :314021  
                                                                    
    Indiana         Washington        Illinois          Oregon      
 Min.   : 90242   Min.   :166298   Min.   :108150   Min.   :142976  
 1st Qu.:103664   1st Qu.:216225   1st Qu.:132659   1st Qu.:188999  
 Median :109106   Median :259786   Median :150891   Median :229912  
 Mean   :121298   Mean   :289661   Mean   :154393   Mean   :254140  
 3rd Qu.:127750   3rd Qu.:326684   3rd Qu.:173109   3rd Qu.:306743  
 Max.   :217822   Max.   :588018   Max.   :229898   Max.   :498337  
                                                                    
  Connecticut          Utah         North.Dakota       Montana      
 Min.   :156392   Min.   :161827   Min.   :130966   Min.   :144789  
 1st Qu.:227901   1st Qu.:178131   1st Qu.:139510   1st Qu.:167533  
 Median :246940   Median :212460   Median :191458   Median :182144  
 Mean   :248507   Mean   :245073   Mean   :177292   Mean   :216615  
 3rd Qu.:272082   3rd Qu.:274825   3rd Qu.:196953   3rd Qu.:242717  
 Max.   :349739   Max.   :529454   Max.   :235975   Max.   :434557  
                                   NA's   :108      NA's   :61      
 New.Hampshire         Ohio           Missouri         Maryland     
 Min.   :125228   Min.   : 94815   Min.   : 84986   Min.   :138999  
 1st Qu.:197957   1st Qu.:104970   1st Qu.:108511   1st Qu.:223950  
 Median :219170   Median :112999   Median :118146   Median :255487  
 Mean   :232154   Mean   :120513   Mean   :126813   Mean   :255882  
 3rd Qu.:245854   3rd Qu.:123063   3rd Qu.:134601   3rd Qu.:296146  
 Max.   :421924   Max.   :199518   Max.   :221340   Max.   :381710  
                                                                    
     Idaho          Wisconsin          Kansas        Pennsylvania   
 Min.   :115460   Min.   :102168   Min.   : 72593   Min.   : 87544  
 1st Qu.:136507   1st Qu.:128707   1st Qu.: 92543   1st Qu.:135443  
 Median :170850   Median :142384   Median : 98642   Median :143779  
 Mean   :197510   Mean   :150836   Mean   :109245   Mean   :146101  
 3rd Qu.:210888   3rd Qu.:158847   3rd Qu.:119907   3rd Qu.:154896  
 Max.   :470980   Max.   :250226   Max.   :205073   Max.   :237774  
 NA's   :1                         NA's   :2                        
    Oklahoma      District.of.Columbia    Nebraska          Alaska      
 Min.   : 70112   Min.   :147817       Min.   : 99521   Min.   :125485  
 1st Qu.: 87542   1st Qu.:326460       1st Qu.:115634   1st Qu.:199817  
 Median : 94748   Median :358798       Median :120794   Median :229922  
 Mean   :103287   Mean   :389287       Mean   :135590   Mean   :232338  
 3rd Qu.:112466   3rd Qu.:501651       3rd Qu.:148736   3rd Qu.:263731  
 Max.   :185171   Max.   :654749       Max.   :236836   Max.   :343188  
                                                        NA's   :2       
   New.Mexico        Wyoming          Delaware       Rhode.Island   
 Min.   :123686   Min.   :130764   Min.   :132902   Min.   :125136  
 1st Qu.:152562   1st Qu.:181361   1st Qu.:193728   1st Qu.:209214  
 Median :164576   Median :190339   Median :220943   Median :234743  
 Mean   :170653   Mean   :202147   Mean   :221938   Mean   :243631  
 3rd Qu.:178281   3rd Qu.:222466   3rd Qu.:247511   3rd Qu.:278837  
 Max.   :277310   Max.   :319529   Max.   :351269   Max.   :404953  
 NA's   :27       NA's   :27                                        
     Maine          Louisiana         Arkansas         Alabama      
 Min.   : 92612   Min.   : 83285   Min.   : 69180   Min.   : 76906  
 1st Qu.:153086   1st Qu.:110090   1st Qu.: 91542   1st Qu.: 92004  
 Median :166698   Median :124074   Median : 97689   Median :103510  
 Mean   :177752   Mean   :126170   Mean   :103042   Mean   :111737  
 3rd Qu.:189259   3rd Qu.:140977   3rd Qu.:113094   3rd Qu.:122264  
 Max.   :344147   Max.   :180360   Max.   :173423   Max.   :199357  
                                                    NA's   :1       
  Mississippi      South.Dakota         Iowa           Kentucky     
 Min.   : 66101   Min.   : 90148   Min.   : 75996   Min.   : 78366  
 1st Qu.: 83544   1st Qu.:114269   1st Qu.: 97729   1st Qu.: 96390  
 Median : 88905   Median :124725   Median :101922   Median :101124  
 Mean   : 96463   Mean   :144248   Mean   :113321   Mean   :109261  
 3rd Qu.:111732   3rd Qu.:169450   3rd Qu.:126571   3rd Qu.:116845  
 Max.   :156641   Max.   :275894   Max.   :194074   Max.   :184542  
                  NA's   :1                                         

Visualisation:

To visualize the affordability, I’m planning the states into three categories : “Highly Affordable and Healthy”, “Moderately Affordable and Healthy”, and “Unaffordable and Unhealthy”. To categorize the states, a composite index that takes into account of major factors as Zillow Home Value Index (ZHVI) , median household income by state, inflation rates by region.

Variables: Zillow Home Value Index (ZHVI) - Measure of median home value in each state, Median household income by state - average income earned by households in each state, Inflation rates by region - Measure of change in prices of goods and services over time in each region.

Index on healthy housing affordability could be done using below formula: Affordability Score = (ZHVI / Income) * (1 + Inflation)

Based on the Affordability Score, we could categorize states into different tiers of affordability, such as: Affordable: States with a score below 1.0, indicating that housing is relatively affordable compared to income levels. Moderately Affordable: States with a score between 1.0 and 1.5, indicating that housing is somewhat affordable but may be becoming less so over time. Unaffordable: States with a score above 1.5, indicating that housing is significantly unaffordable relative to income levels and inflation rates.

Using a heat map, where each state is colored according to how affordable it is, with a legend showing the range of scores could offer a quick and simple way to discover states that have affordability scores that are consistently high or low over time, and how these scores fluctuate in relation to one another over time.

Did real incomes rise? With CPI prices for the states available between 2000 and 2023 and then compare median real incomes in these cities for the respective periods. First, we convert 2011 median incomes grouped by the city into 2018 dollars by using CPI’s of both time periods.