library(tidyverse)
library(dplyr)
library(ggplot2)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Final Project Assignment#1: Project & Data Description
Overview of the Final Project
Introduction:
Housing affordability is one of the major problems in the United States. With many households struggling to afford the rising cost of housing, it is critical to analyze the data trends over the country. According to statistics from the U.S. Census Bureau on median household income by city and region from 1996 to the present, Housing affordability has changed significantly over the time.
Although there has been general increase in median household income over the previous few decades, it has not been uniform across all the regions. Some regions have seen greater growth than others and similar swings have been observed in the median home prices, with some regions seeing more significant price increases than others.
The widening disparity between median household incomes and median property prices is one of the most obvious patterns in the statistics which is becoming more challenging for many Americans to afford homeownership as median property prices in many locations have been rising more quickly than median household incomes. The affordability issue has also been made worse by the fact that, despite the median household income rising, it has not kept up with the cost of living.
Overall, according to data from the U.S. Census Bureau, the affordability of housing has become a major worry over time, with many Americans finding it difficult to keep up with the rising cost of housing. The data emphasize the necessity of making policy changes to address the housing affordability crisis and guarantee that all Americans have access to safe, reasonably priced housing.
As part of this project I’m trying to focus on below: Visualize the trends in home values, the average household income, and inflation rates through time, broken down by area and city size. Create graphs showing the evolution of housing affordability over time in the country’s largest cities and how it differs by area and city size.
Data Sources:
As part of this project I found below resources that will help me understand the issue. Zillow Home Value Index (ZHVI) - a dataset from Zillow with the median home value for each region and city in the United States from 1996 to the present. https://www.kaggle.com/datasets/robikscube/zillow-home-value-index?resource=download
U.S. Census Bureau data on median household income by city and region from 1996 to the present. https://www.census.gov/data/tables/time-series/demo/income-poverty/historical-income-households.html
U.S. Bureau of Labor Statistics data on inflation rates by region and year from 1996 to the present. https://www.bls.gov/cpi/regional-resources.htm
- Util to Create a look up for states to region
<- function(state) {
state_to_region <- list(
region_lookup Northeast = c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "New Jersey", "New York", "Pennsylvania", "Rhode Island", "Vermont"),
South = c("Delaware", "District of Columbia", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "West Virginia", "Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas", "Louisiana", "Oklahoma", "Texas"),
Midwest = c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin", "Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota"),
West = c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming", "Alaska", "California", "Hawaii", "Oregon", "Washington")
)<- NA
region for (r in names(region_lookup)) {
if (state %in% region_lookup[[r]]) {
<- r
region break
}
}return(region)
}
- Read The Datasets
<- read_excel("Rahul_Final_projectData/income_state.xlsx")
income_data
<- read_excel("Rahul_Final_projectData/USA_inflation_data.xlsx")
df_inflation
<- read.csv("Rahul_Final_projectData/ZHVI.csv") df_rentalindex
- Create functions for getting data for each state
#Function to get income data by state and year
<- function(income_data, year, state){
get_median_se <- which(income_data$State == state)
rownum <- which(colnames(income_data) == year)
colnum <- as.numeric(income_data[rownum, colnum])
median_income <- as.numeric(income_data[rownum, colnum+1])
se #diff <- median_income - se
return(c(as.character(median_income), as.character(se)))
}
get_median_se(income_data, "2021", "Alabama")
[1] "56929" "2294"
#Function to get inflation data
<- function(df, state, year) {
avg_by_region_and_year <- state_to_region(state)
region <- subset(df, Region == region & Year == year)
subset_df <- apply(subset_df[, 4:ncol(subset_df)], 1, mean, na.rm = TRUE)
row_avgs <- mean(row_avgs, na.rm = TRUE)
overall_avg return(overall_avg)
}
# example usage
avg_by_region_and_year(df_inflation, "Texas", 2000)
[1] 167.4143
$Date <- as.Date(df_rentalindex$Date, format = "%Y-%m-%d")
df_rentalindex
# function to calculate average rental index for a state and year
<- function(df,state, year) {
get_state_rentalindex <- subset(df, format(as.Date(Date), "%Y") == year)
df_filtered <- mean(df_filtered[[state]])
state_avg return(state_avg)
}# example usage
get_state_rentalindex(df_rentalindex,"California", "2000")
[1] 192840.2
- present the descriptive information of the dataset(s) using the functions in Challenges 1, 2, and 3;
dim(income_data)
[1] 53 79
dim(df_inflation)
[1] 96 17
dim(df_rentalindex)
[1] 278 52
head(income_data)
- conduct summary statistics of the dataset(s); especially show the basic statistics (min, max, mean, median, etc.) for the variables you are interested in.
summary(df_rentalindex)
Date Virginia California Florida
Min. :2000-01-01 Min. :114448 Min. :183263 Min. :102232
1st Qu.:2005-10-08 1st Qu.:197891 1st Qu.:292524 1st Qu.:136109
Median :2011-07-16 Median :219631 Median :388784 Median :173778
Mean :2011-07-17 Mean :219795 Mean :400685 Mean :188378
3rd Qu.:2017-04-23 3rd Qu.:245837 3rd Qu.:471709 3rd Qu.:230129
Max. :2023-02-01 Max. :352788 Max. :758837 Max. :382968
Texas Nevada New.York New.Jersey
Min. :103491 Min. :126840 Min. :122871 Min. :151148
1st Qu.:123530 1st Qu.:167170 1st Qu.:204061 1st Qu.:247960
Median :130655 Median :220389 Median :219991 Median :279515
Mean :151983 Mean :237218 Mean :232584 Mean :281958
3rd Qu.:177563 3rd Qu.:303531 3rd Qu.:253952 3rd Qu.:317507
Max. :295310 Max. :440144 Max. :410896 Max. :440368
Arizona Tennessee West.Virginia Georgia
Min. :127957 Min. : 92784 Min. : 58201 Min. :106693
1st Qu.:151316 1st Qu.:111305 1st Qu.: 78732 1st Qu.:126001
Median :189856 Median :120526 Median : 83140 Median :140309
Mean :210120 Mean :137849 Mean : 88808 Mean :153769
3rd Qu.:252483 3rd Qu.:150538 3rd Qu.:100642 3rd Qu.:159109
Max. :437047 Max. :285261 Max. :143983 Max. :303796
NA's :1 NA's :1
North.Carolina Massachusetts Hawaii Colorado
Min. :116314 Min. :183093 Min. :176017 Min. :169818
1st Qu.:135109 1st Qu.:285744 1st Qu.:369539 1st Qu.:214326
Median :145608 Median :317222 Median :419144 Median :228125
Mean :159663 Mean :333554 Mean :438470 Mean :278369
3rd Qu.:165305 3rd Qu.:369005 3rd Qu.:554296 3rd Qu.:339056
Max. :303166 Max. :557412 Max. :861627 Max. :556750
Michigan South.Carolina Vermont Minnesota
Min. : 79031 Min. :100082 Min. : 88760 Min. :112685
1st Qu.:100723 1st Qu.:122438 1st Qu.:150899 1st Qu.:156275
Median :116302 Median :134929 Median :168068 Median :177589
Mean :121370 Mean :145298 Mean :174899 Mean :186909
3rd Qu.:127914 3rd Qu.:154536 3rd Qu.:197476 3rd Qu.:205064
Max. :211833 Max. :267835 Max. :319746 Max. :314021
Indiana Washington Illinois Oregon
Min. : 90242 Min. :166298 Min. :108150 Min. :142976
1st Qu.:103664 1st Qu.:216225 1st Qu.:132659 1st Qu.:188999
Median :109106 Median :259786 Median :150891 Median :229912
Mean :121298 Mean :289661 Mean :154393 Mean :254140
3rd Qu.:127750 3rd Qu.:326684 3rd Qu.:173109 3rd Qu.:306743
Max. :217822 Max. :588018 Max. :229898 Max. :498337
Connecticut Utah North.Dakota Montana
Min. :156392 Min. :161827 Min. :130966 Min. :144789
1st Qu.:227901 1st Qu.:178131 1st Qu.:139510 1st Qu.:167533
Median :246940 Median :212460 Median :191458 Median :182144
Mean :248507 Mean :245073 Mean :177292 Mean :216615
3rd Qu.:272082 3rd Qu.:274825 3rd Qu.:196953 3rd Qu.:242717
Max. :349739 Max. :529454 Max. :235975 Max. :434557
NA's :108 NA's :61
New.Hampshire Ohio Missouri Maryland
Min. :125228 Min. : 94815 Min. : 84986 Min. :138999
1st Qu.:197957 1st Qu.:104970 1st Qu.:108511 1st Qu.:223950
Median :219170 Median :112999 Median :118146 Median :255487
Mean :232154 Mean :120513 Mean :126813 Mean :255882
3rd Qu.:245854 3rd Qu.:123063 3rd Qu.:134601 3rd Qu.:296146
Max. :421924 Max. :199518 Max. :221340 Max. :381710
Idaho Wisconsin Kansas Pennsylvania
Min. :115460 Min. :102168 Min. : 72593 Min. : 87544
1st Qu.:136507 1st Qu.:128707 1st Qu.: 92543 1st Qu.:135443
Median :170850 Median :142384 Median : 98642 Median :143779
Mean :197510 Mean :150836 Mean :109245 Mean :146101
3rd Qu.:210888 3rd Qu.:158847 3rd Qu.:119907 3rd Qu.:154896
Max. :470980 Max. :250226 Max. :205073 Max. :237774
NA's :1 NA's :2
Oklahoma District.of.Columbia Nebraska Alaska
Min. : 70112 Min. :147817 Min. : 99521 Min. :125485
1st Qu.: 87542 1st Qu.:326460 1st Qu.:115634 1st Qu.:199817
Median : 94748 Median :358798 Median :120794 Median :229922
Mean :103287 Mean :389287 Mean :135590 Mean :232338
3rd Qu.:112466 3rd Qu.:501651 3rd Qu.:148736 3rd Qu.:263731
Max. :185171 Max. :654749 Max. :236836 Max. :343188
NA's :2
New.Mexico Wyoming Delaware Rhode.Island
Min. :123686 Min. :130764 Min. :132902 Min. :125136
1st Qu.:152562 1st Qu.:181361 1st Qu.:193728 1st Qu.:209214
Median :164576 Median :190339 Median :220943 Median :234743
Mean :170653 Mean :202147 Mean :221938 Mean :243631
3rd Qu.:178281 3rd Qu.:222466 3rd Qu.:247511 3rd Qu.:278837
Max. :277310 Max. :319529 Max. :351269 Max. :404953
NA's :27 NA's :27
Maine Louisiana Arkansas Alabama
Min. : 92612 Min. : 83285 Min. : 69180 Min. : 76906
1st Qu.:153086 1st Qu.:110090 1st Qu.: 91542 1st Qu.: 92004
Median :166698 Median :124074 Median : 97689 Median :103510
Mean :177752 Mean :126170 Mean :103042 Mean :111737
3rd Qu.:189259 3rd Qu.:140977 3rd Qu.:113094 3rd Qu.:122264
Max. :344147 Max. :180360 Max. :173423 Max. :199357
NA's :1
Mississippi South.Dakota Iowa Kentucky
Min. : 66101 Min. : 90148 Min. : 75996 Min. : 78366
1st Qu.: 83544 1st Qu.:114269 1st Qu.: 97729 1st Qu.: 96390
Median : 88905 Median :124725 Median :101922 Median :101124
Mean : 96463 Mean :144248 Mean :113321 Mean :109261
3rd Qu.:111732 3rd Qu.:169450 3rd Qu.:126571 3rd Qu.:116845
Max. :156641 Max. :275894 Max. :194074 Max. :184542
NA's :1
Visualisation:
To visualize the affordability, I’m planning the states into three categories : “Highly Affordable and Healthy”, “Moderately Affordable and Healthy”, and “Unaffordable and Unhealthy”. To categorize the states, a composite index that takes into account of major factors as Zillow Home Value Index (ZHVI) , median household income by state, inflation rates by region.
Variables: Zillow Home Value Index (ZHVI) - Measure of median home value in each state, Median household income by state - average income earned by households in each state, Inflation rates by region - Measure of change in prices of goods and services over time in each region.
Index on healthy housing affordability could be done using below formula: Affordability Score = (ZHVI / Income) * (1 + Inflation)
Based on the Affordability Score, we could categorize states into different tiers of affordability, such as: Affordable: States with a score below 1.0, indicating that housing is relatively affordable compared to income levels. Moderately Affordable: States with a score between 1.0 and 1.5, indicating that housing is somewhat affordable but may be becoming less so over time. Unaffordable: States with a score above 1.5, indicating that housing is significantly unaffordable relative to income levels and inflation rates.
Using a heat map, where each state is colored according to how affordable it is, with a legend showing the range of scores could offer a quick and simple way to discover states that have affordability scores that are consistently high or low over time, and how these scores fluctuate in relation to one another over time.
Did real incomes rise? With CPI prices for the states available between 2000 and 2023 and then compare median real incomes in these cities for the respective periods. First, we convert 2011 median incomes grouped by the city into 2018 dollars by using CPI’s of both time periods.