Data Analytics and Computational Social Science: HNguyen HW 3

Henry Nguyen

Introduction

This is homework number three. As a results of highly effective, tolerable antiretroviral therapy and effective prevention strategies, there has been a decrease in incident HIV infections and deaths over the years. Unfortunately, some states are still experiencing more incident infections than others. There are two data sets that this project will look at. The first is state level HIV incidence data and the second is state level health care GDP.

Load Packages

library(tidyverse)
library(dplyr)

Import Data HIV Incidence Data

The first data set looks at state level annual HIV incident infections¹. The variables in this data set:

Cases - This is numerical data. It is the number of incident cases of HIV in each observation year.
Year - This is numerical data. It it the year in which each observation occurred.
State - This is character data. This is the state in which the observation occured.

# Data set #1
# Data obtained from here:https://gis.cdc.gov/grasp/nchhstpatlas/tables.html

HIV.State <- read_csv("HIV.by.State.CSV", skip = 10)

HIV.State <- select(HIV.State, "Year", "Geography", "Cases", "Rate per 100000")

Tidy HIV Incidence Data

str(HIV.State)

tibble [714 × 4] (S3: tbl_df/tbl/data.frame)
 $ Year           : num [1:714] 2021 2020 2019 2018 2017 ...
 $ Geography      : chr [1:714] "Alabama" "Alabama" "Alabama" "Alabama" ...
 $ Cases          : num [1:714] 314 586 638 607 650 653 663 664 630 663 ...
 $ Rate per 100000: chr [1:714] "Data not available" "Data not available" "15.5" "14.8" ...

head(HIV.State)

# A tibble: 6 × 4
   Year Geography Cases `Rate per 100000` 
  <dbl> <chr>     <dbl> <chr>             
1  2021 Alabama     314 Data not available
2  2020 Alabama     586 Data not available
3  2019 Alabama     638 15.5              
4  2018 Alabama     607 14.8              
5  2017 Alabama     650 15.9              
6  2016 Alabama     653 16.0

Import State GDP Data

This second dataset looks at annual state level GDP².

The variables in this data set:

GeoName - This is character data that describes the state of each observation.
Description - This is character data that describes what GDP sector of each observation.
GDP - This is numerical data that describes the GDP in billions in current dollars.

# This is dataset #2
# This data was obatined from https://apps.bea.gov/itable/iTable.cfm?ReqID=70&step=1

GDP.State <- read_csv("StateGDP.csv", skip = 4)

Tidy State GDP data

GDP.State <- GDP.State %>% 
  select(2,4:17)

GDP.State <- GDP.State %>%
  pivot_longer(
    `2008`: `2020`,
    names_to = "Year",
    values_to = "GDP"
  )

str(GDP.State)

tibble [2,405 × 4] (S3: tbl_df/tbl/data.frame)
 $ GeoName    : chr [1:2405] "United States *" "United States *" "United States *" "United States *" ...
 $ Description: chr [1:2405] "Health care and social assistance" "Health care and social assistance" "Health care and social assistance" "Health care and social assistance" ...
 $ Year       : chr [1:2405] "2008" "2009" "2010" "2011" ...
 $ GDP        : num [1:2405] 1017197 1078771 1112327 1149944 1195074 ...

head(GDP.State)

# A tibble: 6 × 4
  GeoName         Description                       Year      GDP
  <chr>           <chr>                             <chr>   <dbl>
1 United States * Health care and social assistance 2008  1017197
2 United States * Health care and social assistance 2009  1078771
3 United States * Health care and social assistance 2010  1112327
4 United States * Health care and social assistance 2011  1149944
5 United States * Health care and social assistance 2012  1195074
6 United States * Health care and social assistance 2013  1230767

Research Questions

What were the trend in incident HIV infections from 2008-2021 across states?
What were the trend in healthcare GDP by state from 2008-2020?
Are there any associations between healthcare GDP by state and incident infections?

Comment on this article Share:

HNguyen HW 3