HW4

Identifying data for Final Project, and Generating Visualuals

Ari Markowitz
2022-02-22

Overview:

This analysis will combine country level data to see how cultural indicators drive COVID-19 vaccination rates around the world, and how different health systems, and strength of those impact that relationship. I will combine three different datasets at the country level:

1: Cultural indicators from Geert Hofstede’s Cultural Dimensions Theory Data https://data.world/adamhelsinger/geerthofstedeculturaldimension

2: National Vaccination Rates from Open-Source Github Repo: https://github.com/owid/covid-19-data

3: National Health System Classification Data - TBD

4: Global Country Indicators (Income, Income Per Capita, Government Type (indexes?), Population etc…) - TBD

Read In Geert Hofstede’s Cultural Dimensions Theory Data From Data.World

culture_indices <- read.csv2("https://query.data.world/s/c4dbzx65xq3bnkf65gnutajtb2566a", header=TRUE, stringsAsFactors=FALSE) %>%
      na_if("#NULL!") %>%
      mutate(country  = replace(country, country == "U.S.A.", "United States"))
head(df)
                                              
1 function (x, df1, df2, ncp, log = FALSE)    
2 {                                           
3     if (missing(ncp))                       
4         .Call(C_df, x, df1, df2, log)       
5     else .Call(C_dnf, x, df1, df2, ncp, log)
6 }                                           

Read in National Vaccination Rates from Open Source Github Repo

"https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations/country_data" %>%
            read_html() %>%
            html_nodes(xpath = '//*[@role="rowheader"]') %>%
            html_nodes('span a') %>%
            html_attr('href') %>%
            sub('blob/', '', .) %>%
            paste0('https://raw.githubusercontent.com', .) %>%
            purrr::map_df(read.csv)  ->  vaccine_data

head(vaccine_data)
     location       date
1 Afghanistan 2021-02-22
2 Afghanistan 2021-02-28
3 Afghanistan 2021-03-16
4 Afghanistan 2021-04-07
5 Afghanistan 2021-04-22
6 Afghanistan 2021-05-11
                                                 vaccine
1                                     Oxford/AstraZeneca
2                                     Oxford/AstraZeneca
3                                     Oxford/AstraZeneca
4                                     Oxford/AstraZeneca
5                                     Oxford/AstraZeneca
6 Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing
                                                                                                    source_url
1                                                                 https://tolonews.com/index.php/health-170225
2                                                                 https://tolonews.com/index.php/health-170355
3                                      http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
4                                      http://www.xinhuanet.com/english/asiapacific/2021-04/07/c_139864755.htm
5 https://reliefweb.int/report/afghanistan/afghanistan-strategic-situation-report-covid-19-no-95-22-april-2021
6                                                                                     https://covid19.who.int/
  total_vaccinations people_vaccinated people_fully_vaccinated
1                  0                 0                      NA
2               8200              8200                      NA
3              54000             54000                      NA
4             120000            120000                      NA
5             240000            240000                      NA
6             504502            448878                   55624
  total_boosters people_partly_vaccinated
1             NA                       NA
2             NA                       NA
3             NA                       NA
4             NA                       NA
5             NA                       NA
6             NA                       NA

Read in Country Level Data from the world bank

Used “wbstat” package for ease of access to World Bank API - https://cran.r-project.org/web/packages/wbstats/vignettes/wbstats.html

my_indicators = c("pop" = "SP.POP.TOTL",
                  "gdp" = "NY.GDP.MKTP.CD")

pop_gdp <- wb_data(my_indicators, start_date = 2020, end_date = 2022) %>% mutate(gdp_per_capita = gdp/pop)

Healthcare System Data

GHED_PVT-DCHE_SHA2011 - Domestic private health expenditure (PVT-D) as percentage of current health expenditure (CHE) (%)

GHED_GGHE-DCHE_SHA2011 - Domestic general government health expenditure (GGHE-D) as percentage of current health expenditure (CHE) (%)

GHED_GGHE-DGGE_SHA2011 - Domestic general government health expenditure (GGHE-D) as percentage of general government expenditure (GGE) (%)

GHED_PVT-D_pc_PPP_SHA2011 - Domestic private health expenditure (PVT-D) per capita in PPP int$

GHED_GGHE-D_pc_PPP_SHA2011 - Domestic general government health expenditure (GGHE-D) per capita in PPP int$

GHED_CHEGDP_SHA2011 - Current health expenditure (CHE) as percentage of gross domestic product (GDP) (%)

GHED_EXT_pc_PPP_SHA2011 - External health expenditure (EXT) per capita in PPP int$

GDO_q9x5_1 - Density of outpatient health centres (per 100,000 population)

DEVICES02 - Total density per 100 000 population: Health centres

who_health_indicators<-fromJSON("https://ghoapi.azureedge.net/api/Indicator?$filter=contains(IndicatorName,%27health%27)")$value

Health_System_Data <- c('GHED_PVT-DCHE_SHA2011','GHED_GGHE-DCHE_SHA2011','GHED_GGHE-DGGE_SHA2011','GHED_PVT-D_pc_PPP_SHA2011','GHED_GGHE-D_pc_PPP_SHA2011','GHED_CHEGDP_SHA2011','GHED_EXT_pc_PPP_SHA2011','GDO_q9x5_1','DEVICES02') %>%     
            paste0('https://ghoapi.azureedge.net/api/', .) %>%
            purrr::map_df({function(x) fromJSON(x)$value}) %>%
            left_join(who_health_indicators %>% 
                                select(IndicatorCode,IndicatorName), by = 'IndicatorCode') %>%
            left_join(fromJSON('https://ghoapi.azureedge.net/api/DIMENSION/COUNTRY/DimensionValues')$value[c("Code","Title")], by = c('SpatialDim'='Code')) %>%
            rename(country = Title, year = TimeDim, who_indicator = IndicatorName, who_indicator_code = IndicatorCode, who_indicator_value = Value ) %>%
            arrange(desc(year))  %>%
            distinct(country, who_indicator, who_indicator_code,.keep_all = TRUE) %>% 
            select(country, who_indicator, who_indicator_code,who_indicator_value)

Join Datasets

df <- vaccine_data %>% 
            select(location, date, people_vaccinated, people_fully_vaccinated, people_partly_vaccinated) %>%
            rename(country = location) %>%
            # arrange(desc(date)) %>%
            # distinct(country, .keep_all= TRUE) %>%
            full_join(culture_indices %>%
                                select(country, pdi, idv, mas, uai, ltowvs, ivr),
                      by='country') %>%
            full_join(pop_gdp %>%
                            select(country, gdp, pop, gdp_per_capita),
                      by = 'country') %>%
            mutate(pop_vax_rate = people_vaccinated/pop, pop_part_vax_rate = people_partly_vaccinated/pop, pop_full_vax_rate = people_fully_vaccinated/pop, date = as.Date(date), year = as.integer(format(as.Date(date), format = "%Y")),gdp_quartile = ntile(gdp_per_capita, 4))  %>%
            left_join(Health_System_Data, by = 'country') %>%
            pivot_longer(c(pdi, idv, mas, uai, ltowvs, ivr),names_to = 'culture_index', values_to = "culture_index_value") %>%
            mutate(culture_index_value = as.numeric(culture_index_value))
          
print(head(df))
# A tibble: 6 × 18
  country     date       people_vaccinated people_fully_vaccinated
  <chr>       <date>                 <int>                   <int>
1 Afghanistan 2021-02-22                 0                      NA
2 Afghanistan 2021-02-22                 0                      NA
3 Afghanistan 2021-02-22                 0                      NA
4 Afghanistan 2021-02-22                 0                      NA
5 Afghanistan 2021-02-22                 0                      NA
6 Afghanistan 2021-02-22                 0                      NA
# … with 14 more variables: people_partly_vaccinated <int>,
#   gdp <dbl>, pop <dbl>, gdp_per_capita <dbl>, pop_vax_rate <dbl>,
#   pop_part_vax_rate <dbl>, pop_full_vax_rate <dbl>, year <int>,
#   gdp_quartile <int>, who_indicator <chr>,
#   who_indicator_code <chr>, who_indicator_value <chr>,
#   culture_index <chr>, culture_index_value <dbl>

Descriptive Statistics of data from most recent date

df %>% 
              arrange(desc(date))  %>%
              distinct(country, .keep_all = TRUE) %>%
              summary()
   country               date            people_vaccinated  
 Length:278         Min.   :2021-02-01   Min.   :       47  
 Class :character   1st Qu.:2022-02-17   1st Qu.:   265860  
 Mode  :character   Median :2022-02-19   Median :  2270897  
                    Mean   :2022-02-09   Mean   : 18811059  
                    3rd Qu.:2022-02-21   3rd Qu.:  9643354  
                    Max.   :2022-02-21   Max.   :962967313  
                    NA's   :54           NA's   :89         
 people_fully_vaccinated people_partly_vaccinated      gdp           
 Min.   :       47       Min.   : NA              Min.   :4.886e+07  
 1st Qu.:   234910       1st Qu.: NA              1st Qu.:7.780e+09  
 Median :  2038507       Median : NA              Median :3.119e+10  
 Mean   : 16083651       Mean   :NaN              Mean   :4.310e+11  
 3rd Qu.:  8042658       3rd Qu.: NA              3rd Qu.:2.031e+11  
 Max.   :774808452       Max.   : NA              Max.   :2.095e+13  
 NA's   :85              NA's   :278              NA's   :84         
      pop            gdp_per_capita    pop_vax_rate    
 Min.   :1.083e+04   Min.   :   239   Min.   :0.00075  
 1st Qu.:7.828e+05   1st Qu.:  2170   1st Qu.:0.44518  
 Median :6.725e+06   Median :  5467   Median :0.67689  
 Mean   :3.581e+07   Mean   : 15312   Mean   :0.59411  
 3rd Qu.:2.571e+07   3rd Qu.: 17700   3rd Qu.:0.77034  
 Max.   :1.411e+09   Max.   :173688   Max.   :1.23787  
 NA's   :62          NA's   :84       NA's   :128      
 pop_part_vax_rate pop_full_vax_rate      year       gdp_quartile  
 Min.   : NA       Min.   :0.00071   Min.   :2021   Min.   :1.000  
 1st Qu.: NA       1st Qu.:0.38049   1st Qu.:2022   1st Qu.:1.000  
 Median : NA       Median :0.58409   Median :2022   Median :2.000  
 Mean   :NaN       Mean   :0.52879   Mean   :2022   Mean   :2.113  
 3rd Qu.: NA       3rd Qu.:0.71747   3rd Qu.:2022   3rd Qu.:3.000  
 Max.   : NA       Max.   :1.21219   Max.   :2022   Max.   :4.000  
 NA's   :278       NA's   :124       NA's   :54     NA's   :84     
 who_indicator      who_indicator_code who_indicator_value
 Length:278         Length:278         Length:278         
 Class :character   Class :character   Class :character   
 Mode  :character   Mode  :character   Mode  :character   
                                                          
                                                          
                                                          
                                                          
 culture_index      culture_index_value
 Length:278         Min.   : 11.00     
 Class :character   1st Qu.: 42.50     
 Mode  :character   Median : 62.00     
                    Mean   : 59.33     
                    3rd Qu.: 72.50     
                    Max.   :104.00     
                    NA's   :200        

Graph Vaccinations Over Time by Country Wealth Quartile

# Keep only 3 names
plot_df <- df %>%
  arrange(gdp_per_capita) %>%
  filter(!is.na(culture_index_value) & !is.na(pop_vax_rate)) %>%
  select(date,pop_vax_rate, country,gdp_quartile) %>%
  distinct()

p <- vector('list', 4)
i <- 1
quartiles <- c('1st Quartile (Poorest)','2nd Quartile','3rd Quartile','4th Quartile (Wealthiest)')
colors <- c('YlOrRd','BuGn','BuPu','PuRd')
for (var in unique(plot_df$gdp_quartile)) {
    dev.new()
    p[[i]] <- plot_df %>% 
              filter(gdp_quartile == var) %>% 
              ggplot(
                aes(x=date, y=pop_vax_rate, group=country, color=country))+
                ggtitle(quartiles[i]) +
                theme(legend.position = "none") +
                geom_line() +
                scale_colour_brewer(palette = colors[i] ) +
                geom_hline(yintercept = .7) +
                ylim(0,1) +
                xlim(as.Date(c(min(plot_df$date), max(plot_df$date)), format="%d/%m/%Y") )
    i <- i+1
}

grid.arrange(p[[1]], p[[2]],p[[3]],p[[4]], nrow = 2,top = textGrob("Vaccination Progress by Country Wealth",gp=gpar(fontsize=20,font=3)))

GRID NOT PRINTING???

This graph shows the relationship between vaccination rate progress and national wealth, measured by GDP per capita. A reference line is included here to denote a 70% vaccination rate (“heard immunity”, more or less(. We can clearly see that the wealthiest nations achieved heard immunity much earlier than all other groups, while many 1st, 2nd and 3rd quartile countries still have not achieved heard immunity. This graph is unable to capture the impact of other confounding factors that may be impacting a nations population vaccination rate progress. Furthermore, we are unable to say if the differences we we see between income groups in vaccination progress is statistically significant.

Scatterplot between Current Vaccination Rates and National Wealth.

corr_df <- df %>% 
              arrange(desc(date))  %>%
              distinct(country, .keep_all = TRUE) %>%
              select(country, pop_vax_rate, gdp_per_capita) %>% 
              na.omit()


p <- corr_df %>% 
     ggplot() +
    ggtitle("National Vaccination Rates by GDP Per Capita") +
     geom_point(aes(x = gdp_per_capita, y = pop_vax_rate, color = country, text = paste("country:", country))) + 
     stat_smooth(aes(x = gdp_per_capita, y = pop_vax_rate), method = "lm", col = "red", se = FALSE) +
     scale_x_log10()


fig <- ggplotly(p)

fig

Here we see a positive relationship between GDP per Capita and national vaccination rates. This implies (as would be expected) that wealthier nations have an easier time getting their population vaccinated, or are able to get their population vaccinated more quickly, given that many vaccination campaigns are still in progress. This visualization does not provide us with any statistical verification of the relationship that it presented visually.

Scatterplot between Current Vaccination Rates and Colectivist/Individualist Index.

corr_df <- df %>% 
              arrange(desc(date))  %>%
              filter(culture_index == 'idv') %>%
              distinct(country, .keep_all = TRUE) %>%
              select(country, pop_vax_rate, culture_index_value) %>% 
              na.omit()


p <- corr_df %>% 
     ggplot() +
    ggtitle("National Vaccination Rates by Collectivist/Individualist Index") +
     geom_point(aes(x = culture_index_value, y = pop_vax_rate, color = country, text = paste("country:", country))) + 
     stat_smooth(aes(x = culture_index_value, y = pop_vax_rate), method = "lm", col = "red", se = FALSE) 


fig <- ggplotly(p)

fig

This scatterplot shows that as countries become more individualistic, and less collectivist, they appear to have more success vaccinating their population. This relationship does not appear to be that strong, and we are unable to verify if it is statistically significant or not. For my final project, I hope to include a button in this graph that allows the user to rapidly switch between the 6 cultural indices.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Markowitz (2022, Feb. 23). Data Analytics and Computational Social Science: HW4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomamarkowitzhw4/

BibTeX citation

@misc{markowitz2022hw4,
  author = {Markowitz, Ari},
  title = {Data Analytics and Computational Social Science: HW4},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomamarkowitzhw4/},
  year = {2022}
}