Final Project Check-in 1

FinalProject
checkin1
Rahul Somu
dataset
ggplot2
Author

Rahul Somu

Published

March 24, 2023

Code
library(tidyverse)
library(readxl)
library(dplyr)
library(ggplot2)
knitr::opts_chunk$set(echo = TRUE)

Overview

The disastrous effects that a highly contagious disease can have on the world have been strongly illustrated by the COVID-19 pandemic. Millions of people have passed away as a consequence of the pandemic and also impacted the lives of billions of people around the world. Current state of affairs has brought to light the necessity for research on factor and tactics to effectively combat pandemics in the future.

It’s critical to comprehend the variables affecting COVID-19 mortality as the pandemic spreads further. The goal of this study is to look at the correlations between a nation’s COVID-19 mortality rate and its population density, median age, GDP per-capita, prevalence of diabetes, hospital beds per 1,000 people, and human development index.

In this project I’m aiming to research To what extent do these socioeconomic factors contribute to the variation in COVID-19 mortality rate across the world and derive the relationship of COVID-19 mortality rate with population density,median age, GDP per capita, diabetes prevalence, hospital beds per thousand people and human development index.

#DataSet

The data set contains time series data of around 193 countries around the world. There are around 84,000 records of the countries over the period of time.

Datasource: https://www.kaggle.com/datasets/fedesoriano/coronavirus-covid19-vaccinations-data

Code
df <- read_excel("_data/COVID_Data.xlsx")
df_selected <- df[,c("iso_code","continent","location","date","total_cases_per_million","population_density","median_age",
"gdp_per_capita","diabetes_prevalence","hospital_beds_per_thousand","human_development_index")]
dataset_dim <- (dim(df_selected))
dataset_dim
[1] 84772    11
Code
countries_count <- (length(unique(df_selected$location)))
countries_count
[1] 193
Code
countries_list <- (unique(df_selected$location))

head(df_selected)
# A tibble: 6 × 11
  iso_code conti…¹ locat…² date  total…³ popul…⁴ media…⁵ gdp_p…⁶ diabe…⁷ hospi…⁸
  <chr>    <chr>   <chr>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 AFG      Asia    Afghan… 2020…   0.873    54.4    18.6   1804.    9.59     0.5
2 AFG      Asia    Afghan… 2020…   1.05     54.4    18.6   1804.    9.59     0.5
3 AFG      Asia    Afghan… 2020…   1.10     54.4    18.6   1804.    9.59     0.5
4 AFG      Asia    Afghan… 2020…   1.95     54.4    18.6   1804.    9.59     0.5
5 AFG      Asia    Afghan… 2020…   2.06     54.4    18.6   1804.    9.59     0.5
6 AFG      Asia    Afghan… 2020…   2.34     54.4    18.6   1804.    9.59     0.5
# … with 1 more variable: human_development_index <dbl>, and abbreviated
#   variable names ¹​continent, ²​location, ³​total_cases_per_million,
#   ⁴​population_density, ⁵​median_age, ⁶​gdp_per_capita, ⁷​diabetes_prevalence,
#   ⁸​hospital_beds_per_thousand
Code
summary(df_selected)
   iso_code          continent           location             date          
 Length:84772       Length:84772       Length:84772       Length:84772      
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
 total_cases_per_million population_density   median_age    gdp_per_capita    
 Min.   :     0.02       Min.   :    1.98   Min.   :15.10   Min.   :   661.2  
 1st Qu.:   454.56       1st Qu.:   36.25   1st Qu.:22.20   1st Qu.:  4466.5  
 Median :  2579.10       Median :   82.60   Median :30.60   Median : 13367.6  
 Mean   : 14350.40       Mean   :  361.01   Mean   :30.76   Mean   : 19633.0  
 3rd Qu.: 16110.12       3rd Qu.:  205.86   3rd Qu.:39.60   3rd Qu.: 27936.9  
 Max.   :179667.38       Max.   :19347.50   Max.   :48.20   Max.   :116935.6  
 NA's   :1               NA's   :4819       NA's   :5783    NA's   :6696      
 diabetes_prevalence hospital_beds_per_thousand human_development_index
 Min.   : 0.990      Min.   : 0.100             Min.   :0.394          
 1st Qu.: 5.290      1st Qu.: 1.300             1st Qu.:0.602          
 Median : 7.110      Median : 2.400             Median :0.756          
 Mean   : 7.651      Mean   : 3.047             Mean   :0.731          
 3rd Qu.: 9.740      3rd Qu.: 4.200             3rd Qu.:0.852          
 Max.   :22.020      Max.   :13.800             Max.   :0.957          
 NA's   :4872        NA's   :12687              NA's   :5802           

Methodology:

Multiple linear regression models will be used to carry out the analysis. The socioeconomic determinants will be the independent variables, where as the COVID-19 mortality rate will be the dependent variable.

Expected Results:

The findings of this investigation will aid in understanding the variables affecting the COVID-19 mortality rate. Population density, median age, diabetes prevalence, and hospital beds per thousand people are anticipated to have a positive correlation with the COVID-19 mortality rate, whereas GDP per capita and the human development index are anticipated to have a negative correlation. Planning public health policies and actions to lessen the effects of COVID-19 will benefit from the findings.

Conclusion:

The goal of this study is to understand the relationship between socioeconomic factors and the COVID-19 mortality rate. The results will be helpful in planning public health policy and initiatives and will shed light on the factors that affect the COVID-19 death rate.