true
true
true
Author

Linda Humphrey

Published

May 22, 2023

Code
library(tidyverse)
library(ggplot2)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Introduction

In sub-Saharan Africa, malaria is one of the leading causes of illness and death. The disease affects millions of people each year, particularly young children and pregnant women.

Reading dataset

As we can see from the graph, the number of malaria cases in sub-Saharan Africa has been decreasing over the past decade. This is partly due to increased efforts to control and prevent the disease, including the distribution of insecticide-treated bed nets and the use of antimalarial drugs. However, the fight against malaria is far from over. According to WHO, sub-Saharan Africa still accounted for 94% of all malaria cases and deaths in 2019. The visualization below shows the number of deaths FROM 2000 UPT 2020 due to malaria in sub-Saharan Africa over time:

Code
library(tidyverse)
library(tidyr)

# Read in the csv file
MALARIA_EST_DEATHS <- read.csv("~/Desktop/601_Spring_2023/posts/LindaHumphrey_FinalProjectDataFolder/MALARIA_EST_DEATHS.csv")

View(MALARIA_EST_DEATHS)

Tidy Malaria_imported data.

Code
library(tidyverse)
library(tidyr)

# Read in the csv file
MALARIA_IMPORTED <- read.csv("~/Desktop/601_Spring_2023/posts/LindaHumphrey_FinalProjectDataFolder/MALARIA_IMPORTED.csv")

MALARIA_IMPORTED <- MALARIA_IMPORTED %>%
  select(X, Number.of.imported.malaria.cases, Number.of.imported.malaria.cases.1, Number.of.imported.malaria.cases.2, Number.of.imported.malaria.cases.3 ) %>%
  rename('Country' = X, 'Number.of.imported.malaria.cases' = 2020, 'Number.of.imported.malaria.cases.1' = 2019, 'Number.of.imported.malaria.cases.2' = 2018, 'Number.of.imported.malaria.cases.3' = 2017)
Error in `rename()`:
! Can't rename columns that don't exist.
ℹ Locations 2020, 2019, 2018, and 2017 don't exist.
ℹ There are only 5 columns.

EXPLORE DATA

Data visualizations can help us understand the extent of the problem. For instance, the World Health Organization (WHO) provides data on the number of confirmed malaria cases in different regions of the world. The visualization below shows the number of cases in sub-Saharan Africa over time:

Code
library(ggplot2)

# Read in the csv file
MALARIA_IMPORTED <- read.csv("~/Desktop/601_Spring_2023/posts/LindaHumphrey_FinalProjectDataFolder/MALARIA_IMPORTED.csv")

ggplot(data = MALARIA_IMPORTED, aes(x = , y = Cases)) +
  geom_line() +
  labs(title = "Malaria Cases in Sub-Saharan Africa",
       x = "X",
       y = "Number of Cases")
Error in `geom_line()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'Cases' not found

Exploratory Data Analysis

As we can see from the graph, although the number of deaths due to malaria has decreased over the past decade, it still remains a significant public health issue in the region. One of the challenges in fighting malaria in sub-Saharan Africa is the high incidence of drug-resistant strains of the malaria parasite. The visualization below shows the percentage of confirmed malaria cases in sub-Saharan Africa that were resistant to antimalarial drugs:

Code
# Load required packages
library(tidyverse)
library(ggplot2)

#Read data file incidence-of-malaria
MALARIA_IMPORTED <- read.csv("~/Desktop/601_Spring_2023/posts/LindaHumphrey_FinalProjectDataFolder/MALARIA_IMPORTED.csv")

# View data using head() function to display first few rows
head(MALARIA_IMPORTED)
           X Number.of.imported.malaria.cases
1    Country                             2020
2    Algeria                            2 725
3  Argentina                                 
4    Armenia                                3
5 Azerbaijan                                 
6 Bangladesh                                2
  Number.of.imported.malaria.cases.1 Number.of.imported.malaria.cases.2
1                               2019                               2018
2                              1 014                              1 241
3                                                                    23
4                                                                     6
5                                  0                                  2
6                                  6                                 41
  Number.of.imported.malaria.cases.3 Number.of.imported.malaria.cases.4
1                               2017                               2016
2                                446                                420
3                                 18                                  7
4                                  2                                  2
5                                  1                                  1
6                                 19                                109
  Number.of.imported.malaria.cases.5 Number.of.imported.malaria.cases.6
1                               2015                               2014
2                                727                                260
3                                 11                                 15
4                                  2                                  1
5                                  1                                  2
6                                129                                   
  Number.of.imported.malaria.cases.7 Number.of.imported.malaria.cases.8
1                               2013                               2012
2                                587                                828
3                                 11                                 16
4                                  0                                  4
5                                  4                                  1
6                                                                      
  Number.of.imported.malaria.cases.9 Number.of.imported.malaria.cases.10
1                               2011                                2010
2                                187                                 396
3                                 28                                  55
4                                  0                                   1
5                                  4                                   2
6                                                                       
Code
# Check the structure of the data using the str() function.
str(MALARIA_IMPORTED)
'data.frame':   59 obs. of  12 variables:
 $ X                                  : chr  "Country" "Algeria" "Argentina" "Armenia" ...
 $ Number.of.imported.malaria.cases   : chr  " 2020" "2 725" "" "3" ...
 $ Number.of.imported.malaria.cases.1 : chr  " 2019" "1 014" "" "" ...
 $ Number.of.imported.malaria.cases.2 : chr  " 2018" "1 241" "23" "6" ...
 $ Number.of.imported.malaria.cases.3 : chr  " 2017" "446" "18" "2" ...
 $ Number.of.imported.malaria.cases.4 : chr  " 2016" "420" "7" "2" ...
 $ Number.of.imported.malaria.cases.5 : chr  " 2015" "727" "11" "2" ...
 $ Number.of.imported.malaria.cases.6 : chr  " 2014" "260" "15" "1" ...
 $ Number.of.imported.malaria.cases.7 : chr  " 2013" "587" "11" "0" ...
 $ Number.of.imported.malaria.cases.8 : chr  " 2012" "828" "16" "4" ...
 $ Number.of.imported.malaria.cases.9 : chr  " 2011" "187" "28" "0" ...
 $ Number.of.imported.malaria.cases.10: chr  " 2010" "396" "55" "1" ...
Code
# Check for missing values using the summary() function. 
summary(MALARIA_IMPORTED)
      X             Number.of.imported.malaria.cases
 Length:59          Length:59                       
 Class :character   Class :character                
 Mode  :character   Mode  :character                
 Number.of.imported.malaria.cases.1 Number.of.imported.malaria.cases.2
 Length:59                          Length:59                         
 Class :character                   Class :character                  
 Mode  :character                   Mode  :character                  
 Number.of.imported.malaria.cases.3 Number.of.imported.malaria.cases.4
 Length:59                          Length:59                         
 Class :character                   Class :character                  
 Mode  :character                   Mode  :character                  
 Number.of.imported.malaria.cases.5 Number.of.imported.malaria.cases.6
 Length:59                          Length:59                         
 Class :character                   Class :character                  
 Mode  :character                   Mode  :character                  
 Number.of.imported.malaria.cases.7 Number.of.imported.malaria.cases.8
 Length:59                          Length:59                         
 Class :character                   Class :character                  
 Mode  :character                   Mode  :character                  
 Number.of.imported.malaria.cases.9 Number.of.imported.malaria.cases.10
 Length:59                          Length:59                          
 Class :character                   Class :character                   
 Mode  :character                   Mode  :character                   
Code
# Plot the distribution of the Incidence variable using a histogram with the ggplot() function.
ggplot(MALARIA_IMPORTED, aes(x = Incidence.of.malaria..per.1.000.population.at.risk.)) +
  geom_histogram()
Error in `geom_histogram()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'Incidence.of.malaria..per.1.000.population.at.risk.' not found
Code
#Plot the incidence of malaria over time using a line chart with the ggplot() function. 
ggplot(MALARIA_IMPORTED, aes(x = Year, y = Incidence.of.malaria..per.1.000.population.at.risk., color = Entity)) +
  geom_line()
Error in `geom_line()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'Year' not found
Code
# Calculate the mean and standard deviation of the incidence of malaria for each country using the group_by() and summarize() functions from the dplyr package, which is part of the tidyverse.
incidence_of_malaria_summary <- MALARIA_IMPORTED %>%
  group_by(Entity) %>%
  summarize(mean_Incidence.of.malaria..per.1.000.population.at.risk. = mean(Incidence.of.malaria..per.1.000.population.at.risk.), 
            sd_Incidence.of.malaria..per.1.000.population.at.risk. = sd(Incidence.of.malaria..per.1.000.population.at.risk.))
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `Entity` is not found.
Code
# Plot the mean incidence of malaria for each country using a bar chart with error bars showing the standard deviation
ggplot(incidence_of_malaria_summary, aes(x = Entity, y = mean_Incidence.of.malaria..per.1.000.population.at.risk., fill = Entity )) +
  geom_bar(stat = "Identity") +
  geom_errorbar(aes(ymin = mean_Incidence.of.malaria..per.1.000.population.at.risk. - sd_Incidence.of.malaria..per.1.000.population.at.risk., ymax = mean_Incidence.of.malaria..per.1.000.population.at.risk. + sd_Incidence.of.malaria..per.1.000.population.at.risk.), width = 0.4, position = position_dodge(width = 0.9)) +
  coord_flip() +
  labs(x = "", y = "Mean Incidence of Malaria", title = "Mean Incidence of malaria by Entity")
Error in ggplot(incidence_of_malaria_summary, aes(x = Entity, y = mean_Incidence.of.malaria..per.1.000.population.at.risk., : object 'incidence_of_malaria_summary' not found

Graph of Malaria drug resistance.

As we can see from the graph, drug resistance is a significant problem in many countries in the region. This highlights the need for ongoing research and development of new antimalarial drugs that can effectively treat drug-resistant strains of the malaria parasite. In conclusion, while progress has been made in the fight against malaria in sub-Saharan Africa, there is still much work to be done. Data visualizations can help us understand the extent of the problem and identify areas where additional resources and interventions are needed. With continued efforts and investment, we can hope to see further reductions in the burden of malaria in the region.

Code
library(ggplot2)

# #Read data file incidence-of-malaria
MALARIA_IMPORTED <- read.csv("~/Desktop/601_Spring_2023/posts/LindaHumphrey_FinalProjectDataFolder/MALARIA_IMPORTED.csv", stringsAsFactors = FALSE)

ggplot(MALARIA_IMPORTED, aes(x = Year)) +
  geom_histogram(binwidth = 5, color = "black", fill = "steelblue") +
  labs(x = "Incidence", y = "Count", 
       title = "Histogram of Malaria Incidence")
Error in `geom_histogram()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'Year' not found

Line Graph of Malaria Prevalence

To create a line graph of malaria prevalence over time, we need to calculate the prevalence of malaria for each survey year. In this case, we only have data for 2017-2018, so we can just calculate the prevalence for those two years. We can use the aggregate() function to calculate the mean prevalence of malaria for each survey year:

Code
library(ggplot2)
library(hrbrthemes)

#Read data file incidence-of-malaria
MALARIA_IMPORTED <- read.csv("~/Desktop/601_Spring_2023/posts/LindaHumphrey_FinalProjectDataFolder/MALARIA_IMPORTED.csv", stringsAsFactors = FALSE)

# a line graph of the incidence of malaria over time for each country
ggplot(MALARIA_IMPORTED, aes(x = Year, y = Incidence.of.malaria..per.1.000.population.at.risk.)) +
  geom_line() +
  geom_point() +
  viridis::scale_color_viridis(discrete = TRUE) +
  labs(title = "Incidence of Malaria Over Time",
       x = "Year",
       y = "Incidence")
Error in `geom_line()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `FUN()`:
! object 'Year' not found

Data modeling

This code reads the incidence-of-malaria.csv file, fits a linear regression model to the data, and obtains a summary of the model, including coefficients and p-values. Interpret the results with caution.

Code
#Read data file incidence-of-malaria
MALARIA_IMPORTED <- read.csv("~/Desktop/601_Spring_2023/posts/LindaHumphrey_FinalProjectDataFolder/MALARIA_IMPORTED.csv", stringsAsFactors = FALSE)


model <- lm(Incidence.of.malaria..per.1.000.population.at.risk. ~ Year + Entity, data = MALARIA_IMPORTED)
Error in eval(predvars, data, env): object 'Incidence.of.malaria..per.1.000.population.at.risk.' not found
Code
summary(model)
Error in summary(model): object 'model' not found

Evaluation and Visualization

Finally, we can evaluate the performance of our model and visualize. For example, we can use the following commands to create a confusion matrix and visualize the ROC curve

Data Visualization

Data visualization of malaria prevalence in Tanzania, using the ggplot2 package to create a bar chart

Data Variations