Kimble HW 3

This is my HW 3 for DACSS 601.

Karen Kimble
2022-04-07
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readxl)
library(dplyr)

The dataset I have chosen for the final project is the Social Progress Index report containing data from 2011-2021. The mission of the Social Progress Index is to measure if people have what they need to adequately support their well-being and floruish in society. They look at if people have their basic needs met, are well-nourished, feel safe, are discriminated against, etc. There are a lot of variables within this dataset, all part of three overarching categories: Basic Human Needs, Foundations of Wellbeing, and Opportunity. The categories’ scores are the averages of all their components, and the overall Social Progress score for each country is the average of the three.

Basic Human Needs:

Foundations of Wellbeing

Opportunity

As you can see, there are a large number of variables with differet indicators for society. For the purposes of my final paper, I will primarily be focusing on the main indicators of each section: Nutrition and Basic Medical Care, Water and Sanitation, Shelter, Personal Safety, Access to Knowledge, Access to Info/ Communications, Health and Wellness, Environmental Quality, Personal Rights, Personal Freedom/Choice, Inclusiveness, and Access to Advanced Education.

SPI <- read_excel("Social Progress Index.xlsx", sheet = "2011-2021 data")
head(SPI)
# A tibble: 6 × 76
  `SPI Rank` Country `SPI country code` `SPI \r\nyear` Status
       <dbl> <chr>   <chr>                       <dbl> <chr> 
1         NA World   WWW                          2021 <NA>  
2         NA World   WWW                          2020 <NA>  
3         NA World   WWW                          2019 <NA>  
4         NA World   WWW                          2018 <NA>  
5         NA World   WWW                          2017 <NA>  
6         NA World   WWW                          2016 <NA>  
# … with 71 more variables: `Social Progress Index` <dbl>,
#   `Basic Human Needs` <dbl>, `Foundations of Wellbeing` <dbl>,
#   Opportunity <dbl>, ...10 <lgl>,
#   `Nutrition and Basic Medical Care` <dbl>,
#   `Water and Sanitation` <dbl>, Shelter <dbl>,
#   `Personal Safety` <dbl>, `Access to Basic Knowledge` <dbl>,
#   `Access to Information and Communications` <dbl>, …

Cleaning Data

There is not much that needs cleaning within this dataset as the Social Progress Index website publishes it relatively clean (especially for an Excel sheet). All I need to do is take out two blank columns and name the variables.

SPI$...10 <- NULL
SPI$...23 <- NULL

colnames(SPI) <- c("Rank",
                   "Country",
                   "Country code",
                   "Year",
                   "Status",
                   "SPI",
                   "Needs",
                   "Wellbeing",
                   "Opportunity",
                   "Nutrition and care",
                   "Sanitation",
                   "Shelter",
                   "Safety",
                   "Access knowledge",
                   "Info and comm",
                   "Health",
                   "Environment",
                   "Rights",
                   "Choice",
                   "Inclusiveness",
                   "Advanced ed",
                   "Infectious",
                   "Child mortality",
                   "Stunting",
                   "Maternal mortality",
                   "Undernourishment",
                   "Improved sanitation",
                   "Improved water",
                   "Hygeine deaths",
                   "Pollution deaths",
                   "Housing",
                   "Electricity",
                   "Clean fuels",
                   "Personal violence deaths",
                   "Transport deaths",
                   "Criminality",
                   "Political torture killings",
                   "Women no education",
                   "Equal education access",
                   "Primary school enrollment",
                   "Secondary attainment",
                   "Gender gap secondary",
                   "Online governance",
                   "Internet users",
                   "Media",
                   "Cellphone",
                   "Life expectancy",
                   "Premature deaths",
                   "Healthcare",
                   "Essential services",
                   "Outdoor pollution",
                   "Lead exposure",
                   "Particulate",
                   "Species",
                   "Justice",
                   "Expression",
                   "Religion",
                   "Political rights",
                   "Property",
                   "Contraception",
                   "Corruption",
                   "Early marriage",
                   "Youth nonemployed",
                   "Vulnerable",
                   "Equal power gender",
                   "Equal power social",
                   "Equal power socioeconomic",
                   "Discrimination violence",
                   "LGBT",
                   "Citable docs",
                   "Academic",
                   "Women advanced ed",
                   "Tertiary",
                   "Quality unis")

# I also want to only look at countries that have an official ranking.

SPI <- SPI %>%
  filter(`Status` == "Ranked")

# For aesthetic purposes, I want to arrange the data by year, then by country.
SPI %>%
  arrange(`Year`, `Country`)
# A tibble: 1,848 × 74
    Rank Country    `Country code`  Year Status   SPI Needs Wellbeing
   <dbl> <chr>      <chr>          <dbl> <chr>  <dbl> <dbl>     <dbl>
 1    56 Albania    ALB             2011 Ranked  69.4  82.8      69.1
 2    88 Algeria    DZA             2011 Ranked  62.2  80.5      55.5
 3   155 Angola     AGO             2011 Ranked  39.7  39.6      41.3
 4    43 Argentina  ARG             2011 Ranked  76.0  84.4      71.2
 5    67 Armenia    ARM             2011 Ranked  67.2  85.1      62.2
 6     5 Australia  AUS             2011 Ranked  89.6  94.9      89.0
 7    18 Austria    AUT             2011 Ranked  86.1  95.1      84.4
 8   102 Azerbaijan AZE             2011 Ranked  58.2  81.1      54.7
 9    80 Bahrain    BHR             2011 Ranked  64.3  85.1      62.5
10   122 Bangladesh BGD             2011 Ranked  50.3  63        42.5
# … with 1,838 more rows, and 66 more variables: Opportunity <dbl>,
#   `Nutrition and care` <dbl>, Sanitation <dbl>, Shelter <dbl>,
#   Safety <dbl>, `Access knowledge` <dbl>, `Info and comm` <dbl>,
#   Health <dbl>, Environment <dbl>, Rights <dbl>, Choice <dbl>,
#   Inclusiveness <dbl>, `Advanced ed` <dbl>, Infectious <dbl>,
#   `Child mortality` <dbl>, Stunting <dbl>,
#   `Maternal mortality` <dbl>, Undernourishment <dbl>, …

Potential Research Questions

This dataset is logitudinal and contains a wide variety of information about the countries of the world. Some potential questions are:

Data taken from: https://www.socialprogress.org

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kimble (2022, April 11). Data Analytics and Computational Social Science: Kimble HW 3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkkimble887511/

BibTeX citation

@misc{kimble2022kimble,
  author = {Kimble, Karen},
  title = {Data Analytics and Computational Social Science: Kimble HW 3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkkimble887511/},
  year = {2022}
}