Data Analytics and Computational Social Science: Kimble HW 3

Karen Kimble

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readxl)
library(dplyr)

The dataset I have chosen for the final project is the Social Progress Index report containing data from 2011-2021. The mission of the Social Progress Index is to measure if people have what they need to adequately support their well-being and floruish in society. They look at if people have their basic needs met, are well-nourished, feel safe, are discriminated against, etc. There are a lot of variables within this dataset, all part of three overarching categories: Basic Human Needs, Foundations of Wellbeing, and Opportunity. The categories’ scores are the averages of all their components, and the overall Social Progress score for each country is the average of the three.

Basic Human Needs:

Nutrition and Basic Medical Care (numeric): Average of its components.
- Undernourishment (numeric): The probability that a random individual from the population consumes an amount of calories that is insufficient for a healthy life.
- Maternal mortality rate (numeric): Maternal deaths per 100,000 livebirths in women aged 10-54.
- Child mortality rate (numeric): Probability of dying between birth and 5 years old, per 1,000 live births.
- Child stunting (numeric): Prevelance of stunting in children under 5, as measured by the summary exposure value for child stunting, on scale from 0% to 100%.
- Deaths from infectious diseases (numeric): Age-standardized mortality rate from deaths from infectious diseases per 100,000 people.
Water and Sanitation (numeric): Average of its components.
- Unsafe water, sanitation, and hygiene attributable deaths (numeric): Age- standardized death rate attributable to these factors per 100,000 people.
- Access to improved water source (numeric): Proportion of population with access to improved water sources.
- Access to improved sanitation (numeric): Proportion of population with access to improved toilet types.
Shelter (numeric): Average of its components.
- Access to electricity (numeric): Percentage of population with access to electricity.
- Household air pollution attributable deaths (numeric): Age-standardized deaths from household air pollution from solid fuels per 100,000.
- Usage of clean fuels and technology for cooking (numeric): The proportion of population primarily using clean cooking fuels and technologies for cooking.
- Dissatisfaction with housing affordability (numeric): Percentage of respondents that answered “no” to the question, “In the city or area where you live, are you satisfied or dissatisfied with the availability of good, affordable housing?”.
Personal safety (numeric): Average of its components.
- Deaths from interpersonal violence (numeric): Age-standardized deaths rate (per 100,000 people) from interpersonal violence, defined as death or disability from intentional use of physical force or power, threatened or actual, from another civilian person or group.
- Political killings and torture (ordinal): Physical violence index scaled 0 to 1 that is based on indicators that reflect violence committed by government agents and that are not directly referring to elections.
- Transportation related fatalities (numeric): Age-standardized rate of deaths per 100,000 people due to injuries related to transportation.
- Perceived criminality (ordinal): Measured on a scale of 1 (majority of other citizens can be trusted; very low levels of domestic security) to 5 (very high level of distrust; people are extremely cautious in their dealings with others; large number of gated communities, high prevalence of security guards).

Foundations of Wellbeing

Access to Basic Knowledge (numeric): Average of its components.
- Women with no schooling (numeric): Proportion of women (age-standardized) with no schooling.
- Primary school enrollment (numeric): Percentage of the total population of official primary school age that are actually enrolled in any level of education.
- Secondary school attainment (numeric): Percent of the population ages 25 and older with at least some secondary education.
- Gender parity in secondary attainment (numeric): The absolute deviation from parity (=1) in secondary education attainment of women and men.
- Equal access to quality education (ordinal): Country experts’ aggregated evaluation of the question, “To what extent is high quality basic education guaranteed to all, sufficient to enable them to exercise their basic rights as adult citizens?” measured on a scale of 0 (Unequal) to 4 (Equal).
Access to Information and Communications (numeric): Average of its components.
- Mobile telephone subscriptions (numeric): The number of mobile telephone subscriptions per 100 inhabitants.
- Internet users (numeric): The estimated number of Internet users out of the total population.
- Access to online governance (numeric): The availability of e-participation tools on national government portals.
- Media censorship (ordinal): Country experts’ aggregated evaluation of the question, “Does the government directly or indirectly attempt to censor the print or broadcast media?” measured on a scale of 0 (direct and routine attempts) to 4 (attempts are rare).
Health and Wellness (numeric): Average of its components.
- Life expectancy at 60 (numeric): The average number of years that a person of 60 to 64 years old could expect to live.
- Premature deaths from non-communicable diseases (numeric): Mortality rate among people aged 30-70 from non-communicable diseases.
- Access to essential services (numeric): The universal health coverage (UHC) index measures the coverage of 9 tracer interventions and risk-standardized death rates from 32 causes amenable to personal healthcare.
- Equal access to quality healthcare (ordinal): Country experts’ aggregated evaluation of the question, “To what extent is high quality basic healthcare guaranteed to all, sufficient to enable them to exercise their basic political rights as adult citizens?” measured on a scale of 0 (Extreme) to 4 (Equal).
Environmental Quality (numeric): Average of all its components.
- Outdoor air pollution attributable deaths (numeric): The number of deaths resulting from ambient particulate matter pollution per 100,000 people, age- adjusted.
- Deaths from lead exposure (numeric): Age-standardized death rate from lead exposure (per 100,000 people).
- Particulate matter pollution (numeric): Population-weighted mean levels of annual exposure to suspended particles.
- Species protection (ordinal): An index of how well a country’s terrestrial protected areas overlap with the ranges of its vertebrate, invertebrate, and plant species. A score of 100 indicates full coverage of all species’ ranges by a country’s protected areas, and a score of 0 indicates no overlap.

Opportunity

Personal Rights (numeric): Average of all its components.
- Political rights (ordinal): An evaluation of three subcategories of political rights: electoral process, political pluralism and participation, and functioning of government on a scale from 0 (no political rights) to 40 (full political rights).
- Freedom of expression (ordinal): Country experts’ aggregated evaluation of the question, “To what extent does government respect press & media freedom, the freedom of ordinary people to discuss political matters at home and in the public sphere, as well as the freedom of academic and cultural expression?” on a scale of 0 (no freedom) to 4 (full freedom).
- Freedom of religion (ordinal): Country experts’ aggregated evaluation of the question, “Is there freedom of religion?” measured on a scale of 0 (hardly any) to 4 (full freedom).
- Access to justice (ordinal): Country experts’ aggregated evaluation of the question, “Do citizens enjoy secure and effective access to justice?” converted to a scale of 0 (access is nonexistent) to 1 (access is almost always observed).
- Property rights for women (ordinal): Country experts’ aggregated evaluation of the question, “Do women enjoy the right to private property?” measured on a scale of 0 (not at all) to 5 (yes).
Personal Freedom and Choice (numeric): Average of all its components.
- Vulnerable employment (numeric): Contributing family workers and own-account workers as a percentage of total employment.
- Early marriage (numeric): The percentage of women aged 15-19 years who are married or in-union.
- Satisfied demand for contraception (numeric): The percentage of total demand for family planning among married or in-union women aged 15 to 49.
- Perception of corruption (ordinal): The perceived level of public sector corruption based on expert opinion, measured on a scale from 0 (highly corrupt) to 100 (very clean).
- Young people not in education, employment, or training (numeric): The proportion of youth (15-24) who are not in employment and not in education or training.
Inclusiveness (numeric): Average of all its components.
- Acceptance of gays and lesbians (numeric): The percentage of respondents answering yes to the question, “Is the city or area where you live a good place or not a good place to live for gay or lesbian people?”
- Discrimination and violence against minorities (ordinal): Discrimination, powerlessness, ethnic violence, communal violence, sectarian violence, and religious violence, measured on a scale on 0 (low pressures) to 10 (very high pressures).
- Equality of political power by gender (ordinal): Country experts’ aggregated evaluation of the question, “Is political power distributed according to gender?” measured on a scale of 0 (men have monopoly) to 4 (roughly equal).
*Equality of political power by socioeconomic position (ordinal): Country experts’ aggregated evaluation of the question, “Is political power distributed according to socioeconomic position?” measured on a scale of 0 (wealthy monopoly) to 4 (roughly equal).
- Equality of political power by social group (ordinal): Country experts’ aggregated evaluation of the question, “Is political power distributed according to social groups (defined by caste, ethnicity, language, race, religion or some combination thereof)?” measured on a scale of 0 (monopolized by a social group that’s a minority of the pop.) to 4 (roughly equal).
Access to Advanced Education (numeric): Average of its components.
- Quality weighted universities (numeric): The number of universities in a country weighted by the quality of universities, measured by university rankings.
- Expected years of tertiary schooling (numeric): Number of years a person of tertiary school entrance age can expect to spend within tertiary education.
- Women with advanced education (numeric): Proportion of females (age- standardized) with 12–18 years of education.
- Citable documents (numeric): Citable documents - articles, reviews and conference papers - per 1,000 population.
- Academic freedom (ordinal): Aggregated evaluation of the question, “To what extent is academic freedom respected?”, measured on a scale of 0 to 1.

As you can see, there are a large number of variables with differet indicators for society. For the purposes of my final paper, I will primarily be focusing on the main indicators of each section: Nutrition and Basic Medical Care, Water and Sanitation, Shelter, Personal Safety, Access to Knowledge, Access to Info/ Communications, Health and Wellness, Environmental Quality, Personal Rights, Personal Freedom/Choice, Inclusiveness, and Access to Advanced Education.

SPI <- read_excel("Social Progress Index.xlsx", sheet = "2011-2021 data")
head(SPI)

# A tibble: 6 × 76
  `SPI Rank` Country `SPI country code` `SPI \r\nyear` Status
       <dbl> <chr>   <chr>                       <dbl> <chr> 
1         NA World   WWW                          2021 <NA>  
2         NA World   WWW                          2020 <NA>  
3         NA World   WWW                          2019 <NA>  
4         NA World   WWW                          2018 <NA>  
5         NA World   WWW                          2017 <NA>  
6         NA World   WWW                          2016 <NA>  
# … with 71 more variables: `Social Progress Index` <dbl>,
#   `Basic Human Needs` <dbl>, `Foundations of Wellbeing` <dbl>,
#   Opportunity <dbl>, ...10 <lgl>,
#   `Nutrition and Basic Medical Care` <dbl>,
#   `Water and Sanitation` <dbl>, Shelter <dbl>,
#   `Personal Safety` <dbl>, `Access to Basic Knowledge` <dbl>,
#   `Access to Information and Communications` <dbl>, …

Cleaning Data

There is not much that needs cleaning within this dataset as the Social Progress Index website publishes it relatively clean (especially for an Excel sheet). All I need to do is take out two blank columns and name the variables.

SPI$...10 <- NULL
SPI$...23 <- NULL

colnames(SPI) <- c("Rank",
                   "Country",
                   "Country code",
                   "Year",
                   "Status",
                   "SPI",
                   "Needs",
                   "Wellbeing",
                   "Opportunity",
                   "Nutrition and care",
                   "Sanitation",
                   "Shelter",
                   "Safety",
                   "Access knowledge",
                   "Info and comm",
                   "Health",
                   "Environment",
                   "Rights",
                   "Choice",
                   "Inclusiveness",
                   "Advanced ed",
                   "Infectious",
                   "Child mortality",
                   "Stunting",
                   "Maternal mortality",
                   "Undernourishment",
                   "Improved sanitation",
                   "Improved water",
                   "Hygeine deaths",
                   "Pollution deaths",
                   "Housing",
                   "Electricity",
                   "Clean fuels",
                   "Personal violence deaths",
                   "Transport deaths",
                   "Criminality",
                   "Political torture killings",
                   "Women no education",
                   "Equal education access",
                   "Primary school enrollment",
                   "Secondary attainment",
                   "Gender gap secondary",
                   "Online governance",
                   "Internet users",
                   "Media",
                   "Cellphone",
                   "Life expectancy",
                   "Premature deaths",
                   "Healthcare",
                   "Essential services",
                   "Outdoor pollution",
                   "Lead exposure",
                   "Particulate",
                   "Species",
                   "Justice",
                   "Expression",
                   "Religion",
                   "Political rights",
                   "Property",
                   "Contraception",
                   "Corruption",
                   "Early marriage",
                   "Youth nonemployed",
                   "Vulnerable",
                   "Equal power gender",
                   "Equal power social",
                   "Equal power socioeconomic",
                   "Discrimination violence",
                   "LGBT",
                   "Citable docs",
                   "Academic",
                   "Women advanced ed",
                   "Tertiary",
                   "Quality unis")

# I also want to only look at countries that have an official ranking.

SPI <- SPI %>%
  filter(`Status` == "Ranked")

# For aesthetic purposes, I want to arrange the data by year, then by country.
SPI %>%
  arrange(`Year`, `Country`)

# A tibble: 1,848 × 74
    Rank Country    `Country code`  Year Status   SPI Needs Wellbeing
   <dbl> <chr>      <chr>          <dbl> <chr>  <dbl> <dbl>     <dbl>
 1    56 Albania    ALB             2011 Ranked  69.4  82.8      69.1
 2    88 Algeria    DZA             2011 Ranked  62.2  80.5      55.5
 3   155 Angola     AGO             2011 Ranked  39.7  39.6      41.3
 4    43 Argentina  ARG             2011 Ranked  76.0  84.4      71.2
 5    67 Armenia    ARM             2011 Ranked  67.2  85.1      62.2
 6     5 Australia  AUS             2011 Ranked  89.6  94.9      89.0
 7    18 Austria    AUT             2011 Ranked  86.1  95.1      84.4
 8   102 Azerbaijan AZE             2011 Ranked  58.2  81.1      54.7
 9    80 Bahrain    BHR             2011 Ranked  64.3  85.1      62.5
10   122 Bangladesh BGD             2011 Ranked  50.3  63        42.5
# … with 1,838 more rows, and 66 more variables: Opportunity <dbl>,
#   `Nutrition and care` <dbl>, Sanitation <dbl>, Shelter <dbl>,
#   Safety <dbl>, `Access knowledge` <dbl>, `Info and comm` <dbl>,
#   Health <dbl>, Environment <dbl>, Rights <dbl>, Choice <dbl>,
#   Inclusiveness <dbl>, `Advanced ed` <dbl>, Infectious <dbl>,
#   `Child mortality` <dbl>, Stunting <dbl>,
#   `Maternal mortality` <dbl>, Undernourishment <dbl>, …

Potential Research Questions

This dataset is logitudinal and contains a wide variety of information about the countries of the world. Some potential questions are:

Which countries have had the most improvement in gender equality since the beginning of the SPI’s data (2011)?
Are countries with better-ranked gender equality more or less likely to have better-ranked LGBT acceptance? What about rankings related to marginalized communities?
Are Western countries systematically ranked higher than others?
Are there any correlations between health indicators and social indicators?

Data taken from: https://www.socialprogress.org

Comment on this article Share:

Kimble HW 3

Basic Human Needs:

Foundations of Wellbeing

Opportunity

Cleaning Data

Potential Research Questions

Reuse

Citation