DACSS 601 Final

This is my final project for DACSS 601

Karen Kimble
2022-05-05

Introduction

The dataset I have chosen for the final project is the Social Progress Index report containing data from 2011-to 2021. The mission of the Social Progress Index is to measure if people have what they need to adequately support their well-being and flourish in society. They look at if people have their basic needs met, are well-nourished, feel safe, are discriminated against, etc. There are a lot of variables within this dataset, all part of three overarching categories: Basic Human Needs, Foundations of Wellbeing, and Opportunity. The categories’ scores are the averages of all their components, and the overall Social Progress score for each country is the average of the three. I think it’s important to look at what each variable means to get a sense of what calculations and factors are being entered into the dataset.

The Dataset

Basic Human Needs:

Foundations of Wellbeing

Opportunity

Reading in the Dataset & Cleaining

SPI <- read_excel("/Users/karenkimble/Documents/R Practice/Social Progress Index.xlsx", sheet = "2011-2021 data")

SPI$...10 <- NULL
SPI$...23 <- NULL

colnames(SPI) <- c("Rank",
                   "Country",
                   "Code",
                   "Year",
                   "Status",
                   "SPI",
                   "Needs",
                   "Wellbeing",
                   "Opportunity",
                   "Nutrition/care",
                   "Sanitation",
                   "Shelter",
                   "Safety",
                   "Access-knowledge",
                   "Info-comm",
                   "Health",
                   "Environment",
                   "Rights",
                   "Choice",
                   "Inclusiveness",
                   "Advanced-ed",
                   "Infectious",
                   "Child mortality",
                   "Stunting",
                   "Maternal-mortality",
                   "Undernourishment",
                   "Improved-sanitation",
                   "Improved-water",
                   "Hygeine-deaths",
                   "Pollution-deaths",
                   "Housing",
                   "Electricity",
                   "Clean-fuels",
                   "Personal-violence-deaths",
                   "Transport",
                   "Criminality",
                   "Political-killings",
                   "Women-no-education",
                   "Education-access",
                   "Primary-enrollment",
                   "Secondary-attainment",
                   "Gender-gap-secondary",
                   "Online-governance",
                   "Internet-users",
                   "Media",
                   "Cellphone",
                   "Life-expectancy",
                   "Premature-deaths",
                   "Healthcare",
                   "Essential-services",
                   "Pollution",
                   "Lead",
                   "Particulate",
                   "Species",
                   "Justice",
                   "Expression",
                   "Religion",
                   "Political-rights",
                   "Property",
                   "Contraception",
                   "Corruption",
                   "Early-marriage",
                   "Youth-nonemployed",
                   "Vulnerable",
                   "Equal-gender",
                   "Equal-social",
                   "Equal-socioeconomic",
                   "Discrimination-violence",
                   "LGBT",
                   "Citable-docs",
                   "Academic",
                   "Women-advanced",
                   "Tertiary",
                   "Quality-unis")
head(SPI)
# A tibble: 6 × 74
   Rank Country Code   Year Status   SPI Needs Wellbeing Opportunity
  <dbl> <chr>   <chr> <dbl> <chr>  <dbl> <dbl>     <dbl>       <dbl>
1    NA World   WWW    2021 <NA>    65.1  74.2      64.4        56.5
2    NA World   WWW    2020 <NA>    64.7  73.8      64.5        55.8
3    NA World   WWW    2019 <NA>    64.7  73.3      64.5        56.1
4    NA World   WWW    2018 <NA>    64.0  73.0      63.2        55.8
5    NA World   WWW    2017 <NA>    63.7  72.6      62.4        55.9
6    NA World   WWW    2016 <NA>    63.1  72.1      61.5        55.8
# … with 65 more variables: `Nutrition/care` <dbl>, Sanitation <dbl>,
#   Shelter <dbl>, Safety <dbl>, `Access-knowledge` <dbl>,
#   `Info-comm` <dbl>, Health <dbl>, Environment <dbl>, Rights <dbl>,
#   Choice <dbl>, Inclusiveness <dbl>, `Advanced-ed` <dbl>,
#   Infectious <dbl>, `Child mortality` <dbl>, Stunting <dbl>,
#   `Maternal-mortality` <dbl>, Undernourishment <dbl>,
#   `Improved-sanitation` <dbl>, `Improved-water` <dbl>, …

Research Question

As you can see, there are a large number of variables with different indicators for society. For the purposes of my final paper, I will primarily be focusing on the main indicators of each section: Nutrition and Basic Medical Care, Water and Sanitation, Shelter, Personal Safety, Access to Knowledge, Access to Info/Communications, Health and Wellness, Environmental Quality, Personal Rights, Personal Freedom/Choice, Inclusiveness, and Access to Advanced Education.

Because of the sheer amount of variables within this dataset, I will only be focusing on one category of the SPI’s three major categories: Foundations of Wellbeing. The other two categories, Basic Needs and Opportunity, are still important and should be analyzed. However, I am primarily interested in the Foundations of Wellbeing category, which includes indicators related to access to knowledge and infrastructure as well as health, because it may be interesting to see if countries generally viewed as more “free” and democratic will do well in those categories (such as the United States or some European Union countries). There are still a lot of variables condensed into the Foundations of Wellbeing category, so I will analyze the main variables that are computed using their sub-categories. Those variables are Access to Basic Knowledge, Access to Information and Communications, Health and Wellness, and Environmental Quality.

Access to Basic Knowledge, as shown above, is made up of many variables related to the quality of education, educational attainment, and equal access to education. The Health and Wellness category consists of life expectancy, death rate, and access to healthcare or other services. Lastly, Environmental Quality is based on pollution levels, species protection, and lead exposure deaths.

  1. Have the average worldwide scores for the Foundations of Wellbeing categories improved over time? What categories have improved the most or the least? What about overall Wellbeing?

  2. How do the largest countries from each continent compare when it comes to Wellbeing?

  3. Do countries that have higher Foundations of Wellbeing scores have higher scores in the other major categories? How do those scores relate to rank?

Dataset Characteristics

I want to look at the difference between average scores for the categories in 2011 and 2021 to see if there are any changes over that period.

# Filtering the data

SPI_2011 <- SPI %>%
  filter(Year == "2011")

SPI_2021 <- SPI %>%
  filter(Year == "2021")

Wellbeing

#2011

summary(SPI_2011$Wellbeing)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  31.25   46.41   59.12   59.78   71.14   89.06      34 
sd(SPI_2011$Wellbeing, na.rm=TRUE)
[1] 16.17586
#2021

summary(SPI_2021$Wellbeing)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  34.17   55.56   67.45   67.83   78.70   93.80      33 
sd(SPI_2021$Wellbeing, na.rm=TRUE)
[1] 15.31456

In both 2011 and 2021, the median and mean Wellbeing scores are the same, showing that the data is not skewed very much. There was an overall improvement in Wellbeing, but not all countries improved the same amount since the standard deviation decreased in 2021. The minimum scores also did not increase as much as the median and mean scores, only improving about 3 points compared to the median’s improvement of 8. The maximum scores also did not improve much.

Access to Basic Knowledge

#2011

summary(SPI_2011$'Access-knowledge')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  15.27   54.26   75.25   70.97   89.86   98.93      34 
sd(SPI_2011$'Access-knowledge', na.rm=TRUE)
[1] 22.02836
#2021

summary(SPI_2021$'Access-knowledge')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  23.14   61.66   79.32   74.93   91.14   99.51      33 
sd(SPI_2021$'Access-knowledge', na.rm=TRUE)
[1] 19.34884

The summary statistics show that Access to Basic Knowledge was much more varied in 2011 than in 2021 because the standard deviation was higher in 2011. Additionally, the minimum score increased by 8 between 2011 and 2021, meaning that countries that have worse Access to Knowledge scores still had an increase since 2011.

Access to Information and Communications

#2011
summary(SPI_2011$'Info-comm')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   0.28   24.45   43.27   44.09   60.05   90.30      33 
sd(SPI_2011$'Info-comm', na.rm=TRUE)
[1] 22.40024
#2021
summary(SPI_2021$'Info-comm')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   5.67   51.07   70.28   66.49   82.52   98.18      32 
sd(SPI_2021$'Info-comm', na.rm=TRUE)
[1] 20.74732

In the Access to Information and Communications category, scores also improved overall worldwide between 2011 and 2021. The median score increased dramatically, from 43 in 2011 to 70 in 2021. However, minimum scores did not increase the same amount (from 0.28 in 2011 and 6 in 2021), showing that some countries (or an outlier) were lagging behind. It might be an outlier since the first quartile is 51. The mean increased dramatically as well, from 44 to 66. The standard deviation decreased, however, showing that overall countries were closer together in scores in 2021 than in 2011.

Health and Wellness

#2011

summary(SPI_2011$Health)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  15.19   45.06   59.26   58.81   70.03   89.20      32 
sd(SPI_2011$Health, na.rm=TRUE)
[1] 16.7191
#2021

summary(SPI_2021$Health)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  21.03   49.54   62.61   62.28   73.29   92.10      31 
sd(SPI_2021$Health, na.rm=TRUE)
[1] 16.01574

For the Health and Wellness score, the median increased from 2011 to 2021 from 59 to 62. The mean also increased about the same amount. The minimum and maximum scores also increased, showing that countries improved worldwide. The standard deviation stayed the same, showing that countries’ scores distribution stayed the same.

Environmental Quality

#2011

summary(SPI_2011$Environment)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  22.97   56.66   66.74   65.14   73.70   93.42      30 
sd(SPI_2011$Environment, na.rm=TRUE)
[1] 13.6848
#2021

summary(SPI_2021$Environment)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  23.95   58.66   67.83   67.40   77.78   95.15      29 
sd(SPI_2021$Environment, na.rm=TRUE)
[1] 14.16101

In the Environmental Quality category, countries overall improved the least. The median stayed about the same, only improving about a point. The mean also stayed the same and was the same as the median for both years. The minimums and maximums were the same, too. Out of all the categories, the environment saw the least improvement, if there was any at all.

Visualizations

Average Scores Worldwide by Year

By looking at how the world is doing as a whole from 2011-2021, we can get an idea of what the improvement overall has been like and compare that to individual countries’ progress. These graphs have been changed and now show the standard error bars.

Worldwide Average of Access to Basic Knowledge 2011-2021

avgAK <- summarySE(SPI, measurevar="Access-knowledge", groupvars=c("Year"),
                   na.rm=TRUE)

avgAK %>%
ggplot(aes(x=Year, y=`Access-knowledge`)) +
  geom_errorbar(aes(ymin=`Access-knowledge`-se, ymax=`Access-knowledge`+se),
                width=.1, color="blue") +
  geom_line(color="dark blue") +
  geom_point(color="dark blue") +
  labs(y="Avg Access to Knowledge")

Worldwide Average of Access to Information and Communications 2011-2021

avgIC <- summarySE(SPI, measurevar="Info-comm", groupvars=c("Year"), na.rm=TRUE)

avgIC %>%
ggplot(aes(x=Year, y=`Info-comm`)) +
  geom_errorbar(aes(ymin=`Info-comm`-se, ymax=`Info-comm`+se), width=.1,
                color="red") +
  geom_line(color="dark red") +
  geom_point(color="dark red") +
  labs(y="Avg Info and Communications")

Worldwide Average of Health and Wellness 2011-2021

avgHW <- summarySE(SPI, measurevar="Health", groupvars=c("Year"), na.rm=TRUE)

avgHW %>%
ggplot(aes(x=Year, y=`Health`)) +
  geom_errorbar(aes(ymin=`Health`-se, ymax=`Health`+se), width=.1,
                color="#b4a7d6") +
  geom_line(color="#4d1c7c") +
  geom_point(color="#4d1c7c") +
  labs(y="Avg Health and Wellness")

Worldwide Average Environmental Quality 2011-2021

avgEQ <- summarySE(SPI, measurevar="Environment", groupvars=c("Year"),
                   na.rm=TRUE)

avgEQ %>%
ggplot(aes(x=Year, y=`Environment`)) +
  geom_errorbar(aes(ymin=`Environment`-se, ymax=`Environment`+se), width=.1,
                color="#93c47d") +
  geom_line(color="#274e13") +
  geom_point(color="#274e13") +
  labs(y="Avg Environemtnal Quality")

All of these plots show that there has been improvement across all categories, but not all of them have been consistent and they have all been exponential. Something left out is how each country has improved over the years. I also could have chosen a different metric, such as a median, which can give a different type of insight since means may be skewed due to outliers. Additionally, there aren’t a lot of years included in the dataset compared to the length of human history, so some more historical data could be valuable.

Variation by Country

Since there are a great many countries in the dataset and I don’t want there to be an overcrowded graph, I will select a few countries to look at. I’ll base my selection on the largest countries by population in their respective continent so there is some similarity between them: China, Russia, the United States, Brazil, Nigeria, and Australia.

SPI_Large <- SPI %>%
  filter(`Country` %in% c("China", "Russia", "Brazil", "Nigeria", "Australia",
                          "United States"))

head(SPI_Large)
# A tibble: 6 × 74
   Rank Country   Code   Year Status   SPI Needs Wellbeing Opportunity
  <dbl> <chr>     <chr> <dbl> <chr>  <dbl> <dbl>     <dbl>       <dbl>
1    11 Australia AUS    2021 Ranked  90.3  95.1      90.4        85.3
2    10 Australia AUS    2020 Ranked  90.1  95.1      90.5        84.8
3    12 Australia AUS    2019 Ranked  90    95        90.2        84.8
4    11 Australia AUS    2018 Ranked  89.9  94.6      90.6        84.5
5    10 Australia AUS    2017 Ranked  90.0  95.1      90.2        84.6
6    10 Australia AUS    2016 Ranked  89.8  95.2      89.9        84.5
# … with 65 more variables: `Nutrition/care` <dbl>, Sanitation <dbl>,
#   Shelter <dbl>, Safety <dbl>, `Access-knowledge` <dbl>,
#   `Info-comm` <dbl>, Health <dbl>, Environment <dbl>, Rights <dbl>,
#   Choice <dbl>, Inclusiveness <dbl>, `Advanced-ed` <dbl>,
#   Infectious <dbl>, `Child mortality` <dbl>, Stunting <dbl>,
#   `Maternal-mortality` <dbl>, Undernourishment <dbl>,
#   `Improved-sanitation` <dbl>, `Improved-water` <dbl>, …

By looking at overall rankings over time, there can be a good general idea of how these countries have done in comparison to the others in all indicators, not just a few.

Overall Rankings 2011-2021

(it is important to note that a low rank means the country is doing better than the others and a higher number means it is doing worse)

ggplot(data = SPI_Large, mapping=aes(x = `Year`, y = `Rank`, color = `Country`)) +
  geom_line() +
  facet_wrap(facets = vars(`Country`))

From the above, we can see that Nigeria has consistently ranked very poorly with very little improvement. Brazil had a slightly better-than-middle ranking, but then was suddenly ranked worse in 2017 and continued to trend poorer every year since. China and Russia, on the other hand, seem pretty stagnant with consistent rankings throughout the years–China doing worse than Russia. Australia has the best consistent rankings out of all the countries, while the US was a close second but has started to be ranked poorly in 2015 or so and on. I think it’s interesting to look at these comparisons when thinking about overall rankings because it makes me wonder what is dragging down or boosting up scores for each country. Something left unanswered is what other countries in the same continent are like for rankings, what caused these rankings to drop, and what categories some countries do better in than others. A general view is helpful but does not tell everything.

Wellbeing Scores for Large Countries 2011-2021

ggplot(data = SPI_Large, mapping=aes(x = `Year`, y = `Wellbeing`, color = `Country`)) +
  geom_line() +
  facet_wrap(facets = vars(`Country`))

The country that has consistently done the best in Wellbeing is Australia, with scores near 90 for the entire duration. Nigeria, on the other hand, has done worse than the other countries but has improved since the beginning of the SPI data. Russia and Brazil have stayed towards the middle of the scores, though Russia stagnated towards the end and Brazil had a slight decrease. China has improved and rose from the bottom scores to the middle. The United States has also had slight improvement while staying towards high scores, though Australia has done better overall.

So now I ask the question of if large countries with higher Wellbeing scores also tend to have higher scores in the other major categories.

Basic Human Needs Scores for Large Countries 2011-2021

ggplot(data = SPI_Large, mapping=aes(x = `Year`, y = `Needs`, color = `Country`)) +
  geom_line() +
  facet_wrap(facets = vars(`Country`))

This graph of the Basic Needs category almost mirrors the graph of the Wellbeing category, with some exceptions. Russia is much more stagnant, with slightly higher scores than in Wellbeing. China also started out much higher than it did for Wellbeing, and saw less improvement over time but probably because it had a good starting point. Nigeria’s Basic Needs scores mirrored its Wellbeing scores, towards the law end with not very much improvement. Brazil’s was very different because its Basic Needs score stayed stagnant while its Wellbeing score fluctuated more throughout the duration. Australia was also the same, staying stagnant with higher scores. The United States’ Basic Needs scores were similar to its Wellbeing scores, but had a drop between 2016 and 2019 not present in Wellbeing.

Opportunity Scores for Large Countries 2011-2021

ggplot(data = SPI_Large, mapping=aes(x = `Year`, y = `Opportunity`, color = `Country`)) +
  geom_line() +
  facet_wrap(facets = vars(`Country`))

The graph for Opportunity is different from the Wellbeing and Basic Needs graphs. Australia, unlike the other countries, had relatively the same scores for all three categories and had consistently high scores from 2011-2021. Nigeria’s score was also somewhat the same, remaining low throughout 2011-2021, but did not have as much of an increase as the Wellbeing or Basic Needs scores. Russia had consistently low Opportunity scores, unlike its Wellbeing scores (which increased) and its Basic Needs scores (which were stagnated but higher). China also had very consistently low Opportunity scores while its Wellbeing scores started low but increased dramatically and its Basic Needs scores were relatively high. Brazil was very different from its other two scores. For Wellbeing, Brazil increased, and for Basic Needs, the country was stagnant but relatively high. Its Opportunity scores, however, Brazil stayed stagnant from 2011-2016 then had a sudden dramatic drop. Lastly, the United States had pretty similar scores to its other two graphs, staying high with a slight drop at the end.

Reflection

This class was not my first time using R, but it was my first time using the software so in-depth since in my previous class it was not the main focus. I decided to focus on Wellbeing specifically because it is not a category that seems to be a priority for the United States (as well as many other countries). I really wanted to see if Wellbeing mattered in overall rankings and whether there has been any improvement in the U.S. and worldwide in that area.

I think this dataset in general was very challenging because of its size and how many variables were included. There are so many aspects of the SPI dataset that I simply could not look at or analyze–if given more time and no homework in any other class I would gladly do that. Another huge challenge was making sure that graphs were readable and that I had picked the right wording and format for that variable when writing the code. Learning ggplot and ggplot2 was somewhat difficult but once I passed a certain point of understanding it came more easily.

Something that I would like to do next with this project if I were to continue would be to perform significance tests on whether or not Wellbeing influences country rankings. Also, I would like to further analyze the variables related to gender equality and see how those relate to country rankings.

Conclusion

Comparing the graphs of Basic Needs, Wellbeing, and Opportunity to the original graph of Rankings from 2011 to 2021 for Large Countries can show some interesting results. Australia, with its consistent high scores in all three categories, also had consistently high rankings.

Brazil’s rankings started off stagnant in the high-middle, then dropped dramatically after 2016. This seems to be mostly caused by the sharp drop in the Opportunity scores, as well as in a less-sharp drop in the Wellbeing Score after 2016. Its Basic Needs score stayed the same throughout the period, but this clearly did not impact the rankings as much as the other scores.

China’s rankings, on the other hand, seem to be most informed by its Basic Needs and Opportunity scores–since all three stayed pretty stagnant. However, while Basic Needs did not fluctuate much within the high-middle, Opportunity scores stayed relatively the same on the very low end. China’s stagnant low rankings seem to be informed by this, rather than its Wellbeing score, which started low but had a dramatic increase towards the middle after 2013.

Nigeria had consistently low rankings yet had increases in both Wellbeing and Basic Needs. However, its Opportunity score stayed very low and had a slight decrease after 2019. The progress Nigeria made in the other categories must not have been enough to outweigh the low Opportunity score or possibly greater progress other countries made that pushed its rankings low.

Russia’s rankings fluctuated slightly from 2011 to 2021, but remained in the upper-middle. This is probably due to its Basic Needs and Opportunity scores. Its Basic Needs scores stayed in the upper-middle while the Opportunity scores stayed in the lower middle, somewhat canceling each other out. Its Wellbeing scores, on the other hand, saw a pretty dramatic rise after 2013 but this was clearly not enough to improve its overall rankings.

The United States’ rankings can also be seen in its scores. For Wellbeing, the U.S. had pretty consistent high scores. For Basic Human Needs and Opportunity, however, the country’s scores decreased after 2016. The rankings reflect this, showing still high but relatively lower rankings after 2016 than in 2011.

Overall, it seems that Wellbeing scores are not the best indicator of what a country’s SPI ranking would be. Some countries made a relatively large amount of progress in that category, yet still had low rankings. Additionally, overall worldwide rankings for the Wellbeng categories increased–though Environmental Quality had the least progress out of them. However, it is unclear if these conclusions can be generalized to all other countries instead of just large countries. It’s also unclear what smaller countries’ rankings and scores look like, as well as countries in each continent and each socioeconomic class or GDP ranking.

Source

Data taken from: https://www.socialprogress.org

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kimble (2022, May 11). Data Analytics and Computational Social Science: DACSS 601 Final. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkkimble898672/

BibTeX citation

@misc{kimble2022dacss,
  author = {Kimble, Karen},
  title = {Data Analytics and Computational Social Science: DACSS 601 Final},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkkimble898672/},
  year = {2022}
}