Code
library(tidyverse)
library(readxl)
::opts_chunk$set(echo = TRUE) knitr
Aleacia Messiah
October 13, 2022
# A tibble: 191 × 75
Country Notes `1949` `1950` `1951` `1952` `1953` `1954` `1955` `1956` `1957`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Africa <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 North A… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Algeria §4 xxx xxx xxx xxx xxx xxx xxx xxx xxx
4 Libya द16 xxx xxx ... ... ... ... ... ... ...
5 Morocco §17 xxx xxx xxx xxx xxx xxx xxx 23.71… 35.40…
6 Tunisia <NA> xxx xxx xxx xxx xxx xxx xxx 3.714… 6.411…
7 sub-Sah… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 Angola §‖1 xxx xxx xxx xxx xxx xxx xxx xxx xxx
9 Benin § xxx xxx xxx xxx xxx xxx xxx xxx xxx
10 Botswana <NA> xxx xxx xxx xxx xxx xxx xxx xxx xxx
# … with 181 more rows, and 64 more variables: `1958` <chr>, `1959` <chr>,
# `1960` <chr>, `1961` <chr>, `1962` <chr>, `1963` <chr>, `1964` <chr>,
# `1965` <chr>, `1966` <chr>, `1967` <chr>, `1968` <chr>, `1969` <chr>,
# `1970` <chr>, `1971` <chr>, `1972` <chr>, `1973` <chr>, `1974` <chr>,
# `1975` <chr>, `1976` <chr>, `1977` <chr>, `1978` <chr>, `1979` <chr>,
# `1980` <chr>, `1981` <chr>, `1982` <chr>, `1983` <chr>, `1984` <chr>,
# `1985` <chr>, `1986` <chr>, `1987` <chr>, `1988` <chr>, `1989` <chr>, …
milex_clean <- milex %>%
# remove Notes column
select(!contains("Notes")) %>%
# create a column with region names
mutate(Region = case_when(
str_ends(Country, "Africa") ~ "Africa",
str_detect(Country, "America") ~ "Americas",
str_detect(Country, "Asia & Oceania") ~ "Asia & Oceania",
str_ends(Country, "Europe") ~ "Europe",
str_ends(Country, "East") ~ "Middle East",
TRUE ~ NA_character_
)) %>%
# replace NAs in Region column
fill(Region, .direction = "down") %>%
# create a column with sub region names
mutate(Sub_Region = case_when(
str_detect(Country, "North Africa") ~ "North Africa",
str_detect(Country, "sub-Saharan Africa") ~ "sub-Saharan Africa",
str_detect(Country, "Central America and the Caribbean") ~ "Central America and the Caribbean",
str_detect(Country, "North America") ~ "North America",
str_detect(Country, "South America") ~ "South America",
str_detect(Country, "Oceania") ~ "Oceania",
str_detect(Country, "South Asia") ~ "South Asia",
str_detect(Country, "East Asia") ~ "East Asia",
str_detect(Country, "South East Asia") ~ "South East Asia",
str_detect(Country, "Central Asia") ~ "Central Asia",
str_detect(Country, "Central Europe") ~ "Central Europe",
str_detect(Country, "Eastern Europe") ~ "Eastern Europe",
str_detect(Country, "Western Europe") ~ "Western Europe",
str_ends(Country, "East") ~ "Middle East",
TRUE ~ NA_character_
)) %>%
# replace NAs in Sub Region column
fill(Sub_Region, .direction = "down") %>%
# remove rows with region and sub-region names
slice(-c(1:2, 7, 55:56, 70, 73, 85:86, 91, 98, 105, 117, 123:124, 145, 154, 175)) %>%
# replace "xxx" and "..." with NAs
na_if("xxx") %>%
na_if("...") %>%
# combine Region and Sub Region into one column
unite(Region:Sub_Region, col = "Region & Sub Region", sep = "/" ) %>%
# combine year columns into one column
pivot_longer('1949':'2021', names_to = "Year", values_to = "Military Expenditure in Millions of US$")
# view cleaned dataset
milex_clean
# A tibble: 12,629 × 4
Country `Region & Sub Region` Year `Military Expenditure in Millions of US$`
<chr> <chr> <chr> <chr>
1 Algeria Africa/North Africa 1949 <NA>
2 Algeria Africa/North Africa 1950 <NA>
3 Algeria Africa/North Africa 1951 <NA>
4 Algeria Africa/North Africa 1952 <NA>
5 Algeria Africa/North Africa 1953 <NA>
6 Algeria Africa/North Africa 1954 <NA>
7 Algeria Africa/North Africa 1955 <NA>
8 Algeria Africa/North Africa 1956 <NA>
9 Algeria Africa/North Africa 1957 <NA>
10 Algeria Africa/North Africa 1958 <NA>
# … with 12,619 more rows
[1] 173
# A tibble: 173 × 1
Country
<chr>
1 Algeria
2 Libya
3 Morocco
4 Tunisia
5 Angola
6 Benin
7 Botswana
8 Burkina Faso
9 Burundi
10 Cameroon
# … with 163 more rows
# A tibble: 13 × 1
`Region & Sub Region`
<chr>
1 Africa/North Africa
2 Africa/sub-Saharan Africa
3 Americas/Central America and the Caribbean
4 Americas/North America
5 Americas/South America
6 Asia & Oceania/Oceania
7 Asia & Oceania/South Asia
8 Asia & Oceania/East Asia
9 Asia & Oceania/Central Asia
10 Europe/Central Europe
11 Europe/Eastern Europe
12 Europe/Western Europe
13 Middle East/Middle East
[1] 73
[1] "1949" "2021"
The Current USD data from the SIPRI Milex dataset details the military expenditures in millions of US dollars for 173 countries for the period 1949-2021. There are 191 observations (173 actual observations) and 75 variables (Country
, Notes
, and Years 1949
through 2021
). Observations (countries) were grouped by region and sub-region. There are 5 regions (Africa, Americas, Asia & Oceania, Europe, and Middle East) and 14 sub-regions (North Africa, Sub-Saharan Africa, Central America and the Caribbean, North America, South America, Oceania, South Asia, East Asia, South East Asia, Central Asia, Central Europe, Eastern Europe, Western Europe, and Middle East). Some of the data is missing for several countries in various years due to the data not being available or the country did not exist or was not independent at the time the data was collected, so those values are replaced with NAs.
After cleaning the data, the new dataset has 12,629 observations and 4 variables (Country
, Region & Sub Region
, Year
, Military Expenditure in Millions of US$
). The regions and sub-regions were extracted from the Country
column and combined together into one column to illustrate the various combinations of region and sub-region. The years were also condensed into one column.
Since this data was taken over a large period of time, I can research whether using time series methods would be appropriate to visualize and analyze the trends in the data.
1. What trends are present in the data (e.g. are there certain patterns of military spending over the years)?
2. What are the top spending countries for each year?
3. What is the predicted amount of military expenditures in dollars for year 2022 for each country?
---
title: "Homework 2"
author: "Aleacia Messiah"
description: "Read, Clean, Describe, and Formulate Research Questions from Data"
date: "10/13/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw2
- Aleacia Messiah
- SIPRI_Milex
- tidyverse
- readxl
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE)
```
## Read in Data
```{r}
# read in the Current USD sheet of the Milex dataset, skipping the first 6 rows and renaming the columns
milex <- read_excel("_data/SIPRI-Milex-data-1949-2021.xlsx", sheet = "Current USD", skip = 6, col_names = c("Country", "Notes", 1949:2021))
# view milex data
milex
```
## Clean Data
```{r}
milex_clean <- milex %>%
# remove Notes column
select(!contains("Notes")) %>%
# create a column with region names
mutate(Region = case_when(
str_ends(Country, "Africa") ~ "Africa",
str_detect(Country, "America") ~ "Americas",
str_detect(Country, "Asia & Oceania") ~ "Asia & Oceania",
str_ends(Country, "Europe") ~ "Europe",
str_ends(Country, "East") ~ "Middle East",
TRUE ~ NA_character_
)) %>%
# replace NAs in Region column
fill(Region, .direction = "down") %>%
# create a column with sub region names
mutate(Sub_Region = case_when(
str_detect(Country, "North Africa") ~ "North Africa",
str_detect(Country, "sub-Saharan Africa") ~ "sub-Saharan Africa",
str_detect(Country, "Central America and the Caribbean") ~ "Central America and the Caribbean",
str_detect(Country, "North America") ~ "North America",
str_detect(Country, "South America") ~ "South America",
str_detect(Country, "Oceania") ~ "Oceania",
str_detect(Country, "South Asia") ~ "South Asia",
str_detect(Country, "East Asia") ~ "East Asia",
str_detect(Country, "South East Asia") ~ "South East Asia",
str_detect(Country, "Central Asia") ~ "Central Asia",
str_detect(Country, "Central Europe") ~ "Central Europe",
str_detect(Country, "Eastern Europe") ~ "Eastern Europe",
str_detect(Country, "Western Europe") ~ "Western Europe",
str_ends(Country, "East") ~ "Middle East",
TRUE ~ NA_character_
)) %>%
# replace NAs in Sub Region column
fill(Sub_Region, .direction = "down") %>%
# remove rows with region and sub-region names
slice(-c(1:2, 7, 55:56, 70, 73, 85:86, 91, 98, 105, 117, 123:124, 145, 154, 175)) %>%
# replace "xxx" and "..." with NAs
na_if("xxx") %>%
na_if("...") %>%
# combine Region and Sub Region into one column
unite(Region:Sub_Region, col = "Region & Sub Region", sep = "/" ) %>%
# combine year columns into one column
pivot_longer('1949':'2021', names_to = "Year", values_to = "Military Expenditure in Millions of US$")
# view cleaned dataset
milex_clean
# number of distinct countries
n_distinct(milex_clean$Country)
# list distinct countries
distinct(milex_clean[,1])
# list distinct region and sub-region combinations
distinct(milex_clean[,2])
# number of years
n_distinct(milex_clean$Year)
# range of years
range(milex_clean$Year)
```
## Narrative
The Current USD data from the SIPRI Milex dataset details the military expenditures in millions of US dollars for 173 countries for the period 1949-2021. There are 191 observations (173 actual observations) and 75 variables (`Country`, `Notes`, and Years `1949` through `2021`). Observations (countries) were grouped by region and sub-region. There are 5 regions (Africa, Americas, Asia & Oceania, Europe, and Middle East) and 14 sub-regions (North Africa, Sub-Saharan Africa, Central America and the Caribbean, North America, South America, Oceania, South Asia, East Asia, South East Asia, Central Asia, Central Europe, Eastern Europe, Western Europe, and Middle East). Some of the data is missing for several countries in various years due to the data not being available or the country did not exist or was not independent at the time the data was collected, so those values are replaced with NAs.
After cleaning the data, the new dataset has 12,629 observations and 4 variables (`Country`, `Region & Sub Region`, `Year`, `Military Expenditure in Millions of US$`). The regions and sub-regions were extracted from the `Country` column and combined together into one column to illustrate the various combinations of region and sub-region. The years were also condensed into one column.
## Potential Research Questions
Since this data was taken over a large period of time, I can research whether using time series methods would be appropriate to visualize and analyze the trends in the data.
**1. What trends are present in the data (e.g. are there certain patterns of military spending over the years)?**
**2. What are the top spending countries for each year?**
**3. What is the predicted amount of military expenditures in dollars for year 2022 for each country?**