Class Project Assignment 1

hw1

challenge1

my name

dataset

ggplot2

Author

Matt Eckstein

Published

July 10, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Loading packages and reading in data

Code

library(dplyr)
library(readxl)

data <- read_csv("Eckstein_data/Orthodoxy_and_US_leadership_cleaned.csv")

Rows: 70 Columns: 15
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (5): Country, Estimated 2010
Catholic Population, Estimated 2010
Protest...
dbl (8): bordru, bordua, bordruua, pctox, approve, disapprove, dkref, netapp...
num (2): Estimated 2010
Orthodox Population, totalpop

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

data$pctox <- data$pctox * 100

Plotting probability and cumulative density functions for variables

Code

#Probability distribution of percent Orthodox by country

probability <- data$pctox

barplot(probability, 
        xlab = "Countries by Orthodox Christian Share of Population", 
        col = "steelblue", 
        space = 0, 
        main = "Orthodoxy by Country")

Code

#Probability distribution of net approval by country

probability2 <- data$netapprove

barplot(probability2, 
        xlab = "Net Approval of US World Leadership by Country", 
        col = "steelblue", 
        space = 0, 
        main = "Approval of US Leadership, by Country")

Code

#Cumulative distribution of the percentage of Orthodox Christians by country

plot(ecdf(data$pctox))

Code

#Cumulative distribution of the net approval rating by country of US leadership in the world
plot(ecdf(data$netapprove))

The probability density plot of countries by the shares of their populations that identify as Orthodox Christian is notably right-skewed (and the CDF plot of countries by percent Orthodox Christian has many points clustered at the left edge, near the zero value of the horizontal axis) because the distribution of Orthodoxy throughout the world is highly uneven. Only a few countries, mostly in Eastern Europe, have high percentages of their populations that identify with Orthodoxy, while considerably more countries have small minorities consisting of less than two percent of their populations that identify with Orthodoxy.

On the other hand, the plots of net approval by country of US leadership indicate a more even distribution, with a handful of countries where US leadership is extremely popular, a handful where it is extremely unpopular, and many others in between where public opinion towards US leadership is mixed.

Creating buckets by which to group data, to facilitate the data grouping step

Code

# Create a vector to store the oxbucket values
oxbucket <- character(length(data$pctox))

# Assign values to oxbucket based on pctox
oxbucket[data$pctox < 2] <- "<2%"
oxbucket[data$pctox >= 2 & data$pctox < 5] <- "2-5%"
oxbucket[data$pctox >= 5 & data$pctox < 10] <- "5-10%"
oxbucket[data$pctox >= 10 & data$pctox < 25] <- "10-25%"
oxbucket[data$pctox >= 25 & data$pctox < 50] <- "25-50%"
oxbucket[data$pctox >= 50 & data$pctox < 75] <- "50-75%"
oxbucket[data$pctox >= 75 & data$pctox < 100] <- "75-100%"

# Add the oxbucket variable to the data frame
data$oxbucket <- oxbucket

Grouping data by variables of interest, obtaining mean and standard deviation

Code

avgs <- data %>%
  group_by(bordruua, oxbucket) %>%
  summarize(mean(netapprove),
  sd(netapprove),
  n())

`summarise()` has grouped output by 'bordruua'. You can override using the
`.groups` argument.

Splitting variables by category

Code

border <- avgs %>% dplyr::filter(bordruua == 1)
noborder <- avgs %>% dplyr::filter(bordruua == 0)

Renaming columns

Code

colnames(border) <- c("Border_war", "Percent_orthodox", "mean_net_approval", "SD_net_approval", "number")
colnames(noborder) <- c("Border_war", "Percent_orthodox", "mean_net_approval", "SD_net_approval", "number")

Estimating gaps

Code

gap <- border$mean_net_approval - noborder$mean_net_approval

gap_se <- sqrt(border$SD_net_approval^2 / border$number + noborder$SD_net_approval^2 / noborder$number)

gap_ci_l <- gap - 1.96 * gap_se

gap_ci_u <- gap + 1.96 * gap_se

result <- cbind(border[,-1], noborder[-(1:2)], gap, gap_se, gap_ci_l, gap_ci_u)


print(result, digits = 3)

  Percent_orthodox mean_net_approval SD_net_approval number mean_net_approval
1           10-25%              1.67            11.0      3             -15.4
2             2-5%              3.00              NA      1             -12.0
3            5-10%             20.00              NA      1              23.0
4           50-75%            -73.00              NA      1             -20.0
5          75-100%             15.25            25.8      4             -23.0
6              <2%              1.50            38.8      8               2.5
  SD_net_approval number   gap gap_se gap_ci_l gap_ci_u
1           16.02      5  17.1   9.58    -1.71     35.8
2           12.73      2  15.0     NA       NA       NA
3           54.17      4  -3.0     NA       NA       NA
4            5.57      3 -53.0     NA       NA       NA
5           37.82      4  38.2  22.88    -6.59     83.1
6           34.66     34  -1.0  14.95   -30.30     28.3

Plot

Code

plot(data$pctox,
     data$netapprove,
     type = "p",
     main = "Countries: Percent Orthodox Christian vs. Net Approval of US Leadership",
     cex.main = 1,
     xlab = "Percent Orthodox",
     ylab = "Net Approval of US Leadership",
     col = "steelblue",
     pch = 19)

Regression

Code

model <- lm(data$netapprove ~ data$pctox)
plot2 <- plot(data$netapprove ~ data$pctox)

abline(a = 2.438959, b = -0.1798092)

The plot of countries by the percentages of their populations that identify as Orthodox Christian vs. their levels of net approval of US leadership has a regression line with a slope of -0.1798, demonstrating a slight negative relationship between the shares of countries’ populations that identify as Orthodox and their levels of net approval of US leadership.