library(tidyverse)
library(ggplot2)
library(plotly)
library(gapminder)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Final
Intorduction:
ILGA World State-Sponsored Homophobia 2020: Global Legislation Overview Update
About ILGA World
Established in 1978, ILGA World is the International Lesbian, Gay, Bisexual, Trans and Intersex Association. As an international federation made up of over 160 countries, ILGA World campaigns for lesbian, gay, bisexual, trans and intersex human rights.
Through advocacy, research, training and convenings, and global communications, ILGA world envisions a world of assured and established global justice and equity regardless of SOGIESC: Sexual Orientations, Gender Identities, Gender Expressions and Sex Characteristics.
Since 2006, ILGA World has published a global legislation review update on “State-Sponsored Homophobia” which reports a world survey on sexual orientation laws.
The report specifically focuses near exclusively to law and does not comment on issues regarding gender identity, gender expression, or sex-characteristics.
Guiding Ideas
As a member of the LGBTQIA+ community interested pursuing a career in Law, I am interested to learn more about the relationship between legislation and sexual orientation.
What kinds of sexual orientation laws are there?
What is the relationship between legislation, protections, and recogntions of SOGIESC?
Which countries are the most/least protected and recognized for SOGIESC legislation? Why?
Read in Data: State-Sponsored Homophobia 2020: Global Legislation Overview Update
The ILGA_State_Sponsored_Homophobia_2020 data set originally contained 242 Observations and 16 variables. The original data set had its first row as labels that interfered with clear representation of variable names, which could be more clearly represented.
The modified data set has 241 observations and 16 variables. There are three broad categories of variables, Criminalisation, Protection, and Recognition related to sexual orientation legal issues. The observations reports the congregate data of if a country does or does not have (as “Yes”, “No”, and “Limited”) legislation regarding the broad topics.
For my analysis, I decided to focus on the broad categories of Protection and Recognition.
#Original Data
<- read_excel("_data/ILGA_State_Sponsored_Homophobia_2020_dataset.xlsx", skip = 1)
ilga_base
head(ilga_base,6)
# A tibble: 6 × 16
N CN COUNTRY CSSSA…¹ DATE …² MAX P…³ CONST. BROAD…⁴ EMPLOY. HATE …⁵
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 Algeria NO - 2 NO NO NO NO
2 2 2 Angola YES 2021 - NO YES YES YES
3 3 3 Benin YES NEVER … - NO NO NO NO
4 4 4 Botswana YES 2019 - NO NO YES NO
5 5 5 Burkina Fa… YES NEVER … - NO NO NO NO
6 6 6 Burundi NO - 2 NO NO NO NO
# … with 6 more variables: INCITEMENT <chr>, `BAN CONV. THERAPIES` <chr>,
# `SAME SEX MARRIAGE` <chr>, `CIVIL UNIONS` <chr>, `JOINT ADOPTION` <chr>,
# `SECOND PARENT ADOPTION` <chr>, and abbreviated variable names
# ¹`CSSSA LEGAL?`, ²`DATE OF DECRIM`, ³`MAX PENALTY`, ⁴`BROAD PROT.`,
# ⁵`HATE CRIME`
# ℹ Use `colnames()` to see all variable names
Dimensions
dim(ilga_base)
[1] 241 16
Column Names
colnames(ilga_base)
[1] "N" "CN" "COUNTRY"
[4] "CSSSA LEGAL?" "DATE OF DECRIM" "MAX PENALTY"
[7] "CONST." "BROAD PROT." "EMPLOY."
[10] "HATE CRIME" "INCITEMENT" "BAN CONV. THERAPIES"
[13] "SAME SEX MARRIAGE" "CIVIL UNIONS" "JOINT ADOPTION"
[16] "SECOND PARENT ADOPTION"
Data set Specific to Protection and Recognition
Variables Explained (According to ILGA World)
I renamed variables related to Protection and Recognition to better identify the legislation category.
A “Yes”, “No”, or “Limited” value indicates if there is/is not legislation under each variable.
P is added in front of variables pertaining to Protection and R is added to variables pertaining to Recognition.
Protection:
P_Const = Constitutional Protection
- Observes constitutional protections in terms of discrimination against individuals on the basis of their sexual orientation.
P_BroadProt = Broad Protection
- Observes provisions for the penalizing of discrimination based on sexual orientation in employment, health, education, housing and provision of goods and services.
P_Employ = Protection in Employment
- Observes jurisdictions of employment protection for laws that explicitly prohibit employment discrimination on the basis of sexual orientation. Observes topics of unfair dismissal, social security, benefits, and so on.
P_HateCrime = Criminal Liability(Hate Crime Laws)
- Observes legislation that explicitly prohibit hate crimes on the basis of sexual orientation.
P_Incitement = Prohibition of Incitement to Hatred, Violence or Discrimination
- Observes the legislative prohibition of incitement to hatred, violence or discrimination on the basis of sexual orientation.
P_BanConvTherapies = Bans on “Conversion Therapy”
- Observes legislative developments to the legal bans on “conversion therapies”.
Recognition:
R_SameSexMarriage = Same-Sex Marriage
- Observes the legislative recognitions and progress in areas of same-sex marriage equality.
R_CivilUnion = Partnership Recognition for Same-Sex Couples
- Observes legislative recognitions for partners who do not wish to enter the institution of marriages for same-sex couples.
R_JointAdoptions = Adoption by Same-Sex Couple (Joint Adoption)
- Observes the legislative recognitions for same-sex couples’ adoption rights
R_SecondParentAdoption = Adoption by Same-Sex Couple (Second Parent Adoption)
- Observes the legislative recognitions for same-sex couples’ adoption rights
#Data set specific to Protection and Recognition
<- ilga_base%>%
ilga_PR select(N:COUNTRY, "CONST.":"SECOND PARENT ADOPTION")%>%
rename(P_Const = CONST.,
P_BroadProt = "BROAD PROT.",
P_Employ = EMPLOY.,
P_HateCrime = "HATE CRIME",
P_Incitement = INCITEMENT,
P_BanConvTherapies = "BAN CONV. THERAPIES",
R_SameSexMarriage = "SAME SEX MARRIAGE",
R_CivilUnion = "CIVIL UNIONS",
R_JointAdoptions = "JOINT ADOPTION",
R_SecondParentAdoption = "SECOND PARENT ADOPTION" )
#Sanity Check
ilga_PR
# A tibble: 241 × 13
N CN COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵ R_Sam…⁶
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 Algeria NO NO NO NO NO NO NO
2 2 2 Angola NO YES YES YES YES NO NO
3 3 3 Benin NO NO NO NO NO NO NO
4 4 4 Botswana NO NO YES NO NO NO NO
5 5 5 Burkina … NO NO NO NO NO NO NO
6 6 6 Burundi NO NO NO NO NO NO NO
7 7 7 Cameroon NO NO NO NO NO NO NO
8 8 8 Cabo Ver… NO NO YES YES NO NO NO
9 9 9 Central … NO NO NO NO NO NO NO
10 10 10 Chad NO NO NO YES NO NO NO
# … with 231 more rows, 3 more variables: R_CivilUnion <chr>,
# R_JointAdoptions <chr>, R_SecondParentAdoption <chr>, and abbreviated
# variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime, ⁴P_Incitement,
# ⁵P_BanConvTherapies, ⁶R_SameSexMarriage
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Mutating Data set to Contain Continents
The original data specifies the continent name for each country under a large tab with all the countries to the continent.
I want to be able to express data grouped by the continent, so I made a Continent column that would specify the continent for each country.
ILGA World specified 6 Continent labels: Africa, Latin America/The Caribbean, North America, Asia, Europe, and Oceania.
#Specify Continent
<-ilga_PR%>%
ilga_PR_Contmutate(Continent = case_when(
>= 1 & N <= 54 ~ "Africa",
N >= 55 & N <= 87 ~ "Latin America/The Caribbean",
N >= 88 & N <= 89 ~ "North America",
N >= 90 & N <= 131 ~ "Asia",
N >= 132 & N <= 179 ~ "Europe",
N >= 180 & N <= 193 ~ "Oceania")
N )
Bring Continent Column to Front of Data set
After adding a Continent column, I wanted to bring the column to the front of data set.
Note: N is the order of the country in the scheme of all countries. CN is the country number within the Continent the country falls.
#Brings the newly created Continent column to the front
<- ilga_PR_Cont%>%
ilga_PR_Cont select(N, CN, Continent, everything())
#Sanity Check
ilga_PR_Cont
# A tibble: 241 × 14
N CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 Africa Algeria NO NO NO NO NO NO
2 2 2 Africa Angola NO YES YES YES YES NO
3 3 3 Africa Benin NO NO NO NO NO NO
4 4 4 Africa Botswa… NO NO YES NO NO NO
5 5 5 Africa Burkin… NO NO NO NO NO NO
6 6 6 Africa Burundi NO NO NO NO NO NO
7 7 7 Africa Camero… NO NO NO NO NO NO
8 8 8 Africa Cabo V… NO NO YES YES NO NO
9 9 9 Africa Centra… NO NO NO NO NO NO
10 10 10 Africa Chad NO NO NO YES NO NO
# … with 231 more rows, 4 more variables: R_SameSexMarriage <chr>,
# R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
# and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
# ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Treating NA
The data set contained NA values in N and CN, for areas that were broadly presenting as a “territory”.
For my data, I decided to filter out the NA values and focused on the countries assigned a N and CN value.
<- ilga_PR_Cont %>%
ilga_PR_Cont filter(!is.na(N))
ilga_PR_Cont
# A tibble: 193 × 14
N CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 Africa Algeria NO NO NO NO NO NO
2 2 2 Africa Angola NO YES YES YES YES NO
3 3 3 Africa Benin NO NO NO NO NO NO
4 4 4 Africa Botswa… NO NO YES NO NO NO
5 5 5 Africa Burkin… NO NO NO NO NO NO
6 6 6 Africa Burundi NO NO NO NO NO NO
7 7 7 Africa Camero… NO NO NO NO NO NO
8 8 8 Africa Cabo V… NO NO YES YES NO NO
9 9 9 Africa Centra… NO NO NO NO NO NO
10 10 10 Africa Chad NO NO NO YES NO NO
# … with 183 more rows, 4 more variables: R_SameSexMarriage <chr>,
# R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
# and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
# ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Pivoting Data
To better analyze my data, I decided to pivot the data frame. Before, the “Type of measure” (Protection and Recognition) were written out as their own variables (P_Type and R_Type) with a “Yes”, “No”, or “Limited” as their values.
After I pivoted my data longer,P_Type and R_Type fall under the variable Type, with the “Yes”, “No”, and “Limited” values falling under the variable Yes_No_Limited.
#ilga_PR_Cont Pivot longer
<- pivot_longer(ilga_PR_Cont, P_Const:R_SecondParentAdoption, names_to = "Type", values_to = "Yes_No_Limited")
ilga_PIV
#Sanity Check
ilga_PIV
# A tibble: 1,930 × 6
N CN Continent COUNTRY Type Yes_No_Limited
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 1 Africa Algeria P_Const NO
2 1 1 Africa Algeria P_BroadProt NO
3 1 1 Africa Algeria P_Employ NO
4 1 1 Africa Algeria P_HateCrime NO
5 1 1 Africa Algeria P_Incitement NO
6 1 1 Africa Algeria P_BanConvTherapies NO
7 1 1 Africa Algeria R_SameSexMarriage NO
8 1 1 Africa Algeria R_CivilUnion NO
9 1 1 Africa Algeria R_JointAdoptions NO
10 1 1 Africa Algeria R_SecondParentAdoption NO
# … with 1,920 more rows
# ℹ Use `print(n = ...)` to see more rows
Visualizing Data
Number of Yes, No, Limited for Each Continent
I wanted to visualize the Number of “Yes”, “No”, and “Limited” values for each Type by the Continent. I thought that just looking at the number of the “Yes”, “No”, and “Limited” values would help identify which countries did/or did not have protections and recognitions in legislation in regards to state-sponsored homophobia.
I decided to make a bar graph that is facet wrapped by Continent, with the x variable as Type and the fill as Yes, No, Limited. This would allow a graph that showed the total make up of “Yes”, “No”, and “Limited” for each Type for each Continent.
Use of ggplotly: Looking at the r-graph gallery, I found that the graph could be made interactive with the use of the function ggplotly(). To use this function, I installed the packages “gapminder” and “plotly”. ggplotly() served as a useful way to check my data, as the interactions through the tool tip delivered a quick summary of each data point on my graph. I could confirm if the data did/or did not present the way I thought it would.
Use of Hex codes: I wanted to choose the colors in my visualizations, and I learned that I could use the thousands of unique hex codes to specify the colors I wanted.
#pal1 <- c("#5b7f95","#c69214", "#b3b995", "#aca39a", "#41273b", "#881c1c")
# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple),
#Hex code palette
<- c("#5b7f95","#8A1515", "#478335")
pal1
<- ggplot(data = ilga_PIV) +
All_Continent geom_bar(mapping = aes(x = Type, fill = `Yes_No_Limited`)) +
facet_wrap(~ Continent, nrow = 2) +
coord_flip() +
scale_fill_manual(values = pal1)
ggplotly(All_Continent)
My visualization shows a clear representation of the makeup of Yes_No_Limited for each Type under each Continent. The colors make a quick and easy distinction, that otherwise could be difficult to grasp by just looking at the data frame.
However, something that I did not account for was that just looking at the total count of Yes_No_Limited, though useful to compare the makeup of the Type within each Continent, is not the best standard to compare across Continents. Each continent presented by ILGA World, has a different number of COUNTRYs for each Continent.
Just looking at the count of Yes_No_Limited for each Type across each Continent can present a skewed view of the data. For example “North America” seemingly has a fewer number of Yes_No_Limited compared to “Europe”. But that is not to say that the “North America” proportion/percent of “Yes”, “No”, and “Limited” is lesser than “Europe” when looking at the number of countries. What determined the number is number of COUNTRYs in each Continent.
So for my next visualization(s), I aimed to look at the percent/some sort of better distinguished numeric data of “Yes”, “No”, and “Limited” for each Continent.
Treating YES, NO, and LIMITED as Numbers
To compare the percentages of “Yes”, “No”, and “Limited” under Yes_No_Limited for each Continent, I first converted each “Yes”, “No”, and “Limited” to a numeric value.
I gave the values NO = 0, YES = 1, LIMITED = 0.5. I did this because I wanted to create a numeric metric to compare the the number of Protections and Recognition for Continents and Countries in the data frame.
First I recoded just for the P_Type
#recodinng variables for Protection("P")
<- ilga_PR_Cont %>%
ilga_PR_Cont_NumP mutate(across(starts_with("P"),
~recode(., "NO" = 0,"YES" = 1, "LIMITED" = 0.5),
))
#Sanity Check
ilga_PR_Cont_NumP
# A tibble: 193 × 14
N CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
<dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 Africa Algeria 0 0 0 0 0 0
2 2 2 Africa Angola 0 1 1 1 1 0
3 3 3 Africa Benin 0 0 0 0 0 0
4 4 4 Africa Botswa… 0 0 1 0 0 0
5 5 5 Africa Burkin… 0 0 0 0 0 0
6 6 6 Africa Burundi 0 0 0 0 0 0
7 7 7 Africa Camero… 0 0 0 0 0 0
8 8 8 Africa Cabo V… 0 0 1 1 0 0
9 9 9 Africa Centra… 0 0 0 0 0 0
10 10 10 Africa Chad 0 0 0 1 0 0
# … with 183 more rows, 4 more variables: R_SameSexMarriage <chr>,
# R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
# and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
# ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
I did the same for the Recognition
#recoding variables for Protection and Recognition(P and R)
<- ilga_PR_Cont_NumP %>%
ilga_PR_Cont_NumPR mutate(across(starts_with("R"),
~recode(., "NO" = 0,"YES" = 1, "LIMITED" = 0.5),
))
#sanity check
ilga_PR_Cont_NumPR
# A tibble: 193 × 14
N CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
<dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 Africa Algeria 0 0 0 0 0 0
2 2 2 Africa Angola 0 1 1 1 1 0
3 3 3 Africa Benin 0 0 0 0 0 0
4 4 4 Africa Botswa… 0 0 1 0 0 0
5 5 5 Africa Burkin… 0 0 0 0 0 0
6 6 6 Africa Burundi 0 0 0 0 0 0
7 7 7 Africa Camero… 0 0 0 0 0 0
8 8 8 Africa Cabo V… 0 0 1 1 0 0
9 9 9 Africa Centra… 0 0 0 0 0 0
10 10 10 Africa Chad 0 0 0 1 0 0
# … with 183 more rows, 4 more variables: R_SameSexMarriage <dbl>,
# R_CivilUnion <dbl>, R_JointAdoptions <dbl>, R_SecondParentAdoption <dbl>,
# and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
# ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
The final data set ilga_PR_Cont_NumPR represents each “Yes”, “No”, and “Limited” as a number. Since the values are now a double type, I can better make numeric calculations and operations with my data.
Freedom Score For Each Country
With each Yes”, “No”, and “Limited” as a number, I decided to create a column called Freedom_Score.
The Freedom_Score represents the aggregate score of the P_Type and R_Type added. So each number across P_Const:R_SecondParentAdoption is added to single number under the variable Freedom_Score.
#Freedom Score Variable
<- ilga_PR_Cont_NumPR %>%
ilga_PR_Cont_NumPR_FS rowwise() %>%
mutate(Freedom_Score = sum(c_across(P_Const:R_SecondParentAdoption)))
#Sanity Check
ilga_PR_Cont_NumPR_FS
# A tibble: 193 × 15
# Rowwise:
N CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
<dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 Africa Algeria 0 0 0 0 0 0
2 2 2 Africa Angola 0 1 1 1 1 0
3 3 3 Africa Benin 0 0 0 0 0 0
4 4 4 Africa Botswa… 0 0 1 0 0 0
5 5 5 Africa Burkin… 0 0 0 0 0 0
6 6 6 Africa Burundi 0 0 0 0 0 0
7 7 7 Africa Camero… 0 0 0 0 0 0
8 8 8 Africa Cabo V… 0 0 1 1 0 0
9 9 9 Africa Centra… 0 0 0 0 0 0
10 10 10 Africa Chad 0 0 0 1 0 0
# … with 183 more rows, 5 more variables: R_SameSexMarriage <dbl>,
# R_CivilUnion <dbl>, R_JointAdoptions <dbl>, R_SecondParentAdoption <dbl>,
# Freedom_Score <dbl>, and abbreviated variable names ¹P_BroadProt,
# ²P_Employ, ³P_HateCrime, ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Adding the Freedom Score of Each Country for Each Continent
With the Freedom_Score calculated for each COUNTRY, I wanted to create a data set that observed the Freedom_Score for each Continent. I saved this calculation as a Total_Freedom_Score.
<- ilga_PR_Cont_NumPR_FS %>%
Total_FS select(Continent, COUNTRY, Freedom_Score) %>%
group_by(Continent) %>%
summarise(Total_Freedom_Score = sum(Freedom_Score))
#Sanity Check
Total_FS
# A tibble: 6 × 2
Continent Total_Freedom_Score
<chr> <dbl>
1 Africa 23.5
2 Asia 17.5
3 Europe 222.
4 Latin America/The Caribbean 85.5
5 North America 15
6 Oceania 29
colnames(Total_FS)
[1] "Continent" "Total_Freedom_Score"
Visualizing Total Freedom Score
#Palette for Total_Freedom_Score
# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple)
<- c("#CD6155","#EB984E", "#F4D03F", "#52BE80", "#7FB3D5", "#BB8FCE")
pal2
##ggplot calculations
<- ggplot(data = Total_FS) +
All_Cont_FSgeom_bar(mapping = aes(x = Continent, y = `Total_Freedom_Score`, fill = Continent), stat = "identity") +
coord_flip()+
scale_fill_manual(values = pal2) +
theme(legend.position="none")
Total_Freedom_Score (Aggregate Freedom_Score by COUNTRY for Continent)
ggplotly(All_Cont_FS)
This visualization represents Total_Freedom_Score, a data set that observed the Freedom_Score for each Continent.
Continuing Numeric Calculations
With the Total_Freedom_Score values, I continued to calculate the percentages of the Total_Freedom_Score from the total points a COUNTRY could have theoretically gained, under the variable Real_Freedom_Score.
In theory, each COUNTRY can have a maximum of 10 under Freedom_Score ( 6 “Yes” in P_Type + 4 “No” in R_Type = 10).
Thus, each COUNTRY could have a maximum of 10. If each COUNTRY has a maximum of 10 under Freedom_Score, then the Total_Possible_Freedom_Score for each Continent would be a calculation of the number of COUNTRYs multiplied by 10.
I do the calculations as follows:
Finding Number of Country in Each Continent for Reported Data
<- ilga_PR_Cont_NumPR_FS %>%
Continent_FS select(Continent, COUNTRY, Freedom_Score) %>%
group_by(Continent) %>%
count()
<-rename(Continent_FS, Number_Of_Country = n)
Continent_FS
#Sanity Check
Continent_FS
# A tibble: 6 × 2
# Groups: Continent [6]
Continent Number_Of_Country
<chr> <int>
1 Africa 54
2 Asia 42
3 Europe 48
4 Latin America/The Caribbean 33
5 North America 2
6 Oceania 14
Joining Number_of_Country Data set and Total_Freedom_Score to Create Real_Freedom_Score
I joined Continent_FS data set and Total_FS data set to joined_Cont_Total_FS, so that the data set would contain information about the Number_of_Country and Total_Freedom_Score.
I make a Real_Freedom_Score Column by (Total_Freedom_Score/(Number_Of_Country * 10)) * 100
<- full_join(Continent_FS, Total_FS)
joined_Cont_Total_FS
<- joined_Cont_Total_FS %>%
joined_Cont_Total_FS select(Number_Of_Country, Total_Freedom_Score) %>%
mutate(
Total_Possible_Freedom_Score = Number_Of_Country * 10,
Real_Freedom_Score = (Total_Freedom_Score/(Number_Of_Country * 10)) * 100
)
Real_Freedom_Score Observation
Now all the potential types of scores are stored in joined_Cont_Total_FS
joined_Cont_Total_FS
# A tibble: 6 × 5
# Groups: Continent [6]
Continent Number_Of_Country Total_Freedom_…¹ Total…² Real_…³
<chr> <int> <dbl> <dbl> <dbl>
1 Africa 54 23.5 540 4.35
2 Asia 42 17.5 420 4.17
3 Europe 48 222. 480 46.1
4 Latin America/The Caribbean 33 85.5 330 25.9
5 North America 2 15 20 75
6 Oceania 14 29 140 20.7
# … with abbreviated variable names ¹Total_Freedom_Score,
# ²Total_Possible_Freedom_Score, ³Real_Freedom_Score
Pivot Data in joined_Cont_Total_FS
<- joined_Cont_Total_FS %>%
Total_Real_Score group_by(Continent)%>%
select(Continent, Real_Freedom_Score,Total_Freedom_Score) %>%
pivot_longer(Total_Freedom_Score:Real_Freedom_Score, names_to = "Score_Type", values_to = "Score")
I pivoted the all scores Total_Freedom_Score and Real_Freedom_Score to Score_Type and the values to Score to the data set Total_Real_Score
Total_Real_Score
Total_Real_Score
# A tibble: 12 × 3
# Groups: Continent [6]
Continent Score_Type Score
<chr> <chr> <dbl>
1 Africa Total_Freedom_Score 23.5
2 Africa Real_Freedom_Score 4.35
3 Asia Total_Freedom_Score 17.5
4 Asia Real_Freedom_Score 4.17
5 Europe Total_Freedom_Score 222.
6 Europe Real_Freedom_Score 46.1
7 Latin America/The Caribbean Total_Freedom_Score 85.5
8 Latin America/The Caribbean Real_Freedom_Score 25.9
9 North America Total_Freedom_Score 15
10 North America Real_Freedom_Score 75
11 Oceania Total_Freedom_Score 29
12 Oceania Real_Freedom_Score 20.7
Comparing Real_Freedom_Score to Total_Freedom_Score
#Palette for Total_Freedom_Score
# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple)
<- c("#7FB3D5","#CD6155")
pal3
<- ggplot(data = Total_Real_Score) +
Score_All geom_bar(mapping = aes(x = Score_Type, y = Score, fill = Score_Type),stat = "identity") +
facet_wrap(~ Continent, nrow = 2)+
theme(axis.text.x=element_text(angle=25,hjust=1)) +
scale_fill_manual(values = pal3)+
theme(axis.title.x = element_text(margin = margin(t = 45))
)
ggplotly(Score_All)
This visualization compares the Real_Freedom_Score to Total_Freedom_Score of each continent. Looking at just the Total_Freedom_Score, the data can seem like “Europe” heavily outweighs the other Continents. However, when looking at Real_Freedom_Score, we see that “North America” outweighs “Europe”.
Making such observations informs the percent a Continent has achieved of the total Total_Possible_Score. However, it is also important to note the these proportions and calculations are based off of ILGA World’s categorization of different countries and continents. Depending on the allocation of a country to a continent etc. their proportions will change.
In this data set 2 COUNTRYs are allocated to “North America” and 48 COUNTRYs are allocated to “Europe”. It is important to look at the proportion, but also to recognize the distribution and categorization of countries by ILGA WORLD.
Processing Data for Asian COUNTRYs
I did the same process’ and visualizations I did for the Continents, for the COUNTRYs in “Asia”.
Dataframes Specific to Asia with “Yes”, “No”, and “Limited”.
#Dataframe with Yes and NO
<- ilga_PR_Cont %>%
Asia_YN filter( Continent == "Asia")
Asia_YN
# A tibble: 42 × 14
N CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 90 1 Asia Afghan… NO NO NO NO NO NO
2 91 2 Asia Bahrain NO NO NO NO NO NO
3 92 3 Asia Bangla… NO NO NO NO NO NO
4 93 4 Asia Bhutan NO NO NO NO NO NO
5 94 5 Asia Brunei… NO NO NO NO NO NO
6 95 6 Asia Cambod… NO NO NO NO NO NO
7 96 7 Asia China NO NO NO NO NO NO
8 97 8 Asia East T… NO NO NO YES NO NO
9 98 9 Asia India NO NO NO NO NO NO
10 99 10 Asia Indone… NO NO NO NO NO NO
# … with 32 more rows, 4 more variables: R_SameSexMarriage <chr>,
# R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
# and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
# ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Asia Data frame with “Yes”, “No”, and “Limited” as 1, 0, 0.5 and Freedom Score
#Dataframe with Numbers and Freedom Score
<- ilga_PR_Cont_NumPR_FS %>%
Asia_FS filter( Continent == "Asia")
Asia_FS
# A tibble: 42 × 15
# Rowwise:
N CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
<dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 90 1 Asia Afghan… 0 0 0 0 0 0
2 91 2 Asia Bahrain 0 0 0 0 0 0
3 92 3 Asia Bangla… 0 0 0 0 0 0
4 93 4 Asia Bhutan 0 0 0 0 0 0
5 94 5 Asia Brunei… 0 0 0 0 0 0
6 95 6 Asia Cambod… 0 0 0 0 0 0
7 96 7 Asia China 0 0 0 0 0 0
8 97 8 Asia East T… 0 0 0 1 0 0
9 98 9 Asia India 0 0 0 0 0 0
10 99 10 Asia Indone… 0 0 0 0 0 0
# … with 32 more rows, 5 more variables: R_SameSexMarriage <dbl>,
# R_CivilUnion <dbl>, R_JointAdoptions <dbl>, R_SecondParentAdoption <dbl>,
# Freedom_Score <dbl>, and abbreviated variable names ¹P_BroadProt,
# ²P_Employ, ³P_HateCrime, ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Pivot Asia Dataframe
<- Asia_YN %>%
Asia_Piv pivot_longer(P_Const:R_SecondParentAdoption, names_to = "Type", values_to = "Yes_No_Limited")
Asia_Piv
# A tibble: 420 × 6
N CN Continent COUNTRY Type Yes_No_Limited
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 90 1 Asia Afghanistan P_Const NO
2 90 1 Asia Afghanistan P_BroadProt NO
3 90 1 Asia Afghanistan P_Employ NO
4 90 1 Asia Afghanistan P_HateCrime NO
5 90 1 Asia Afghanistan P_Incitement NO
6 90 1 Asia Afghanistan P_BanConvTherapies NO
7 90 1 Asia Afghanistan R_SameSexMarriage NO
8 90 1 Asia Afghanistan R_CivilUnion NO
9 90 1 Asia Afghanistan R_JointAdoptions NO
10 90 1 Asia Afghanistan R_SecondParentAdoption NO
# … with 410 more rows
# ℹ Use `print(n = ...)` to see more rows
Observing Count of “Yes”, “No”, “Limited” for each Country in Asia
# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple),
<- c("#7FB3D5", "#CD6155", "#52BE80" )
pal4
<- ggplot(data = Asia_Piv) +
Asia_Country geom_bar(mapping = aes(x = COUNTRY, fill = `Yes_No_Limited`)) +
scale_fill_manual(values = pal4) +
coord_flip()
ggplotly(Asia_Country)
Observing Freedom_Score for each Country in Asia
<- c( "#52BE80" )
pal4
<- ggplot(Asia_FS, aes(x = COUNTRY, y = Freedom_Score, width=.5))+
Asia_FreedomScore geom_bar(position="dodge", stat="identity") +
coord_flip()
ggplotly(Asia_FreedomScore)
Conclusion/Reflections
Furthering project:
Overall, my project gave an insight to how representing data in different structure can vastly affect the interpretation of the data.As my data did not track information over time, but rather gave a count, the way the count was represented greatly changed in how the data presented.
Years of Decriminalisation to Number of Protection/Recognition
I did not observe the criminalisation sections of the original data frame. The criminalisation data provides data on if same-sex acts are legal, the date of decriminalisation for same-sex acts, and the max penalty or same-sex acts.
Something that can be interesting to observe would be to see if there is a relation between the time a country has decriminalized same-sex acts and the number of protections and recognition it has. I would hypothesize that there is a positive correlation between the years since decriminalization and the number of protections and recognition. However, with specific categories, I do believe that there could be a stagnation due to complacency of governments in legislation to generally appease people through surface legislation. Observing this relationship could give better insight into a potential prediction model between the number of years and the number of protections and recognition.
Between Protection and Recognition
Something I wanted to achieve in my analysis is to compare the protections to recognitions, rather than aggregate them under one Type. However, I could not code an apt data frame which I could use to make a ggplot. In my basic schema I would want protection values on the x axis and recognition values on the y axis for each county/continent. Observing the relationship between protections and recogntions could give better insight into which areas of legislation ought to be focused on/where countries worldwide lack in their legislative measures.
Map Data based on Protections and Recognitions
additionally, creating a map of the different protections and recognitions can give bettwe visual information to which areas of the world have what kinds of protections and recognitions. This kind of representation would abstract sounding data into a quickly absorbable image. For example, if country had more or less protections and recognitions, the country could be shaded darker, lighter, or not at all.
Thoughts:
Throughout this project, the greatest data lesson I learned is flexibility in approaches.
As a near first-time coder, the task of taking on a project seemed very daunting. I was very careful to make sure that all my lines of code “made sense” and that there were no errors. There was a lot of “noise” and confusion for the seemingly endless combinations of approaching a task with a common end.
However, as I decided to just somewhat haphazardly experiment with my data sets and code, there were challenges that arose which ultimately helped me solve my problems of being “too scared” to code. I practiced tracing back my errors and researchig solutions.
As someone who studied linguistics and political science during undergrad, I categorized coding as a topic that was unapproachable and far from my studies. However, this project gave insight into the several intersections between computational and social science studies. I better understand how visualization can be a powerful tool to represent information that is difficult to express in writing.
Planning:
Before I started any code, I physically drew out the types of visualizations I wanted. I had a basic schema of how my data frames would looks like and the types of visualizations I wanted. Though I made some adjustments to the visualizations as I completed my project, having a physical planned drawing definitely better guided my to what I wanted to achieve with code.
Use Country Code
For my data, I manually entered the continents for the Countries, based off the original data frame. However, there is a way to link countries to continents based on certain packages in R, which could produce more accurate results. This could also help account for the NAs in the data set, rather than having to remove them or manually enter each one into the data frame.
Pivoting Data:
One big issue that I had was understanding how to approach my data. I was used to pivoting my data frames for my projects, so I assumed that my data had to be pivoted from the beginning. However, thinking that I had to pivot my data right away created confusion in the way I approached my data and it made it difficult to understand what the pivoted data meant.
After thinking through my process with tutors and classmates, I realized that I did not have to necessarily pivot my data for some of the calculations and observations I wanted to make.
ggplot:
I spent two days trying to figure out my All_Continent data. The ggplot I was making made sense to me, but the visualization that I was being produced did not make sense at all.
All_Continent <- ggplot(data = ilga_PIV) +
geom_bar(mapping = aes(x = Type, fill = Yes_No_Limited
)) +
facet_wrap(~ Continent, nrow = 2) + coord_flip() +
scale_fill_manual(values = pal1)
It turns out that the issue was that I used ” ” around Yes_No_Limited instead of . Something so seemingly small, created created days’ long problems. After learning this error fix after discussion with a classmate, I remembered how there were notes that mentions similar topics about the importance of syntax in the early tutorials. Though I still have much to learn about the deeper mechanics of R, I do better understand the importance of topics building upon each other after completing my project.
Discuss, Discuss, and Discuss
One of the greatest helps of this project was the discussions I had with classmates. Taking the time to verbalize what I exactly wanted, helped me better understand the minutia of what I wanted to achieve with my code, and helped me better organize my thoughts. While helping classmates with their code, I saw how I could apply some of the solutions to their code to my project.
Bibliography
1.
ILGA World: Lucas Ramon Mendos, Kellyn Botha, Rafael Carrano Lelis, Enrique López de la Peña,
Ilia Savelev and Daron Tan, State-Sponsored Homophobia 2020: Global Legislation Overview Update
(Geneva: ILGA, December 2020).
2.
R Core Team (2022). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
URL http://www.R-project.org/.