Final

final

ilga wolrd

lgbtqia+ politics

SOGIESC

ggplotly()

Final Project Work in Progress

Author

Roy Yoon

Published

August 27, 2022

library(tidyverse)
library(ggplot2)
library(plotly)
library(gapminder)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Intorduction:

ILGA World State-Sponsored Homophobia 2020: Global Legislation Overview Update

About ILGA World

Established in 1978, ILGA World is the International Lesbian, Gay, Bisexual, Trans and Intersex Association. As an international federation made up of over 160 countries, ILGA World campaigns for lesbian, gay, bisexual, trans and intersex human rights.

Through advocacy, research, training and convenings, and global communications, ILGA world envisions a world of assured and established global justice and equity regardless of SOGIESC: Sexual Orientations, Gender Identities, Gender Expressions and Sex Characteristics.

Since 2006, ILGA World has published a global legislation review update on “State-Sponsored Homophobia” which reports a world survey on sexual orientation laws.

The report specifically focuses near exclusively to law and does not comment on issues regarding gender identity, gender expression, or sex-characteristics.

Guiding Ideas

As a member of the LGBTQIA+ community interested pursuing a career in Law, I am interested to learn more about the relationship between legislation and sexual orientation.

What kinds of sexual orientation laws are there?

What is the relationship between legislation, protections, and recogntions of SOGIESC?

Which countries are the most/least protected and recognized for SOGIESC legislation? Why?

Read in Data: State-Sponsored Homophobia 2020: Global Legislation Overview Update

The ILGA_State_Sponsored_Homophobia_2020 data set originally contained 242 Observations and 16 variables. The original data set had its first row as labels that interfered with clear representation of variable names, which could be more clearly represented.

The modified data set has 241 observations and 16 variables. There are three broad categories of variables, Criminalisation, Protection, and Recognition related to sexual orientation legal issues. The observations reports the congregate data of if a country does or does not have (as “Yes”, “No”, and “Limited”) legislation regarding the broad topics.

For my analysis, I decided to focus on the broad categories of Protection and Recognition.

#Original Data
ilga_base <- read_excel("_data/ILGA_State_Sponsored_Homophobia_2020_dataset.xlsx", skip = 1)

head(ilga_base,6)

# A tibble: 6 × 16
      N    CN COUNTRY     CSSSA…¹ DATE …² MAX P…³ CONST. BROAD…⁴ EMPLOY. HATE …⁵
  <dbl> <dbl> <chr>       <chr>   <chr>   <chr>   <chr>  <chr>   <chr>   <chr>  
1     1     1 Algeria     NO      -       2       NO     NO      NO      NO     
2     2     2 Angola      YES     2021    -       NO     YES     YES     YES    
3     3     3 Benin       YES     NEVER … -       NO     NO      NO      NO     
4     4     4 Botswana    YES     2019    -       NO     NO      YES     NO     
5     5     5 Burkina Fa… YES     NEVER … -       NO     NO      NO      NO     
6     6     6 Burundi     NO      -       2       NO     NO      NO      NO     
# … with 6 more variables: INCITEMENT <chr>, `BAN CONV. THERAPIES` <chr>,
#   `SAME SEX MARRIAGE` <chr>, `CIVIL UNIONS` <chr>, `JOINT ADOPTION` <chr>,
#   `SECOND PARENT ADOPTION` <chr>, and abbreviated variable names
#   ¹`CSSSA LEGAL?`, ²`DATE OF DECRIM`, ³`MAX PENALTY`, ⁴`BROAD PROT.`,
#   ⁵`HATE CRIME`
# ℹ Use `colnames()` to see all variable names

Dimensions

dim(ilga_base)

[1] 241  16

Column Names

colnames(ilga_base)

 [1] "N"                      "CN"                     "COUNTRY"               
 [4] "CSSSA LEGAL?"           "DATE OF DECRIM"         "MAX PENALTY"           
 [7] "CONST."                 "BROAD PROT."            "EMPLOY."               
[10] "HATE CRIME"             "INCITEMENT"             "BAN CONV. THERAPIES"   
[13] "SAME SEX MARRIAGE"      "CIVIL UNIONS"           "JOINT ADOPTION"        
[16] "SECOND PARENT ADOPTION"

Data set Specific to Protection and Recognition

Variables Explained (According to ILGA World)

I renamed variables related to Protection and Recognition to better identify the legislation category.

A “Yes”, “No”, or “Limited” value indicates if there is/is not legislation under each variable.

P is added in front of variables pertaining to Protection and R is added to variables pertaining to Recognition.

Protection:

P_Const = Constitutional Protection
- Observes constitutional protections in terms of discrimination against individuals on the basis of their sexual orientation.
P_BroadProt = Broad Protection
- Observes provisions for the penalizing of discrimination based on sexual orientation in employment, health, education, housing and provision of goods and services.
P_Employ = Protection in Employment
- Observes jurisdictions of employment protection for laws that explicitly prohibit employment discrimination on the basis of sexual orientation. Observes topics of unfair dismissal, social security, benefits, and so on.
P_HateCrime = Criminal Liability(Hate Crime Laws)
- Observes legislation that explicitly prohibit hate crimes on the basis of sexual orientation.
P_Incitement = Prohibition of Incitement to Hatred, Violence or Discrimination
- Observes the legislative prohibition of incitement to hatred, violence or discrimination on the basis of sexual orientation.
P_BanConvTherapies = Bans on “Conversion Therapy”
- Observes legislative developments to the legal bans on “conversion therapies”.

Recognition:

R_SameSexMarriage = Same-Sex Marriage
- Observes the legislative recognitions and progress in areas of same-sex marriage equality.
R_CivilUnion = Partnership Recognition for Same-Sex Couples
- Observes legislative recognitions for partners who do not wish to enter the institution of marriages for same-sex couples.
R_JointAdoptions = Adoption by Same-Sex Couple (Joint Adoption)
- Observes the legislative recognitions for same-sex couples’ adoption rights
R_SecondParentAdoption = Adoption by Same-Sex Couple (Second Parent Adoption)
- Observes the legislative recognitions for same-sex couples’ adoption rights

#Data set specific to Protection and Recognition
ilga_PR <- ilga_base%>%
  select(N:COUNTRY, "CONST.":"SECOND PARENT ADOPTION")%>%
  rename(P_Const = CONST.,
         P_BroadProt = "BROAD PROT.",
         P_Employ = EMPLOY.,
         P_HateCrime = "HATE CRIME",
         P_Incitement = INCITEMENT,
         P_BanConvTherapies = "BAN CONV. THERAPIES",
         R_SameSexMarriage = "SAME SEX MARRIAGE",
         R_CivilUnion = "CIVIL UNIONS",
         R_JointAdoptions = "JOINT ADOPTION",
         R_SecondParentAdoption = "SECOND PARENT ADOPTION" )

#Sanity Check
ilga_PR

# A tibble: 241 × 13
       N    CN COUNTRY   P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵ R_Sam…⁶
   <dbl> <dbl> <chr>     <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1     1     1 Algeria   NO      NO      NO      NO      NO      NO      NO     
 2     2     2 Angola    NO      YES     YES     YES     YES     NO      NO     
 3     3     3 Benin     NO      NO      NO      NO      NO      NO      NO     
 4     4     4 Botswana  NO      NO      YES     NO      NO      NO      NO     
 5     5     5 Burkina … NO      NO      NO      NO      NO      NO      NO     
 6     6     6 Burundi   NO      NO      NO      NO      NO      NO      NO     
 7     7     7 Cameroon  NO      NO      NO      NO      NO      NO      NO     
 8     8     8 Cabo Ver… NO      NO      YES     YES     NO      NO      NO     
 9     9     9 Central … NO      NO      NO      NO      NO      NO      NO     
10    10    10 Chad      NO      NO      NO      YES     NO      NO      NO     
# … with 231 more rows, 3 more variables: R_CivilUnion <chr>,
#   R_JointAdoptions <chr>, R_SecondParentAdoption <chr>, and abbreviated
#   variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime, ⁴P_Incitement,
#   ⁵P_BanConvTherapies, ⁶R_SameSexMarriage
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Mutating Data set to Contain Continents

The original data specifies the continent name for each country under a large tab with all the countries to the continent.

I want to be able to express data grouped by the continent, so I made a Continent column that would specify the continent for each country.

ILGA World specified 6 Continent labels: Africa, Latin America/The Caribbean, North America, Asia, Europe, and Oceania.

#Specify Continent
ilga_PR_Cont<-ilga_PR%>%
  mutate(Continent = case_when(
         N >= 1 & N <= 54 ~ "Africa",
         N >= 55 & N <= 87 ~ "Latin America/The Caribbean",
         N >= 88 & N <= 89 ~ "North America",
         N >= 90 & N <= 131 ~ "Asia",
         N >= 132 & N <= 179 ~ "Europe",
         N >= 180 & N <= 193 ~ "Oceania")
  )

Bring Continent Column to Front of Data set

After adding a Continent column, I wanted to bring the column to the front of data set.

Note: N is the order of the country in the scheme of all countries. CN is the country number within the Continent the country falls.

#Brings the newly created Continent column to the front 
ilga_PR_Cont <- ilga_PR_Cont%>%
  select(N, CN, Continent, everything())

#Sanity Check
ilga_PR_Cont

# A tibble: 241 × 14
       N    CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
   <dbl> <dbl> <chr>     <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1     1     1 Africa    Algeria NO      NO      NO      NO      NO      NO     
 2     2     2 Africa    Angola  NO      YES     YES     YES     YES     NO     
 3     3     3 Africa    Benin   NO      NO      NO      NO      NO      NO     
 4     4     4 Africa    Botswa… NO      NO      YES     NO      NO      NO     
 5     5     5 Africa    Burkin… NO      NO      NO      NO      NO      NO     
 6     6     6 Africa    Burundi NO      NO      NO      NO      NO      NO     
 7     7     7 Africa    Camero… NO      NO      NO      NO      NO      NO     
 8     8     8 Africa    Cabo V… NO      NO      YES     YES     NO      NO     
 9     9     9 Africa    Centra… NO      NO      NO      NO      NO      NO     
10    10    10 Africa    Chad    NO      NO      NO      YES     NO      NO     
# … with 231 more rows, 4 more variables: R_SameSexMarriage <chr>,
#   R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
#   and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
#   ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Treating NA

The data set contained NA values in N and CN, for areas that were broadly presenting as a “territory”.

For my data, I decided to filter out the NA values and focused on the countries assigned a N and CN value.

ilga_PR_Cont <- ilga_PR_Cont %>%
  filter(!is.na(N))

ilga_PR_Cont

# A tibble: 193 × 14
       N    CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
   <dbl> <dbl> <chr>     <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1     1     1 Africa    Algeria NO      NO      NO      NO      NO      NO     
 2     2     2 Africa    Angola  NO      YES     YES     YES     YES     NO     
 3     3     3 Africa    Benin   NO      NO      NO      NO      NO      NO     
 4     4     4 Africa    Botswa… NO      NO      YES     NO      NO      NO     
 5     5     5 Africa    Burkin… NO      NO      NO      NO      NO      NO     
 6     6     6 Africa    Burundi NO      NO      NO      NO      NO      NO     
 7     7     7 Africa    Camero… NO      NO      NO      NO      NO      NO     
 8     8     8 Africa    Cabo V… NO      NO      YES     YES     NO      NO     
 9     9     9 Africa    Centra… NO      NO      NO      NO      NO      NO     
10    10    10 Africa    Chad    NO      NO      NO      YES     NO      NO     
# … with 183 more rows, 4 more variables: R_SameSexMarriage <chr>,
#   R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
#   and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
#   ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Pivoting Data

To better analyze my data, I decided to pivot the data frame. Before, the “Type of measure” (Protection and Recognition) were written out as their own variables (P_Type and R_Type) with a “Yes”, “No”, or “Limited” as their values.

After I pivoted my data longer,P_Type and R_Type fall under the variable Type, with the “Yes”, “No”, and “Limited” values falling under the variable Yes_No_Limited.

#ilga_PR_Cont Pivot longer
ilga_PIV <- pivot_longer(ilga_PR_Cont, P_Const:R_SecondParentAdoption, names_to = "Type", values_to = "Yes_No_Limited")

#Sanity Check
ilga_PIV

# A tibble: 1,930 × 6
       N    CN Continent COUNTRY Type                   Yes_No_Limited
   <dbl> <dbl> <chr>     <chr>   <chr>                  <chr>         
 1     1     1 Africa    Algeria P_Const                NO            
 2     1     1 Africa    Algeria P_BroadProt            NO            
 3     1     1 Africa    Algeria P_Employ               NO            
 4     1     1 Africa    Algeria P_HateCrime            NO            
 5     1     1 Africa    Algeria P_Incitement           NO            
 6     1     1 Africa    Algeria P_BanConvTherapies     NO            
 7     1     1 Africa    Algeria R_SameSexMarriage      NO            
 8     1     1 Africa    Algeria R_CivilUnion           NO            
 9     1     1 Africa    Algeria R_JointAdoptions       NO            
10     1     1 Africa    Algeria R_SecondParentAdoption NO            
# … with 1,920 more rows
# ℹ Use `print(n = ...)` to see more rows

Visualizing Data

Number of Yes, No, Limited for Each Continent

I wanted to visualize the Number of “Yes”, “No”, and “Limited” values for each Type by the Continent. I thought that just looking at the number of the “Yes”, “No”, and “Limited” values would help identify which countries did/or did not have protections and recognitions in legislation in regards to state-sponsored homophobia.

I decided to make a bar graph that is facet wrapped by Continent, with the x variable as Type and the fill as Yes, No, Limited. This would allow a graph that showed the total make up of “Yes”, “No”, and “Limited” for each Type for each Continent.

Use of ggplotly: Looking at the r-graph gallery, I found that the graph could be made interactive with the use of the function ggplotly(). To use this function, I installed the packages “gapminder” and “plotly”. ggplotly() served as a useful way to check my data, as the interactions through the tool tip delivered a quick summary of each data point on my graph. I could confirm if the data did/or did not present the way I thought it would.

Use of Hex codes: I wanted to choose the colors in my visualizations, and I learned that I could use the thousands of unique hex codes to specify the colors I wanted.

#pal1 <- c("#5b7f95","#c69214", "#b3b995", "#aca39a", "#41273b", "#881c1c")

# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple), 

#Hex code palette 
pal1 <- c("#5b7f95","#8A1515", "#478335")

All_Continent <- ggplot(data = ilga_PIV) + 
  geom_bar(mapping = aes(x = Type, fill = `Yes_No_Limited`)) +
  facet_wrap(~ Continent, nrow = 2) +
  coord_flip() +
  scale_fill_manual(values = pal1)

 
ggplotly(All_Continent)

My visualization shows a clear representation of the makeup of Yes_No_Limited for each Type under each Continent. The colors make a quick and easy distinction, that otherwise could be difficult to grasp by just looking at the data frame.

However, something that I did not account for was that just looking at the total count of Yes_No_Limited, though useful to compare the makeup of the Type within each Continent, is not the best standard to compare across Continents. Each continent presented by ILGA World, has a different number of COUNTRYs for each Continent.

Just looking at the count of Yes_No_Limited for each Type across each Continent can present a skewed view of the data. For example “North America” seemingly has a fewer number of Yes_No_Limited compared to “Europe”. But that is not to say that the “North America” proportion/percent of “Yes”, “No”, and “Limited” is lesser than “Europe” when looking at the number of countries. What determined the number is number of COUNTRYs in each Continent.

So for my next visualization(s), I aimed to look at the percent/some sort of better distinguished numeric data of “Yes”, “No”, and “Limited” for each Continent.

Treating YES, NO, and LIMITED as Numbers

To compare the percentages of “Yes”, “No”, and “Limited” under Yes_No_Limited for each Continent, I first converted each “Yes”, “No”, and “Limited” to a numeric value.

I gave the values NO = 0, YES = 1, LIMITED = 0.5. I did this because I wanted to create a numeric metric to compare the the number of Protections and Recognition for Continents and Countries in the data frame.

First I recoded just for the P_Type

#recodinng variables for Protection("P")
ilga_PR_Cont_NumP <- ilga_PR_Cont %>%
  mutate(across(starts_with("P"),
                ~recode(., "NO" = 0,"YES" = 1, "LIMITED" = 0.5),
  ))

#Sanity Check
ilga_PR_Cont_NumP

# A tibble: 193 × 14
       N    CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
   <dbl> <dbl> <chr>     <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1     1     1 Africa    Algeria       0       0       0       0       0       0
 2     2     2 Africa    Angola        0       1       1       1       1       0
 3     3     3 Africa    Benin         0       0       0       0       0       0
 4     4     4 Africa    Botswa…       0       0       1       0       0       0
 5     5     5 Africa    Burkin…       0       0       0       0       0       0
 6     6     6 Africa    Burundi       0       0       0       0       0       0
 7     7     7 Africa    Camero…       0       0       0       0       0       0
 8     8     8 Africa    Cabo V…       0       0       1       1       0       0
 9     9     9 Africa    Centra…       0       0       0       0       0       0
10    10    10 Africa    Chad          0       0       0       1       0       0
# … with 183 more rows, 4 more variables: R_SameSexMarriage <chr>,
#   R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
#   and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
#   ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

I did the same for the Recognition

#recoding variables for Protection and Recognition(P and R)
ilga_PR_Cont_NumPR <- ilga_PR_Cont_NumP %>%
  mutate(across(starts_with("R"),
                ~recode(., "NO" = 0,"YES" = 1, "LIMITED" = 0.5),
  ))

#sanity check
ilga_PR_Cont_NumPR

# A tibble: 193 × 14
       N    CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
   <dbl> <dbl> <chr>     <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1     1     1 Africa    Algeria       0       0       0       0       0       0
 2     2     2 Africa    Angola        0       1       1       1       1       0
 3     3     3 Africa    Benin         0       0       0       0       0       0
 4     4     4 Africa    Botswa…       0       0       1       0       0       0
 5     5     5 Africa    Burkin…       0       0       0       0       0       0
 6     6     6 Africa    Burundi       0       0       0       0       0       0
 7     7     7 Africa    Camero…       0       0       0       0       0       0
 8     8     8 Africa    Cabo V…       0       0       1       1       0       0
 9     9     9 Africa    Centra…       0       0       0       0       0       0
10    10    10 Africa    Chad          0       0       0       1       0       0
# … with 183 more rows, 4 more variables: R_SameSexMarriage <dbl>,
#   R_CivilUnion <dbl>, R_JointAdoptions <dbl>, R_SecondParentAdoption <dbl>,
#   and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
#   ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

The final data set ilga_PR_Cont_NumPR represents each “Yes”, “No”, and “Limited” as a number. Since the values are now a double type, I can better make numeric calculations and operations with my data.

Freedom Score For Each Country

With each Yes”, “No”, and “Limited” as a number, I decided to create a column called Freedom_Score.

The Freedom_Score represents the aggregate score of the P_Type and R_Type added. So each number across P_Const:R_SecondParentAdoption is added to single number under the variable Freedom_Score.

#Freedom Score Variable 
ilga_PR_Cont_NumPR_FS <- ilga_PR_Cont_NumPR %>% 
  rowwise() %>%
  mutate(Freedom_Score = sum(c_across(P_Const:R_SecondParentAdoption)))

#Sanity Check
ilga_PR_Cont_NumPR_FS

# A tibble: 193 × 15
# Rowwise: 
       N    CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
   <dbl> <dbl> <chr>     <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1     1     1 Africa    Algeria       0       0       0       0       0       0
 2     2     2 Africa    Angola        0       1       1       1       1       0
 3     3     3 Africa    Benin         0       0       0       0       0       0
 4     4     4 Africa    Botswa…       0       0       1       0       0       0
 5     5     5 Africa    Burkin…       0       0       0       0       0       0
 6     6     6 Africa    Burundi       0       0       0       0       0       0
 7     7     7 Africa    Camero…       0       0       0       0       0       0
 8     8     8 Africa    Cabo V…       0       0       1       1       0       0
 9     9     9 Africa    Centra…       0       0       0       0       0       0
10    10    10 Africa    Chad          0       0       0       1       0       0
# … with 183 more rows, 5 more variables: R_SameSexMarriage <dbl>,
#   R_CivilUnion <dbl>, R_JointAdoptions <dbl>, R_SecondParentAdoption <dbl>,
#   Freedom_Score <dbl>, and abbreviated variable names ¹P_BroadProt,
#   ²P_Employ, ³P_HateCrime, ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Adding the Freedom Score of Each Country for Each Continent

With the Freedom_Score calculated for each COUNTRY, I wanted to create a data set that observed the Freedom_Score for each Continent. I saved this calculation as a Total_Freedom_Score.

Total_FS <- ilga_PR_Cont_NumPR_FS %>% 
  select(Continent, COUNTRY, Freedom_Score) %>%
  group_by(Continent) %>%
  summarise(Total_Freedom_Score = sum(Freedom_Score))

#Sanity Check
Total_FS

# A tibble: 6 × 2
  Continent                   Total_Freedom_Score
  <chr>                                     <dbl>
1 Africa                                     23.5
2 Asia                                       17.5
3 Europe                                    222. 
4 Latin America/The Caribbean                85.5
5 North America                              15  
6 Oceania                                    29

colnames(Total_FS)

[1] "Continent"           "Total_Freedom_Score"

Visualizing Total Freedom Score

#Palette for Total_Freedom_Score
# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple)

pal2 <- c("#CD6155","#EB984E", "#F4D03F", "#52BE80", "#7FB3D5", "#BB8FCE")

##ggplot calculations
All_Cont_FS<- ggplot(data = Total_FS) +
  geom_bar(mapping = aes(x = Continent, y = `Total_Freedom_Score`, fill = Continent), stat = "identity") +
  coord_flip()+
  scale_fill_manual(values = pal2) +
  theme(legend.position="none")

Total_Freedom_Score (Aggregate Freedom_Score by COUNTRY for Continent)

ggplotly(All_Cont_FS)

This visualization represents Total_Freedom_Score, a data set that observed the Freedom_Score for each Continent.

Continuing Numeric Calculations

With the Total_Freedom_Score values, I continued to calculate the percentages of the Total_Freedom_Score from the total points a COUNTRY could have theoretically gained, under the variable Real_Freedom_Score.
In theory, each COUNTRY can have a maximum of 10 under Freedom_Score ( 6 “Yes” in P_Type + 4 “No” in R_Type = 10).
Thus, each COUNTRY could have a maximum of 10. If each COUNTRY has a maximum of 10 under Freedom_Score, then the Total_Possible_Freedom_Score for each Continent would be a calculation of the number of COUNTRYs multiplied by 10.

I do the calculations as follows:

Finding Number of Country in Each Continent for Reported Data

 Continent_FS <- ilga_PR_Cont_NumPR_FS %>% 
  select(Continent, COUNTRY, Freedom_Score) %>%
  group_by(Continent) %>%
  count()

Continent_FS <-rename(Continent_FS, Number_Of_Country = n)

#Sanity Check
Continent_FS

# A tibble: 6 × 2
# Groups:   Continent [6]
  Continent                   Number_Of_Country
  <chr>                                   <int>
1 Africa                                     54
2 Asia                                       42
3 Europe                                     48
4 Latin America/The Caribbean                33
5 North America                               2
6 Oceania                                    14

Joining Number_of_Country Data set and Total_Freedom_Score to Create Real_Freedom_Score

I joined Continent_FS data set and Total_FS data set to joined_Cont_Total_FS, so that the data set would contain information about the Number_of_Country and Total_Freedom_Score.

I make a Real_Freedom_Score Column by (Total_Freedom_Score/(Number_Of_Country * 10)) * 100

joined_Cont_Total_FS <- full_join(Continent_FS, Total_FS)


joined_Cont_Total_FS <- joined_Cont_Total_FS %>%
  select(Number_Of_Country, Total_Freedom_Score) %>%
  mutate(
    Total_Possible_Freedom_Score = Number_Of_Country * 10,
    Real_Freedom_Score = (Total_Freedom_Score/(Number_Of_Country * 10)) * 100
    )

Real_Freedom_Score Observation

Now all the potential types of scores are stored in joined_Cont_Total_FS

 joined_Cont_Total_FS

# A tibble: 6 × 5
# Groups:   Continent [6]
  Continent                   Number_Of_Country Total_Freedom_…¹ Total…² Real_…³
  <chr>                                   <int>            <dbl>   <dbl>   <dbl>
1 Africa                                     54             23.5     540    4.35
2 Asia                                       42             17.5     420    4.17
3 Europe                                     48            222.      480   46.1 
4 Latin America/The Caribbean                33             85.5     330   25.9 
5 North America                               2             15        20   75   
6 Oceania                                    14             29       140   20.7 
# … with abbreviated variable names ¹Total_Freedom_Score,
#   ²Total_Possible_Freedom_Score, ³Real_Freedom_Score

Pivot Data in joined_Cont_Total_FS

Total_Real_Score <- joined_Cont_Total_FS %>%
  group_by(Continent)%>%
  select(Continent, Real_Freedom_Score,Total_Freedom_Score) %>%
  pivot_longer(Total_Freedom_Score:Real_Freedom_Score, names_to = "Score_Type", values_to = "Score")

I pivoted the all scores Total_Freedom_Score and Real_Freedom_Score to Score_Type and the values to Score to the data set Total_Real_Score

Total_Real_Score

Total_Real_Score

# A tibble: 12 × 3
# Groups:   Continent [6]
   Continent                   Score_Type           Score
   <chr>                       <chr>                <dbl>
 1 Africa                      Total_Freedom_Score  23.5 
 2 Africa                      Real_Freedom_Score    4.35
 3 Asia                        Total_Freedom_Score  17.5 
 4 Asia                        Real_Freedom_Score    4.17
 5 Europe                      Total_Freedom_Score 222.  
 6 Europe                      Real_Freedom_Score   46.1 
 7 Latin America/The Caribbean Total_Freedom_Score  85.5 
 8 Latin America/The Caribbean Real_Freedom_Score   25.9 
 9 North America               Total_Freedom_Score  15   
10 North America               Real_Freedom_Score   75   
11 Oceania                     Total_Freedom_Score  29   
12 Oceania                     Real_Freedom_Score   20.7

Comparing Real_Freedom_Score to Total_Freedom_Score

#Palette for Total_Freedom_Score
# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple)

pal3 <- c("#7FB3D5","#CD6155")

Score_All <- ggplot(data = Total_Real_Score) + 
  geom_bar(mapping = aes(x = Score_Type, y = Score, fill = Score_Type),stat = "identity") +
  facet_wrap(~ Continent, nrow = 2)+
  theme(axis.text.x=element_text(angle=25,hjust=1)) +
  scale_fill_manual(values = pal3)+
  theme(axis.title.x = element_text(margin = margin(t = 45))
  )
  

ggplotly(Score_All)

This visualization compares the Real_Freedom_Score to Total_Freedom_Score of each continent. Looking at just the Total_Freedom_Score, the data can seem like “Europe” heavily outweighs the other Continents. However, when looking at Real_Freedom_Score, we see that “North America” outweighs “Europe”.

Making such observations informs the percent a Continent has achieved of the total Total_Possible_Score. However, it is also important to note the these proportions and calculations are based off of ILGA World’s categorization of different countries and continents. Depending on the allocation of a country to a continent etc. their proportions will change.

In this data set 2 COUNTRYs are allocated to “North America” and 48 COUNTRYs are allocated to “Europe”. It is important to look at the proportion, but also to recognize the distribution and categorization of countries by ILGA WORLD.

Processing Data for Asian COUNTRYs

I did the same process’ and visualizations I did for the Continents, for the COUNTRYs in “Asia”.

Dataframes Specific to Asia with “Yes”, “No”, and “Limited”.

#Dataframe with Yes and NO

Asia_YN <- ilga_PR_Cont %>%
  filter( Continent == "Asia")

Asia_YN

# A tibble: 42 × 14
       N    CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
   <dbl> <dbl> <chr>     <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1    90     1 Asia      Afghan… NO      NO      NO      NO      NO      NO     
 2    91     2 Asia      Bahrain NO      NO      NO      NO      NO      NO     
 3    92     3 Asia      Bangla… NO      NO      NO      NO      NO      NO     
 4    93     4 Asia      Bhutan  NO      NO      NO      NO      NO      NO     
 5    94     5 Asia      Brunei… NO      NO      NO      NO      NO      NO     
 6    95     6 Asia      Cambod… NO      NO      NO      NO      NO      NO     
 7    96     7 Asia      China   NO      NO      NO      NO      NO      NO     
 8    97     8 Asia      East T… NO      NO      NO      YES     NO      NO     
 9    98     9 Asia      India   NO      NO      NO      NO      NO      NO     
10    99    10 Asia      Indone… NO      NO      NO      NO      NO      NO     
# … with 32 more rows, 4 more variables: R_SameSexMarriage <chr>,
#   R_CivilUnion <chr>, R_JointAdoptions <chr>, R_SecondParentAdoption <chr>,
#   and abbreviated variable names ¹P_BroadProt, ²P_Employ, ³P_HateCrime,
#   ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Asia Data frame with “Yes”, “No”, and “Limited” as 1, 0, 0.5 and Freedom Score

#Dataframe with Numbers and Freedom Score
Asia_FS <- ilga_PR_Cont_NumPR_FS %>%
  filter( Continent == "Asia")

Asia_FS

# A tibble: 42 × 15
# Rowwise: 
       N    CN Continent COUNTRY P_Const P_Bro…¹ P_Emp…² P_Hat…³ P_Inc…⁴ P_Ban…⁵
   <dbl> <dbl> <chr>     <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1    90     1 Asia      Afghan…       0       0       0       0       0       0
 2    91     2 Asia      Bahrain       0       0       0       0       0       0
 3    92     3 Asia      Bangla…       0       0       0       0       0       0
 4    93     4 Asia      Bhutan        0       0       0       0       0       0
 5    94     5 Asia      Brunei…       0       0       0       0       0       0
 6    95     6 Asia      Cambod…       0       0       0       0       0       0
 7    96     7 Asia      China         0       0       0       0       0       0
 8    97     8 Asia      East T…       0       0       0       1       0       0
 9    98     9 Asia      India         0       0       0       0       0       0
10    99    10 Asia      Indone…       0       0       0       0       0       0
# … with 32 more rows, 5 more variables: R_SameSexMarriage <dbl>,
#   R_CivilUnion <dbl>, R_JointAdoptions <dbl>, R_SecondParentAdoption <dbl>,
#   Freedom_Score <dbl>, and abbreviated variable names ¹P_BroadProt,
#   ²P_Employ, ³P_HateCrime, ⁴P_Incitement, ⁵P_BanConvTherapies
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Pivot Asia Dataframe

Asia_Piv <- Asia_YN %>%
  pivot_longer(P_Const:R_SecondParentAdoption, names_to = "Type", values_to = "Yes_No_Limited")

Asia_Piv

# A tibble: 420 × 6
       N    CN Continent COUNTRY     Type                   Yes_No_Limited
   <dbl> <dbl> <chr>     <chr>       <chr>                  <chr>         
 1    90     1 Asia      Afghanistan P_Const                NO            
 2    90     1 Asia      Afghanistan P_BroadProt            NO            
 3    90     1 Asia      Afghanistan P_Employ               NO            
 4    90     1 Asia      Afghanistan P_HateCrime            NO            
 5    90     1 Asia      Afghanistan P_Incitement           NO            
 6    90     1 Asia      Afghanistan P_BanConvTherapies     NO            
 7    90     1 Asia      Afghanistan R_SameSexMarriage      NO            
 8    90     1 Asia      Afghanistan R_CivilUnion           NO            
 9    90     1 Asia      Afghanistan R_JointAdoptions       NO            
10    90     1 Asia      Afghanistan R_SecondParentAdoption NO            
# … with 410 more rows
# ℹ Use `print(n = ...)` to see more rows

Observing Count of “Yes”, “No”, “Limited” for each Country in Asia

# #CD6155 (red), #EB984E (orange), #F4D03F (yello) #52BE80 (green), #7FB3D5 (blue), #BB8FCE (purple),
 
pal4 <- c("#7FB3D5", "#CD6155", "#52BE80" )

Asia_Country <- ggplot(data = Asia_Piv) + 
  geom_bar(mapping = aes(x = COUNTRY, fill = `Yes_No_Limited`)) +
  scale_fill_manual(values = pal4) +
  coord_flip() 

ggplotly(Asia_Country)

Observing Freedom_Score for each Country in Asia

pal4 <- c( "#52BE80" )

Asia_FreedomScore <- ggplot(Asia_FS, aes(x = COUNTRY, y = Freedom_Score, width=.5))+ 
    geom_bar(position="dodge", stat="identity") +
    coord_flip() 

ggplotly(Asia_FreedomScore)

Conclusion/Reflections

Furthering project:

Overall, my project gave an insight to how representing data in different structure can vastly affect the interpretation of the data.As my data did not track information over time, but rather gave a count, the way the count was represented greatly changed in how the data presented.

Years of Decriminalisation to Number of Protection/Recognition

I did not observe the criminalisation sections of the original data frame. The criminalisation data provides data on if same-sex acts are legal, the date of decriminalisation for same-sex acts, and the max penalty or same-sex acts.

Something that can be interesting to observe would be to see if there is a relation between the time a country has decriminalized same-sex acts and the number of protections and recognition it has. I would hypothesize that there is a positive correlation between the years since decriminalization and the number of protections and recognition. However, with specific categories, I do believe that there could be a stagnation due to complacency of governments in legislation to generally appease people through surface legislation. Observing this relationship could give better insight into a potential prediction model between the number of years and the number of protections and recognition.

Between Protection and Recognition

Something I wanted to achieve in my analysis is to compare the protections to recognitions, rather than aggregate them under one Type. However, I could not code an apt data frame which I could use to make a ggplot. In my basic schema I would want protection values on the x axis and recognition values on the y axis for each county/continent. Observing the relationship between protections and recogntions could give better insight into which areas of legislation ought to be focused on/where countries worldwide lack in their legislative measures.

Map Data based on Protections and Recognitions

additionally, creating a map of the different protections and recognitions can give bettwe visual information to which areas of the world have what kinds of protections and recognitions. This kind of representation would abstract sounding data into a quickly absorbable image. For example, if country had more or less protections and recognitions, the country could be shaded darker, lighter, or not at all.

Thoughts:

Throughout this project, the greatest data lesson I learned is flexibility in approaches.

As a near first-time coder, the task of taking on a project seemed very daunting. I was very careful to make sure that all my lines of code “made sense” and that there were no errors. There was a lot of “noise” and confusion for the seemingly endless combinations of approaching a task with a common end.

However, as I decided to just somewhat haphazardly experiment with my data sets and code, there were challenges that arose which ultimately helped me solve my problems of being “too scared” to code. I practiced tracing back my errors and researchig solutions.

As someone who studied linguistics and political science during undergrad, I categorized coding as a topic that was unapproachable and far from my studies. However, this project gave insight into the several intersections between computational and social science studies. I better understand how visualization can be a powerful tool to represent information that is difficult to express in writing.

Planning:

Before I started any code, I physically drew out the types of visualizations I wanted. I had a basic schema of how my data frames would looks like and the types of visualizations I wanted. Though I made some adjustments to the visualizations as I completed my project, having a physical planned drawing definitely better guided my to what I wanted to achieve with code.

Use Country Code

For my data, I manually entered the continents for the Countries, based off the original data frame. However, there is a way to link countries to continents based on certain packages in R, which could produce more accurate results. This could also help account for the NAs in the data set, rather than having to remove them or manually enter each one into the data frame.

Pivoting Data:

One big issue that I had was understanding how to approach my data. I was used to pivoting my data frames for my projects, so I assumed that my data had to be pivoted from the beginning. However, thinking that I had to pivot my data right away created confusion in the way I approached my data and it made it difficult to understand what the pivoted data meant.

After thinking through my process with tutors and classmates, I realized that I did not have to necessarily pivot my data for some of the calculations and observations I wanted to make.

ggplot:

I spent two days trying to figure out my All_Continent data. The ggplot I was making made sense to me, but the visualization that I was being produced did not make sense at all.

All_Continent <- ggplot(data = ilga_PIV) +

geom_bar(mapping = aes(x = Type, fill = Yes_No_Limited)) +

facet_wrap(~ Continent, nrow = 2) + coord_flip() +

scale_fill_manual(values = pal1)

It turns out that the issue was that I used ” ” around Yes_No_Limited instead of . Something so seemingly small, created created days’ long problems. After learning this error fix after discussion with a classmate, I remembered how there were notes that mentions similar topics about the importance of syntax in the early tutorials. Though I still have much to learn about the deeper mechanics of R, I do better understand the importance of topics building upon each other after completing my project.

Discuss, Discuss, and Discuss

One of the greatest helps of this project was the discussions I had with classmates. Taking the time to verbalize what I exactly wanted, helped me better understand the minutia of what I wanted to achieve with my code, and helped me better organize my thoughts. While helping classmates with their code, I saw how I could apply some of the solutions to their code to my project.

Bibliography

ILGA World: Lucas Ramon Mendos, Kellyn Botha, Rafael Carrano Lelis, Enrique López de la Peña,

Ilia Savelev and Daron Tan, State-Sponsored Homophobia 2020: Global Legislation Overview Update

(Geneva: ILGA, December 2020).

R Core Team (2022). R: A language and environment for statistical

computing. R Foundation for Statistical Computing, Vienna, Austria.

URL http://www.R-project.org/.