Final Project Check in 2

finalpart1
kristinabijaoude
bullying
Author

Kristin Abijaoude

Published

April 21, 2023

Code
# load packages
packages <- c("readr", "readxl", "summarytools", "tidyverse", "dplyr", "cars")
lapply(packages, require, character.only = TRUE)
Loading required package: readr
Loading required package: readxl
Loading required package: summarytools
Loading required package: tidyverse
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ dplyr   1.1.0
✔ tibble  3.1.8     ✔ stringr 1.5.0
✔ tidyr   1.3.0     ✔ forcats 0.5.2
✔ purrr   1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ tibble::view()  masks summarytools::view()
Loading required package: cars
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'cars'
[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

[[4]]
[1] TRUE

[[5]]
[1] TRUE

[[6]]
[1] FALSE
Code
knitr::opts_chunk$set(echo = TRUE)

Overview

Bullying continues to be a persistent problem in schools.

Types of bullying faced by those affected include physical fights, exclusion, rumors, snarky “jokes”, and name-calling. Every bullied student dreads going to school because they have to face their bullies, who would find any reason, or no reason at all, to target them. Bullying can happen outside of school, especially with today’s advanced technology and near-universal access to the Internet. While students are always encouraged to tell a trusted adult, such as a teacher, trusted adults in authority have a spotty record when it comes to tackling this epidemic.

In the US alone, one of every five students report being bullied on school grounds, including name-calling (13% among those who reported bullying), being pushed or shoved (5%), or have property destroyed on purpose (1%). 15% of students who reported bullying were cyberbullied 1. Globally, one in three students report bullying, from as low as 7% in the Central Asian country of Tajikistan to as high as 74% in Samoa.2

The negative effects on bullying include low self-esteem, feeling angry or isolated, and distress, as well as physical effects like loss of sleep, headaches, and disordered eating. Bullying can be so detrimental to the victim that they take their own life to escape the pain.3

When discussing ways to combat bullying, it’s too simplistic to say that “kids are just cruel”. My purpose is to find why some students are more vulnerable to being targets of bullying, and how we can use those parameters to create solutions to end bullying once and for all.

Hypotheses and Proposed Models

I will specify which model to test out each of my hypotheses. In this project, I will use these variables to explore a relationship between those variables and bullying.

  • Ha: Students who report loneliness and fewer friends are more vulnerable of being targets or bullying.

  • Ha: Male students are more likely than female students to face physical abuse by bullies.

  • Ha: More female students who report bullying are targeted for being underweight, while male students who report bullying are targeted for being overweight.

  • Ha: Students who face more physical attacks on school grounds are more likely to miss school.

  • Ha: Students in primary school tend to be more enganged in some form of physical attacks than students in secondary school.

Data Summary

Code
bullying <- read_xlsx("_data/Bullying.xlsx",
                   range = cell_rows(2:56982))
bullying
# A tibble: 56,980 × 18
   record Bullie…¹ Bulli…² Cyber…³ Custo…⁴ Sex   Physi…⁵ Physi…⁶ Felt_…⁷ Close…⁸
    <dbl> <chr>    <chr>   <chr>   <chr>   <chr> <chr>   <chr>   <chr>   <chr>  
 1      1 Yes      Yes     <NA>    13 yea… Fema… 0 times 0 times Always  2      
 2      2 No       No      No      13 yea… Fema… 0 times 0 times Never   3 or m…
 3      3 No       No      No      14 yea… Male  0 times 0 times Never   3 or m…
 4      4 No       No      No      16 yea… Male  0 times 2 or 3… Never   3 or m…
 5      5 No       No      No      13 yea… Fema… 0 times 0 times Rarely  3 or m…
 6      6 No       No      No      13 yea… Male  0 times 1 time  Never   3 or m…
 7      7 No       No      No      14 yea… Fema… 1 time  0 times Someti… 3 or m…
 8      8 No       No      No      12 yea… Fema… 0 times 0 times Rarely  3 or m…
 9      9 No       No      No      13 yea… Male  1 time  2 or 3… Never   3 or m…
10     10 Yes      No      No      14 yea… Fema… 0 times 0 times Always  0      
# … with 56,970 more rows, 8 more variables: Miss_school_no_permission <chr>,
#   Other_students_kind_and_helpful <chr>, Parents_understand_problems <chr>,
#   Most_of_the_time_or_always_felt_lonely <chr>,
#   Missed_classes_or_school_without_permission <chr>, Were_underweight <chr>,
#   Were_overweight <chr>, Were_obese <chr>, and abbreviated variable names
#   ¹​Bullied_on_school_property_in_past_12_months,
#   ²​Bullied_not_on_school_property_in_past_12_months, …

This 2018 study was conducted by Global School-Based Student Health Survey (GSHS), where 56,981 students from Argentina participated by filling out the questionnaire in regards to their mental health and behavior.4

Code
dim(bullying) # 56980 rows and 18 columns
[1] 56980    18
Code
print(dfSummary(bullying,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

bullying

Dimensions: 56980 x 18
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
record [numeric]
Mean (sd) : 28534.9 (16479.7)
min ≤ med ≤ max:
1 ≤ 28521.5 ≤ 57094
IQR (CV) : 28540.5 (0.6)
56980 distinct values 0 (0.0%)
Bullied_on_school_property_in_past_12_months [character]
1. No
2. Yes
43838 ( 78.6% )
11903 ( 21.4% )
1239 (2.2%)
Bullied_not_on_school_property_in_past_12_months [character]
1. No
2. Yes
44263 ( 78.4% )
12228 ( 21.6% )
489 (0.9%)
Cyber_bullied_in_past_12_months [character]
1. No
2. Yes
44213 ( 78.4% )
12196 ( 21.6% )
571 (1.0%)
Custom_Age [character]
1. 11 years old or younger
2. 12 years old
3. 13 years old
4. 14 years old
5. 15 years old
6. 16 years old
7. 17 years old
8. 18 years old or older
48 ( 0.1% )
145 ( 0.3% )
10574 ( 18.6% )
12946 ( 22.8% )
12812 ( 22.5% )
11737 ( 20.6% )
8227 ( 14.5% )
383 ( 0.7% )
108 (0.2%)
Sex [character]
1. Female
2. Male
29361 ( 52.0% )
27083 ( 48.0% )
536 (0.9%)
Physically_attacked [character]
1. 0 times
2. 1 time
3. 10 or 11 times
4. 12 or more times
5. 2 or 3 times
6. 4 or 5 times
7. 6 or 7 times
8. 8 or 9 times
46996 ( 82.8% )
5248 ( 9.2% )
115 ( 0.2% )
790 ( 1.4% )
2405 ( 4.2% )
695 ( 1.2% )
302 ( 0.5% )
189 ( 0.3% )
240 (0.4%)
Physical_fighting [character]
1. 0 times
2. 1 time
3. 10 or 11 times
4. 12 or more times
5. 2 or 3 times
6. 4 or 5 times
7. 6 or 7 times
8. 8 or 9 times
43245 ( 76.3% )
6932 ( 12.2% )
165 ( 0.3% )
939 ( 1.7% )
3650 ( 6.4% )
1028 ( 1.8% )
489 ( 0.9% )
264 ( 0.5% )
268 (0.5%)
Felt_lonely [character]
1. Always
2. Most of the time
3. Never
4. Rarely
5. Sometimes
3120 ( 5.5% )
6422 ( 11.3% )
17931 ( 31.7% )
14427 ( 25.5% )
14714 ( 26.0% )
366 (0.6%)
Close_friends [character]
1. 0
2. 1
3. 2
4. 3 or more
3331 ( 6.0% )
4732 ( 8.5% )
9110 ( 16.3% )
38731 ( 69.3% )
1076 (1.9%)
Miss_school_no_permission [character]
1. 0 days
2. 1 or 2 days
3. 10 or more days
4. 3 to 5 days
5. 6 to 9 days
38654 ( 70.1% )
9738 ( 17.7% )
1468 ( 2.7% )
3925 ( 7.1% )
1331 ( 2.4% )
1864 (3.3%)
Other_students_kind_and_helpful [character]
1. Always
2. Most of the time
3. Never
4. Rarely
5. Sometimes
9710 ( 17.5% )
15820 ( 28.5% )
4775 ( 8.6% )
10966 ( 19.8% )
14150 ( 25.5% )
1559 (2.7%)
Parents_understand_problems [character]
1. Always
2. Most of the time
3. Never
4. Rarely
5. Sometimes
13072 ( 23.9% )
9570 ( 17.5% )
11964 ( 21.9% )
10459 ( 19.2% )
9542 ( 17.5% )
2373 (4.2%)
Most_of_the_time_or_always_felt_lonely [character]
1. No
2. Yes
47072 ( 83.1% )
9542 ( 16.9% )
366 (0.6%)
Missed_classes_or_school_without_permission [character]
1. No
2. Yes
38654 ( 70.1% )
16462 ( 29.9% )
1864 (3.3%)
Were_underweight [character]
1. No
2. Yes
35318 ( 98.0% )
733 ( 2.0% )
20929 (36.7%)
Were_overweight [character]
1. No
2. Yes
25376 ( 70.4% )
10675 ( 29.6% )
20929 (36.7%)
Were_obese [character]
1. No
2. Yes
33396 ( 92.6% )
2655 ( 7.4% )
20929 (36.7%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-04-23

Tidying Dataset

Code
# shortening variable names
bully <- bullying %>%
  rename("bullied_at_school" = Bullied_on_school_property_in_past_12_months,
         "bullied_outside_school" = Bullied_not_on_school_property_in_past_12_months,
         "cyberbullied" = Cyber_bullied_in_past_12_months,
         "grade" = Custom_Age,
         "missed_school" = Miss_school_no_permission,
         "help_from_peers" = Other_students_kind_and_helpful,
         "parents_help" = Parents_understand_problems,
         "underweight" = Were_underweight,
         "overweight" = Were_overweight) # for the purpose of this project, I will conflate overweight with obese

# removing repetitive and unneeded variables
bully <- bully %>%
  select(-c("Were_obese", "Missed_classes_or_school_without_permission", "Most_of_the_time_or_always_felt_lonely"))

There are a lot of missing data in the dataset, with some variables having about 40% missing data. For easier management, I will convert the binary variables into dummy variables, with NAs being treated as no or 0.

Code
# replace NAs with 0
bully$`bullied_at_school`[is.na(bully$`bullied_at_school`)] <-0
bully$bullied_outside_school[is.na(bully$bullied_outside_school)] <- 0
bully$cyberbullied[is.na(bully$cyberbullied)] <- 0
bully$grade[is.na(bully$grade)] <- 0
bully$Sex[is.na(bully$Sex)] <- 0
bully$Physically_attacked[is.na(bully$Physically_attacked)] <- 0
bully$Physical_fighting[is.na(bully$Physical_fighting)] <- 0
bully$Felt_lonely[is.na(bully$Felt_lonely)] <- 0
bully$Close_friends[is.na(bully$Close_friends)] <- 0
bully$missed_school[is.na(bully$missed_school)] <- 0
bully$help_from_peers[is.na(bully$help_from_peers)] <- 0
bully$parents_help[is.na(bully$parents_help)] <- 0
bully$underweight[is.na(bully$underweight)] <- 0
bully$overweight[is.na(bully$overweight)] <- 0

# let's count the amount of missing data by variable
colSums(is.na(bully))
                record      bullied_at_school bullied_outside_school 
                     0                      0                      0 
          cyberbullied                  grade                    Sex 
                     0                      0                      0 
   Physically_attacked      Physical_fighting            Felt_lonely 
                     0                      0                      0 
         Close_friends          missed_school        help_from_peers 
                     0                      0                      0 
          parents_help            underweight             overweight 
                     0                      0                      0 

For the purpose of this project, 1 means Yes or more than 0, and 0 means No or 0.

Code
# Were you bullied on school grounds at one point in the past 12 months?
bully$bullied_at_school <- ifelse(bully$bullied_at_school == "Yes", 1, 0)

# Were you bullied outside of school at one point in the last 12 months?
bully$bullied_outside_school <- ifelse(bully$bullied_outside_school == "Yes", 1, 0)

# Were you cyberbullied at one point in the past 12 months?
bully$cyberbullied <- ifelse(bully$cyberbullied == "Yes", 1,0)

# Are you underweight?
bully$underweight <- ifelse(bully$underweight == "Yes", 1,0)

# Are you overweight or obese?
bully$overweight <- ifelse(bully$overweight == "Yes", 1,0)

# Are you Male or Female?
bully$Sex <- ifelse(bully$Sex == "Male", 1,0) # Male is 1, female is 0

As one notices, not all variables have binary responses. For the same reason I converted binary variables into dummy variables, I will code the values accordingly.

Code
# How often are fellow students are helpful towards you?
bully <- bully %>% 
       mutate(help_from_peers = case_when(
         help_from_peers == "Never" ~ 0,
         help_from_peers == "Rarely" ~ 1,
         help_from_peers == "Sometimes" ~ 2,
         help_from_peers == "Most of the time" ~ 3,
         help_from_peers == "Always" ~ 4,
         TRUE ~ 0)) 

# How often have you felt lonely?
bully <- bully %>% 
       mutate(Felt_lonely = case_when(
         Felt_lonely == "Never" ~ 0,
         Felt_lonely == "Rarely" ~ 1,
         Felt_lonely == "Sometimes" ~ 2,
         Felt_lonely == "Most of the time" ~ 3,
         Felt_lonely == "Always" ~ 4,
         TRUE ~ 0)) 

#   How helpful and understanding are your parents?
bully <- bully %>% 
       mutate(parents_help = case_when(
         parents_help == "Never" ~ 0,
         parents_help == "Rarely" ~ 1,
         parents_help == "Sometimes" ~ 2,
         parents_help == "Most of the time" ~ 3,
         parents_help == "Always" ~ 4,
         TRUE ~ 0)) 

# How many times were you physically attacked?
bully <- bully %>% 
       mutate(Physically_attacked = case_when(
         Physically_attacked == "0 times" ~ 0,
         Physically_attacked == "1 time" ~ 1,
         Physically_attacked == "2 or 3 times" ~ 2,
         Physically_attacked == "4 or 5 times" ~ 3,
         Physically_attacked == "6 or 7 times" ~ 4,
         Physically_attacked == "8 or 9 times" ~ 5,
         Physically_attacked == "10 or 11 times" ~ 6,
         Physically_attacked == "12 or more times" ~ 7,
         TRUE ~ 0)) 

# How many times were you involved in some form of physical fighting?
bully <- bully %>% 
       mutate(Physical_fighting = case_when(
         Physical_fighting == "0 times" ~ 0,
         Physical_fighting == "1 time" ~ 1,
         Physical_fighting == "2 or 3 times" ~ 2,
         Physical_fighting == "4 or 5 times" ~ 3,
         Physical_fighting == "6 or 7 times" ~ 4,
         Physical_fighting == "8 or 9 times" ~ 5,
         Physical_fighting == "10 or 11 times" ~ 6,
         Physical_fighting == "12 or more times" ~ 7,
         TRUE ~ 0)) 

# How many close friends do you have?
bully <- bully %>% 
       mutate(Close_friends = case_when(
         Close_friends == "0" ~ 0,
         Close_friends == "1" ~ 1,
         Close_friends == "2" ~ 2,
         Close_friends == "3 or more" ~ 3,
         TRUE ~ 0)) 

# How many days have you missed school?
bully <- bully %>% 
       mutate(missed_school = case_when(
         missed_school == "0" ~ 0,
         missed_school == "1 or 2 days" ~ 1,
         missed_school == "3 to 5 days" ~ 2,
         missed_school == "6 to 9 days" ~ 3,
         missed_school == "10 or more days" ~ 4,
         TRUE ~ 0)) 

For this project, I decided to turn the custom age variable into a dummy variable for students who went to either primary school or secondary school. In Argentina, students age 6-14 years old attend primary school, while students older attend secondary school and beyond5. 0 means secondary school, while 1 means primary school.

Code
# create another variable
bully <- bully %>%
    mutate(primary_school = as.integer(grade %in% 
                                  c("13 years old", "14 years old", "12 years old", "11 years old or younger")),
           secondary = as.integer(!primary_school)) %>%
  select(-c(secondary, grade))

colnames(bully)
 [1] "record"                 "bullied_at_school"      "bullied_outside_school"
 [4] "cyberbullied"           "Sex"                    "Physically_attacked"   
 [7] "Physical_fighting"      "Felt_lonely"            "Close_friends"         
[10] "missed_school"          "help_from_peers"        "parents_help"          
[13] "underweight"            "overweight"             "primary_school"        

Above, we have all the relevant variables on hand, which is the next step towards effectively testing my hypotheses.

Variables

The dependent variables will be the following:

  1. bullied_at_school: Were you bullied on school grounds at one point in the past 12 months?
  2. bullied_outside_school: Were you bullied outside of school at one point in the last 12 months?
  3. Physically_attacked: How many times were you physically attacked?
  4. Physical_fighting: How many times were you involved in some form of physical fighting?
  5. missed_school: How many times have you missed school?
  6. cyberbullied: Were you cyberbullied at one point in the past 12 months?

The independent variables are the following:

  1. Felt_lonely: How often have you felt lonely?
  2. Close_friends: How many close friends do you have?
  3. help_from_peers: How often are fellow students are helpful towards you?
  4. parents_help: How helpful and understanding are your parents?
  5. underweight: Are you underweight?
  6. overweight: Are you overweight or obese?

The controlled variables are the following:

  1. Sex: Male or Female
  2. primary_school: Are you in primary or secondary school?

Visualizations

Before I test the hypotheses, I thought I would visualize the data distribution for each variable.

Code
# Visualize the dataset
# trim dataset first
bully_set <- bully[,2:14]

# make a loop
for (i in 2:ncol(bully_set)){
  hist(bully_set[[i]], main=names(bully_set[i]), xlab = paste("Frequency",i), col = 'lightblue') 
  box(lty = "solid")
}

Testing My Hypotheses

Hypothesis #1 Loneliness and Bullying

Ha: Students who report loneliness and fewer friends are more vulnerable of being targets of bullying.

Since I am dealing with more than one independent variables, I’m going to use the multiple linear regression, as well as the Pearson’s Correlation to calculate the statistical significance and correlation between the two variables of felt_lonely and close_friends.

Bullying on School Grounds

Code
# Multiple Regression
lonely_fit <- lm(bullied_at_school ~ Felt_lonely + Close_friends, data = bully)
summary(lonely_fit)

Call:
lm(formula = bullied_at_school ~ Felt_lonely + Close_friends, 
    data = bully)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4141 -0.2466 -0.1807 -0.1149  0.8851 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.150674   0.005337  28.232  < 2e-16 ***
Felt_lonely    0.065849   0.001421  46.335  < 2e-16 ***
Close_friends -0.011935   0.001806  -6.608 3.93e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3984 on 56977 degrees of freedom
Multiple R-squared:  0.03963,   Adjusted R-squared:  0.03959 
F-statistic:  1175 on 2 and 56977 DF,  p-value: < 2.2e-16
Code
# Pearson's Correlation
y_hat <- lonely_fit$fitted.values
y <- lonely_fit$model$bullied_at_school
print(cor.test(y_hat, y)$estimate)
      cor 
0.1990651 

Given the three asterisks next to the calculations, I can conclude that there is a significant correlation between the number of close friends, loneliness, and bullying. In fact, the P-value is so small that R doesn’t completely compute how small the number is.

Code
plot(lonely_fit)

Outside of Schoool

Code
# Multiple Regression
lonely_fit2 <- lm(bullied_outside_school ~ Felt_lonely + Close_friends, data = bully)
summary(lonely_fit2)

Call:
lm(formula = bullied_outside_school ~ Felt_lonely + Close_friends, 
    data = bully)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4115 -0.2579 -0.1891 -0.1204  0.8796 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.136588   0.005387  25.355   <2e-16 ***
Felt_lonely    0.068724   0.001434  47.908   <2e-16 ***
Close_friends -0.005392   0.001823  -2.958   0.0031 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4021 on 56977 degrees of freedom
Multiple R-squared:  0.04063,   Adjusted R-squared:  0.04059 
F-statistic:  1206 on 2 and 56977 DF,  p-value: < 2.2e-16
Code
# Pearson's Correlation
y_hat2 <- lonely_fit2$fitted.values
y2 <- lonely_fit2$model$bullied_outside_school
print(cor.test(y_hat2, y2)$estimate)
    cor 
0.20156 
Code
plot(lonely_fit2)

Cyberbullying

Code
# Multiple Regression
lonely_fit3 <- lm(cyberbullied ~ Felt_lonely + Close_friends, data = bully)
summary(lonely_fit3)

Call:
lm(formula = cyberbullied ~ Felt_lonely + Close_friends, data = bully)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4346 -0.2639 -0.1858 -0.1078  0.8922 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.122406   0.005349  22.883  < 2e-16 ***
Felt_lonely    0.078040   0.001424  54.786  < 2e-16 ***
Close_friends -0.004877   0.001810  -2.694  0.00706 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3993 on 56977 degrees of freedom
Multiple R-squared:  0.05221,   Adjusted R-squared:  0.05217 
F-statistic:  1569 on 2 and 56977 DF,  p-value: < 2.2e-16
Code
# Pearson's Correlation
y_hat3 <- lonely_fit3$fitted.values
y3 <- lonely_fit3$model$cyberbullied
print(cor.test(y_hat3, y3)$estimate)
    cor 
0.22849 
Code
plot(lonely_fit3)

Hypothesis #2 Sex and Physical Bullying

Ha: Male students are more likely than female students to face physical abuse by bullies.

Code
bully$Sex <- as.character(bully$Sex)
bully$Physically_attacked <- as.character(bully$Physically_attacked)

abuse <- table(bully$Sex, bully$Physically_attacked)

chisq.test(abuse)

    Pearson's Chi-squared test

data:  abuse
X-squared = 125.09, df = 7, p-value < 2.2e-16
Code
table <- data.frame(with(bully, table(Sex,Physically_attacked)))

ggplot(table, aes(x=Sex,y=Freq, fill=Physically_attacked))+
  geom_bar(stat="identity",position="dodge")+
  scale_fill_discrete(name = "Physically_attacked",labels = c('0 times','1 time', "2 or 3 times", "4 or 5 times",  "6 or 7 times", "8 or 9 times", "10 or 11 times", "12 or more times")) +
  xlab("Sex (0 - Female and 1 - Male)") +
  ylab("Number of student responses")+
  ggtitle("How often were you physically attacked?")

Hypothesis #3 Size and Bullying

Ha: More female students who report bullying are targeted for being underweight, while male students who report bullying are targeted for being overweight.

Bullying on School Grounds

Code
bullyANOVA <- aov(bullied_at_school ~ overweight + underweight + Sex + overweight:underweight, data = bully)
print(summary(bullyANOVA))
               Df Sum Sq Mean Sq F value   Pr(>F)    
overweight      1      2   1.919   11.65 0.000642 ***
underweight     1      1   0.550    3.34 0.067614 .  
Sex             1     31  31.100  188.85  < 2e-16 ***
Residuals   56976   9383   0.165                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
plot(bullyANOVA)

Outside of School

Code
bullyANOVA2 <- aov(bullied_outside_school ~ overweight + underweight + Sex + overweight:underweight, data = bully)
print(summary(bullyANOVA2))
               Df Sum Sq Mean Sq F value Pr(>F)    
overweight      1      0   0.176   1.050  0.306    
underweight     1      0   0.120   0.716  0.398    
Sex             1     25  25.473 151.530 <2e-16 ***
Residuals   56976   9578   0.168                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
plot(bullyANOVA2)

Cyberbullying

Code
bullyANOVA3 <- aov(cyberbullied ~ overweight + underweight + Sex + overweight:underweight, data = bully)
print(summary(bullyANOVA3))
               Df Sum Sq Mean Sq  F value Pr(>F)    
overweight      1      0    0.03    0.182  0.670    
underweight     1      0    0.00    0.022  0.881    
Sex             1    180  179.63 1088.110 <2e-16 ***
Residuals   56976   9406    0.17                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
plot(bullyANOVA3)

Hypothesis #4 Physical Attacks and Missing School

Ha: Students who face more physical attacks on school grounds are more likely to miss school.

Code
attacks <- select(bully, c(Physically_attacked))
school <-  select(bully, c(missed_school))

attacks <- as.numeric(unlist(attacks))
school <- as.numeric(unlist(school))

missed <- t.test(school, attacks, var.equal = FALSE, alternative = "two.sided")
print(missed)

    Welch Two Sample t-test

data:  school and attacks
t = 20.562, df = 111036, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.1101395 0.1333495
sample estimates:
mean of x mean of y 
0.4818006 0.3600562 
Code
ggplot(bully, aes(x=Physically_attacked, fill=missed_school)) +
     geom_histogram(alpha=0.5, position="identity")+
     geom_vline(data=mu, aes(xintercept=mean, color=candidate), linetype="dashed")+
       xlab("Family income for people voted in 2016 election")
Error in fortify(data): object 'mu' not found

Footnotes

  1. https://www.stopbullying.gov/resources/facts↩︎

  2. http://uis.unesco.org/en/news/new-sdg-4-data-bullying↩︎

  3. https://www.ncbi.nlm.nih.gov/books/NBK390414/↩︎

  4. https://www.kaggle.com/datasets/leomartinelli/bullying-in-schools?datasetId=2952457↩︎

  5. https://en.wikipedia.org/wiki/Education_in_Argentina↩︎