hw1
Paritosh G
Template of course blog qmd file
Author

Paritosh G

Published

April 5, 2023

Research Question

Identifying the Factors which makes Students most vulnerable to be a victim of Bullying.

Calling the libraries

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.1     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
Code
library(stringr)
library(rmarkdown)
library(knitr)

Reading the data

Code
data <- read.csv("_data/Bullying_2018.csv", sep = ";")

Replacing the White Spaces with “NA”

Code
data[data == " "] <- NA

returning Sub-Strings of “years old” from Column “Custom Age” && ”time” , ”times”, ” or…” from Column ”Physically_ attacked”, “Physical_fighting”, “Close_friends” && “day”, “days”, ” ---to….”, “--or…” from the column Miss_school_no_permission.

Using both str_sub and sub

Code
data$Custom_Age <- str_sub(data$Custom_Age,1,2)
data$Physically_attacked <- sub(" .*", "", data$Physically_attacked)
data$Physical_fighting <- sub(" .*","",data$Physical_fighting)
data$Close_friends <- str_sub(data$Close_friends,1,2)
data$Miss_school_no_permission <- sub(" .*","",data$Miss_school_no_permission)

Columns Such as “Felt_lonely”, “Other_students_kind_and_helpful”, “Parents_understand_problems” have 5 level of text responses which are replaced into numeric as follows:

1) “Never” <-1,

2) “Rarely” <- 2,

3) “Sometimes” <- 3,

4) “Most of the time” <- 4,

5) “Always” <- 5.

Code
data[data == "Never"] <- 1
data[data == "Rarely"] <- 2
data[data == "Sometimes"] <- 3
data[data == "Most of the time"] <- 4
data[data == "Always"] <- 5

Replacing “YES” with 1 and “No” with 0 in the columns “Bullied_on_school_property_in_past_12_months” , “Bullied_not_on_school_property_in_past_12_months”, “Cyber_bullied_in_past_12_months” , “Most_of_the_time_or_always_felt_lonely”, “Missed_classes_or_school_without_permission”.

Code
data[data == "Yes"] <- 1
data[data == "No"] <- 0

Assigning data-types into Factors depending upon the requirement of the model

Code
data$Bullied_on_school_property_in_past_12_months <- as.factor(data$Bullied_on_school_property_in_past_12_months)
data$Bullied_not_on_school_property_in_past_12_months <- as.factor(data$Bullied_not_on_school_property_in_past_12_months)
data$Cyber_bullied_in_past_12_months <- as.factor(data$Cyber_bullied_in_past_12_months)
data$Sex <- as.factor(data$Sex)
data$Felt_lonely <- as.factor(data$Felt_lonely)
data$Other_students_kind_and_helpful <- as.factor(data$Other_students_kind_and_helpful)
data$Parents_understand_problems <- as.factor(data$Parents_understand_problems)
data$Most_of_the_time_or_always_felt_lonely <- as.factor(data$Most_of_the_time_or_always_felt_lonely)
data$Missed_classes_or_school_without_permission <- as.factor(data$Missed_classes_or_school_without_permission)

Assigning the data-types into integers depending upon the requirement of the model.

Code
data$Custom_Age <- as.integer(data$Custom_Age)
data$Physically_attacked <- as.integer(data$Physically_attacked)
data$Physical_fighting <- as.integer(data$Physical_fighting)
data$Close_friends <- as.integer(data$Close_friends)
data$Miss_school_no_permission <- as.integer(data$Miss_school_no_permission)

Deleting the columns which seems irrelevant to the model or they have high number of missing values

Code
data <- data[, -c(16:18)]

Deleting “NA” from columns 2 to 15

Code
data <- data[complete.cases(data),]

Coding the Logistic Regression Model

Null Hypothesis :- Factors such as:

1)Custom_Age

2) Sex

3)Degree to which the student was feeling lonely there are 5 levels in the dataset related to it

4) Number of Close friends the student has

5) Number of days the student missed school without permission there are 5 levels mentioned in the dataset related to it

6) Degree to which other students are kind and helpful

7) Degree to which parents understand their problems

8) Whether they felt lonely Most of the time or always.

Code
logistic <- glm(Bullied_on_school_property_in_past_12_months ~ Custom_Age + Sex + Felt_lonely + Close_friends + Miss_school_no_permission + Other_students_kind_and_helpful + Parents_understand_problems + Most_of_the_time_or_always_felt_lonely, family = binomial, data = data)
summary(logistic)

Call:
glm(formula = Bullied_on_school_property_in_past_12_months ~ 
    Custom_Age + Sex + Felt_lonely + Close_friends + Miss_school_no_permission + 
        Other_students_kind_and_helpful + Parents_understand_problems + 
        Most_of_the_time_or_always_felt_lonely, family = binomial, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3730  -0.7089  -0.5708  -0.4277   2.3740  

Coefficients: (1 not defined because of singularities)
                                         Estimate Std. Error z value Pr(>|z|)
(Intercept)                              0.473860   0.135779   3.490 0.000483
Custom_Age                              -0.134201   0.008496 -15.796  < 2e-16
SexMale                                 -0.018558   0.023605  -0.786 0.431765
Felt_lonely2                             0.356116   0.034118  10.438  < 2e-16
Felt_lonely3                             0.784856   0.033078  23.728  < 2e-16
Felt_lonely4                             1.226552   0.039477  31.070  < 2e-16
Felt_lonely5                             1.427658   0.048959  29.160  < 2e-16
Close_friends                           -0.048507   0.012657  -3.832 0.000127
Miss_school_no_permission                0.035849   0.005614   6.385 1.71e-10
Other_students_kind_and_helpful2        -0.055115   0.043260  -1.274 0.202643
Other_students_kind_and_helpful3        -0.267550   0.042871  -6.241 4.35e-10
Other_students_kind_and_helpful4        -0.544506   0.043769 -12.440  < 2e-16
Other_students_kind_and_helpful5        -0.707908   0.048992 -14.450  < 2e-16
Parents_understand_problems2             0.057328   0.034318   1.670 0.094825
Parents_understand_problems3             0.115074   0.035805   3.214 0.001310
Parents_understand_problems4             0.084336   0.037303   2.261 0.023770
Parents_understand_problems5             0.046950   0.035758   1.313 0.189177
Most_of_the_time_or_always_felt_lonely1        NA         NA      NA       NA
                                           
(Intercept)                             ***
Custom_Age                              ***
SexMale                                    
Felt_lonely2                            ***
Felt_lonely3                            ***
Felt_lonely4                            ***
Felt_lonely5                            ***
Close_friends                           ***
Miss_school_no_permission               ***
Other_students_kind_and_helpful2           
Other_students_kind_and_helpful3        ***
Other_students_kind_and_helpful4        ***
Other_students_kind_and_helpful5        ***
Parents_understand_problems2            .  
Parents_understand_problems3            ** 
Parents_understand_problems4            *  
Parents_understand_problems5               
Most_of_the_time_or_always_felt_lonely1    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 52207  on 50753  degrees of freedom
Residual deviance: 49317  on 50737  degrees of freedom
AIC: 49351

Number of Fisher Scoring iterations: 4

Model-2 deleting variable Sex from the previous variable

Code
logistic_2 <- glm(Bullied_on_school_property_in_past_12_months ~ Custom_Age  + Felt_lonely + Close_friends + Miss_school_no_permission + Other_students_kind_and_helpful + Parents_understand_problems + Most_of_the_time_or_always_felt_lonely, family = binomial, data = data)
summary(logistic_2)

Call:
glm(formula = Bullied_on_school_property_in_past_12_months ~ 
    Custom_Age + Felt_lonely + Close_friends + Miss_school_no_permission + 
        Other_students_kind_and_helpful + Parents_understand_problems + 
        Most_of_the_time_or_always_felt_lonely, family = binomial, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3705  -0.7091  -0.5708  -0.4275   2.3716  

Coefficients: (1 not defined because of singularities)
                                         Estimate Std. Error z value Pr(>|z|)
(Intercept)                              0.464436   0.135246   3.434 0.000595
Custom_Age                              -0.134425   0.008491 -15.831  < 2e-16
Felt_lonely2                             0.358841   0.033941  10.572  < 2e-16
Felt_lonely3                             0.790375   0.032326  24.450  < 2e-16
Felt_lonely4                             1.233638   0.038438  32.094  < 2e-16
Felt_lonely5                             1.434696   0.048137  29.804  < 2e-16
Close_friends                           -0.048548   0.012656  -3.836 0.000125
Miss_school_no_permission                0.035705   0.005612   6.363 1.98e-10
Other_students_kind_and_helpful2        -0.054129   0.043241  -1.252 0.210650
Other_students_kind_and_helpful3        -0.267190   0.042868  -6.233 4.58e-10
Other_students_kind_and_helpful4        -0.544039   0.043765 -12.431  < 2e-16
Other_students_kind_and_helpful5        -0.706932   0.048975 -14.434  < 2e-16
Parents_understand_problems2             0.057627   0.034315   1.679 0.093084
Parents_understand_problems3             0.115107   0.035805   3.215 0.001305
Parents_understand_problems4             0.084713   0.037299   2.271 0.023136
Parents_understand_problems5             0.047617   0.035746   1.332 0.182829
Most_of_the_time_or_always_felt_lonely1        NA         NA      NA       NA
                                           
(Intercept)                             ***
Custom_Age                              ***
Felt_lonely2                            ***
Felt_lonely3                            ***
Felt_lonely4                            ***
Felt_lonely5                            ***
Close_friends                           ***
Miss_school_no_permission               ***
Other_students_kind_and_helpful2           
Other_students_kind_and_helpful3        ***
Other_students_kind_and_helpful4        ***
Other_students_kind_and_helpful5        ***
Parents_understand_problems2            .  
Parents_understand_problems3            ** 
Parents_understand_problems4            *  
Parents_understand_problems5               
Most_of_the_time_or_always_felt_lonely1    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 52207  on 50753  degrees of freedom
Residual deviance: 49318  on 50738  degrees of freedom
AIC: 49350

Number of Fisher Scoring iterations: 4

Model-3 generating model to learn about the factors affecting cyber bullying

Code
logistic3 <- glm(Cyber_bullied_in_past_12_months ~ Custom_Age + Sex + Felt_lonely + Close_friends + Miss_school_no_permission + Other_students_kind_and_helpful + Parents_understand_problems + Most_of_the_time_or_always_felt_lonely, family = binomial, data = data)
summary(logistic3)

Call:
glm(formula = Cyber_bullied_in_past_12_months ~ Custom_Age + 
    Sex + Felt_lonely + Close_friends + Miss_school_no_permission + 
    Other_students_kind_and_helpful + Parents_understand_problems + 
    Most_of_the_time_or_always_felt_lonely, family = binomial, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4350  -0.7279  -0.5546  -0.4073   2.3254  

Coefficients: (1 not defined because of singularities)
                                         Estimate Std. Error z value Pr(>|z|)
(Intercept)                             -2.098113   0.136725 -15.345  < 2e-16
Custom_Age                               0.030872   0.008399   3.675 0.000237
SexMale                                 -0.461395   0.023888 -19.315  < 2e-16
Felt_lonely2                             0.455650   0.034863  13.070  < 2e-16
Felt_lonely3                             0.832488   0.033747  24.668  < 2e-16
Felt_lonely4                             1.264661   0.039855  31.732  < 2e-16
Felt_lonely5                             1.466739   0.049311  29.745  < 2e-16
Close_friends                            0.004034   0.012881   0.313 0.754136
Miss_school_no_permission                0.059858   0.005422  11.040  < 2e-16
Other_students_kind_and_helpful2        -0.052590   0.045070  -1.167 0.243270
Other_students_kind_and_helpful3        -0.167107   0.044545  -3.751 0.000176
Other_students_kind_and_helpful4        -0.205658   0.044745  -4.596 4.30e-06
Other_students_kind_and_helpful5        -0.361692   0.049508  -7.306 2.76e-13
Parents_understand_problems2             0.083632   0.033838   2.472 0.013455
Parents_understand_problems3             0.101362   0.035462   2.858 0.004259
Parents_understand_problems4            -0.029512   0.037313  -0.791 0.428983
Parents_understand_problems5            -0.126788   0.036233  -3.499 0.000467
Most_of_the_time_or_always_felt_lonely1        NA         NA      NA       NA
                                           
(Intercept)                             ***
Custom_Age                              ***
SexMale                                 ***
Felt_lonely2                            ***
Felt_lonely3                            ***
Felt_lonely4                            ***
Felt_lonely5                            ***
Close_friends                              
Miss_school_no_permission               ***
Other_students_kind_and_helpful2           
Other_students_kind_and_helpful3        ***
Other_students_kind_and_helpful4        ***
Other_students_kind_and_helpful5        ***
Parents_understand_problems2            *  
Parents_understand_problems3            ** 
Parents_understand_problems4               
Parents_understand_problems5            ***
Most_of_the_time_or_always_felt_lonely1    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 52664  on 50753  degrees of freedom
Residual deviance: 49297  on 50737  degrees of freedom
AIC: 49331

Number of Fisher Scoring iterations: 4
Code
#render("Check_in_1&2.qmd", output_file = "my_document.html", output_format = "html_document")