My final project will be a further investigation on digital devices in schools that I have submitted as the final project for DACSS 601. I still explore the data from the survey “Programme for International Student Assessment” in 2018. In this assignment, I will propose my hypothesis, and present the descriptive statistics with minor changes base on my last project.
Attaching package: 'dbplyr'
The following objects are masked from 'package:dplyr':
ident, sql
Code
pisa <-read_csv('_data/CY07_MSU_SCH_QQQ.csv')
New names:
Rows: 21903 Columns: 198
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(8): CNT, CYC, NatCen, STRATUM, SUBNATIO, SC053D11TA, PRIVATESCH, VER_DAT dbl
(189): ...1, CNTRYID, CNTSCHID, Region, OECD, ADMINMODE, LANGTEST, SC001... lgl
(1): BOOKID
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Research Questions
My final project will probe into what factors contribute to the accessibility to and human resources’ support for digital devices in schools. Additionally, I will explore if there is a correlations between career guidance and digital devices? I will conduct this research based on the data “Programme for International Student Assessment” (PISA) collected by the The Organization for Economic Co-operation and Development (OECD) in 2018.
Hpyotheis
I propose that the size of urban population primarily contributes to the conditions of digital device. “OECD or Non-OECD” and “public or private schools” may be two cofounders, which is suppose to be incorporated into the regression analysis. Also, I hypothesize that the higher score a school report regarding career guidance, the higher score a school reports in terms of digital divices.
This original OECD PISA 2018 School Questionnaire Dataset is one part of PISA 2018 dataset with a focus on schools. It covers 80 countries and regions all over the world. The dataset documents 21,903 schools’ responses regarding 187 questions.After cleaning the data, the dataset includes 8 variables: CNT identifies countries. STRATUM identifies schools. OECD indicates if a school locates in a OECD country or not. Urban describes different conditions of urban communities where a school locates. Public_or_Private presents if a school is public or private. Career_Guidance demonstrates the score a school reports in terms of career guidance. Accessibility demonstrates the score a school reports in terms of accessibility to digital devices. Human_Resource_Support suggests the score a school reports in terms of human ressource support for digital devices.
After using the summary function and visualization, I have already show the descriptive statistics. A large number of NA stands out. I will figure out how to deal with them properly.
Code
summary(pisa_SC155)
CNT STRATUM OECD Urban
Length:21903 Length:21903 Min. :0.0000 Min. :1.000
Class :character Class :character 1st Qu.:0.0000 1st Qu.:2.000
Mode :character Mode :character Median :1.0000 Median :3.000
Mean :0.5171 Mean :3.007
3rd Qu.:1.0000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000
NA's :1363
Public_or_Private Career_Guidance Accessibility Human_Resource_Support
Min. :1.00 Min. :0.000 Min. :1.000 Min. :1.000
1st Qu.:1.00 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.286
Median :1.00 Median :1.000 Median :2.750 Median :2.714
Mean :1.19 Mean :1.518 Mean :2.674 Mean :2.658
3rd Qu.:1.00 3rd Qu.:2.000 3rd Qu.:3.250 3rd Qu.:3.000
Max. :2.00 Max. :4.000 Max. :4.000 Max. :4.000
NA's :2092 NA's :1499 NA's :1185 NA's :1236
---title: "Final Project Check 1"author: "Guanhua Tan"description: "Final Project Check 1"date: "March 4 2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - fpc1 - research question - desriptive statistics---My final project will be a further investigation on digital devices in schools that I have submitted as the final project for DACSS 601. I still explore the data from the survey "Programme for International Student Assessment" in 2018. In this assignment, I will propose my hypothesis, and present the descriptive statistics with minor changes base on my last project.```{r, echo=T}library(tidyverse)library(ggplot2)library(dbplyr)pisa <-read_csv('_data/CY07_MSU_SCH_QQQ.csv')```# Research QuestionsMy final project will probe into what factors contribute to the accessibility to and human resources' support for digital devices in schools. Additionally, I will explore if there is a correlations between career guidance and digital devices? I will conduct this research based on the data "Programme for International Student Assessment" (PISA) collected by the The Organization for Economic Co-operation and Development (OECD) in 2018.# HpyotheisI propose that the size of urban population primarily contributes to the conditions of digital device. "OECD or Non-OECD" and "public or private schools" may be two cofounders, which is suppose to be incorporated into the regression analysis. Also, I hypothesize that the higher score a school report regarding career guidance, the higher score a school reports in terms of digital divices.```{r, echo=TRUE, results='hide'}# create a data frame#view(pisa)# select related variablepisa_selected <-select(pisa,starts_with(c("SC001", "SC013", "SC016", "SC161","SC155")))pisa2018_joint <-cbind(pisa[, 1:12], pisa_selected)# pisa_SC155pisa2018_joint$Accessibility=rowMeans(pisa2018_joint[,c("SC155Q01HA","SC155Q02HA", "SC155Q03HA","SC155Q04HA")])pisa2018_joint$Human_Resource_Support=rowMeans(pisa2018_joint[ ,c("SC155Q05HA","SC155Q06HA", "SC155Q07HA","SC155Q08HA","SC155Q09HA", "SC155Q10HA", "SC155Q11HA")])pisa2018_joint$Career_Guidance=rowSums(pisa2018_joint[, c("SC161Q02SA","SC161Q03SA","SC161Q04SA","SC161Q04SA")])pisa_SC155 <- pisa2018_joint %>%select(CNT, STRATUM, OECD, Career_Guidance,Accessibility, Human_Resource_Support, SC001Q01TA, SC013Q01TA) %>%mutate(Urban=SC001Q01TA, Public_or_Private=SC013Q01TA) %>%select(-c(SC001Q01TA, SC013Q01TA)) %>%select(c(CNT,STRATUM,OECD,Urban, Public_or_Private,Career_Guidance,Accessibility,Human_Resource_Support))pisa_SC155```# Descriptive StatisticsThis original OECD PISA 2018 School Questionnaire Dataset is one part of PISA 2018 dataset with a focus on schools. It covers 80 countries and regions all over the world. The dataset documents 21,903 schools' responses regarding 187 questions.After cleaning the data, the dataset includes 8 variables: CNT identifies countries. STRATUM identifies schools. OECD indicates if a school locates in a OECD country or not. Urban describes different conditions of urban communities where a school locates. Public_or_Private presents if a school is public or private. Career_Guidance demonstrates the score a school reports in terms of career guidance. Accessibility demonstrates the score a school reports in terms of accessibility to digital devices. Human_Resource_Support suggests the score a school reports in terms of human ressource support for digital devices.After using the summary function and visualization, I have already show the descriptive statistics. A large number of NA stands out. I will figure out how to deal with them properly.```{r, echo=TRUE}summary(pisa_SC155)pisa_SC155_boxplot<-pisa_SC155 %>%select(STRATUM, Career_Guidance, Accessibility, Human_Resource_Support) %>%pivot_longer(cols=c(Career_Guidance, Accessibility, Human_Resource_Support), names_to ="Group", values_to ="Evaluation")ggplot(pisa_SC155_boxplot,aes(Evaluation, fill=Group))+stat_boxplot(geom ="errorbar", # Error barswidth =0.2)+geom_boxplot()+facet_wrap(~Group)+labs(title="Pisa2018 Evaluation")+coord_flip()```