KMuhammad_HW4

DACSS 601 - HW4: Descriptive Statistics and Visualization

Kalimah Muhammad
2022-04-27

Data Set

For this project I changed my data set from the DOD Active Military Marital Data to this new source on school closures during the covid-19 pandemic. The data was procured from UNESCO Institute of Statistics data on COVID-19 Education Response (sourced below).

Context
On March 11,2020, the World Health Organization declared the novel coronavirus, Covid-19, a pandemic. Within 2 days, the then US President Donald Trump declared a nationwide emergency and by the end of the month, populations across the United States and the globe began entering lock down and social distancing. During the first few months of the pandemic, many businesses, institutions, and organizations across the globe reduced occupancy or closed their doors in an attempt to control the spread of the virus. Schools across the world made decisions on whether to continue in person learning or adopt distance learning practices. This project will review the status of school closures by country during 2020-2021 of the pandemic and what characteristics distinguished those who adopted the closures or not.

Content and Research Questions
The data set contains daily school closure status for 210 countries/ territories from 2/16/2020 to 3/31/2022. This results in over 162,750 observations over the course of 775 days.The data also provides static information such as the approximate number of enrolled students and teachers for pre-primary to secondary school as well as regional location, country economic level, and access to distance learning technologies.

My research questions include:
* How did the practice of school closures and re-openings unfold over the pandemic years of 2020 - 2021?
* What characteristics, if any, by geographic location, country income level, student population size, or access to distance learning modalities could be predictors of adopting similar measures for similar events in the future?

Data Set Variables
Categorical variables:
- Country ID = Country ISO Alpha-3 code
- Country = Country name (English)
- Income Level = World Bank country income groups (i.e. high income, upper middle income, lower middle income, and low income)
- Regional Name = Sustainable Development Goals regional groups (i.e. Africa (Sub-Saharan); Asia (Central and Southern); Asia (Eastern and South-eastern); Latin America and Caribbean; Northern America and Europe; Oceania; and Western Asia and Northern Africa)
- School Status = status of school at time of data collection (i.e. Academic break; Closed due to COVID-19; Fully open; Seasonal school closures; and Partially open)
- Distance learning modalities (TV) = Existence of distance learning modalities (TV) in the country
- Distance learning modalities (Radio) = Existence of distance learning modalities (Radio) in the country
- Distance learning modalities (Online) = Existence of distance learning modalities (Online) in the country
- Distance learning modalities (Global) = Existence of distance learning modalities (combination of TV+Radio+Online) in the country

Numeric variables:
- Date = Reference date
- Enrolment (Pre-Primary to Upper Secondary) = number of enrolled students in Pre-Primary to Upper Secondary school levels
- Teachers (Pre-Primary to Upper Secondary)= number of teachers for Pre-Primary to Upper Secondary school levels
- School Age Population (Pre-Primary to Upper Secondary)- number of school age population at Pre-Primary to Upper Secondary school levels
- Weeks partially open- total number of weeks partially open
- Weeks fully closed- total number of weeks fully closed

Exploratory Analysis and Descriptive Statistics

Numeric Variables
To begin, I created a table summarizing all of the numeric variables in the data set.

Show code
summary(unesco_fin)
      Date             Country ID          Country         
 Min.   :2020-02-16   Length:162750      Length:162750     
 1st Qu.:2020-08-27   Class :character   Class :character  
 Median :2021-03-09   Mode  :character   Mode  :character  
 Mean   :2021-03-09                                        
 3rd Qu.:2021-09-19                                        
 Max.   :2022-03-31                                        
 Regional Name      Income Level          Status         
 Length:162750      Length:162750      Length:162750     
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
 Enrolment (Pre-Primary to Upper Secondary)
 Min.   :        0                         
 1st Qu.:   208294                         
 Median :  1406005                         
 Mean   :  7340691                         
 3rd Qu.:  5360166                         
 Max.   :294893120                         
 Teachers (Pre-Primary to Upper Secondary)
 Min.   :       0                         
 1st Qu.:   10713                         
 Median :   80550                         
 Mean   :  370162                         
 3rd Qu.:  241546                         
 Max.   :15625021                         
 School Age Population (Pre-Primary to Upper Secondary)
 Min.   :        0                                     
 1st Qu.:   204277                                     
 Median :  1594522                                     
 Mean   :  8869652                                     
 3rd Qu.:  7045479                                     
 Max.   :368816440                                     
 Distance learning modalities (TV)
 Length:162750                    
 Class :character                 
 Mode  :character                 
                                  
                                  
                                  
 Distance learning modalities (Radio)
 Length:162750                       
 Class :character                    
 Mode  :character                    
                                     
                                     
                                     
 Distance learning modalities (Global)
 Length:162750                        
 Class :character                     
 Mode  :character                     
                                      
                                      
                                      
 Distance learning modalities (Online) Weeks fully closed
 Length:162750                         Min.   : 0.00     
 Class :character                      1st Qu.:10.00     
 Mode  :character                      Median :16.00     
                                       Mean   :19.74     
                                       3rd Qu.:27.00     
                                       Max.   :75.00     
 Weeks partially open
 Min.   : 0.00       
 1st Qu.: 6.00       
 Median :18.50       
 Mean   :20.85       
 3rd Qu.:30.00       
 Max.   :77.00       

Starting with the Date, we see the data was collected between 2/16/2020 and 3/31/2022 with the mid-point at approximately 3/09/2021. Next we observe a wide range of values for the total enrolled students and teacher population size. For total number of enrolled students, observations range from 0 in Svalbard, Faroe Islands, and Greenland to the mean of 7.3 million, similar to the average enrollment between Cameroon and Uzbekistan, to the maximum of 294.8 million students in India.

For teachers, observations range from 0 in the countries mentioned earlier to an average of approximately 370,000 similar to that of Nepal to the maximum of 15.6 million teachers in China. From here we can also add an estimate of the student to teacher ratio.

Show code
#add ratio of enrolled students to teachers
unesco_fin$Enrol_Teacher_Ratio <- unesco_fin$`Enrolment (Pre-Primary to Upper Secondary)`/ unesco_fin$`Teachers (Pre-Primary to Upper Secondary)`

#summary of enrollment to teacher ratio
summary(unesco_fin$Enrol_Teacher_Ratio)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  6.437  12.895  16.951  20.043  25.268  62.971    2325 

The above summary of enrolled students to teacher ratio shows an average of 20 students/ teacher across the globe. Note, the 2325 NAs were for the three countries/territories of Svalbard, Faroe Islands, and Greenland in which no value is present. We will come back to this ratio to determine if this value is reflective of the ability to socially distance and thus remain to turn to opening.

Finally, reviewing the results of the Weeks fully closed and Weeks partially open, we see similarities in the average amount of weeks closed and partially open (19.74 to 20.85). However, I looked at the mean, median, IQR, standard deviation and variance for the number of weeks schools were fully closed and partially open.

Show code
#Descriptive stats

#descriptive stats of weeks fully closed
unesco_fin%>%
  summarise(
    mean.closed= mean(`Weeks fully closed`, na.rm=TRUE),
    median.closed= median(`Weeks fully closed`, na.rm=TRUE),
   IQR.closed= IQR(`Weeks fully closed`, na.rm=TRUE),
    sd.closed= sd(`Weeks fully closed`),
     var.closed= var(`Weeks fully closed`))
# A tibble: 1 x 5
  mean.closed median.closed IQR.closed sd.closed var.closed
        <dbl>         <dbl>      <dbl>     <dbl>      <dbl>
1        19.7            16         17      14.6       214.
Show code
#descriptive stats of weeks partially closed
unesco_fin%>%
  summarise(
    mean.partopen= mean(`Weeks partially open`, na.rm=TRUE),
    median.partopen= median(`Weeks partially open`, na.rm=TRUE),
    IQR.partopen= IQR(`Weeks partially open`, na.rm=TRUE),
    sd.partopen= sd(`Weeks partially open`),
    var.partopen= var(`Weeks partially open`))
# A tibble: 1 x 5
  mean.partopen median.partopen IQR.partopen sd.partopen var.partopen
          <dbl>           <dbl>        <dbl>       <dbl>        <dbl>
1          20.8            18.5           24        17.3         300.

The first table shows the average number of weeks closed (19.7), middle value (16), interquartile range (17) as well as standard deviation (14.6) and variance (214).

The second table shows the average number of weeks partially open (20.9), middle value (18.5), interquartile range (24) as well as standard deviation (17.3) and variance (300).

At a glance, we see a higher standard deviation and variance among the observations in time spent partially open than those fully closed. This suggests higher variability in partially open responses. Note, the total number of weeks under observation were 110 weeks and 5 days over the 775 days reviewed. Thus an average of 20 weeks closed or partially open equates to about 18% of the 2 years observed.

Categorical Variables
Next we will look at regional and income level variable frequency.

The table below shows the distribution of countries by the Sustainable Development Goals regional groups.

Show code
#table of countries by regions
table(unesco_short$`Regional Name`)

            Africa (Sub-Saharan)      Asia (Central and Southern) 
                              48                               14 
Asia (Eastern and South-eastern)  Latin America and the Caribbean 
                              16                               41 
     Northern America and Europe                          Oceania 
                              50                               17 
Western Asia and Northern Africa 
                              24 

The next table is income level distribution by country based on the World Bank country income groups. Note there are 6 countries in which no data was captured: Anguilla, Cook Islands, Montserrat, Niue, Svalbard, and Tokelau.

Show code
#cross-tabulation of region by country 
table(unesco_short$`Income Level`)

                            High income          Low income 
                  6                  71                  29 
Lower middle income Upper middle income 
                 50                  54 

Combing the previous two tables, below is cross-tabulation of country count by income level and regional name.

Show code
#cross-tabulation of learning mods by country income level
xtabs(~`Regional Name` + `Income Level`, unesco_short)
                                  Income Level
Regional Name                         High income Low income
  Africa (Sub-Saharan)              0           2         22
  Asia (Central and Southern)       0           0          2
  Asia (Eastern and South-eastern)  0           4          1
  Latin America and the Caribbean   2          14          1
  Northern America and Europe       1          39          0
  Oceania                           3           4          0
  Western Asia and Northern Africa  0           8          3
                                  Income Level
Regional Name                      Lower middle income
  Africa (Sub-Saharan)                              19
  Asia (Central and Southern)                        8
  Asia (Eastern and South-eastern)                   7
  Latin America and the Caribbean                    4
  Northern America and Europe                        2
  Oceania                                            5
  Western Asia and Northern Africa                   5
                                  Income Level
Regional Name                      Upper middle income
  Africa (Sub-Saharan)                               5
  Asia (Central and Southern)                        4
  Asia (Eastern and South-eastern)                   4
  Latin America and the Caribbean                   20
  Northern America and Europe                        8
  Oceania                                            5
  Western Asia and Northern Africa                   8

I will come back to this table when we review the how, or if, income level and region are potential indicators of adopting school closure practices in future public health crisis.

Average number of weeks partially open/ fully closed by region

Show code
#average number of weeks fully closed and partially open by region
unesco_short %>%
  group_by(`Regional Name`) %>%
  select(starts_with("Weeks")) %>%
  summarize_all(mean, na.rm = TRUE)
# A tibble: 7 x 3
  `Regional Name`                  `Weeks fully clo~` `Weeks partial~`
  <chr>                                         <dbl>            <dbl>
1 Africa (Sub-Saharan)                          18.1             13.3 
2 Asia (Central and Southern)                   24.4             27.8 
3 Asia (Eastern and South-eastern)              24.4             30.6 
4 Latin America and the Caribbean               29.6             32.3 
5 Northern America and Europe                   12.4             18   
6 Oceania                                        7.12             6.24
7 Western Asia and Northern Africa              24.6             22.0 

Interestingly, the regions with the longest time fully closed as well as partially open are: Asia (Central and Southern), Asia (Eastern and South-eastern), Latin America and the Caribbean, and Western Asia and Northern Africa. Oceania experienced the least amount of school disruption. This may be due to the geographical isolation limiting the spread of the virus.

Visualizations

Total number of weeks fully closed by region and country

This series of graphics provides a detailed visual representation of the distribution of weeks closed by country and region. The aim is to quickly identify outliers and trends within each region.

Show code
unesco_short %>%
  filter(`Regional Name` == "Africa (Sub-Saharan)")%>%
  ggplot(aes(`Weeks fully closed`, Country))+
  geom_col()+
  labs(title = "Region: Africa (Sub-Saharan)")
Show code
unesco_short %>%
  filter(`Regional Name` == "Asia (Central and Southern)")%>%
  ggplot(aes(`Weeks fully closed`, Country))+
  geom_col()+
  labs(title = "Region: Asia (Central and Southern)")
Show code
unesco_short %>%
  filter(`Regional Name` == "Asia (Eastern and South-eastern)")%>%
  ggplot(aes(`Weeks fully closed`, Country))+
  geom_col()+
  labs(title = "Region: Asia (Eastern and South-eastern)")
Show code
unesco_short %>%
  filter(`Regional Name` == "Latin America and the Caribbean")%>%
  ggplot(aes(`Weeks fully closed`, Country))+
  geom_col()+
  labs(title = "Region: Latin America and the Caribbean")
Show code
unesco_short %>%
  filter(`Regional Name` == "Northern America and Europe")%>%
  ggplot(aes(`Weeks fully closed`, Country))+
  geom_col()+
  labs(title = "Region: Northern America and Europe")
Show code
unesco_short %>%
  filter(`Regional Name` == "Oceania")%>%
  ggplot(aes(`Weeks fully closed`, Country))+
  geom_col()+
  labs(title = "Region: Oceania")
Show code
unesco_short %>%
  filter(`Regional Name` == "Western Asia and Northern Africa")%>%
  ggplot(aes(`Weeks fully closed`, Country))+
  geom_col()+
  labs(title = "Region: Western Asia and Northern Africa")

Generally, we can see regions responded with similar thresholds of closures with the exception of an outlier or two in the group. This is evident in Fiji of the Oceanic region, Uganda in Sub-Saharan Africa, and Bangladesh in Central/Southern Asia among others.There are also several observations where the weeks fully close total zero or no data (i.e. the United States, Sweden, Tajikistan, Nicaragua, and Burundi.

Limitations/Next Steps: I’d like to see all the countries on plot in descending order of weeks fully closed with the bar colors distinguished by region. I think this will show a full-scale comparison of school closures. I also need to review the specifics of the zero/ no data locations.

Access to Distance Learning

This section of plots summarizes the students’ access to distance learning modalities by country income level and number of weeks fully closed.

Summary of distance learning technology access by country.

The plot shows the country income level distribution by types of distance learning modes. The aim is to uncover if there are trends in income and the type and number of modalities available.

Show code
#bar chart of countries by distance learning mods
unesco_short %>%
  ggplot(aes(`Distance learning modalities (Global)`, fill= `Income Level`))+
  geom_bar(position = "stack")+
  scale_fill_brewer(palette = "BuPu")+
  labs(y= "No. of Countries", title = "Count of Countries by Distance Learning Modalities")+
   guides(x = guide_axis(n.dodge = 2))
Show code
#cross-tabulation of learning mods by country income level
xtabs(~`Distance learning modalities (Global)` + `Income Level`, unesco_short)
                                     Income Level
Distance learning modalities (Global)    High income Low income
                  None                 6          14          8
                  Online               0          17          0
                  Online + Radio       0           0          1
                  Online + TV          0          33          3
                  Radio                0           0          5
                  TV                   0           1          1
                  TV + Online + Radio  0           6          9
                  TV + Radio           0           0          2
                                     Income Level
Distance learning modalities (Global) Lower middle income
                  None                                  2
                  Online                                2
                  Online + Radio                        3
                  Online + TV                          16
                  Radio                                 0
                  TV                                    2
                  TV + Online + Radio                  20
                  TV + Radio                            5
                                     Income Level
Distance learning modalities (Global) Upper middle income
                  None                                  4
                  Online                                4
                  Online + Radio                        2
                  Online + TV                          23
                  Radio                                 0
                  TV                                    1
                  TV + Online + Radio                  20
                  TV + Radio                            0

Here we see lower and upper-middle income countries gravitating towards Online/TV and Online/TV/Radio modalities. These two modes appear the most popular among countries. Interestingly, we see high income countries have the highest representative of using no modality or online only, 19.7 and 23.4% of their respective total. Radio appears to be a tool used more in low and low-middle income countries.

Limitations/Next Steps: I expected to see rising incomes correspond to more access; however I will need to investigate the countries choosing none. Appropriate levels of distance learning access could impact a communities ability to use this option when in-person learning is halted.

Distribution of Distance Learning Access by Weeks Fully Closed

The goal here is to see if there is a relationship between types of access and the amount of weeks a school system is closed.

Show code
#Weeks fully closed by distance learning modality
unesco_short %>%
  ggplot(aes(`Distance learning modalities (Global)`, `Weeks fully closed`))+
  geom_boxplot()+
  labs(title = "No. of Weeks Fully Closed by Distance Learning Modalities")+
   guides(x = guide_axis(n.dodge = 2))

Overall we see the majority of the distribution between the 10 - 40 weeks regardless of the technology available. Locations with none had the fewest weeks fully closed followed by those with Radio only. Alternatively, those with TV+Online+Radio saw longer school closures (especially among the outliers).

Limitations/ Next Steps: I think adding the mean could be helpful to the reader.

Finally for Blog#5, I will add in the time element to review changes over the course of 2 years.


Source UNESCO map on school closures [https://en.unesco.org/covid19/educationresponse] and UIS, March 2022 [http://data.uis.unesco.org]

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Muhammad (2022, April 27). Data Analytics and Computational Social Science: KMuhammad_HW4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkmuhamma895068/

BibTeX citation

@misc{muhammad2022kmuhammad_hw4,
  author = {Muhammad, Kalimah},
  title = {Data Analytics and Computational Social Science: KMuhammad_HW4},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkmuhamma895068/},
  year = {2022}
}