HW-4 | Descriptive Statistics and Few graphs

Statistical Analysis of Indian Education System

Niharika Pola
2022-05-04

This project analyzes 7 dataframes from 2013-2016 related to Indian Education System extracted from Indian Government’s Data Management website - data.gov.in. The main objective of the project is to study the impact on dropout ratio of schools based on the facilites (access to computers, sanitation and electricity) across various levels(primary, upper-primary, secondary and higher secondary) in all states of the country. I also aim to analyze the correlation between all the dataframes linked to drpout ratio.

Loading the packages

Reading Dataframe-1 | Gross Enrollment Ratio from 2013-2016 across all Indian States

gross_enrollment_ratio <- read_csv("601 Major Project/gross-enrollment-ratio.csv")
dim(gross_enrollment_ratio)
[1] 110  14
View(gross_enrollment_ratio)
head(gross_enrollment_ratio)
# A tibble: 6 x 14
  State_UT              Year  Primary_Boys Primary_Girls Primary_Total
  <chr>                 <chr>        <dbl>         <dbl>         <dbl>
1 Andaman & Nicobar Is~ 2013~         95.9          92.0          93.9
2 Andhra Pradesh        2013~         96.6          96.9          96.7
3 Arunachal Pradesh     2013~        129.          128.          128. 
4 Assam                 2013~        112.          115.          113. 
5 Bihar                 2013~         95.0         101.           98.0
6 Chandigarh            2013~         88.4          96.1          91.8
# ... with 9 more variables: Upper_Primary_Boys <dbl>,
#   Upper_Primary_Girls <dbl>, Upper_Primary_Total <dbl>,
#   Secondary_Boys <dbl>, Secondary_Girls <dbl>,
#   Secondary_Total <dbl>, Higher_Secondary_Boys <chr>,
#   Higher_Secondary_Girls <chr>, Higher_Secondary_Total <chr>
colnames(gross_enrollment_ratio)
 [1] "State_UT"               "Year"                  
 [3] "Primary_Boys"           "Primary_Girls"         
 [5] "Primary_Total"          "Upper_Primary_Boys"    
 [7] "Upper_Primary_Girls"    "Upper_Primary_Total"   
 [9] "Secondary_Boys"         "Secondary_Girls"       
[11] "Secondary_Total"        "Higher_Secondary_Boys" 
[13] "Higher_Secondary_Girls" "Higher_Secondary_Total"

Datatypes of each column

str(gross_enrollment_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT              : chr [1:110] "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
 $ Year                  : chr [1:110] "2013-14" "2013-14" "2013-14" "2013-14" ...
 $ Primary_Boys          : num [1:110] 95.9 96.6 129.1 111.8 95 ...
 $ Primary_Girls         : num [1:110] 92 96.9 127.8 115.2 101.2 ...
 $ Primary_Total         : num [1:110] 93.9 96.7 128.5 113.4 98 ...
 $ Upper_Primary_Boys    : num [1:110] 94.7 82.8 112.6 87.8 80.6 ...
 $ Upper_Primary_Girls   : num [1:110] 89 84.4 115.3 98.7 94.9 ...
 $ Upper_Primary_Total   : num [1:110] 91.8 83.6 113.9 93.1 87.2 ...
 $ Secondary_Boys        : num [1:110] 102.9 73.8 88.4 65.6 57.7 ...
 $ Secondary_Girls       : num [1:110] 97.4 76.8 84.9 77.2 63 ...
 $ Secondary_Total       : num [1:110] 100.2 75.2 86.7 71.2 60.1 ...
 $ Higher_Secondary_Boys : chr [1:110] "105.4" "59.83" "65.16" "31.78" ...
 $ Higher_Secondary_Girls: chr [1:110] "96.61" "60.83" "65.38" "34.27" ...
 $ Higher_Secondary_Total: chr [1:110] "101.28" "60.3" "65.27" "32.94" ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   Year = col_character(),
  ..   Primary_Boys = col_double(),
  ..   Primary_Girls = col_double(),
  ..   Primary_Total = col_double(),
  ..   Upper_Primary_Boys = col_double(),
  ..   Upper_Primary_Girls = col_double(),
  ..   Upper_Primary_Total = col_double(),
  ..   Secondary_Boys = col_double(),
  ..   Secondary_Girls = col_double(),
  ..   Secondary_Total = col_double(),
  ..   Higher_Secondary_Boys = col_character(),
  ..   Higher_Secondary_Girls = col_character(),
  ..   Higher_Secondary_Total = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

Tidying the data

gross_enrollment_ratio[ gross_enrollment_ratio == "NR" ] <- NA
gross_enrollment_ratio[ gross_enrollment_ratio == "@" ] <- NA
ger1 <- data.frame(gross_enrollment_ratio)
ger <- na.exclude(ger1)
ger$Higher_Secondary_Boys = as.numeric(ger$Higher_Secondary_Boys)
ger$Higher_Secondary_Girls = as.numeric(ger$Higher_Secondary_Girls)
ger$Higher_Secondary_Total = as.numeric(ger$Higher_Secondary_Total)

str(ger)
'data.frame':   108 obs. of  14 variables:
 $ State_UT              : chr  "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
 $ Year                  : chr  "2013-14" "2013-14" "2013-14" "2013-14" ...
 $ Primary_Boys          : num  95.9 96.6 129.1 111.8 95 ...
 $ Primary_Girls         : num  92 96.9 127.8 115.2 101.2 ...
 $ Primary_Total         : num  93.9 96.7 128.5 113.4 98 ...
 $ Upper_Primary_Boys    : num  94.7 82.8 112.6 87.8 80.6 ...
 $ Upper_Primary_Girls   : num  89 84.4 115.3 98.7 94.9 ...
 $ Upper_Primary_Total   : num  91.8 83.6 113.9 93.1 87.2 ...
 $ Secondary_Boys        : num  102.9 73.8 88.4 65.6 57.7 ...
 $ Secondary_Girls       : num  97.4 76.8 84.9 77.2 63 ...
 $ Secondary_Total       : num  100.2 75.2 86.7 71.2 60.1 ...
 $ Higher_Secondary_Boys : num  105.4 59.8 65.2 31.8 23.3 ...
 $ Higher_Secondary_Girls: num  96.6 60.8 65.4 34.3 24.2 ...
 $ Higher_Secondary_Total: num  101.3 60.3 65.3 32.9 23.7 ...
 - attr(*, "na.action")= 'exclude' Named int [1:2] 26 99
  ..- attr(*, "names")= chr [1:2] "26" "99"
all_india_ger <- filter(ger,  State_UT=="All India") %>% 
  arrange(Year)

plotting All India girls enrollment ratio

all_india_ger_girls <- select(all_india_ger,Year, ends_with("girls")) 
head(all_india_ger_girls)
     Year Primary_Girls Upper_Primary_Girls Secondary_Girls
1 2013-14        102.65               92.75           76.47
2 2014-15        101.43               95.29           78.94
3 2015-16        100.69               97.57           80.97
  Higher_Secondary_Girls
1                  51.58
2                  53.81
3                  56.41
  fig1 <- pivot_longer(all_india_ger_girls, c(Primary_Girls, Upper_Primary_Girls, Secondary_Girls, Higher_Secondary_Girls), names_to = "Education_Level", values_to = "GER") 
  ggplot(fig1, aes(x=Year, y=GER, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "fig1: Gross Enrollment Ratio of Girls in India") +  geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black")

Girls who join Primary School have exceeded 100 GER which means the students also include who are repeating a grade, or those who enrolled late and are older than their classmates, or those who have advanced quickly and are younger than their classmates. This is a positive sign in a developing country like India. However, the ratio has been consistently decreasing over the years.

Plotting All India boys enrollment ratio
all_india_ger_boys <- select(all_india_ger, Year, ends_with("boys"))
head(all_india_ger_boys)
     Year Primary_Boys Upper_Primary_Boys Secondary_Boys
1 2013-14       100.20              86.31          76.80
2 2014-15        98.85              87.71          78.13
3 2015-16        97.87              88.72          79.16
  Higher_Secondary_Boys
1                 52.77
2                 54.57
3                 55.95
  fig2 <- pivot_longer(all_india_ger_boys, c(Primary_Boys, Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys), names_to = "Education_Level", values_to = "GER") 
  ggplot(fig2, aes(x=Year, y=GER, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "fig2: Gross Enrollment Ratio of boys in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black")

The trends for Primary Schools are the same as girls. Again, if we could look further into data about the GER, we could gather some important information as to why there’s a decrease in GER for Girls and Boys. Compared to Girls, the GER of Boys in Upper Primary is very low. This is a negative sign because this might affect the enrollments of Secondary and Higher Secondary Schools among boys.

Plotting All India total enrollement ratio
all_india_ger_total <- select(all_india_ger, Year, ends_with("Total"))
head(all_india_ger_total)
     Year Primary_Total Upper_Primary_Total Secondary_Total
1 2013-14        101.36               89.33           76.64
2 2014-15        100.08               91.24           78.51
3 2015-16         99.21               92.81           80.01
  Higher_Secondary_Total
1                  52.21
2                  54.21
3                  56.16
fig3 <- pivot_longer(all_india_ger_total, c(Primary_Total, Upper_Primary_Total, Secondary_Total, Higher_Secondary_Total), names_to = "Education_Level", values_to = "GER") 
  ggplot(fig3, aes(x=Year, y=GER, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "fig3 : Gross Enrollment Ratio total in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black")

States ger data
states_ger <- filter(ger,  State_UT != "All India") 
head(states_ger)
                   State_UT    Year Primary_Boys Primary_Girls
1 Andaman & Nicobar Islands 2013-14        95.88         91.97
2            Andhra Pradesh 2013-14        96.62         96.87
3         Arunachal Pradesh 2013-14       129.12        127.77
4                     Assam 2013-14       111.77        115.16
5                     Bihar 2013-14        95.03        101.15
6                Chandigarh 2013-14        88.42         96.09
  Primary_Total Upper_Primary_Boys Upper_Primary_Girls
1         93.93              94.70               88.98
2         96.74              82.81               84.38
3        128.46             112.64              115.27
4        113.43              87.85               98.69
5         97.96              80.60               94.92
6         91.85              99.93              103.02
  Upper_Primary_Total Secondary_Boys Secondary_Girls Secondary_Total
1               91.83         102.89           97.36          100.16
2               83.57          73.76           76.77           75.20
3              113.94          88.37           84.89           86.65
4               93.13          65.60           77.20           71.21
5               87.24          57.66           62.96           60.08
6              101.27          92.08           92.16           92.11
  Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
1                105.40                  96.61                 101.28
2                 59.83                  60.83                  60.30
3                 65.16                  65.38                  65.27
4                 31.78                  34.27                  32.94
5                 23.33                  24.17                  23.70
6                 90.50                  92.88                  91.49
library(ggmap)
register_google(key = "AIzaSyDc2lDTQRLgvlGtdiZM6hkShq0fW_wv4-0")
coordinates <- geocode(states_ger$State_UT)
plot <- merge(states_ger,coordinates)
map <- get_map(location = 'India', zoom = 4, maptype= 'terrain', scale = "auto")
ggmap(map, fullpage= TRUE)
map
1280x1280 terrain map image from Google Maps. 
See ?ggmap to plot it.
a <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Primary_Boys))
b <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Upper_Primary_Boys))
c <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Secondary_Boys))
d <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Higher_Secondary_Boys))
cowplot::plot_grid(a, b,c,d)

a <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Primary_Girls))
b <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Upper_Primary_Girls))
c <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Secondary_Girls))
d <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Higher_Secondary_Girls))
cowplot::plot_grid(a, b,c,d)

states_ger %>% 
  select(Year, State_UT, Primary_Boys,Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys) %>% 
  group_by(Year) %>% 
  summarise(avg_pb=mean(Primary_Boys), avg_upb=mean(Upper_Primary_Boys), avg_sb=mean(Secondary_Boys), avg_hsb=mean(Higher_Secondary_Boys))
# A tibble: 3 x 5
  Year    avg_pb avg_upb avg_sb avg_hsb
  <chr>    <dbl>   <dbl>  <dbl>   <dbl>
1 2013-14   105.    96.9   87.2    60.0
2 2014-15   102.    97.0   88.0    60.4
3 2015-16   100.    98.1   86.9    58.2
states_ger %>% 
  select(Year, Primary_Girls,Upper_Primary_Girls, Secondary_Girls, Higher_Secondary_Girls) %>% 
  group_by(Year) %>% 
  summarise(avg_pb=mean(Primary_Girls), avg_upb=mean(Upper_Primary_Girls), avg_sb=mean(Secondary_Girls), avg_hsb=mean(Higher_Secondary_Girls))
# A tibble: 3 x 5
  Year    avg_pb avg_upb avg_sb avg_hsb
  <chr>    <dbl>   <dbl>  <dbl>   <dbl>
1 2013-14   106.    99.8   88.0    60.5
2 2014-15   103.   102.    89.6    62.2
3 2015-16   101.   104.    89.4    61.8

Reading Dataframe-2 | Dropout Ratio across all Indian States from 2013-2016

dropout_ratio <- read_csv("601 Major Project/dropout-ratio.csv")
View(dropout_ratio)
head(dropout_ratio)
# A tibble: 6 x 14
  State_UT       year    Primary_Boys Primary_Girls Primary_Total
  <chr>          <chr>   <chr>        <chr>         <chr>        
1 A & N Islands  2012-13 0.83         0.51          0.68         
2 A & N Islands  2013-14 1.35         1.06          1.21         
3 A & N Islands  2014-15 0.47         0.55          0.51         
4 Andhra Pradesh 2012-13 3.3          3.05          3.18         
5 Andhra Pradesh 2013-14 4.31         4.39          4.35         
6 Andhra Pradesh 2014-15 6.57         6.89          6.72         
# ... with 9 more variables: `Upper Primary_Boys` <chr>,
#   `Upper Primary_Girls` <chr>, `Upper Primary_Total` <chr>,
#   `Secondary _Boys` <chr>, `Secondary _Girls` <chr>,
#   `Secondary _Total` <chr>, HrSecondary_Boys <chr>,
#   HrSecondary_Girls <chr>, HrSecondary_Total <chr>
colnames(dropout_ratio)
 [1] "State_UT"            "year"                "Primary_Boys"       
 [4] "Primary_Girls"       "Primary_Total"       "Upper Primary_Boys" 
 [7] "Upper Primary_Girls" "Upper Primary_Total" "Secondary _Boys"    
[10] "Secondary _Girls"    "Secondary _Total"    "HrSecondary_Boys"   
[13] "HrSecondary_Girls"   "HrSecondary_Total"  

Datatype of each column

str(dropout_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT           : chr [1:110] "A & N Islands" "A & N Islands" "A & N Islands" "Andhra Pradesh" ...
 $ year               : chr [1:110] "2012-13" "2013-14" "2014-15" "2012-13" ...
 $ Primary_Boys       : chr [1:110] "0.83" "1.35" "0.47" "3.3" ...
 $ Primary_Girls      : chr [1:110] "0.51" "1.06" "0.55" "3.05" ...
 $ Primary_Total      : chr [1:110] "0.68" "1.21" "0.51" "3.18" ...
 $ Upper Primary_Boys : chr [1:110] "Uppe_r_Primary" "NR" "1.44" "3.21" ...
 $ Upper Primary_Girls: chr [1:110] "1.09" "1.54" "1.95" "3.51" ...
 $ Upper Primary_Total: chr [1:110] "1.23" "0.51" "1.69" "3.36" ...
 $ Secondary _Boys    : chr [1:110] "5.57" "8.36" "11.47" "12.21" ...
 $ Secondary _Girls   : chr [1:110] "5.55" "5.98" "8.16" "13.25" ...
 $ Secondary _Total   : chr [1:110] "5.56" "7.2" "9.87" "12.72" ...
 $ HrSecondary_Boys   : chr [1:110] "17.66" "18.94" "21.05" "2.66" ...
 $ HrSecondary_Girls  : chr [1:110] "10.15" "12.2" "12.21" "NR" ...
 $ HrSecondary_Total  : chr [1:110] "14.14" "15.87" "16.93" "0.35" ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Boys = col_character(),
  ..   Primary_Girls = col_character(),
  ..   Primary_Total = col_character(),
  ..   `Upper Primary_Boys` = col_character(),
  ..   `Upper Primary_Girls` = col_character(),
  ..   `Upper Primary_Total` = col_character(),
  ..   `Secondary _Boys` = col_character(),
  ..   `Secondary _Girls` = col_character(),
  ..   `Secondary _Total` = col_character(),
  ..   HrSecondary_Boys = col_character(),
  ..   HrSecondary_Girls = col_character(),
  ..   HrSecondary_Total = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

Tidying the data

library(janitor)
dropout_ratio <- clean_names(dropout_ratio)
View(dropout_ratio)
dim(dropout_ratio)
[1] 110  14
dropout_ratio[ dropout_ratio == "NR" ] <- NA
#dropout_ratio[ dropout_ratio == "upper_primary_boys" ] <- NA
dropout_ratio[ dropout_ratio == "Uppe_r_Primary" ] <- NA
dropout_ratio <- data.frame(dropout_ratio)
dropout_ratio <- na.exclude(dropout_ratio)
View(dropout_ratio)
dim(dropout_ratio)
[1] 55 14
dropout_ratio$primary_boys = as.numeric(dropout_ratio$primary_boys)
dropout_ratio$primary_girls = as.numeric(dropout_ratio$primary_girls)
dropout_ratio$primary_total = as.numeric(dropout_ratio$primary_total)
dropout_ratio$upper_primary_boys = as.numeric(dropout_ratio$upper_primary_boys)
dropout_ratio$upper_primary_girls = as.numeric(dropout_ratio$upper_primary_girls)
dropout_ratio$upper_primary_total = as.numeric(dropout_ratio$upper_primary_total)
dropout_ratio$secondary_boys = as.numeric(dropout_ratio$secondary_boys)
dropout_ratio$secondary_girls = as.numeric(dropout_ratio$secondary_girls)
dropout_ratio$secondary_total = as.numeric(dropout_ratio$secondary_total)
dropout_ratio$hr_secondary_boys = as.numeric(dropout_ratio$hr_secondary_boys)
dropout_ratio$hr_secondary_girls = as.numeric(dropout_ratio$hr_secondary_girls)
dropout_ratio$hr_secondary_total = as.numeric(dropout_ratio$hr_secondary_total)
View(dropout_ratio)
str(dropout_ratio)
'data.frame':   55 obs. of  14 variables:
 $ state_ut           : chr  "A & N Islands" "Andhra Pradesh" "Arunachal  Pradesh" "Arunachal Pradesh" ...
 $ year               : chr  "2014-15" "2013-14" "2013-14" "2012-13" ...
 $ primary_boys       : num  0.47 4.31 11.54 15.84 11.51 ...
 $ primary_girls      : num  0.55 4.39 10.22 14.44 10.09 ...
 $ primary_total      : num  0.51 4.35 10.89 15.16 10.82 ...
 $ upper_primary_boys : num  1.44 3.46 4.44 5.86 5.31 7.89 7.6 6.47 3.31 3.7 ...
 $ upper_primary_girls: num  1.95 4.12 6.74 9.06 8.08 6.55 6.54 5.22 5.09 4.4 ...
 $ upper_primary_total: num  1.69 3.78 5.59 7.47 6.71 7.2 7.05 5.85 4.13 4.02 ...
 $ secondary_boys     : num  11.5 11.9 16.1 14 18.3 ...
 $ secondary_girls    : num  8.16 13.37 12.75 11.77 15.81 ...
 $ secondary_total    : num  9.87 12.65 14.49 12.93 17.11 ...
 $ hr_secondary_boys  : num  21.05 12.65 18.57 7.85 19.37 ...
 $ hr_secondary_girls : num  12.21 10.85 15.49 2.14 17.44 ...
 $ hr_secondary_total : num  16.93 11.79 17.07 5.11 18.42 ...
 - attr(*, "na.action")= 'exclude' Named int [1:55] 1 2 4 6 12 13 14 15 16 17 ...
  ..- attr(*, "names")= chr [1:55] "1" "2" "4" "6" ...
all_india_drop <- filter(dropout_ratio, state_ut=="All India") 
dim(all_india_drop)
[1]  1 14
fig4 <- pivot_longer(all_india_drop, c(primary_boys, primary_girls, primary_total, upper_primary_boys, upper_primary_girls, upper_primary_total, secondary_boys, secondary_girls, secondary_total, hr_secondary_girls, hr_secondary_boys, hr_secondary_total), names_to = "EducationLevel")
ggplot(fig4, aes(x=value, y=EducationLevel, fill=value)) + geom_boxplot(color="red") + geom_text(aes(label=value), size=5) + labs(title = "fig4: All India Dropout Ratio - Education Wise ") 

Reading Dataframe-3 | Percentage of Schools with access to computers

percentage_of_schools_with_comps <- read_csv("601 Major Project/percentage-of-schools-with-comps.csv")
View(percentage_of_schools_with_comps)
head(percentage_of_schools_with_comps)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         30.4             73.7             89.7
2 Andaman & Nico~ 2014~         30.9             76.5             92.1
3 Andaman & Nico~ 2015~         28.4             78.6             92.5
4 Andhra Pradesh  2013~         12.7             42.7             87.0
5 Andhra Pradesh  2014~         10.3             44.2             88.5
6 Andhra Pradesh  2015~         11.5             44.8             89.5
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_comps)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(percentage_of_schools_with_comps)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 30.4 30.9 28.4 12.7 10.3 ...
 $ Primary_with_U_Primary          : num [1:110] 73.7 76.5 78.6 42.7 44.1 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 89.7 92.1 92.5 87 88.5 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 45.5 50 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 94.7 94.7 17.1 62.2 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 97.9 100 100 68.2 68.4 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 73.2 76.6 ...
 $ Sec_Only                        : num [1:110] 0 0 0 60 71 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 33.3 66.7 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 19.3 41.6 ...
 $ All Schools                     : num [1:110] 53.1 57.2 57 29.6 28.1 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-4 | Percentage of Schools with Electricity

percentage_of_schools_with_electricity <- read_csv("601 Major Project/percentage-of-schools-with-electricity.csv")
View(percentage_of_schools_with_electricity)
head(percentage_of_schools_with_electricity)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         82.4             96.0            100  
2 Andaman & Nico~ 2014~         80.7             96.3            100  
3 Andaman & Nico~ 2015~         82.1             97.6            100  
4 Andhra Pradesh  2013~         87.7             93.6             99.3
5 Andhra Pradesh  2014~         91.1             94.7            100  
6 Andhra Pradesh  2015~         91.6             95.6            100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_electricity)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(percentage_of_schools_with_electricity)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 82.4 80.7 82.1 87.7 91.1 ...
 $ Primary_with_U_Primary          : num [1:110] 96 96.3 97.6 93.6 94.7 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.3 100 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 100 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 67.5 86.1 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 96.2 97.6 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 96.2 97.1 ...
 $ Sec_Only                        : num [1:110] 0 0 0 97.5 93.5 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 100 83.3 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 91.3 93.2 ...
 $ All Schools                     : num [1:110] 88.9 88.9 90.1 90.3 92.8 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-5 | Percentage of Schools with water faciltity

percentage_of_schools_with_water_facility <- read_csv("601 Major Project/percentage-of-schools-with-water-facility.csv")
View(percentage_of_schools_with_water_facility)
head(percentage_of_schools_with_water_facility)
# A tibble: 6 x 13
  `State/UT`      Year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         98.2             98.7            100  
2 Andaman & Nico~ 2014~         99.6             98.8            100  
3 Andaman & Nico~ 2015~        100              100              100  
4 Andhra Pradesh  2013~         86.9             94.5             99.7
5 Andhra Pradesh  2014~         91.8             96.1            100  
6 Andhra Pradesh  2015~         93.9             97.0            100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_water_facility)
 [1] "State/UT"                        
 [2] "Year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(percentage_of_schools_with_water_facility)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State/UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ Year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 98.2 99.5 100 86.9 91.8 ...
 $ Primary_with_U_Primary          : num [1:110] 98.7 98.8 100 94.5 96.1 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.7 100 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 90.9 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 87.3 90 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 98.8 99.6 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 96 97.5 ...
 $ Sec_Only                        : num [1:110] 0 0 0 97.5 100 100 0 0 0 88.3 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 100 100 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 97.5 98.4 ...
 $ All Schools                     : num [1:110] 98.7 99.5 100 90.3 93.7 ...
 - attr(*, "spec")=
  .. cols(
  ..   `State/UT` = col_character(),
  ..   Year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-6 | Percentage of Schools with boys toilet

schools_with_boys_toilet <- read_csv("601 Major Project/schools-with-boys-toilet.csv")
View(schools_with_boys_toilet)
head(schools_with_boys_toilet)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         91.6             97.4            100  
2 Andaman & Nico~ 2014~        100              100              100  
3 Andaman & Nico~ 2015~        100              100              100  
4 Andhra Pradesh  2013~         53.0             62.6             82.0
5 Andhra Pradesh  2014~         57.9             76.5             96  
6 Andhra Pradesh  2015~         99.6             99.9             99.0
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_boys_toilet)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(schools_with_boys_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 91.6 100 100 53 57.9 ...
 $ Primary_with_U_Primary          : num [1:110] 97.4 100 100 62.6 76.5 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 82 96 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 45.5 75 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 64.1 93.3 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 76.2 91.4 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 60.6 78 ...
 $ Sec_Only                        : num [1:110] 0 0 0 59.3 80.7 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 85.7 60 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 73.4 86.5 ...
 $ All Schools                     : num [1:110] 94.5 100 100 56.9 65.3 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-7 | Percentage of Schools with girls toilet

schools_with_girls_toilet <- read_csv("601 Major Project/schools-with-girls-toilet.csv")
View(schools_with_girls_toilet)
head(schools_with_girls_toilet)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 All India       2013~         88.7             96.0             98.8
2 All India       2014~         91.2             96.9             99.5
3 All India       2015~         97.0             99.0             99.7
4 Andaman & Nico~ 2013~         89.7             97.4            100  
5 Andaman & Nico~ 2014~        100              100              100  
6 Andaman & Nico~ 2015~        100              100              100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_girls_toilet)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(schools_with_girls_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "All India" "All India" "All India" "Andaman & Nicobar Islands" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 88.7 91.2 97 89.7 100 ...
 $ Primary_with_U_Primary          : num [1:110] 96 96.9 99 97.4 100 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 98.8 99.5 99.7 100 100 ...
 $ U_Primary_Only                  : num [1:110] 91.4 91.4 96.3 0 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 98.2 99.2 99.6 100 100 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 97.3 98.2 99.3 100 100 ...
 $ U_Primary_With_Sec              : num [1:110] 94.4 96.6 98.8 0 0 ...
 $ Sec_Only                        : num [1:110] 99.1 90.3 95.2 0 0 ...
 $ Sec_with_HrSec.                 : num [1:110] 98.4 94 98.3 100 100 ...
 $ HrSec_Only                      : num [1:110] 76.1 90.9 96.2 0 0 ...
 $ All Schools                     : num [1:110] 91.2 93.1 97.5 93.4 100 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Preliminary Research Questions 1. Is there any impact on the school drop-out ratio of girls and boys based on available facilites across various states in India from 2013-2016. 2. Correlation between GER and Drop-out ratio. 3. State-wise, Year-Wise, Education Level-Wise analysis of all the 7 dataframes. 4. What suggestions can be proposed to the Indian Government based on the analysis.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Pola (2022, May 4). Data Analytics and Computational Social Science: HW-4 | Descriptive Statistics and Few graphs. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomniharika898262/

BibTeX citation

@misc{pola2022hw-4,
  author = {Pola, Niharika},
  title = {Data Analytics and Computational Social Science: HW-4 | Descriptive Statistics and Few graphs},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomniharika898262/},
  year = {2022}
}