HW-3 | Major Project Dataset and Preliminary Research Questions

Statistical Analysis of Indian Education System

Niharika Pola
2022-05-11

This project analyzes 7 dataframes from 2013-2016 related to Indian Education System extracted from Indian Government’s Data Management website - data.gov.in. The main objective of the project is to study the impact on dropout ratio of schools based on the facilites (access to computers, sanitation and electricity) across various levels(primary, upper-primary, secondary and higher secondary) in all states of the country. I also aim to analyze the correlation between all the dataframes linked to drpout ratio.

Loading the packages

Reading Dataframe-1 | Gross Enrollment Ratio from 2013-2016 acoss all Indian States

gross_enrollment_ratio <- read_csv("601 Major Project/gross-enrollment-ratio.csv")
View(gross_enrollment_ratio)
head(gross_enrollment_ratio)
# A tibble: 6 x 14
  State_UT              Year  Primary_Boys Primary_Girls Primary_Total
  <chr>                 <chr>        <dbl>         <dbl>         <dbl>
1 Andaman & Nicobar Is~ 2013~         95.9          92.0          93.9
2 Andhra Pradesh        2013~         96.6          96.9          96.7
3 Arunachal Pradesh     2013~        129.          128.          128. 
4 Assam                 2013~        112.          115.          113. 
5 Bihar                 2013~         95.0         101.           98.0
6 Chandigarh            2013~         88.4          96.1          91.8
# ... with 9 more variables: Upper_Primary_Boys <dbl>,
#   Upper_Primary_Girls <dbl>, Upper_Primary_Total <dbl>,
#   Secondary_Boys <dbl>, Secondary_Girls <dbl>,
#   Secondary_Total <dbl>, Higher_Secondary_Boys <chr>,
#   Higher_Secondary_Girls <chr>, Higher_Secondary_Total <chr>
colnames(gross_enrollment_ratio)
 [1] "State_UT"               "Year"                  
 [3] "Primary_Boys"           "Primary_Girls"         
 [5] "Primary_Total"          "Upper_Primary_Boys"    
 [7] "Upper_Primary_Girls"    "Upper_Primary_Total"   
 [9] "Secondary_Boys"         "Secondary_Girls"       
[11] "Secondary_Total"        "Higher_Secondary_Boys" 
[13] "Higher_Secondary_Girls" "Higher_Secondary_Total"

Datatypes of each column

str(gross_enrollment_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT              : chr [1:110] "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
 $ Year                  : chr [1:110] "2013-14" "2013-14" "2013-14" "2013-14" ...
 $ Primary_Boys          : num [1:110] 95.9 96.6 129.1 111.8 95 ...
 $ Primary_Girls         : num [1:110] 92 96.9 127.8 115.2 101.2 ...
 $ Primary_Total         : num [1:110] 93.9 96.7 128.5 113.4 98 ...
 $ Upper_Primary_Boys    : num [1:110] 94.7 82.8 112.6 87.8 80.6 ...
 $ Upper_Primary_Girls   : num [1:110] 89 84.4 115.3 98.7 94.9 ...
 $ Upper_Primary_Total   : num [1:110] 91.8 83.6 113.9 93.1 87.2 ...
 $ Secondary_Boys        : num [1:110] 102.9 73.8 88.4 65.6 57.7 ...
 $ Secondary_Girls       : num [1:110] 97.4 76.8 84.9 77.2 63 ...
 $ Secondary_Total       : num [1:110] 100.2 75.2 86.7 71.2 60.1 ...
 $ Higher_Secondary_Boys : chr [1:110] "105.4" "59.83" "65.16" "31.78" ...
 $ Higher_Secondary_Girls: chr [1:110] "96.61" "60.83" "65.38" "34.27" ...
 $ Higher_Secondary_Total: chr [1:110] "101.28" "60.3" "65.27" "32.94" ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   Year = col_character(),
  ..   Primary_Boys = col_double(),
  ..   Primary_Girls = col_double(),
  ..   Primary_Total = col_double(),
  ..   Upper_Primary_Boys = col_double(),
  ..   Upper_Primary_Girls = col_double(),
  ..   Upper_Primary_Total = col_double(),
  ..   Secondary_Boys = col_double(),
  ..   Secondary_Girls = col_double(),
  ..   Secondary_Total = col_double(),
  ..   Higher_Secondary_Boys = col_character(),
  ..   Higher_Secondary_Girls = col_character(),
  ..   Higher_Secondary_Total = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

Tidying the data

gross_enrollment_ratio[ gross_enrollment_ratio == "NR" ] <- NA
gross_enrollment_ratio[ gross_enrollment_ratio == "@" ] <- NA
select(gross_enrollment_ratio, 'Higher_Secondary_Boys', 'Higher_Secondary_Girls', 'Higher_Secondary_Total')
# A tibble: 110 x 3
   Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
   <chr>                 <chr>                  <chr>                 
 1 105.4                 96.61                  101.28                
 2 59.83                 60.83                  60.3                  
 3 65.16                 65.38                  65.27                 
 4 31.78                 34.27                  32.94                 
 5 23.33                 24.17                  23.7                  
 6 90.5                  92.88                  91.49                 
 7 58.27                 56.16                  57.23                 
 8 37.77                 41.99                  39.64                 
 9 34.37                 64.55                  44.36                 
10 98.88                 102.3                  100.42                
# ... with 100 more rows

Reading Dataframe-2 | Dropout Ratio across all Indian States from 2013-2016

dropout_ratio <- read_csv("601 Major Project/dropout-ratio.csv")
View(dropout_ratio)
head(dropout_ratio)
# A tibble: 6 x 14
  State_UT       year    Primary_Boys Primary_Girls Primary_Total
  <chr>          <chr>   <chr>        <chr>         <chr>        
1 A & N Islands  2012-13 0.83         0.51          0.68         
2 A & N Islands  2013-14 1.35         1.06          1.21         
3 A & N Islands  2014-15 0.47         0.55          0.51         
4 Andhra Pradesh 2012-13 3.3          3.05          3.18         
5 Andhra Pradesh 2013-14 4.31         4.39          4.35         
6 Andhra Pradesh 2014-15 6.57         6.89          6.72         
# ... with 9 more variables: `Upper Primary_Boys` <chr>,
#   `Upper Primary_Girls` <chr>, `Upper Primary_Total` <chr>,
#   `Secondary _Boys` <chr>, `Secondary _Girls` <chr>,
#   `Secondary _Total` <chr>, HrSecondary_Boys <chr>,
#   HrSecondary_Girls <chr>, HrSecondary_Total <chr>
colnames(dropout_ratio)
 [1] "State_UT"            "year"                "Primary_Boys"       
 [4] "Primary_Girls"       "Primary_Total"       "Upper Primary_Boys" 
 [7] "Upper Primary_Girls" "Upper Primary_Total" "Secondary _Boys"    
[10] "Secondary _Girls"    "Secondary _Total"    "HrSecondary_Boys"   
[13] "HrSecondary_Girls"   "HrSecondary_Total"  

Datatype of each column

str(dropout_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT           : chr [1:110] "A & N Islands" "A & N Islands" "A & N Islands" "Andhra Pradesh" ...
 $ year               : chr [1:110] "2012-13" "2013-14" "2014-15" "2012-13" ...
 $ Primary_Boys       : chr [1:110] "0.83" "1.35" "0.47" "3.3" ...
 $ Primary_Girls      : chr [1:110] "0.51" "1.06" "0.55" "3.05" ...
 $ Primary_Total      : chr [1:110] "0.68" "1.21" "0.51" "3.18" ...
 $ Upper Primary_Boys : chr [1:110] "Uppe_r_Primary" "NR" "1.44" "3.21" ...
 $ Upper Primary_Girls: chr [1:110] "1.09" "1.54" "1.95" "3.51" ...
 $ Upper Primary_Total: chr [1:110] "1.23" "0.51" "1.69" "3.36" ...
 $ Secondary _Boys    : chr [1:110] "5.57" "8.36" "11.47" "12.21" ...
 $ Secondary _Girls   : chr [1:110] "5.55" "5.98" "8.16" "13.25" ...
 $ Secondary _Total   : chr [1:110] "5.56" "7.2" "9.87" "12.72" ...
 $ HrSecondary_Boys   : chr [1:110] "17.66" "18.94" "21.05" "2.66" ...
 $ HrSecondary_Girls  : chr [1:110] "10.15" "12.2" "12.21" "NR" ...
 $ HrSecondary_Total  : chr [1:110] "14.14" "15.87" "16.93" "0.35" ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Boys = col_character(),
  ..   Primary_Girls = col_character(),
  ..   Primary_Total = col_character(),
  ..   `Upper Primary_Boys` = col_character(),
  ..   `Upper Primary_Girls` = col_character(),
  ..   `Upper Primary_Total` = col_character(),
  ..   `Secondary _Boys` = col_character(),
  ..   `Secondary _Girls` = col_character(),
  ..   `Secondary _Total` = col_character(),
  ..   HrSecondary_Boys = col_character(),
  ..   HrSecondary_Girls = col_character(),
  ..   HrSecondary_Total = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

Tidying the data

dropout_ratio[ dropout_ratio == "NR" ] <- NA
dropout_ratio[ dropout_ratio == "Upper Primary_Boys" ] <- NA
dropout_ratio[ dropout_ratio == "Uppe_r_Primary" ] <- NA

Reading Dataframe-3 | Percentage of Schools with access to computers

percentage_of_schools_with_comps <- read_csv("601 Major Project/percentage-of-schools-with-comps.csv")
View(percentage_of_schools_with_comps)
head(percentage_of_schools_with_comps)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         30.4             73.7             89.7
2 Andaman & Nico~ 2014~         30.9             76.5             92.1
3 Andaman & Nico~ 2015~         28.4             78.6             92.5
4 Andhra Pradesh  2013~         12.7             42.7             87.0
5 Andhra Pradesh  2014~         10.3             44.2             88.5
6 Andhra Pradesh  2015~         11.5             44.8             89.5
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_comps)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(percentage_of_schools_with_comps)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 30.4 30.9 28.4 12.7 10.3 ...
 $ Primary_with_U_Primary          : num [1:110] 73.7 76.5 78.6 42.7 44.1 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 89.7 92.1 92.5 87 88.5 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 45.5 50 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 94.7 94.7 17.1 62.2 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 97.9 100 100 68.2 68.4 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 73.2 76.6 ...
 $ Sec_Only                        : num [1:110] 0 0 0 60 71 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 33.3 66.7 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 19.3 41.6 ...
 $ All Schools                     : num [1:110] 53.1 57.2 57 29.6 28.1 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-4 | Percentage of Schools with Electricity

percentage_of_schools_with_electricity <- read_csv("601 Major Project/percentage-of-schools-with-electricity.csv")
View(percentage_of_schools_with_electricity)
head(percentage_of_schools_with_electricity)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         82.4             96.0            100  
2 Andaman & Nico~ 2014~         80.7             96.3            100  
3 Andaman & Nico~ 2015~         82.1             97.6            100  
4 Andhra Pradesh  2013~         87.7             93.6             99.3
5 Andhra Pradesh  2014~         91.1             94.7            100  
6 Andhra Pradesh  2015~         91.6             95.6            100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_electricity)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(percentage_of_schools_with_electricity)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 82.4 80.7 82.1 87.7 91.1 ...
 $ Primary_with_U_Primary          : num [1:110] 96 96.3 97.6 93.6 94.7 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.3 100 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 100 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 67.5 86.1 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 96.2 97.6 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 96.2 97.1 ...
 $ Sec_Only                        : num [1:110] 0 0 0 97.5 93.5 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 100 83.3 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 91.3 93.2 ...
 $ All Schools                     : num [1:110] 88.9 88.9 90.1 90.3 92.8 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-5 | Percentage of Schools with water faciltity

percentage_of_schools_with_water_facility <- read_csv("601 Major Project/percentage-of-schools-with-water-facility.csv")
View(percentage_of_schools_with_water_facility)
head(percentage_of_schools_with_water_facility)
# A tibble: 6 x 13
  `State/UT`      Year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         98.2             98.7            100  
2 Andaman & Nico~ 2014~         99.6             98.8            100  
3 Andaman & Nico~ 2015~        100              100              100  
4 Andhra Pradesh  2013~         86.9             94.5             99.7
5 Andhra Pradesh  2014~         91.8             96.1            100  
6 Andhra Pradesh  2015~         93.9             97.0            100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_water_facility)
 [1] "State/UT"                        
 [2] "Year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(percentage_of_schools_with_water_facility)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State/UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ Year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 98.2 99.5 100 86.9 91.8 ...
 $ Primary_with_U_Primary          : num [1:110] 98.7 98.8 100 94.5 96.1 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.7 100 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 90.9 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 87.3 90 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 98.8 99.6 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 96 97.5 ...
 $ Sec_Only                        : num [1:110] 0 0 0 97.5 100 100 0 0 0 88.3 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 100 100 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 97.5 98.4 ...
 $ All Schools                     : num [1:110] 98.7 99.5 100 90.3 93.7 ...
 - attr(*, "spec")=
  .. cols(
  ..   `State/UT` = col_character(),
  ..   Year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-6 | Percentage of Schools with boys toilet

schools_with_boys_toilet <- read_csv("601 Major Project/schools-with-boys-toilet.csv")
View(schools_with_boys_toilet)
head(schools_with_boys_toilet)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         91.6             97.4            100  
2 Andaman & Nico~ 2014~        100              100              100  
3 Andaman & Nico~ 2015~        100              100              100  
4 Andhra Pradesh  2013~         53.0             62.6             82.0
5 Andhra Pradesh  2014~         57.9             76.5             96  
6 Andhra Pradesh  2015~         99.6             99.9             99.0
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_boys_toilet)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(schools_with_boys_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 91.6 100 100 53 57.9 ...
 $ Primary_with_U_Primary          : num [1:110] 97.4 100 100 62.6 76.5 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 82 96 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 45.5 75 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 64.1 93.3 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 76.2 91.4 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 60.6 78 ...
 $ Sec_Only                        : num [1:110] 0 0 0 59.3 80.7 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 85.7 60 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 73.4 86.5 ...
 $ All Schools                     : num [1:110] 94.5 100 100 56.9 65.3 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Reading Dataframe-7 | Percentage of Schools with girls toilet

schools_with_girls_toilet <- read_csv("601 Major Project/schools-with-girls-toilet.csv")
View(schools_with_girls_toilet)
head(schools_with_girls_toilet)
# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 All India       2013~         88.7             96.0             98.8
2 All India       2014~         91.2             96.9             99.5
3 All India       2015~         97.0             99.0             99.7
4 Andaman & Nico~ 2013~         89.7             97.4            100  
5 Andaman & Nico~ 2014~        100              100              100  
6 Andaman & Nico~ 2015~        100              100              100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_girls_toilet)
 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"                     

Datatype of each column

str(schools_with_girls_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "All India" "All India" "All India" "Andaman & Nicobar Islands" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 88.7 91.2 97 89.7 100 ...
 $ Primary_with_U_Primary          : num [1:110] 96 96.9 99 97.4 100 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 98.8 99.5 99.7 100 100 ...
 $ U_Primary_Only                  : num [1:110] 91.4 91.4 96.3 0 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 98.2 99.2 99.6 100 100 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 97.3 98.2 99.3 100 100 ...
 $ U_Primary_With_Sec              : num [1:110] 94.4 96.6 98.8 0 0 ...
 $ Sec_Only                        : num [1:110] 99.1 90.3 95.2 0 0 ...
 $ Sec_with_HrSec.                 : num [1:110] 98.4 94 98.3 100 100 ...
 $ HrSec_Only                      : num [1:110] 76.1 90.9 96.2 0 0 ...
 $ All Schools                     : num [1:110] 91.2 93.1 97.5 93.4 100 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Preliminary Research Questions:

  1. Which Indian States have the highest dropout percentage in various education levels from 2013-2016 and Is there any impact of the available facilities in those states that are effecting the number of dropouts?
  2. How is the gross enrollment ratio in India from 2013-2016.
  3. State-wise, Year-Wise, Education Level-Wise analysis of all the 7 dataframes.
  4. What suggestions can be proposed to the Indian Government based on the analysis.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Pola (2022, May 19). Data Analytics and Computational Social Science: HW-3 | Major Project Dataset and Preliminary Research Questions. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomniharika896286/

BibTeX citation

@misc{pola2022hw-3,
  author = {Pola, Niharika},
  title = {Data Analytics and Computational Social Science: HW-3 | Major Project Dataset and Preliminary Research Questions},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomniharika896286/},
  year = {2022}
}