Data Analytics and Computational Social Science: HW-3 | Major Project Dataset and Preliminary Research Questions

Niharika Pola

This project analyzes 7 dataframes from 2013-2016 related to Indian Education System extracted from Indian Government’s Data Management website - data.gov.in. The main objective of the project is to study the impact on dropout ratio of schools based on the facilites (access to computers, sanitation and electricity) across various levels(primary, upper-primary, secondary and higher secondary) in all states of the country. I also aim to analyze the correlation between all the dataframes linked to drpout ratio.

Loading the packages

library(tidyr)
library(dplyr)
library(readr)
library(stringr)

Reading Dataframe-1 | Gross Enrollment Ratio from 2013-2016 acoss all Indian States

gross_enrollment_ratio <- read_csv("601 Major Project/gross-enrollment-ratio.csv")
View(gross_enrollment_ratio)
head(gross_enrollment_ratio)

# A tibble: 6 x 14
  State_UT              Year  Primary_Boys Primary_Girls Primary_Total
  <chr>                 <chr>        <dbl>         <dbl>         <dbl>
1 Andaman & Nicobar Is~ 2013~         95.9          92.0          93.9
2 Andhra Pradesh        2013~         96.6          96.9          96.7
3 Arunachal Pradesh     2013~        129.          128.          128. 
4 Assam                 2013~        112.          115.          113. 
5 Bihar                 2013~         95.0         101.           98.0
6 Chandigarh            2013~         88.4          96.1          91.8
# ... with 9 more variables: Upper_Primary_Boys <dbl>,
#   Upper_Primary_Girls <dbl>, Upper_Primary_Total <dbl>,
#   Secondary_Boys <dbl>, Secondary_Girls <dbl>,
#   Secondary_Total <dbl>, Higher_Secondary_Boys <chr>,
#   Higher_Secondary_Girls <chr>, Higher_Secondary_Total <chr>

colnames(gross_enrollment_ratio)

 [1] "State_UT"               "Year"                  
 [3] "Primary_Boys"           "Primary_Girls"         
 [5] "Primary_Total"          "Upper_Primary_Boys"    
 [7] "Upper_Primary_Girls"    "Upper_Primary_Total"   
 [9] "Secondary_Boys"         "Secondary_Girls"       
[11] "Secondary_Total"        "Higher_Secondary_Boys" 
[13] "Higher_Secondary_Girls" "Higher_Secondary_Total"

Datatypes of each column

str(gross_enrollment_ratio)

spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT              : chr [1:110] "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
 $ Year                  : chr [1:110] "2013-14" "2013-14" "2013-14" "2013-14" ...
 $ Primary_Boys          : num [1:110] 95.9 96.6 129.1 111.8 95 ...
 $ Primary_Girls         : num [1:110] 92 96.9 127.8 115.2 101.2 ...
 $ Primary_Total         : num [1:110] 93.9 96.7 128.5 113.4 98 ...
 $ Upper_Primary_Boys    : num [1:110] 94.7 82.8 112.6 87.8 80.6 ...
 $ Upper_Primary_Girls   : num [1:110] 89 84.4 115.3 98.7 94.9 ...
 $ Upper_Primary_Total   : num [1:110] 91.8 83.6 113.9 93.1 87.2 ...
 $ Secondary_Boys        : num [1:110] 102.9 73.8 88.4 65.6 57.7 ...
 $ Secondary_Girls       : num [1:110] 97.4 76.8 84.9 77.2 63 ...
 $ Secondary_Total       : num [1:110] 100.2 75.2 86.7 71.2 60.1 ...
 $ Higher_Secondary_Boys : chr [1:110] "105.4" "59.83" "65.16" "31.78" ...
 $ Higher_Secondary_Girls: chr [1:110] "96.61" "60.83" "65.38" "34.27" ...
 $ Higher_Secondary_Total: chr [1:110] "101.28" "60.3" "65.27" "32.94" ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   Year = col_character(),
  ..   Primary_Boys = col_double(),
  ..   Primary_Girls = col_double(),
  ..   Primary_Total = col_double(),
  ..   Upper_Primary_Boys = col_double(),
  ..   Upper_Primary_Girls = col_double(),
  ..   Upper_Primary_Total = col_double(),
  ..   Secondary_Boys = col_double(),
  ..   Secondary_Girls = col_double(),
  ..   Secondary_Total = col_double(),
  ..   Higher_Secondary_Boys = col_character(),
  ..   Higher_Secondary_Girls = col_character(),
  ..   Higher_Secondary_Total = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

Tidying the data

gross_enrollment_ratio[ gross_enrollment_ratio == "NR" ] <- NA
gross_enrollment_ratio[ gross_enrollment_ratio == "@" ] <- NA
select(gross_enrollment_ratio, 'Higher_Secondary_Boys', 'Higher_Secondary_Girls', 'Higher_Secondary_Total')

# A tibble: 110 x 3
   Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
   <chr>                 <chr>                  <chr>                 
 1 105.4                 96.61                  101.28                
 2 59.83                 60.83                  60.3                  
 3 65.16                 65.38                  65.27                 
 4 31.78                 34.27                  32.94                 
 5 23.33                 24.17                  23.7                  
 6 90.5                  92.88                  91.49                 
 7 58.27                 56.16                  57.23                 
 8 37.77                 41.99                  39.64                 
 9 34.37                 64.55                  44.36                 
10 98.88                 102.3                  100.42                
# ... with 100 more rows

Reading Dataframe-2 | Dropout Ratio across all Indian States from 2013-2016

dropout_ratio <- read_csv("601 Major Project/dropout-ratio.csv")
View(dropout_ratio)
head(dropout_ratio)

# A tibble: 6 x 14
  State_UT       year    Primary_Boys Primary_Girls Primary_Total
  <chr>          <chr>   <chr>        <chr>         <chr>        
1 A & N Islands  2012-13 0.83         0.51          0.68         
2 A & N Islands  2013-14 1.35         1.06          1.21         
3 A & N Islands  2014-15 0.47         0.55          0.51         
4 Andhra Pradesh 2012-13 3.3          3.05          3.18         
5 Andhra Pradesh 2013-14 4.31         4.39          4.35         
6 Andhra Pradesh 2014-15 6.57         6.89          6.72         
# ... with 9 more variables: `Upper Primary_Boys` <chr>,
#   `Upper Primary_Girls` <chr>, `Upper Primary_Total` <chr>,
#   `Secondary _Boys` <chr>, `Secondary _Girls` <chr>,
#   `Secondary _Total` <chr>, HrSecondary_Boys <chr>,
#   HrSecondary_Girls <chr>, HrSecondary_Total <chr>

colnames(dropout_ratio)

 [1] "State_UT"            "year"                "Primary_Boys"       
 [4] "Primary_Girls"       "Primary_Total"       "Upper Primary_Boys" 
 [7] "Upper Primary_Girls" "Upper Primary_Total" "Secondary _Boys"    
[10] "Secondary _Girls"    "Secondary _Total"    "HrSecondary_Boys"   
[13] "HrSecondary_Girls"   "HrSecondary_Total"

Datatype of each column

str(dropout_ratio)

spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT           : chr [1:110] "A & N Islands" "A & N Islands" "A & N Islands" "Andhra Pradesh" ...
 $ year               : chr [1:110] "2012-13" "2013-14" "2014-15" "2012-13" ...
 $ Primary_Boys       : chr [1:110] "0.83" "1.35" "0.47" "3.3" ...
 $ Primary_Girls      : chr [1:110] "0.51" "1.06" "0.55" "3.05" ...
 $ Primary_Total      : chr [1:110] "0.68" "1.21" "0.51" "3.18" ...
 $ Upper Primary_Boys : chr [1:110] "Uppe_r_Primary" "NR" "1.44" "3.21" ...
 $ Upper Primary_Girls: chr [1:110] "1.09" "1.54" "1.95" "3.51" ...
 $ Upper Primary_Total: chr [1:110] "1.23" "0.51" "1.69" "3.36" ...
 $ Secondary _Boys    : chr [1:110] "5.57" "8.36" "11.47" "12.21" ...
 $ Secondary _Girls   : chr [1:110] "5.55" "5.98" "8.16" "13.25" ...
 $ Secondary _Total   : chr [1:110] "5.56" "7.2" "9.87" "12.72" ...
 $ HrSecondary_Boys   : chr [1:110] "17.66" "18.94" "21.05" "2.66" ...
 $ HrSecondary_Girls  : chr [1:110] "10.15" "12.2" "12.21" "NR" ...
 $ HrSecondary_Total  : chr [1:110] "14.14" "15.87" "16.93" "0.35" ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Boys = col_character(),
  ..   Primary_Girls = col_character(),
  ..   Primary_Total = col_character(),
  ..   `Upper Primary_Boys` = col_character(),
  ..   `Upper Primary_Girls` = col_character(),
  ..   `Upper Primary_Total` = col_character(),
  ..   `Secondary _Boys` = col_character(),
  ..   `Secondary _Girls` = col_character(),
  ..   `Secondary _Total` = col_character(),
  ..   HrSecondary_Boys = col_character(),
  ..   HrSecondary_Girls = col_character(),
  ..   HrSecondary_Total = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

Tidying the data

dropout_ratio[ dropout_ratio == "NR" ] <- NA
dropout_ratio[ dropout_ratio == "Upper Primary_Boys" ] <- NA
dropout_ratio[ dropout_ratio == "Uppe_r_Primary" ] <- NA

Reading Dataframe-3 | Percentage of Schools with access to computers

percentage_of_schools_with_comps <- read_csv("601 Major Project/percentage-of-schools-with-comps.csv")
View(percentage_of_schools_with_comps)
head(percentage_of_schools_with_comps)

# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         30.4             73.7             89.7
2 Andaman & Nico~ 2014~         30.9             76.5             92.1
3 Andaman & Nico~ 2015~         28.4             78.6             92.5
4 Andhra Pradesh  2013~         12.7             42.7             87.0
5 Andhra Pradesh  2014~         10.3             44.2             88.5
6 Andhra Pradesh  2015~         11.5             44.8             89.5
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>

colnames(percentage_of_schools_with_comps)

 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"

Datatype of each column

str(percentage_of_schools_with_comps)

spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 30.4 30.9 28.4 12.7 10.3 ...
 $ Primary_with_U_Primary          : num [1:110] 73.7 76.5 78.6 42.7 44.1 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 89.7 92.1 92.5 87 88.5 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 45.5 50 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 94.7 94.7 17.1 62.2 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 97.9 100 100 68.2 68.4 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 73.2 76.6 ...
 $ Sec_Only                        : num [1:110] 0 0 0 60 71 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 33.3 66.7 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 19.3 41.6 ...
 $ All Schools                     : num [1:110] 53.1 57.2 57 29.6 28.1 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

Reading Dataframe-4 | Percentage of Schools with Electricity

percentage_of_schools_with_electricity <- read_csv("601 Major Project/percentage-of-schools-with-electricity.csv")
View(percentage_of_schools_with_electricity)
head(percentage_of_schools_with_electricity)

# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         82.4             96.0            100  
2 Andaman & Nico~ 2014~         80.7             96.3            100  
3 Andaman & Nico~ 2015~         82.1             97.6            100  
4 Andhra Pradesh  2013~         87.7             93.6             99.3
5 Andhra Pradesh  2014~         91.1             94.7            100  
6 Andhra Pradesh  2015~         91.6             95.6            100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>

colnames(percentage_of_schools_with_electricity)

 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"

Datatype of each column

str(percentage_of_schools_with_electricity)

spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 82.4 80.7 82.1 87.7 91.1 ...
 $ Primary_with_U_Primary          : num [1:110] 96 96.3 97.6 93.6 94.7 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.3 100 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 100 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 67.5 86.1 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 96.2 97.6 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 96.2 97.1 ...
 $ Sec_Only                        : num [1:110] 0 0 0 97.5 93.5 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 100 83.3 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 91.3 93.2 ...
 $ All Schools                     : num [1:110] 88.9 88.9 90.1 90.3 92.8 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

Reading Dataframe-5 | Percentage of Schools with water faciltity

percentage_of_schools_with_water_facility <- read_csv("601 Major Project/percentage-of-schools-with-water-facility.csv")
View(percentage_of_schools_with_water_facility)
head(percentage_of_schools_with_water_facility)

# A tibble: 6 x 13
  `State/UT`      Year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         98.2             98.7            100  
2 Andaman & Nico~ 2014~         99.6             98.8            100  
3 Andaman & Nico~ 2015~        100              100              100  
4 Andhra Pradesh  2013~         86.9             94.5             99.7
5 Andhra Pradesh  2014~         91.8             96.1            100  
6 Andhra Pradesh  2015~         93.9             97.0            100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>

colnames(percentage_of_schools_with_water_facility)

 [1] "State/UT"                        
 [2] "Year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"

Datatype of each column

str(percentage_of_schools_with_water_facility)

spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State/UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ Year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 98.2 99.5 100 86.9 91.8 ...
 $ Primary_with_U_Primary          : num [1:110] 98.7 98.8 100 94.5 96.1 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.7 100 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 90.9 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 87.3 90 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 98.8 99.6 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 96 97.5 ...
 $ Sec_Only                        : num [1:110] 0 0 0 97.5 100 100 0 0 0 88.3 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 100 100 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 97.5 98.4 ...
 $ All Schools                     : num [1:110] 98.7 99.5 100 90.3 93.7 ...
 - attr(*, "spec")=
  .. cols(
  ..   `State/UT` = col_character(),
  ..   Year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

Reading Dataframe-6 | Percentage of Schools with boys toilet

schools_with_boys_toilet <- read_csv("601 Major Project/schools-with-boys-toilet.csv")
View(schools_with_boys_toilet)
head(schools_with_boys_toilet)

# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 Andaman & Nico~ 2013~         91.6             97.4            100  
2 Andaman & Nico~ 2014~        100              100              100  
3 Andaman & Nico~ 2015~        100              100              100  
4 Andhra Pradesh  2013~         53.0             62.6             82.0
5 Andhra Pradesh  2014~         57.9             76.5             96  
6 Andhra Pradesh  2015~         99.6             99.9             99.0
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>

colnames(schools_with_boys_toilet)

 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"

Datatype of each column

str(schools_with_boys_toilet)

spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 91.6 100 100 53 57.9 ...
 $ Primary_with_U_Primary          : num [1:110] 97.4 100 100 62.6 76.5 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 82 96 ...
 $ U_Primary_Only                  : num [1:110] 0 100 0 45.5 75 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 100 100 100 64.1 93.3 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 100 100 100 76.2 91.4 ...
 $ U_Primary_With_Sec              : num [1:110] 0 0 0 60.6 78 ...
 $ Sec_Only                        : num [1:110] 0 0 0 59.3 80.7 ...
 $ Sec_with_HrSec.                 : num [1:110] 100 100 100 85.7 60 ...
 $ HrSec_Only                      : num [1:110] 0 0 0 73.4 86.5 ...
 $ All Schools                     : num [1:110] 94.5 100 100 56.9 65.3 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

Reading Dataframe-7 | Percentage of Schools with girls toilet

schools_with_girls_toilet <- read_csv("601 Major Project/schools-with-girls-toilet.csv")
View(schools_with_girls_toilet)
head(schools_with_girls_toilet)

# A tibble: 6 x 13
  State_UT        year  Primary_Only Primary_with_U_~ Primary_with_U_~
  <chr>           <chr>        <dbl>            <dbl>            <dbl>
1 All India       2013~         88.7             96.0             98.8
2 All India       2014~         91.2             96.9             99.5
3 All India       2015~         97.0             99.0             99.7
4 Andaman & Nico~ 2013~         89.7             97.4            100  
5 Andaman & Nico~ 2014~        100              100              100  
6 Andaman & Nico~ 2015~        100              100              100  
# ... with 8 more variables: U_Primary_Only <dbl>,
#   U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
#   U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
#   HrSec_Only <dbl>, `All Schools` <dbl>

colnames(schools_with_girls_toilet)

 [1] "State_UT"                        
 [2] "year"                            
 [3] "Primary_Only"                    
 [4] "Primary_with_U_Primary"          
 [5] "Primary_with_U_Primary_Sec_HrSec"
 [6] "U_Primary_Only"                  
 [7] "U_Primary_With_Sec_HrSec"        
 [8] "Primary_with_U_Primary_Sec"      
 [9] "U_Primary_With_Sec"              
[10] "Sec_Only"                        
[11] "Sec_with_HrSec."                 
[12] "HrSec_Only"                      
[13] "All Schools"

Datatype of each column

str(schools_with_girls_toilet)

spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ State_UT                        : chr [1:110] "All India" "All India" "All India" "Andaman & Nicobar Islands" ...
 $ year                            : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
 $ Primary_Only                    : num [1:110] 88.7 91.2 97 89.7 100 ...
 $ Primary_with_U_Primary          : num [1:110] 96 96.9 99 97.4 100 ...
 $ Primary_with_U_Primary_Sec_HrSec: num [1:110] 98.8 99.5 99.7 100 100 ...
 $ U_Primary_Only                  : num [1:110] 91.4 91.4 96.3 0 100 ...
 $ U_Primary_With_Sec_HrSec        : num [1:110] 98.2 99.2 99.6 100 100 ...
 $ Primary_with_U_Primary_Sec      : num [1:110] 97.3 98.2 99.3 100 100 ...
 $ U_Primary_With_Sec              : num [1:110] 94.4 96.6 98.8 0 0 ...
 $ Sec_Only                        : num [1:110] 99.1 90.3 95.2 0 0 ...
 $ Sec_with_HrSec.                 : num [1:110] 98.4 94 98.3 100 100 ...
 $ HrSec_Only                      : num [1:110] 76.1 90.9 96.2 0 0 ...
 $ All Schools                     : num [1:110] 91.2 93.1 97.5 93.4 100 ...
 - attr(*, "spec")=
  .. cols(
  ..   State_UT = col_character(),
  ..   year = col_character(),
  ..   Primary_Only = col_double(),
  ..   Primary_with_U_Primary = col_double(),
  ..   Primary_with_U_Primary_Sec_HrSec = col_double(),
  ..   U_Primary_Only = col_double(),
  ..   U_Primary_With_Sec_HrSec = col_double(),
  ..   Primary_with_U_Primary_Sec = col_double(),
  ..   U_Primary_With_Sec = col_double(),
  ..   Sec_Only = col_double(),
  ..   Sec_with_HrSec. = col_double(),
  ..   HrSec_Only = col_double(),
  ..   `All Schools` = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

Preliminary Research Questions:

Which Indian States have the highest dropout percentage in various education levels from 2013-2016 and Is there any impact of the available facilities in those states that are effecting the number of dropouts?
How is the gross enrollment ratio in India from 2013-2016.
State-wise, Year-Wise, Education Level-Wise analysis of all the 7 dataframes.
What suggestions can be proposed to the Indian Government based on the analysis.

Comment on this article Share:

HW-3 | Major Project Dataset and Preliminary Research Questions

Reuse

Citation