Statistical Analysis of Indian Education System
This project analyzes 7 dataframes from 2013-2016 related to Indian Education System extracted from Indian Government’s Data Management website - data.gov.in. The main objective of the project is to study the impact on dropout ratio of schools based on the facilites (access to computers, sanitation and electricity) across various levels(primary, upper-primary, secondary and higher secondary) in all states of the country. I also aim to analyze the correlation between all the dataframes linked to drpout ratio.
Loading the packagesReading Dataframe-1 | Gross Enrollment Ratio from 2013-2016 across all Indian States
gross_enrollment_ratio <- read_csv("601 Major Project/gross-enrollment-ratio.csv")
dim(gross_enrollment_ratio)
[1] 110 14
# A tibble: 6 x 14
State_UT Year Primary_Boys Primary_Girls Primary_Total
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nicobar Is~ 2013~ 95.9 92.0 93.9
2 Andhra Pradesh 2013~ 96.6 96.9 96.7
3 Arunachal Pradesh 2013~ 129. 128. 128.
4 Assam 2013~ 112. 115. 113.
5 Bihar 2013~ 95.0 101. 98.0
6 Chandigarh 2013~ 88.4 96.1 91.8
# ... with 9 more variables: Upper_Primary_Boys <dbl>,
# Upper_Primary_Girls <dbl>, Upper_Primary_Total <dbl>,
# Secondary_Boys <dbl>, Secondary_Girls <dbl>,
# Secondary_Total <dbl>, Higher_Secondary_Boys <chr>,
# Higher_Secondary_Girls <chr>, Higher_Secondary_Total <chr>
colnames(gross_enrollment_ratio)
[1] "State_UT" "Year"
[3] "Primary_Boys" "Primary_Girls"
[5] "Primary_Total" "Upper_Primary_Boys"
[7] "Upper_Primary_Girls" "Upper_Primary_Total"
[9] "Secondary_Boys" "Secondary_Girls"
[11] "Secondary_Total" "Higher_Secondary_Boys"
[13] "Higher_Secondary_Girls" "Higher_Secondary_Total"
Datatypes of each column
str(gross_enrollment_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
$ Year : chr [1:110] "2013-14" "2013-14" "2013-14" "2013-14" ...
$ Primary_Boys : num [1:110] 95.9 96.6 129.1 111.8 95 ...
$ Primary_Girls : num [1:110] 92 96.9 127.8 115.2 101.2 ...
$ Primary_Total : num [1:110] 93.9 96.7 128.5 113.4 98 ...
$ Upper_Primary_Boys : num [1:110] 94.7 82.8 112.6 87.8 80.6 ...
$ Upper_Primary_Girls : num [1:110] 89 84.4 115.3 98.7 94.9 ...
$ Upper_Primary_Total : num [1:110] 91.8 83.6 113.9 93.1 87.2 ...
$ Secondary_Boys : num [1:110] 102.9 73.8 88.4 65.6 57.7 ...
$ Secondary_Girls : num [1:110] 97.4 76.8 84.9 77.2 63 ...
$ Secondary_Total : num [1:110] 100.2 75.2 86.7 71.2 60.1 ...
$ Higher_Secondary_Boys : chr [1:110] "105.4" "59.83" "65.16" "31.78" ...
$ Higher_Secondary_Girls: chr [1:110] "96.61" "60.83" "65.38" "34.27" ...
$ Higher_Secondary_Total: chr [1:110] "101.28" "60.3" "65.27" "32.94" ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. Year = col_character(),
.. Primary_Boys = col_double(),
.. Primary_Girls = col_double(),
.. Primary_Total = col_double(),
.. Upper_Primary_Boys = col_double(),
.. Upper_Primary_Girls = col_double(),
.. Upper_Primary_Total = col_double(),
.. Secondary_Boys = col_double(),
.. Secondary_Girls = col_double(),
.. Secondary_Total = col_double(),
.. Higher_Secondary_Boys = col_character(),
.. Higher_Secondary_Girls = col_character(),
.. Higher_Secondary_Total = col_character()
.. )
- attr(*, "problems")=<externalptr>
Tidying the data
gross_enrollment_ratio[ gross_enrollment_ratio == "NR" ] <- NA
gross_enrollment_ratio[ gross_enrollment_ratio == "@" ] <- NA
ger1 <- data.frame(gross_enrollment_ratio)
ger <- na.exclude(ger1)
ger$Higher_Secondary_Boys = as.numeric(ger$Higher_Secondary_Boys)
ger$Higher_Secondary_Girls = as.numeric(ger$Higher_Secondary_Girls)
ger$Higher_Secondary_Total = as.numeric(ger$Higher_Secondary_Total)
str(ger)
'data.frame': 108 obs. of 14 variables:
$ State_UT : chr "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
$ Year : chr "2013-14" "2013-14" "2013-14" "2013-14" ...
$ Primary_Boys : num 95.9 96.6 129.1 111.8 95 ...
$ Primary_Girls : num 92 96.9 127.8 115.2 101.2 ...
$ Primary_Total : num 93.9 96.7 128.5 113.4 98 ...
$ Upper_Primary_Boys : num 94.7 82.8 112.6 87.8 80.6 ...
$ Upper_Primary_Girls : num 89 84.4 115.3 98.7 94.9 ...
$ Upper_Primary_Total : num 91.8 83.6 113.9 93.1 87.2 ...
$ Secondary_Boys : num 102.9 73.8 88.4 65.6 57.7 ...
$ Secondary_Girls : num 97.4 76.8 84.9 77.2 63 ...
$ Secondary_Total : num 100.2 75.2 86.7 71.2 60.1 ...
$ Higher_Secondary_Boys : num 105.4 59.8 65.2 31.8 23.3 ...
$ Higher_Secondary_Girls: num 96.6 60.8 65.4 34.3 24.2 ...
$ Higher_Secondary_Total: num 101.3 60.3 65.3 32.9 23.7 ...
- attr(*, "na.action")= 'exclude' Named int [1:2] 26 99
..- attr(*, "names")= chr [1:2] "26" "99"
plotting All India girls enrollment ratio
Year Primary_Girls Upper_Primary_Girls Secondary_Girls
1 2013-14 102.65 92.75 76.47
2 2014-15 101.43 95.29 78.94
3 2015-16 100.69 97.57 80.97
Higher_Secondary_Girls
1 51.58
2 53.81
3 56.41
fig1 <- pivot_longer(all_india_ger_girls, c(Primary_Girls, Upper_Primary_Girls, Secondary_Girls, Higher_Secondary_Girls), names_to = "Education_Level", values_to = "GER")
ggplot(fig1, aes(x=Year, y=GER, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "fig1: Gross Enrollment Ratio of Girls in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black")
Girls who join Primary School have exceeded 100 GER which means the students also include who are repeating a grade, or those who enrolled late and are older than their classmates, or those who have advanced quickly and are younger than their classmates. This is a positive sign in a developing country like India. However, the ratio has been consistently decreasing over the years.
Plotting All India boys enrollment ratio Year Primary_Boys Upper_Primary_Boys Secondary_Boys
1 2013-14 100.20 86.31 76.80
2 2014-15 98.85 87.71 78.13
3 2015-16 97.87 88.72 79.16
Higher_Secondary_Boys
1 52.77
2 54.57
3 55.95
fig2 <- pivot_longer(all_india_ger_boys, c(Primary_Boys, Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys), names_to = "Education_Level", values_to = "GER")
ggplot(fig2, aes(x=Year, y=GER, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "fig2: Gross Enrollment Ratio of boys in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black")
The trends for Primary Schools are the same as girls. Again, if we could look further into data about the GER, we could gather some important information as to why there’s a decrease in GER for Girls and Boys. Compared to Girls, the GER of Boys in Upper Primary is very low. This is a negative sign because this might affect the enrollments of Secondary and Higher Secondary Schools among boys.
Plotting All India total enrollement ratio Year Primary_Total Upper_Primary_Total Secondary_Total
1 2013-14 101.36 89.33 76.64
2 2014-15 100.08 91.24 78.51
3 2015-16 99.21 92.81 80.01
Higher_Secondary_Total
1 52.21
2 54.21
3 56.16
fig3 <- pivot_longer(all_india_ger_total, c(Primary_Total, Upper_Primary_Total, Secondary_Total, Higher_Secondary_Total), names_to = "Education_Level", values_to = "GER")
ggplot(fig3, aes(x=Year, y=GER, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "fig3 : Gross Enrollment Ratio total in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black")
State_UT Year Primary_Boys Primary_Girls
1 Andaman & Nicobar Islands 2013-14 95.88 91.97
2 Andhra Pradesh 2013-14 96.62 96.87
3 Arunachal Pradesh 2013-14 129.12 127.77
4 Assam 2013-14 111.77 115.16
5 Bihar 2013-14 95.03 101.15
6 Chandigarh 2013-14 88.42 96.09
Primary_Total Upper_Primary_Boys Upper_Primary_Girls
1 93.93 94.70 88.98
2 96.74 82.81 84.38
3 128.46 112.64 115.27
4 113.43 87.85 98.69
5 97.96 80.60 94.92
6 91.85 99.93 103.02
Upper_Primary_Total Secondary_Boys Secondary_Girls Secondary_Total
1 91.83 102.89 97.36 100.16
2 83.57 73.76 76.77 75.20
3 113.94 88.37 84.89 86.65
4 93.13 65.60 77.20 71.21
5 87.24 57.66 62.96 60.08
6 101.27 92.08 92.16 92.11
Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
1 105.40 96.61 101.28
2 59.83 60.83 60.30
3 65.16 65.38 65.27
4 31.78 34.27 32.94
5 23.33 24.17 23.70
6 90.50 92.88 91.49
library(ggmap)
register_google(key = "AIzaSyDc2lDTQRLgvlGtdiZM6hkShq0fW_wv4-0")
coordinates <- geocode(states_ger$State_UT)
plot <- merge(states_ger,coordinates)
map <- get_map(location = 'India', zoom = 4, maptype= 'terrain', scale = "auto")
ggmap(map, fullpage= TRUE)
map
1280x1280 terrain map image from Google Maps.
See ?ggmap to plot it.
a <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Primary_Boys))
b <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Upper_Primary_Boys))
c <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Secondary_Boys))
d <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Higher_Secondary_Boys))
cowplot::plot_grid(a, b,c,d)
a <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Primary_Girls))
b <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Upper_Primary_Girls))
c <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Secondary_Girls))
d <- ggmap(map) + geom_polygon(data=plot, aes(x =lon, y =lat, fill=Higher_Secondary_Girls))
cowplot::plot_grid(a, b,c,d)
states_ger %>%
select(Year, State_UT, Primary_Boys,Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys) %>%
group_by(Year) %>%
summarise(avg_pb=mean(Primary_Boys), avg_upb=mean(Upper_Primary_Boys), avg_sb=mean(Secondary_Boys), avg_hsb=mean(Higher_Secondary_Boys))
# A tibble: 3 x 5
Year avg_pb avg_upb avg_sb avg_hsb
<chr> <dbl> <dbl> <dbl> <dbl>
1 2013-14 105. 96.9 87.2 60.0
2 2014-15 102. 97.0 88.0 60.4
3 2015-16 100. 98.1 86.9 58.2
states_ger %>%
select(Year, Primary_Girls,Upper_Primary_Girls, Secondary_Girls, Higher_Secondary_Girls) %>%
group_by(Year) %>%
summarise(avg_pb=mean(Primary_Girls), avg_upb=mean(Upper_Primary_Girls), avg_sb=mean(Secondary_Girls), avg_hsb=mean(Higher_Secondary_Girls))
# A tibble: 3 x 5
Year avg_pb avg_upb avg_sb avg_hsb
<chr> <dbl> <dbl> <dbl> <dbl>
1 2013-14 106. 99.8 88.0 60.5
2 2014-15 103. 102. 89.6 62.2
3 2015-16 101. 104. 89.4 61.8
Reading Dataframe-2 | Dropout Ratio across all Indian States from 2013-2016
dropout_ratio <- read_csv("601 Major Project/dropout-ratio.csv")
View(dropout_ratio)
head(dropout_ratio)
# A tibble: 6 x 14
State_UT year Primary_Boys Primary_Girls Primary_Total
<chr> <chr> <chr> <chr> <chr>
1 A & N Islands 2012-13 0.83 0.51 0.68
2 A & N Islands 2013-14 1.35 1.06 1.21
3 A & N Islands 2014-15 0.47 0.55 0.51
4 Andhra Pradesh 2012-13 3.3 3.05 3.18
5 Andhra Pradesh 2013-14 4.31 4.39 4.35
6 Andhra Pradesh 2014-15 6.57 6.89 6.72
# ... with 9 more variables: `Upper Primary_Boys` <chr>,
# `Upper Primary_Girls` <chr>, `Upper Primary_Total` <chr>,
# `Secondary _Boys` <chr>, `Secondary _Girls` <chr>,
# `Secondary _Total` <chr>, HrSecondary_Boys <chr>,
# HrSecondary_Girls <chr>, HrSecondary_Total <chr>
colnames(dropout_ratio)
[1] "State_UT" "year" "Primary_Boys"
[4] "Primary_Girls" "Primary_Total" "Upper Primary_Boys"
[7] "Upper Primary_Girls" "Upper Primary_Total" "Secondary _Boys"
[10] "Secondary _Girls" "Secondary _Total" "HrSecondary_Boys"
[13] "HrSecondary_Girls" "HrSecondary_Total"
Datatype of each column
str(dropout_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "A & N Islands" "A & N Islands" "A & N Islands" "Andhra Pradesh" ...
$ year : chr [1:110] "2012-13" "2013-14" "2014-15" "2012-13" ...
$ Primary_Boys : chr [1:110] "0.83" "1.35" "0.47" "3.3" ...
$ Primary_Girls : chr [1:110] "0.51" "1.06" "0.55" "3.05" ...
$ Primary_Total : chr [1:110] "0.68" "1.21" "0.51" "3.18" ...
$ Upper Primary_Boys : chr [1:110] "Uppe_r_Primary" "NR" "1.44" "3.21" ...
$ Upper Primary_Girls: chr [1:110] "1.09" "1.54" "1.95" "3.51" ...
$ Upper Primary_Total: chr [1:110] "1.23" "0.51" "1.69" "3.36" ...
$ Secondary _Boys : chr [1:110] "5.57" "8.36" "11.47" "12.21" ...
$ Secondary _Girls : chr [1:110] "5.55" "5.98" "8.16" "13.25" ...
$ Secondary _Total : chr [1:110] "5.56" "7.2" "9.87" "12.72" ...
$ HrSecondary_Boys : chr [1:110] "17.66" "18.94" "21.05" "2.66" ...
$ HrSecondary_Girls : chr [1:110] "10.15" "12.2" "12.21" "NR" ...
$ HrSecondary_Total : chr [1:110] "14.14" "15.87" "16.93" "0.35" ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Boys = col_character(),
.. Primary_Girls = col_character(),
.. Primary_Total = col_character(),
.. `Upper Primary_Boys` = col_character(),
.. `Upper Primary_Girls` = col_character(),
.. `Upper Primary_Total` = col_character(),
.. `Secondary _Boys` = col_character(),
.. `Secondary _Girls` = col_character(),
.. `Secondary _Total` = col_character(),
.. HrSecondary_Boys = col_character(),
.. HrSecondary_Girls = col_character(),
.. HrSecondary_Total = col_character()
.. )
- attr(*, "problems")=<externalptr>
Tidying the data
library(janitor)
dropout_ratio <- clean_names(dropout_ratio)
View(dropout_ratio)
dim(dropout_ratio)
[1] 110 14
dropout_ratio[ dropout_ratio == "NR" ] <- NA
#dropout_ratio[ dropout_ratio == "upper_primary_boys" ] <- NA
dropout_ratio[ dropout_ratio == "Uppe_r_Primary" ] <- NA
dropout_ratio <- data.frame(dropout_ratio)
dropout_ratio <- na.exclude(dropout_ratio)
View(dropout_ratio)
dim(dropout_ratio)
[1] 55 14
dropout_ratio$primary_boys = as.numeric(dropout_ratio$primary_boys)
dropout_ratio$primary_girls = as.numeric(dropout_ratio$primary_girls)
dropout_ratio$primary_total = as.numeric(dropout_ratio$primary_total)
dropout_ratio$upper_primary_boys = as.numeric(dropout_ratio$upper_primary_boys)
dropout_ratio$upper_primary_girls = as.numeric(dropout_ratio$upper_primary_girls)
dropout_ratio$upper_primary_total = as.numeric(dropout_ratio$upper_primary_total)
dropout_ratio$secondary_boys = as.numeric(dropout_ratio$secondary_boys)
dropout_ratio$secondary_girls = as.numeric(dropout_ratio$secondary_girls)
dropout_ratio$secondary_total = as.numeric(dropout_ratio$secondary_total)
dropout_ratio$hr_secondary_boys = as.numeric(dropout_ratio$hr_secondary_boys)
dropout_ratio$hr_secondary_girls = as.numeric(dropout_ratio$hr_secondary_girls)
dropout_ratio$hr_secondary_total = as.numeric(dropout_ratio$hr_secondary_total)
'data.frame': 55 obs. of 14 variables:
$ state_ut : chr "A & N Islands" "Andhra Pradesh" "Arunachal Pradesh" "Arunachal Pradesh" ...
$ year : chr "2014-15" "2013-14" "2013-14" "2012-13" ...
$ primary_boys : num 0.47 4.31 11.54 15.84 11.51 ...
$ primary_girls : num 0.55 4.39 10.22 14.44 10.09 ...
$ primary_total : num 0.51 4.35 10.89 15.16 10.82 ...
$ upper_primary_boys : num 1.44 3.46 4.44 5.86 5.31 7.89 7.6 6.47 3.31 3.7 ...
$ upper_primary_girls: num 1.95 4.12 6.74 9.06 8.08 6.55 6.54 5.22 5.09 4.4 ...
$ upper_primary_total: num 1.69 3.78 5.59 7.47 6.71 7.2 7.05 5.85 4.13 4.02 ...
$ secondary_boys : num 11.5 11.9 16.1 14 18.3 ...
$ secondary_girls : num 8.16 13.37 12.75 11.77 15.81 ...
$ secondary_total : num 9.87 12.65 14.49 12.93 17.11 ...
$ hr_secondary_boys : num 21.05 12.65 18.57 7.85 19.37 ...
$ hr_secondary_girls : num 12.21 10.85 15.49 2.14 17.44 ...
$ hr_secondary_total : num 16.93 11.79 17.07 5.11 18.42 ...
- attr(*, "na.action")= 'exclude' Named int [1:55] 1 2 4 6 12 13 14 15 16 17 ...
..- attr(*, "names")= chr [1:55] "1" "2" "4" "6" ...
[1] 1 14
fig4 <- pivot_longer(all_india_drop, c(primary_boys, primary_girls, primary_total, upper_primary_boys, upper_primary_girls, upper_primary_total, secondary_boys, secondary_girls, secondary_total, hr_secondary_girls, hr_secondary_boys, hr_secondary_total), names_to = "EducationLevel")
ggplot(fig4, aes(x=value, y=EducationLevel, fill=value)) + geom_boxplot(color="red") + geom_text(aes(label=value), size=5) + labs(title = "fig4: All India Dropout Ratio - Education Wise ")
Reading Dataframe-3 | Percentage of Schools with access to computers
percentage_of_schools_with_comps <- read_csv("601 Major Project/percentage-of-schools-with-comps.csv")
View(percentage_of_schools_with_comps)
head(percentage_of_schools_with_comps)
# A tibble: 6 x 13
State_UT year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nico~ 2013~ 30.4 73.7 89.7
2 Andaman & Nico~ 2014~ 30.9 76.5 92.1
3 Andaman & Nico~ 2015~ 28.4 78.6 92.5
4 Andhra Pradesh 2013~ 12.7 42.7 87.0
5 Andhra Pradesh 2014~ 10.3 44.2 88.5
6 Andhra Pradesh 2015~ 11.5 44.8 89.5
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_comps)
[1] "State_UT"
[2] "year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
Datatype of each column
str(percentage_of_schools_with_comps)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
$ year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ Primary_Only : num [1:110] 30.4 30.9 28.4 12.7 10.3 ...
$ Primary_with_U_Primary : num [1:110] 73.7 76.5 78.6 42.7 44.1 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 89.7 92.1 92.5 87 88.5 ...
$ U_Primary_Only : num [1:110] 0 100 0 45.5 50 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 100 94.7 94.7 17.1 62.2 ...
$ Primary_with_U_Primary_Sec : num [1:110] 97.9 100 100 68.2 68.4 ...
$ U_Primary_With_Sec : num [1:110] 0 0 0 73.2 76.6 ...
$ Sec_Only : num [1:110] 0 0 0 60 71 ...
$ Sec_with_HrSec. : num [1:110] 100 100 100 33.3 66.7 ...
$ HrSec_Only : num [1:110] 0 0 0 19.3 41.6 ...
$ All Schools : num [1:110] 53.1 57.2 57 29.6 28.1 ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
Reading Dataframe-4 | Percentage of Schools with Electricity
percentage_of_schools_with_electricity <- read_csv("601 Major Project/percentage-of-schools-with-electricity.csv")
View(percentage_of_schools_with_electricity)
head(percentage_of_schools_with_electricity)
# A tibble: 6 x 13
State_UT year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nico~ 2013~ 82.4 96.0 100
2 Andaman & Nico~ 2014~ 80.7 96.3 100
3 Andaman & Nico~ 2015~ 82.1 97.6 100
4 Andhra Pradesh 2013~ 87.7 93.6 99.3
5 Andhra Pradesh 2014~ 91.1 94.7 100
6 Andhra Pradesh 2015~ 91.6 95.6 100
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_electricity)
[1] "State_UT"
[2] "year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
Datatype of each column
str(percentage_of_schools_with_electricity)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
$ year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ Primary_Only : num [1:110] 82.4 80.7 82.1 87.7 91.1 ...
$ Primary_with_U_Primary : num [1:110] 96 96.3 97.6 93.6 94.7 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.3 100 ...
$ U_Primary_Only : num [1:110] 0 100 0 100 100 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 100 100 100 67.5 86.1 ...
$ Primary_with_U_Primary_Sec : num [1:110] 100 100 100 96.2 97.6 ...
$ U_Primary_With_Sec : num [1:110] 0 0 0 96.2 97.1 ...
$ Sec_Only : num [1:110] 0 0 0 97.5 93.5 ...
$ Sec_with_HrSec. : num [1:110] 100 100 100 100 83.3 ...
$ HrSec_Only : num [1:110] 0 0 0 91.3 93.2 ...
$ All Schools : num [1:110] 88.9 88.9 90.1 90.3 92.8 ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
Reading Dataframe-5 | Percentage of Schools with water faciltity
percentage_of_schools_with_water_facility <- read_csv("601 Major Project/percentage-of-schools-with-water-facility.csv")
View(percentage_of_schools_with_water_facility)
head(percentage_of_schools_with_water_facility)
# A tibble: 6 x 13
`State/UT` Year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nico~ 2013~ 98.2 98.7 100
2 Andaman & Nico~ 2014~ 99.6 98.8 100
3 Andaman & Nico~ 2015~ 100 100 100
4 Andhra Pradesh 2013~ 86.9 94.5 99.7
5 Andhra Pradesh 2014~ 91.8 96.1 100
6 Andhra Pradesh 2015~ 93.9 97.0 100
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(percentage_of_schools_with_water_facility)
[1] "State/UT"
[2] "Year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
Datatype of each column
str(percentage_of_schools_with_water_facility)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State/UT : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
$ Year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ Primary_Only : num [1:110] 98.2 99.5 100 86.9 91.8 ...
$ Primary_with_U_Primary : num [1:110] 98.7 98.8 100 94.5 96.1 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.7 100 ...
$ U_Primary_Only : num [1:110] 0 100 0 90.9 100 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 100 100 100 87.3 90 ...
$ Primary_with_U_Primary_Sec : num [1:110] 100 100 100 98.8 99.6 ...
$ U_Primary_With_Sec : num [1:110] 0 0 0 96 97.5 ...
$ Sec_Only : num [1:110] 0 0 0 97.5 100 100 0 0 0 88.3 ...
$ Sec_with_HrSec. : num [1:110] 100 100 100 100 100 ...
$ HrSec_Only : num [1:110] 0 0 0 97.5 98.4 ...
$ All Schools : num [1:110] 98.7 99.5 100 90.3 93.7 ...
- attr(*, "spec")=
.. cols(
.. `State/UT` = col_character(),
.. Year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
Reading Dataframe-6 | Percentage of Schools with boys toilet
schools_with_boys_toilet <- read_csv("601 Major Project/schools-with-boys-toilet.csv")
View(schools_with_boys_toilet)
head(schools_with_boys_toilet)
# A tibble: 6 x 13
State_UT year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nico~ 2013~ 91.6 97.4 100
2 Andaman & Nico~ 2014~ 100 100 100
3 Andaman & Nico~ 2015~ 100 100 100
4 Andhra Pradesh 2013~ 53.0 62.6 82.0
5 Andhra Pradesh 2014~ 57.9 76.5 96
6 Andhra Pradesh 2015~ 99.6 99.9 99.0
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_boys_toilet)
[1] "State_UT"
[2] "year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
Datatype of each column
str(schools_with_boys_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
$ year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ Primary_Only : num [1:110] 91.6 100 100 53 57.9 ...
$ Primary_with_U_Primary : num [1:110] 97.4 100 100 62.6 76.5 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 82 96 ...
$ U_Primary_Only : num [1:110] 0 100 0 45.5 75 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 100 100 100 64.1 93.3 ...
$ Primary_with_U_Primary_Sec : num [1:110] 100 100 100 76.2 91.4 ...
$ U_Primary_With_Sec : num [1:110] 0 0 0 60.6 78 ...
$ Sec_Only : num [1:110] 0 0 0 59.3 80.7 ...
$ Sec_with_HrSec. : num [1:110] 100 100 100 85.7 60 ...
$ HrSec_Only : num [1:110] 0 0 0 73.4 86.5 ...
$ All Schools : num [1:110] 94.5 100 100 56.9 65.3 ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
Reading Dataframe-7 | Percentage of Schools with girls toilet
schools_with_girls_toilet <- read_csv("601 Major Project/schools-with-girls-toilet.csv")
View(schools_with_girls_toilet)
head(schools_with_girls_toilet)
# A tibble: 6 x 13
State_UT year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 All India 2013~ 88.7 96.0 98.8
2 All India 2014~ 91.2 96.9 99.5
3 All India 2015~ 97.0 99.0 99.7
4 Andaman & Nico~ 2013~ 89.7 97.4 100
5 Andaman & Nico~ 2014~ 100 100 100
6 Andaman & Nico~ 2015~ 100 100 100
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_girls_toilet)
[1] "State_UT"
[2] "year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
Datatype of each column
str(schools_with_girls_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "All India" "All India" "All India" "Andaman & Nicobar Islands" ...
$ year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ Primary_Only : num [1:110] 88.7 91.2 97 89.7 100 ...
$ Primary_with_U_Primary : num [1:110] 96 96.9 99 97.4 100 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 98.8 99.5 99.7 100 100 ...
$ U_Primary_Only : num [1:110] 91.4 91.4 96.3 0 100 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 98.2 99.2 99.6 100 100 ...
$ Primary_with_U_Primary_Sec : num [1:110] 97.3 98.2 99.3 100 100 ...
$ U_Primary_With_Sec : num [1:110] 94.4 96.6 98.8 0 0 ...
$ Sec_Only : num [1:110] 99.1 90.3 95.2 0 0 ...
$ Sec_with_HrSec. : num [1:110] 98.4 94 98.3 100 100 ...
$ HrSec_Only : num [1:110] 76.1 90.9 96.2 0 0 ...
$ All Schools : num [1:110] 91.2 93.1 97.5 93.4 100 ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
Preliminary Research Questions 1. Is there any impact on the school drop-out ratio of girls and boys based on available facilites across various states in India from 2013-2016. 2. Correlation between GER and Drop-out ratio. 3. State-wise, Year-Wise, Education Level-Wise analysis of all the 7 dataframes. 4. What suggestions can be proposed to the Indian Government based on the analysis.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pola (2022, May 4). Data Analytics and Computational Social Science: HW-4 | Descriptive Statistics and Few graphs. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomniharika898262/
BibTeX citation
@misc{pola2022hw-4, author = {Pola, Niharika}, title = {Data Analytics and Computational Social Science: HW-4 | Descriptive Statistics and Few graphs}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomniharika898262/}, year = {2022} }