Analysis of Indian Education System
INTRODUCTION
In this project I have worked on 7 data sets related to the Indian Education System from 2013-2016. First two data sets talk about the Gross Enrollment Ratio and Dropout Ratio, remaining 5 talk about the availability of basic facilities (Water, Electricity, Boys & Girls Toilets and Computers) in Schools. Every data set has State, Year and Percentage data across various levels of the Education - Primary, Upper Primary, Secondary and Higher Secondary.
Lower Primary/ Primary - Nursery to class 1st Upper Primary - Class 1st to 5th Secondary - Class 6th to 8th Higher Secondary/Higher Secondary - Class 9th and 10th
The aim of this project is to perform Exploratory Data Analysis(EDA) of the 7 data sets is to:
and provide few recommendations to the Indian Government based on the Analysis.
Compare the states with lowest dropout ratio with the available facilities data sets.
To find out the impact of non-availability of these facilities on the dropout ratio.
To analyze the trends of available facilities data sets across India.
Dataset-1 | Gross Enrollment Ratio from 2013-2016 across all Indian States
Gross Enrollment Ratio (GER) or Gross Enrollment Index (GEI) is a statistical measure used in the education sector, to determine the number of students enrolled in school at several different grade levels (like elementary, middle school and high school), and use it to show the ratio of the number of students who live in that country to those who qualify for the particular grade level.
The GER can be over 100% as it includes students who may be older or younger than the official age group.
For instance, in India it improved from 25.8 to 26.3, the GER includes students who are repeating a grade, those who enrolled late and are older than their classmates, or those who have advanced quickly and are younger than their classmates. This allows the total enrollment to exceed the population that corresponds to that level of education.
Calculation Method
a = number of students enrolled in a given level b = population of the age group corresponds to given level of education India
GER=a/b×100
Reading Dataset-1gross_enrollment_ratio <- read_csv("601 Major Project/gross-enrollment-ratio.csv")
dim(gross_enrollment_ratio)
[1] 110 14
head(gross_enrollment_ratio)
# A tibble: 6 x 14
State_UT Year Primary_Boys Primary_Girls Primary_Total
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nicobar Is~ 2013~ 95.9 92.0 93.9
2 Andhra Pradesh 2013~ 96.6 96.9 96.7
3 Arunachal Pradesh 2013~ 129. 128. 128.
4 Assam 2013~ 112. 115. 113.
5 Bihar 2013~ 95.0 101. 98.0
6 Chandigarh 2013~ 88.4 96.1 91.8
# ... with 9 more variables: Upper_Primary_Boys <dbl>,
# Upper_Primary_Girls <dbl>, Upper_Primary_Total <dbl>,
# Secondary_Boys <dbl>, Secondary_Girls <dbl>,
# Secondary_Total <dbl>, Higher_Secondary_Boys <chr>,
# Higher_Secondary_Girls <chr>, Higher_Secondary_Total <chr>
colnames(gross_enrollment_ratio)
[1] "State_UT" "Year"
[3] "Primary_Boys" "Primary_Girls"
[5] "Primary_Total" "Upper_Primary_Boys"
[7] "Upper_Primary_Girls" "Upper_Primary_Total"
[9] "Secondary_Boys" "Secondary_Girls"
[11] "Secondary_Total" "Higher_Secondary_Boys"
[13] "Higher_Secondary_Girls" "Higher_Secondary_Total"
Datatypes of each column
str(gross_enrollment_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
$ Year : chr [1:110] "2013-14" "2013-14" "2013-14" "2013-14" ...
$ Primary_Boys : num [1:110] 95.9 96.6 129.1 111.8 95 ...
$ Primary_Girls : num [1:110] 92 96.9 127.8 115.2 101.2 ...
$ Primary_Total : num [1:110] 93.9 96.7 128.5 113.4 98 ...
$ Upper_Primary_Boys : num [1:110] 94.7 82.8 112.6 87.8 80.6 ...
$ Upper_Primary_Girls : num [1:110] 89 84.4 115.3 98.7 94.9 ...
$ Upper_Primary_Total : num [1:110] 91.8 83.6 113.9 93.1 87.2 ...
$ Secondary_Boys : num [1:110] 102.9 73.8 88.4 65.6 57.7 ...
$ Secondary_Girls : num [1:110] 97.4 76.8 84.9 77.2 63 ...
$ Secondary_Total : num [1:110] 100.2 75.2 86.7 71.2 60.1 ...
$ Higher_Secondary_Boys : chr [1:110] "105.4" "59.83" "65.16" "31.78" ...
$ Higher_Secondary_Girls: chr [1:110] "96.61" "60.83" "65.38" "34.27" ...
$ Higher_Secondary_Total: chr [1:110] "101.28" "60.3" "65.27" "32.94" ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. Year = col_character(),
.. Primary_Boys = col_double(),
.. Primary_Girls = col_double(),
.. Primary_Total = col_double(),
.. Upper_Primary_Boys = col_double(),
.. Upper_Primary_Girls = col_double(),
.. Upper_Primary_Total = col_double(),
.. Secondary_Boys = col_double(),
.. Secondary_Girls = col_double(),
.. Secondary_Total = col_double(),
.. Higher_Secondary_Boys = col_character(),
.. Higher_Secondary_Girls = col_character(),
.. Higher_Secondary_Total = col_character()
.. )
- attr(*, "problems")=<externalptr>
As you can see, 3 columns (Higher_Secondary_Boys, Higher_Secondary_Girls, Higher_Secondary_Total) are character instead of double. They have NR, @ in the observations. The data needs to be cleaned.
Tidying the data
gross_enrollment_ratio[ gross_enrollment_ratio == "NR" ] <- NA
gross_enrollment_ratio[ gross_enrollment_ratio == "@" ] <- NA
ger1 <- data.frame(gross_enrollment_ratio)
ger <- na.exclude(ger1)
ger$Higher_Secondary_Boys = as.numeric(ger$Higher_Secondary_Boys)
ger$Higher_Secondary_Girls = as.numeric(ger$Higher_Secondary_Girls)
ger$Higher_Secondary_Total = as.numeric(ger$Higher_Secondary_Total)
str(ger)
'data.frame': 108 obs. of 14 variables:
$ State_UT : chr "Andaman & Nicobar Islands" "Andhra Pradesh" "Arunachal Pradesh" "Assam" ...
$ Year : chr "2013-14" "2013-14" "2013-14" "2013-14" ...
$ Primary_Boys : num 95.9 96.6 129.1 111.8 95 ...
$ Primary_Girls : num 92 96.9 127.8 115.2 101.2 ...
$ Primary_Total : num 93.9 96.7 128.5 113.4 98 ...
$ Upper_Primary_Boys : num 94.7 82.8 112.6 87.8 80.6 ...
$ Upper_Primary_Girls : num 89 84.4 115.3 98.7 94.9 ...
$ Upper_Primary_Total : num 91.8 83.6 113.9 93.1 87.2 ...
$ Secondary_Boys : num 102.9 73.8 88.4 65.6 57.7 ...
$ Secondary_Girls : num 97.4 76.8 84.9 77.2 63 ...
$ Secondary_Total : num 100.2 75.2 86.7 71.2 60.1 ...
$ Higher_Secondary_Boys : num 105.4 59.8 65.2 31.8 23.3 ...
$ Higher_Secondary_Girls: num 96.6 60.8 65.4 34.3 24.2 ...
$ Higher_Secondary_Total: num 101.3 60.3 65.3 32.9 23.7 ...
- attr(*, "na.action")= 'exclude' Named int [1:2] 26 99
..- attr(*, "names")= chr [1:2] "26" "99"
plotting All India girls enrollment ratio
Year Primary_Girls Upper_Primary_Girls Secondary_Girls
1 2013-14 102.65 92.75 76.47
2 2014-15 101.43 95.29 78.94
3 2015-16 100.69 97.57 80.97
Higher_Secondary_Girls
1 51.58
2 53.81
3 56.41
fig1 <- pivot_longer(all_india_ger_girls, c(Primary_Girls, Upper_Primary_Girls, Secondary_Girls, Higher_Secondary_Girls), names_to = "Education_Level", values_to = "GER")
ggplot(fig1, aes(x=Year, y=GER, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig1: Gross Enrollment Ratio of Girls in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + theme_classic()
Findings from Fig-1:
Girls who join Primary School have exceeded 100 GER which means the students also include who are repeating a grade, or those who enrolled late and are older than their classmates, or those who have advanced quickly and are younger than their classmates. This is a positive sign in a developing country like India.
There’s a considerable increase in GER from 2013-2016 for girls who are joining Upper Primary, Secondary, and Higher Secondary Schools. This is again a very positive sign.
Lowest is the Higher Secondary Girls enrollment with 51.58, 53.81, 56.41. The Indian Government should tae measures to increase this.
Plotting All India boys enrollment ratio
Year Primary_Boys Upper_Primary_Boys Secondary_Boys
1 2013-14 100.20 86.31 76.80
2 2014-15 98.85 87.71 78.13
3 2015-16 97.87 88.72 79.16
Higher_Secondary_Boys
1 52.77
2 54.57
3 55.95
fig2 <- pivot_longer(all_india_ger_boys, c(Primary_Boys, Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys), names_to = "Education_Level", values_to = "GER")
ggplot(fig2, aes(x=Year, y=GER, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig2: Gross Enrollment Ratio of boys in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + theme_classic()
Findings from Fig-2:
fig3 <- pivot_longer(all_india_ger, c(Primary_Boys, Primary_Girls, Primary_Total, Upper_Primary_Boys, Upper_Primary_Girls, Upper_Primary_Total, Secondary_Boys, Secondary_Girls, Secondary_Total, Higher_Secondary_Girls, Higher_Secondary_Boys, Higher_Secondary_Total), names_to = "Education_Level")
ggplot(fig3, aes(x=value, y=Education_Level)) + geom_boxplot(color="red") + geom_text(aes(label=value), size=4, vjust=0) + labs(title = "Fig-3: All India Enrollment Ratio - Education Wise ", x="Enrollment Percentage", y="Education Level") + facet_wrap(~Year)
Findings from Fig-3:
State_UT Year Primary_Boys Primary_Girls
1 Andaman & Nicobar Islands 2013-14 95.88 91.97
2 Andhra Pradesh 2013-14 96.62 96.87
3 Arunachal Pradesh 2013-14 129.12 127.77
4 Assam 2013-14 111.77 115.16
5 Bihar 2013-14 95.03 101.15
6 Chandigarh 2013-14 88.42 96.09
Primary_Total Upper_Primary_Boys Upper_Primary_Girls
1 93.93 94.70 88.98
2 96.74 82.81 84.38
3 128.46 112.64 115.27
4 113.43 87.85 98.69
5 97.96 80.60 94.92
6 91.85 99.93 103.02
Upper_Primary_Total Secondary_Boys Secondary_Girls Secondary_Total
1 91.83 102.89 97.36 100.16
2 83.57 73.76 76.77 75.20
3 113.94 88.37 84.89 86.65
4 93.13 65.60 77.20 71.21
5 87.24 57.66 62.96 60.08
6 101.27 92.08 92.16 92.11
Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
1 105.40 96.61 101.28
2 59.83 60.83 60.30
3 65.16 65.38 65.27
4 31.78 34.27 32.94
5 23.33 24.17 23.70
6 90.50 92.88 91.49
library(ggmap)
register_google(key = "AIzaSyDc2lDTQRLgvlGtdiZM6hkShq0fW_wv4-0")
coordinates <- geocode(states_ger$State_UT)
plot <- merge(states_ger,coordinates)
head(plot)
State_UT Year Primary_Boys Primary_Girls
1 Andaman & Nicobar Islands 2013-14 95.88 91.97
2 Andhra Pradesh 2013-14 96.62 96.87
3 Arunachal Pradesh 2013-14 129.12 127.77
4 Assam 2013-14 111.77 115.16
5 Bihar 2013-14 95.03 101.15
6 Chandigarh 2013-14 88.42 96.09
Primary_Total Upper_Primary_Boys Upper_Primary_Girls
1 93.93 94.70 88.98
2 96.74 82.81 84.38
3 128.46 112.64 115.27
4 113.43 87.85 98.69
5 97.96 80.60 94.92
6 91.85 99.93 103.02
Upper_Primary_Total Secondary_Boys Secondary_Girls Secondary_Total
1 91.83 102.89 97.36 100.16
2 83.57 73.76 76.77 75.20
3 113.94 88.37 84.89 86.65
4 93.13 65.60 77.20 71.21
5 87.24 57.66 62.96 60.08
6 101.27 92.08 92.16 92.11
Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
1 105.40 96.61 101.28
2 59.83 60.83 60.30
3 65.16 65.38 65.27
4 31.78 34.27 32.94
5 23.33 24.17 23.70
6 90.50 92.88 91.49
lon lat
1 92.65864 11.74009
2 92.65864 11.74009
3 92.65864 11.74009
4 92.65864 11.74009
5 92.65864 11.74009
6 92.65864 11.74009
The below map is a terrain style map of India. I wanted to integrate my data with a choropleth map, however i understood that R-Studio has pre-existing choropleth map for world and USA but not for other countries and ggmap supports very few map types - “terrain”, “satellite”, “hybrid” and “roadmap” but not choropleth. I feel this is a drawback for R-Studio as well as ggmaps.
map <- get_map(location = 'India', zoom = 5, maptype= 'terrain', scale = "auto")
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size= Primary_Boys, colour=Primary_Boys, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Upper_Primary_Boys, colour=Upper_Primary_Boys, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Secondary_Boys, color=Secondary_Boys, alpha=0.5 ))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Higher_Secondary_Boys, color=Higher_Secondary_Boys, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Primary_Girls, color=Primary_Girls, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Upper_Primary_Girls,color=Upper_Primary_Girls, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Secondary_Girls,color=Secondary_Girls, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Higher_Secondary_Girls,color=Higher_Secondary_Girls, alpha=0.5))
Findings from the maps:
The analysis would have been much more clear if the map is choropleth, I would take this as a scope for improvement in my next projects.
states_ger %>%
select(Year, State_UT, Primary_Boys,Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys) %>%
group_by(Year) %>%
summarise(avg_pb=mean(Primary_Boys), avg_upb=mean(Upper_Primary_Boys), avg_sb=mean(Secondary_Boys), avg_hsb=mean(Higher_Secondary_Boys))
# A tibble: 3 x 5
Year avg_pb avg_upb avg_sb avg_hsb
<chr> <dbl> <dbl> <dbl> <dbl>
1 2013-14 105. 96.9 87.2 60.0
2 2014-15 102. 97.0 88.0 60.4
3 2015-16 100. 98.1 86.9 58.2
states_ger %>%
select(Year, Primary_Girls,Upper_Primary_Girls, Secondary_Girls, Higher_Secondary_Girls) %>%
group_by(Year) %>%
summarise(avg_pb=mean(Primary_Girls), avg_upb=mean(Upper_Primary_Girls), avg_sb=mean(Secondary_Girls), avg_hsb=mean(Higher_Secondary_Girls))
# A tibble: 3 x 5
Year avg_pb avg_upb avg_sb avg_hsb
<chr> <dbl> <dbl> <dbl> <dbl>
1 2013-14 106. 99.8 88.0 60.5
2 2014-15 103. 102. 89.6 62.2
3 2015-16 101. 104. 89.4 61.8
Findings: * The overall mean enrollment percentage in Primary and Upper Primary levels is greater than or equal to 100% which is very good sign. * But, the overall mean percentage in Secondary and Higher Secondary is almost same in all the three years, this is where the government has to pitch in and take adequate measures.
My further analysis will focus on analyzing the dropout percentage and finding if we can get any correlation between the Gross Enrollment and Dropout.
Data Set-2 | Dropout Ratio/Percentage across all Indian States from 2013-2016
There are varying definitions on the web for Dropout Ratio. I will keep it simple here. Dropout Ratio simply means any student who leaves school for any reason before graduation or completion of a program of studies without transferring to another school.
Reading Dataset-2
# A tibble: 6 x 14
State_UT year Primary_Boys Primary_Girls Primary_Total
<chr> <chr> <chr> <chr> <chr>
1 A & N Islands 2012-13 0.83 0.51 0.68
2 A & N Islands 2013-14 1.35 1.06 1.21
3 A & N Islands 2014-15 0.47 0.55 0.51
4 Andhra Pradesh 2012-13 3.3 3.05 3.18
5 Andhra Pradesh 2013-14 4.31 4.39 4.35
6 Andhra Pradesh 2014-15 6.57 6.89 6.72
# ... with 9 more variables: `Upper Primary_Boys` <chr>,
# `Upper Primary_Girls` <chr>, `Upper Primary_Total` <chr>,
# `Secondary _Boys` <chr>, `Secondary _Girls` <chr>,
# `Secondary _Total` <chr>, HrSecondary_Boys <chr>,
# HrSecondary_Girls <chr>, HrSecondary_Total <chr>
colnames(dropout_ratio)
[1] "State_UT" "year" "Primary_Boys"
[4] "Primary_Girls" "Primary_Total" "Upper Primary_Boys"
[7] "Upper Primary_Girls" "Upper Primary_Total" "Secondary _Boys"
[10] "Secondary _Girls" "Secondary _Total" "HrSecondary_Boys"
[13] "HrSecondary_Girls" "HrSecondary_Total"
Datatype of each column
str(dropout_ratio)
spec_tbl_df [110 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "A & N Islands" "A & N Islands" "A & N Islands" "Andhra Pradesh" ...
$ year : chr [1:110] "2012-13" "2013-14" "2014-15" "2012-13" ...
$ Primary_Boys : chr [1:110] "0.83" "1.35" "0.47" "3.3" ...
$ Primary_Girls : chr [1:110] "0.51" "1.06" "0.55" "3.05" ...
$ Primary_Total : chr [1:110] "0.68" "1.21" "0.51" "3.18" ...
$ Upper Primary_Boys : chr [1:110] "Uppe_r_Primary" "NR" "1.44" "3.21" ...
$ Upper Primary_Girls: chr [1:110] "1.09" "1.54" "1.95" "3.51" ...
$ Upper Primary_Total: chr [1:110] "1.23" "0.51" "1.69" "3.36" ...
$ Secondary _Boys : chr [1:110] "5.57" "8.36" "11.47" "12.21" ...
$ Secondary _Girls : chr [1:110] "5.55" "5.98" "8.16" "13.25" ...
$ Secondary _Total : chr [1:110] "5.56" "7.2" "9.87" "12.72" ...
$ HrSecondary_Boys : chr [1:110] "17.66" "18.94" "21.05" "2.66" ...
$ HrSecondary_Girls : chr [1:110] "10.15" "12.2" "12.21" "NR" ...
$ HrSecondary_Total : chr [1:110] "14.14" "15.87" "16.93" "0.35" ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Boys = col_character(),
.. Primary_Girls = col_character(),
.. Primary_Total = col_character(),
.. `Upper Primary_Boys` = col_character(),
.. `Upper Primary_Girls` = col_character(),
.. `Upper Primary_Total` = col_character(),
.. `Secondary _Boys` = col_character(),
.. `Secondary _Girls` = col_character(),
.. `Secondary _Total` = col_character(),
.. HrSecondary_Boys = col_character(),
.. HrSecondary_Girls = col_character(),
.. HrSecondary_Total = col_character()
.. )
- attr(*, "problems")=<externalptr>
Tidying the data
library(janitor)
dropout_ratio <- clean_names(dropout_ratio)
dim(dropout_ratio)
[1] 110 14
dropout_ratio[ dropout_ratio == "NR" ] <- NA
#dropout_ratio[ dropout_ratio == "upper_primary_boys" ] <- NA
dropout_ratio[ dropout_ratio == "Uppe_r_Primary" ] <- NA
dropout_ratio <- data.frame(dropout_ratio)
dropout_ratio <- na.exclude(dropout_ratio)
dim(dropout_ratio)
[1] 55 14
dropout_ratio$primary_boys = as.numeric(dropout_ratio$primary_boys)
dropout_ratio$primary_girls = as.numeric(dropout_ratio$primary_girls)
dropout_ratio$primary_total = as.numeric(dropout_ratio$primary_total)
dropout_ratio$upper_primary_boys = as.numeric(dropout_ratio$upper_primary_boys)
dropout_ratio$upper_primary_girls = as.numeric(dropout_ratio$upper_primary_girls)
dropout_ratio$upper_primary_total = as.numeric(dropout_ratio$upper_primary_total)
dropout_ratio$secondary_boys = as.numeric(dropout_ratio$secondary_boys)
dropout_ratio$secondary_girls = as.numeric(dropout_ratio$secondary_girls)
dropout_ratio$secondary_total = as.numeric(dropout_ratio$secondary_total)
dropout_ratio$hr_secondary_boys = as.numeric(dropout_ratio$hr_secondary_boys)
dropout_ratio$hr_secondary_girls = as.numeric(dropout_ratio$hr_secondary_girls)
dropout_ratio$hr_secondary_total = as.numeric(dropout_ratio$hr_secondary_total)
str(dropout_ratio)
'data.frame': 55 obs. of 14 variables:
$ state_ut : chr "A & N Islands" "Andhra Pradesh" "Arunachal Pradesh" "Arunachal Pradesh" ...
$ year : chr "2014-15" "2013-14" "2013-14" "2012-13" ...
$ primary_boys : num 0.47 4.31 11.54 15.84 11.51 ...
$ primary_girls : num 0.55 4.39 10.22 14.44 10.09 ...
$ primary_total : num 0.51 4.35 10.89 15.16 10.82 ...
$ upper_primary_boys : num 1.44 3.46 4.44 5.86 5.31 7.89 7.6 6.47 3.31 3.7 ...
$ upper_primary_girls: num 1.95 4.12 6.74 9.06 8.08 6.55 6.54 5.22 5.09 4.4 ...
$ upper_primary_total: num 1.69 3.78 5.59 7.47 6.71 7.2 7.05 5.85 4.13 4.02 ...
$ secondary_boys : num 11.5 11.9 16.1 14 18.3 ...
$ secondary_girls : num 8.16 13.37 12.75 11.77 15.81 ...
$ secondary_total : num 9.87 12.65 14.49 12.93 17.11 ...
$ hr_secondary_boys : num 21.05 12.65 18.57 7.85 19.37 ...
$ hr_secondary_girls : num 12.21 10.85 15.49 2.14 17.44 ...
$ hr_secondary_total : num 16.93 11.79 17.07 5.11 18.42 ...
- attr(*, "na.action")= 'exclude' Named int [1:55] 1 2 4 6 12 13 14 15 16 17 ...
..- attr(*, "names")= chr [1:55] "1" "2" "4" "6" ...
[1] 1 14
fig4 <- pivot_longer(all_india_drop, c(primary_boys, primary_girls, primary_total, upper_primary_boys, upper_primary_girls, upper_primary_total, secondary_boys, secondary_girls, secondary_total, hr_secondary_girls, hr_secondary_boys, hr_secondary_total), names_to = "EducationLevel")
ggplot(fig4, aes(x=value, y=EducationLevel)) + geom_boxplot(color="red") + geom_text(aes(label=value), size=4) + labs(title = "Fig-4: All India Dropout Ratio - Education Wise ", x="Dropout Percentage", y="Education Level") + facet_wrap(~year)
Findings:
Correlation between Gross Enrollment Ratio and Dropout Ratios: * Comparing Fig-3 and Fig-4, Higher Secondary level has the lowest Enrollment and Secondary Level has the highest dropout. * We can draw a conclusion from this that, the highest dropouts in Secondary is leading to the lowest enrollment in the Higher secondary schools. Indian Government needs to take measures and implement schemes or improve facilities in these two levels.
state_ut year primary_boys primary_girls primary_total
1 A & N Islands 2014-15 0.47 0.55 0.51
2 Andhra Pradesh 2013-14 4.31 4.39 4.35
3 Arunachal Pradesh 2013-14 11.54 10.22 10.89
4 Arunachal Pradesh 2012-13 15.84 14.44 15.16
5 Arunachal Pradesh 2014-15 11.51 10.09 10.82
6 Assam 2012-13 7.02 5.46 6.24
upper_primary_boys upper_primary_girls upper_primary_total
1 1.44 1.95 1.69
2 3.46 4.12 3.78
3 4.44 6.74 5.59
4 5.86 9.06 7.47
5 5.31 8.08 6.71
6 7.89 6.55 7.20
secondary_boys secondary_girls secondary_total hr_secondary_boys
1 11.47 8.16 9.87 21.05
2 11.95 13.37 12.65 12.65
3 16.08 12.75 14.49 18.57
4 13.99 11.77 12.93 7.85
5 18.33 15.81 17.11 19.37
6 25.65 27.79 26.77 4.87
hr_secondary_girls hr_secondary_total
1 12.21 16.93
2 10.85 11.79
3 15.49 17.07
4 2.14 5.11
5 17.44 18.42
6 4.50 4.69
primary_boys_drop <- states_drop[c("state_ut", "year", "primary_boys")]
slice_min(primary_boys_drop, primary_boys)
state_ut year primary_boys
1 Gujarat 2012-13 0.21
top10 <- arrange(primary_boys_drop, desc(primary_boys))
top10 <- slice_head(top10, n=10)
top10
state_ut year primary_boys
1 Nagaland 2013-14 19.09
2 Manipur 2013-14 17.27
3 Arunachal Pradesh 2012-13 15.84
4 Arunachal Pradesh 2013-14 11.54
5 Arunachal Pradesh 2014-15 11.51
6 Manipur 2012-13 10.24
7 Mizoram 2014-15 10.17
8 Madhya Pradesh 2013-14 9.91
9 Uttar Pradesh 2014-15 9.08
10 Assam 2013-14 8.19
ggplot(top10, aes(x=year, y=primary_boys, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-5: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - Primary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=primary_boys), size = 4, position = position_dodge(width = .9), vjust = 0, color = "white") + theme_dark() + facet_wrap(~state_ut)
primary_girls_drop <- states_drop[c("state_ut", "year", "primary_girls")]
slice_min(primary_girls_drop, primary_girls)
state_ut year primary_girls
1 Daman & Diu 2014-15 0.29
top10 <- arrange(primary_girls_drop, desc(primary_girls))
top10 <- slice_head(top10, n=10)
top10
state_ut year primary_girls
1 Nagaland 2013-14 19.74
2 Manipur 2013-14 18.74
3 Arunachal Pradesh 2012-13 14.44
4 Madhya Pradesh 2013-14 10.40
5 Arunachal Pradesh 2013-14 10.22
6 Arunachal Pradesh 2014-15 10.09
7 Mizoram 2014-15 10.03
8 Manipur 2012-13 9.48
9 Uttar Pradesh 2014-15 8.04
10 Nagaland 2012-13 7.03
ggplot(top10, aes(x=year, y=primary_girls, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-6: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - Primary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=primary_girls), size = 4, position = position_dodge(width = .9), vjust = 0, color = "white") + theme_dark() + facet_wrap(~state_ut)
Analysis of dropout ratio of Upper Primary Boys
upper_primary_boys_drop <- states_drop[c("state_ut", "year", "upper_primary_boys")]
slice_min(upper_primary_boys_drop, upper_primary_boys)
state_ut year upper_primary_boys
1 Puducherry 2012-13 0.33
top10 <- arrange(upper_primary_boys_drop, desc(upper_primary_boys))
top10 <- slice_head(top10, n=10)
top10
state_ut year upper_primary_boys
1 Nagaland 2013-14 18.08
2 Nagaland 2012-13 10.15
3 Madhya Pradesh 2013-14 9.88
4 Jharkhand 2014-15 9.01
5 Assam 2012-13 7.89
6 Nagaland 2014-15 7.87
7 Assam 2013-14 7.60
8 Manipur 2013-14 7.48
9 Chhattisgarh 2014-15 6.47
10 Sikkim 2013-14 6.35
ggplot(top10, aes(x=year, y=upper_primary_boys, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-7: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - Upper Primary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=upper_primary_boys), size = 4, position = position_dodge(width = .9), vjust = 0, color = "white") + theme_dark() + facet_wrap(~state_ut)
Analysis of dropout ratio of upper primary girls
upper_primary_girls_drop <- states_drop[c("state_ut", "year", "upper_primary_girls")]
slice_min(upper_primary_girls_drop, upper_primary_girls)
state_ut year upper_primary_girls
1 Himachal Pradesh 2012-13 0.49
top10 <- arrange(upper_primary_girls_drop, desc(upper_primary_girls))
top10 <- slice_head(top10, n=10)
top10
state_ut year upper_primary_girls
1 Nagaland 2013-14 17.63
2 Madhya Pradesh 2013-14 13.57
3 Nagaland 2012-13 9.51
4 Arunachal Pradesh 2012-13 9.06
5 Jharkhand 2014-15 8.96
6 Gujarat 2014-15 8.54
7 Gujarat 2012-13 8.19
8 Arunachal Pradesh 2014-15 8.08
9 Gujarat 2013-14 8.04
10 Nagaland 2014-15 7.97
ggplot(top10, aes(x=year, y=upper_primary_girls, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-8: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - Upper Primary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=upper_primary_girls), size = 4, position = position_dodge(width = .9), vjust = 0, color = "white") + theme_dark() + facet_wrap(~state_ut)
secondary_boys_drop <- states_drop[c("state_ut", "year", "secondary_boys")]
slice_min(secondary_boys_drop, secondary_boys)
state_ut year secondary_boys
1 Himachal Pradesh 2014-15 6.31
top10 <- arrange(secondary_boys_drop, desc(secondary_boys))
top10 <- slice_head(top10, n=10)
top10
state_ut year secondary_boys
1 Karnataka 2012-13 40.70
2 Daman & Diu 2014-15 34.45
3 Nagaland 2013-14 34.14
4 Dadra & Nagar Haveli 2013-14 30.02
5 Assam 2013-14 28.59
6 Tripura 2014-15 28.03
7 Nagaland 2012-13 26.70
8 Gujarat 2014-15 26.29
9 Assam 2012-13 25.65
10 Madhya Pradesh 2013-14 25.21
ggplot(top10, aes(x=year, y=secondary_boys, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-10: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - secondary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=secondary_boys), size = 4, position = position_dodge(width = .9), vjust = 0, color = "white") + theme_dark() + facet_wrap(~state_ut)
secondary_girls_drop <- states_drop[c("state_ut", "year", "secondary_girls")]
slice_min(secondary_girls_drop, secondary_girls)
state_ut year secondary_girls
1 Himachal Pradesh 2014-15 5.8
top10 <- arrange(secondary_girls_drop, desc(secondary_girls))
top10 <- slice_head(top10, n=10)
top10
state_ut year secondary_girls
1 Karnataka 2012-13 39.07
2 Nagaland 2013-14 36.08
3 Assam 2013-14 32.10
4 Daman & Diu 2014-15 29.73
5 Tripura 2014-15 28.83
6 Madhya Pradesh 2013-14 27.91
7 Assam 2012-13 27.79
8 Tripura 2012-13 26.99
9 Dadra & Nagar Haveli 2013-14 26.83
10 Nagaland 2012-13 26.33
ggplot(top10, aes(x=year, y=secondary_girls, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-11: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - secondary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=secondary_girls), size = 4, position = position_dodge(width = .9), vjust = 0, color = "white") + theme_dark() + facet_wrap(~state_ut)
hr_secondary_boys_drop <- states_drop[c("state_ut", "year", "hr_secondary_boys")]
slice_min(hr_secondary_boys_drop, hr_secondary_boys)
state_ut year hr_secondary_boys
1 Madhya Pradesh 2013-14 0.52
top10 <- arrange(hr_secondary_boys_drop, desc(hr_secondary_boys))
top10 <- slice_head(top10, n=10)
top10
state_ut year hr_secondary_boys
1 Daman & Diu 2014-15 44.38
2 A & N Islands 2014-15 21.05
3 Karnataka 2012-13 19.47
4 Arunachal Pradesh 2014-15 19.37
5 Nagaland 2012-13 18.67
6 Arunachal Pradesh 2013-14 18.57
7 Nagaland 2013-14 15.36
8 Daman & Diu 2013-14 14.48
9 Sikkim 2013-14 14.11
10 Jammu & Kashmir 2014-15 13.85
ggplot(top10, aes(x=year, y=hr_secondary_boys, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-12: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - Higher secondary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=hr_secondary_boys), size = 4, position = position_dodge(width = .9), vjust = 0, color = "white") + theme_dark() + facet_wrap(~state_ut)
hr_secondary_girls_drop <- states_drop[c("state_ut", "year", "hr_secondary_girls")]
slice_min(hr_secondary_girls_drop, hr_secondary_girls)
state_ut year hr_secondary_girls
1 Gujarat 2012-13 0.3
top10 <- arrange(hr_secondary_girls_drop, desc(hr_secondary_girls))
top10 <- slice_head(top10, n=10)
top10
state_ut year hr_secondary_girls
1 Daman & Diu 2014-15 36.05
2 Nagaland 2012-13 17.87
3 Arunachal Pradesh 2014-15 17.44
4 Arunachal Pradesh 2013-14 15.49
5 Telangana 2013-14 13.20
6 Nagaland 2013-14 12.96
7 A & N Islands 2014-15 12.21
8 Sikkim 2013-14 11.92
9 Karnataka 2012-13 11.26
10 Jammu & Kashmir 2014-15 11.20
ggplot(top10, aes(x=year, y=hr_secondary_girls, fill=year)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-13: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - Higher secondary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=hr_secondary_girls), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut, scales = "free_y")
Reading Dataframe-3 | Percentage of Schools with access to computers
schools_with_comps <- read_csv("601 Major Project/percentage-of-schools-with-comps.csv")
colnames(schools_with_comps)
[1] "State_UT"
[2] "year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
schools_with_comps <- rename(schools_with_comps, primary="Primary_Only", upper_primary="U_Primary_Only", secondary="Sec_Only", hr_secondary="HrSec_Only")
All_India <- filter(schools_with_comps, State_UT=="All India") %>%
select(year, primary, upper_primary, secondary, hr_secondary)
All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage")
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Percentage of Schools with access to Computer facility all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + theme_classic() + facet_wrap(~Education_Level)
library(tidytext)
primary_wise<- select(schools_with_comps, State_UT, year, primary) %>%
filter(State_UT != "All India")
primary_wise <- arrange(primary_wise, primary)
primary_wise <- slice_head(primary_wise, n=50)
ggplot(primary_wise, aes(x=primary, y=State_UT, fill=State_UT))+geom_bar(stat="identity")+facet_wrap(~year) + labs(title = "States with lowest percentage of Computer Facility", subtitle = "Education Level - Primary", x="percentage", y="State name") + geom_text(aes(label=primary), size = 3, position = position_dodge(width = .9), vjust = 0, color = "black")
library(kableExtra)
upper_primary_wise<- select(schools_with_comps, State_UT, year, upper_primary) %>%
filter(State_UT != "All India") %>%
arrange(upper_primary) %>%
slice(1:20)
kable(upper_primary_wise, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Percentage"), caption = "Table1 : State-wise Percentage of Upper Primary Schools having lowest access to computers") %>%
kable_styling(font_size = 15) %>%
row_spec(c(1,1,1))
State/Union Territory | Year | Percentage |
---|---|---|
Andaman & Nicobar Islands | 2013-14 | 0.00 |
Andaman & Nicobar Islands | 2015-16 | 0.00 |
Chandigarh | 2013-14 | 0.00 |
Chandigarh | 2014-15 | 0.00 |
Chandigarh | 2015-16 | 0.00 |
Puducherry | 2013-14 | 0.00 |
Sikkim | 2015-16 | 0.00 |
Telangana | 2015-16 | 0.00 |
West Bengal | 2013-14 | 7.93 |
Odisha | 2013-14 | 8.24 |
Jammu And Kashmir | 2013-14 | 8.82 |
Odisha | 2014-15 | 9.19 |
Odisha | 2015-16 | 9.28 |
West Bengal | 2014-15 | 9.32 |
West Bengal | 2015-16 | 9.97 |
Bihar | 2013-14 | 10.79 |
Jammu And Kashmir | 2014-15 | 11.19 |
Jammu And Kashmir | 2015-16 | 11.28 |
Bihar | 2015-16 | 11.64 |
Bihar | 2014-15 | 11.65 |
nagaland <- filter(schools_with_comps, State_UT=="Nagaland")
nagaland <- select(nagaland, State_UT, year, primary, upper_primary)
nagaland
# A tibble: 3 x 4
State_UT year primary upper_primary
<chr> <chr> <dbl> <dbl>
1 Nagaland 2013-14 4.98 59.1
2 Nagaland 2014-15 4.97 68.8
3 Nagaland 2015-16 5.53 76.9
ggplot() + geom_line(data=nagaland, mapping=aes(x=year, y=primary, group=State_UT), size=1, color="red") + geom_point(data=nagaland, mapping=aes(x=year, y=primary, group=State_UT), color="black") +
geom_line(data=nagaland, mapping=aes(x=year, y=upper_primary, group=State_UT), color="blue", size=1) + geom_point(data=nagaland, mapping=aes(x=year, y=upper_primary, group=State_UT), color="black") + labs(title = "Percentage of Schools with access to Computers in Nagaland State", subtitle = "Education Level - Primary and Upper Primary", x="year", y="dropout percentage")
secondary_wise<- select(schools_with_comps, State_UT, year, secondary) %>%
filter(State_UT != "All India")
secondary_wise <- arrange(secondary_wise, desc(secondary))
secondary_wise
# A tibble: 107 x 3
State_UT year secondary
<chr> <chr> <dbl>
1 Daman & Diu 2013-14 100
2 Himachal Pradesh 2013-14 100
3 Kerala 2014-15 100
4 Kerala 2015-16 100
5 Nagaland 2015-16 100
6 Punjab 2013-14 100
7 Daman & Diu 2014-15 92.3
8 Daman & Diu 2015-16 92.3
9 Maharashtra 2015-16 90.5
10 Maharashtra 2014-15 88.4
# ... with 97 more rows
secondary_wise <- slice_head(secondary_wise, n=40)
ggplot(secondary_wise, aes(x=secondary, y=State_UT, fill=State_UT))+geom_bar(stat="identity")+facet_wrap(~year) + labs(title = "States with highest percentage of Computer Facility", subtitle = "Education Level - secondary", x="percentage", y="State name") + geom_text(aes(label=secondary), size = 3, position = position_dodge(width = .1), vjust = 0, color = "black")
karnataka <- filter(schools_with_comps, State_UT=="Karnataka")
karnataka <- select(karnataka, State_UT, year, secondary)
karnataka
# A tibble: 3 x 3
State_UT year secondary
<chr> <chr> <dbl>
1 Karnataka 2013-14 67.0
2 Karnataka 2014-15 69.9
3 Karnataka 2015-16 69.3
ggplot(karnataka, aes(x=year, y=secondary, group=State_UT)) + geom_line(size=1, color="purple") + geom_point() + geom_text(aes(label=secondary), size = 5) + labs(title = "Percentage of Schools with access to Computers in the state of Karnataka", subtitle = "Education Level - Secondary", x="year", y="Percentage")
hr_secondary_wise<- select(schools_with_comps, State_UT, year, hr_secondary) %>%
filter(State_UT != "All India") %>%
arrange(hr_secondary) %>%
slice(1:20)
kable(hr_secondary_wise, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Percentage"), caption = "Table2 : State-wise Percentage of Higher Secondary Schools having zero access to computers") %>%
kable_styling(font_size = 15) %>%
row_spec(c(1,1,1))
State/Union Territory | Year | Percentage |
---|---|---|
Andaman & Nicobar Islands | 2013-14 | 0 |
Andaman & Nicobar Islands | 2014-15 | 0 |
Andaman & Nicobar Islands | 2015-16 | 0 |
Arunachal Pradesh | 2014-15 | 0 |
Arunachal Pradesh | 2015-16 | 0 |
Chandigarh | 2013-14 | 0 |
Chandigarh | 2014-15 | 0 |
Chandigarh | 2015-16 | 0 |
Chhattisgarh | 2013-14 | 0 |
Dadra & Nagar Haveli | 2013-14 | 0 |
Dadra & Nagar Haveli | 2014-15 | 0 |
Dadra & Nagar Haveli | 2015-16 | 0 |
Delhi | 2013-14 | 0 |
Delhi | 2014-15 | 0 |
Haryana | 2013-14 | 0 |
Lakshadweep | 2013-14 | 0 |
Lakshadweep | 2014-15 | 0 |
Lakshadweep | 2015-16 | 0 |
Odisha | 2013-14 | 0 |
Odisha | 2014-15 | 0 |
Reading Dataframe-4 | Percentage of Schools with Electricity
schools_with_electricity <- read_csv("601 Major Project/percentage-of-schools-with-electricity.csv")
head(schools_with_electricity)
# A tibble: 6 x 13
State_UT year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nico~ 2013~ 82.4 96.0 100
2 Andaman & Nico~ 2014~ 80.7 96.3 100
3 Andaman & Nico~ 2015~ 82.1 97.6 100
4 Andhra Pradesh 2013~ 87.7 93.6 99.3
5 Andhra Pradesh 2014~ 91.1 94.7 100
6 Andhra Pradesh 2015~ 91.6 95.6 100
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_electricity)
[1] "State_UT"
[2] "year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
schools_with_electricity <- rename(schools_with_electricity, primary="Primary_Only", upper_primary="U_Primary_Only", secondary="Sec_Only", hr_secondary="HrSec_Only")
Datatype of each column
str(schools_with_electricity)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
$ year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ primary : num [1:110] 82.4 80.7 82.1 87.7 91.1 ...
$ Primary_with_U_Primary : num [1:110] 96 96.3 97.6 93.6 94.7 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.3 100 ...
$ upper_primary : num [1:110] 0 100 0 100 100 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 100 100 100 67.5 86.1 ...
$ Primary_with_U_Primary_Sec : num [1:110] 100 100 100 96.2 97.6 ...
$ U_Primary_With_Sec : num [1:110] 0 0 0 96.2 97.1 ...
$ secondary : num [1:110] 0 0 0 97.5 93.5 ...
$ Sec_with_HrSec. : num [1:110] 100 100 100 100 83.3 ...
$ hr_secondary : num [1:110] 0 0 0 91.3 93.2 ...
$ All Schools : num [1:110] 88.9 88.9 90.1 90.3 92.8 ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
All_India <- filter(schools_with_electricity, State_UT=="All India") %>%
select(year, primary, upper_primary, secondary, hr_secondary)
All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage")
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Percentage of Schools with access to Electricity all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + theme_classic() + facet_wrap(~Education_Level)
states_electricity <- filter(schools_with_electricity, State_UT != "All India")
nagaland <- filter(states_electricity, State_UT == "Nagaland") %>%
select(year, primary, upper_primary )
nagaland <- pivot_longer(nagaland, c(primary,upper_primary), names_to = "Education_Level", values_to = "Percentage")
ggplot(nagaland, aes(x=year, y=Percentage, fill=Education_Level)) + geom_bar(position="dodge", stat="identity") + coord_polar() + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + labs(title = "Percentage of Schools with access to Electricity in the state of Nagaland", subtitle = "Education Level - Primary and Upper Primary" )
karnataka <- filter(states_electricity, State_UT == "Karnataka") %>%
select(year, secondary, State_UT )
ggplot(karnataka, aes(x=year, y=secondary, group=State_UT)) + geom_line(color="red", size=1) + geom_point() + geom_text(aes(label=secondary), size = 4, position = position_dodge(width = .6), vjust = 0, color = "black") + labs(title = "Percentage of Schools with access to Electricity in the state of Karnataka", subtitle = "Education Level - Secondary" )
states_electricity
# A tibble: 107 x 13
State_UT year primary Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nicobar ~ 2013~ 82.4 96.0 100
2 Andaman & Nicobar ~ 2014~ 80.7 96.3 100
3 Andaman & Nicobar ~ 2015~ 82.1 97.6 100
4 Andhra Pradesh 2013~ 87.7 93.6 99.3
5 Andhra Pradesh 2014~ 91.1 94.7 100
6 Andhra Pradesh 2015~ 91.6 95.6 100
7 Arunachal Pradesh 2013~ 19.7 53.6 92.2
8 Arunachal Pradesh 2014~ 21.5 55.0 96.8
9 Arunachal Pradesh 2015~ 22.6 53.9 95.5
10 Assam 2013~ 9.51 51.1 81.2
# ... with 97 more rows, and 8 more variables: upper_primary <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, secondary <dbl>, Sec_with_HrSec. <dbl>,
# hr_secondary <dbl>, `All Schools` <dbl>
Daman_Diu <- filter(states_electricity, State_UT == "Daman & Diu") %>%
select(State_UT, year, hr_secondary )
kable(Daman_Diu, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Percentage"), caption = "Table4 : Daman & Diu Percentage of Schools with Electricity") %>%
kable_styling(font_size = 15) %>%
row_spec(c(1,1,1))
State/Union Territory | Year | Percentage |
---|---|---|
Daman & Diu | 2013-14 | 100 |
Daman & Diu | 2014-15 | 100 |
Daman & Diu | 2015-16 | 100 |
Reading Dataframe-5 | Percentage of Schools with water faciltity
schools_with_water <- read_csv("601 Major Project/percentage-of-schools-with-water-facility.csv")
head(schools_with_water)
# A tibble: 6 x 13
`State/UT` Year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nico~ 2013~ 98.2 98.7 100
2 Andaman & Nico~ 2014~ 99.6 98.8 100
3 Andaman & Nico~ 2015~ 100 100 100
4 Andhra Pradesh 2013~ 86.9 94.5 99.7
5 Andhra Pradesh 2014~ 91.8 96.1 100
6 Andhra Pradesh 2015~ 93.9 97.0 100
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_water)
[1] "State/UT"
[2] "Year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
schools_with_water <- rename(schools_with_water, State_UT="State/UT", primary="Primary_Only", upper_primary="U_Primary_Only", secondary="Sec_Only", hr_secondary="HrSec_Only" )
Datatype of each column
str(schools_with_water)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
$ Year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ primary : num [1:110] 98.2 99.5 100 86.9 91.8 ...
$ Primary_with_U_Primary : num [1:110] 98.7 98.8 100 94.5 96.1 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 99.7 100 ...
$ upper_primary : num [1:110] 0 100 0 90.9 100 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 100 100 100 87.3 90 ...
$ Primary_with_U_Primary_Sec : num [1:110] 100 100 100 98.8 99.6 ...
$ U_Primary_With_Sec : num [1:110] 0 0 0 96 97.5 ...
$ secondary : num [1:110] 0 0 0 97.5 100 100 0 0 0 88.3 ...
$ Sec_with_HrSec. : num [1:110] 100 100 100 100 100 ...
$ hr_secondary : num [1:110] 0 0 0 97.5 98.4 ...
$ All Schools : num [1:110] 98.7 99.5 100 90.3 93.7 ...
- attr(*, "spec")=
.. cols(
.. `State/UT` = col_character(),
.. Year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
All_India <- filter(schools_with_water, State_UT=="All India") %>%
select(Year, primary, upper_primary, secondary, hr_secondary)
All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage")
ggplot(All_India, aes(x=Year, y=Percentage, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Percentage of Schools Water Facility all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .98), vjust = 0, color = "black") + theme_classic() + facet_wrap(~Education_Level)
states_water <- filter(schools_with_water, State_UT != "All India")
nagaland <- filter(states_water, State_UT == "Nagaland") %>%
select(Year, primary, upper_primary )
nagaland <- pivot_longer(nagaland, c(primary,upper_primary), names_to = "Education_Level", values_to = "Percentage")
ggplot(nagaland, aes(x=Year, y=Percentage, fill=Education_Level)) + geom_bar(position="dodge", stat="identity") + coord_polar() + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + labs(title = "Percentage of Schools with access to Drinking Water in the state of Nagaland", subtitle = "Education Level - Primary and Upper Primary" )
karnataka <- filter(schools_with_water, State_UT == "Karnataka") %>%
select(Year, secondary, State_UT )
ggplot(karnataka, aes(x=Year, y=secondary, group=State_UT)) + geom_line(color="red", size=1) + geom_point() + geom_text(aes(label=secondary), size = 4, position = position_dodge(width = .6), vjust = 0, color = "black") + labs(title = "Percentage of Schools with access to Drinking Water in the state of Karnataka", subtitle = "Education Level - Secondary" )
Daman_Diu <- filter(schools_with_water, State_UT == "Daman & Diu") %>%
select(State_UT, Year, hr_secondary )
kable(Daman_Diu, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Percentage"), caption = "Table4 : Daman & Diu Percentage of Schools with Drinking Water Facility") %>%
kable_styling(font_size = 15) %>%
row_spec(c(1,1,1))
State/Union Territory | Year | Percentage |
---|---|---|
Daman & Diu | 2013-14 | 100 |
Daman & Diu | 2014-15 | 100 |
Daman & Diu | 2015-16 | 100 |
Reading Dataframe-6 | Percentage of Schools with boys toilet
schools_with_boys_toilet <- read_csv("601 Major Project/schools-with-boys-toilet.csv")
head(schools_with_boys_toilet)
# A tibble: 6 x 13
State_UT year Primary_Only Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 Andaman & Nico~ 2013~ 91.6 97.4 100
2 Andaman & Nico~ 2014~ 100 100 100
3 Andaman & Nico~ 2015~ 100 100 100
4 Andhra Pradesh 2013~ 53.0 62.6 82.0
5 Andhra Pradesh 2014~ 57.9 76.5 96
6 Andhra Pradesh 2015~ 99.6 99.9 99.0
# ... with 8 more variables: U_Primary_Only <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, Sec_Only <dbl>, Sec_with_HrSec. <dbl>,
# HrSec_Only <dbl>, `All Schools` <dbl>
colnames(schools_with_boys_toilet)
[1] "State_UT"
[2] "year"
[3] "Primary_Only"
[4] "Primary_with_U_Primary"
[5] "Primary_with_U_Primary_Sec_HrSec"
[6] "U_Primary_Only"
[7] "U_Primary_With_Sec_HrSec"
[8] "Primary_with_U_Primary_Sec"
[9] "U_Primary_With_Sec"
[10] "Sec_Only"
[11] "Sec_with_HrSec."
[12] "HrSec_Only"
[13] "All Schools"
schools_with_boys_toilet <- rename(schools_with_boys_toilet, primary="Primary_Only", upper_primary="U_Primary_Only", secondary="Sec_Only", hr_secondary="HrSec_Only")
Datatype of each column
str(schools_with_boys_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andaman & Nicobar Islands" "Andhra Pradesh" ...
$ year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ primary : num [1:110] 91.6 100 100 53 57.9 ...
$ Primary_with_U_Primary : num [1:110] 97.4 100 100 62.6 76.5 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 100 100 100 82 96 ...
$ upper_primary : num [1:110] 0 100 0 45.5 75 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 100 100 100 64.1 93.3 ...
$ Primary_with_U_Primary_Sec : num [1:110] 100 100 100 76.2 91.4 ...
$ U_Primary_With_Sec : num [1:110] 0 0 0 60.6 78 ...
$ secondary : num [1:110] 0 0 0 59.3 80.7 ...
$ Sec_with_HrSec. : num [1:110] 100 100 100 85.7 60 ...
$ hr_secondary : num [1:110] 0 0 0 73.4 86.5 ...
$ All Schools : num [1:110] 94.5 100 100 56.9 65.3 ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
All_India <- filter(schools_with_boys_toilet, State_UT=="All India") %>%
select(year, primary, upper_primary, secondary, hr_secondary)
All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage")
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Percentage of Schools with Boys toilet all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .98), vjust = 1, color = "black") + theme_classic() + facet_wrap(~Education_Level)
states <- c("Nagaland", "Karnataka", "Daman & Diu")
states_with_boys_toilet <- filter(schools_with_boys_toilet, State_UT == states) %>%
pivot_longer(c(primary,upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>%
select(State_UT, year, Education_Level, Percentage)
ggplot(states_with_boys_toilet, aes(x=year, y=Percentage, fill=Education_Level)) + geom_bar(position = "dodge", stat = "identity") + facet_wrap(~State_UT) + theme_dark() + labs(title = "Percentage of Schools with Boys toilet", subtitle = "Daman & Diu, Karnataka, Nagaland")
states_with_no_boys_toilet <- filter(schools_with_boys_toilet, State_UT != "All India", upper_primary==0, secondary==0, hr_secondary==0) %>%
pivot_longer(c(upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>%
select(State_UT, year, Education_Level, Percentage)
kable(states_with_no_boys_toilet, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Education Level", "Percentage"), caption = "Table4 : States with no boys toilet") %>%
kable_styling(font_size = 15) %>%
row_spec(c(1,1,1))
State/Union Territory | Year | Education Level | Percentage |
---|---|---|---|
Andaman & Nicobar Islands | 2013-14 | upper_primary | 0 |
Andaman & Nicobar Islands | 2013-14 | secondary | 0 |
Andaman & Nicobar Islands | 2013-14 | hr_secondary | 0 |
Andaman & Nicobar Islands | 2015-16 | upper_primary | 0 |
Andaman & Nicobar Islands | 2015-16 | secondary | 0 |
Andaman & Nicobar Islands | 2015-16 | hr_secondary | 0 |
Arunachal Pradesh | 2013-14 | upper_primary | 0 |
Arunachal Pradesh | 2013-14 | secondary | 0 |
Arunachal Pradesh | 2013-14 | hr_secondary | 0 |
Chandigarh | 2013-14 | upper_primary | 0 |
Chandigarh | 2013-14 | secondary | 0 |
Chandigarh | 2013-14 | hr_secondary | 0 |
Chandigarh | 2014-15 | upper_primary | 0 |
Chandigarh | 2014-15 | secondary | 0 |
Chandigarh | 2014-15 | hr_secondary | 0 |
Chandigarh | 2015-16 | upper_primary | 0 |
Chandigarh | 2015-16 | secondary | 0 |
Chandigarh | 2015-16 | hr_secondary | 0 |
Dadra & Nagar Haveli | 2013-14 | upper_primary | 0 |
Dadra & Nagar Haveli | 2013-14 | secondary | 0 |
Dadra & Nagar Haveli | 2013-14 | hr_secondary | 0 |
Reading Dataframe-7 | Percentage of Schools with girls toilet
schools_with_girls_toilet <- read_csv("601 Major Project/schools-with-girls-toilet.csv")
schools_with_girls_toilet <- rename(schools_with_girls_toilet, primary="Primary_Only", upper_primary="U_Primary_Only", secondary="Sec_Only", hr_secondary="HrSec_Only")
head(schools_with_girls_toilet)
# A tibble: 6 x 13
State_UT year primary Primary_with_U_~ Primary_with_U_~
<chr> <chr> <dbl> <dbl> <dbl>
1 All India 2013~ 88.7 96.0 98.8
2 All India 2014~ 91.2 96.9 99.5
3 All India 2015~ 97.0 99.0 99.7
4 Andaman & Nicobar I~ 2013~ 89.7 97.4 100
5 Andaman & Nicobar I~ 2014~ 100 100 100
6 Andaman & Nicobar I~ 2015~ 100 100 100
# ... with 8 more variables: upper_primary <dbl>,
# U_Primary_With_Sec_HrSec <dbl>, Primary_with_U_Primary_Sec <dbl>,
# U_Primary_With_Sec <dbl>, secondary <dbl>, Sec_with_HrSec. <dbl>,
# hr_secondary <dbl>, `All Schools` <dbl>
Datatype of each column
str(schools_with_girls_toilet)
spec_tbl_df [110 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ State_UT : chr [1:110] "All India" "All India" "All India" "Andaman & Nicobar Islands" ...
$ year : chr [1:110] "2013-14" "2014-15" "2015-16" "2013-14" ...
$ primary : num [1:110] 88.7 91.2 97 89.7 100 ...
$ Primary_with_U_Primary : num [1:110] 96 96.9 99 97.4 100 ...
$ Primary_with_U_Primary_Sec_HrSec: num [1:110] 98.8 99.5 99.7 100 100 ...
$ upper_primary : num [1:110] 91.4 91.4 96.3 0 100 ...
$ U_Primary_With_Sec_HrSec : num [1:110] 98.2 99.2 99.6 100 100 ...
$ Primary_with_U_Primary_Sec : num [1:110] 97.3 98.2 99.3 100 100 ...
$ U_Primary_With_Sec : num [1:110] 94.4 96.6 98.8 0 0 ...
$ secondary : num [1:110] 99.1 90.3 95.2 0 0 ...
$ Sec_with_HrSec. : num [1:110] 98.4 94 98.3 100 100 ...
$ hr_secondary : num [1:110] 76.1 90.9 96.2 0 0 ...
$ All Schools : num [1:110] 91.2 93.1 97.5 93.4 100 ...
- attr(*, "spec")=
.. cols(
.. State_UT = col_character(),
.. year = col_character(),
.. Primary_Only = col_double(),
.. Primary_with_U_Primary = col_double(),
.. Primary_with_U_Primary_Sec_HrSec = col_double(),
.. U_Primary_Only = col_double(),
.. U_Primary_With_Sec_HrSec = col_double(),
.. Primary_with_U_Primary_Sec = col_double(),
.. U_Primary_With_Sec = col_double(),
.. Sec_Only = col_double(),
.. Sec_with_HrSec. = col_double(),
.. HrSec_Only = col_double(),
.. `All Schools` = col_double()
.. )
- attr(*, "problems")=<externalptr>
All_India <- filter(schools_with_girls_toilet, State_UT=="All India") %>%
select(year, primary, upper_primary, secondary, hr_secondary)
All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage")
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
geom_bar(position = "dodge", stat = "identity") + labs(title = "Percentage of Schools with Girls toilet all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .98), vjust = 1, color = "black") + theme_classic() + facet_wrap(~Education_Level)
states <- c("Nagaland", "Karnataka", "Daman & Diu")
states_with_girls_toilet <- filter(schools_with_girls_toilet, State_UT == states) %>%
pivot_longer(c(primary,upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>%
select(State_UT, year, Education_Level, Percentage)
ggplot(states_with_girls_toilet, aes(x=year, y=Percentage, fill=Education_Level)) + geom_bar(position = "dodge", stat = "identity") + facet_wrap(~State_UT) + theme_dark() + labs(title = "Percentage of Schools with Girls toilet", subtitle = "Daman & Diu, Karnataka, Nagaland")
states_with_no_girls_toilet <- filter(schools_with_girls_toilet, State_UT != "All India", upper_primary==0, secondary==0, hr_secondary==0) %>%
pivot_longer(c(upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>%
select(State_UT, year, Education_Level, Percentage)
kable(states_with_no_girls_toilet, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Education Level", "Percentage"), caption = "Table4 : States with no girls toilet") %>%
kable_styling(font_size = 15) %>%
row_spec(c(1,1,1))
State/Union Territory | Year | Education Level | Percentage |
---|---|---|---|
Andaman & Nicobar Islands | 2013-14 | upper_primary | 0 |
Andaman & Nicobar Islands | 2013-14 | secondary | 0 |
Andaman & Nicobar Islands | 2013-14 | hr_secondary | 0 |
Andaman & Nicobar Islands | 2015-16 | upper_primary | 0 |
Andaman & Nicobar Islands | 2015-16 | secondary | 0 |
Andaman & Nicobar Islands | 2015-16 | hr_secondary | 0 |
Chandigarh | 2013-14 | upper_primary | 0 |
Chandigarh | 2013-14 | secondary | 0 |
Chandigarh | 2013-14 | hr_secondary | 0 |
Chandigarh | 2014-15 | upper_primary | 0 |
Chandigarh | 2014-15 | secondary | 0 |
Chandigarh | 2014-15 | hr_secondary | 0 |
Chandigarh | 2015-16 | upper_primary | 0 |
Chandigarh | 2015-16 | secondary | 0 |
Chandigarh | 2015-16 | hr_secondary | 0 |
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pola (2022, May 19). Data Analytics and Computational Social Science: HW-5. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomniharika901537/
BibTeX citation
@misc{pola2022hw-5, author = {Pola, Niharika}, title = {Data Analytics and Computational Social Science: HW-5}, url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomniharika901537/}, year = {2022} }