Imapct on the Indian Education System | A deep dive data analysis on the Enrollment and Dropout student ratio

Data Science Fundamental - Final Paper | Exploratory Data Analysis of the Indian Education System with special focus on the available facilities

Niharika Pola


India is a country of 28 states and 7 Union Territories (UTs). Universal and compulsory education is a cherished dream of the republic of India. Primary Education is a basic right of every citizen, but recent study on the dropouts from schools has stated that every year 62% of students have been dropping out. This is raising concern in the country. Education enriches people’s understanding of their own country and the world. It improves the quality of life, promotes creativity, productivity and entrepreneurship leading to the economic development of the country. As a nation with second most highest population in the world, India needs to have access to quality education to its citizens.

This motivated me to perform my analysis on the Indian Education System, specifically on the enrollment and dropout data. I have found 7 data sets from the official Indian government run website for data. As this is my first project in the field of data analysis I really wanted to work on a topic which I am always passionate about - Education. I am very proud and happy with the findings of my study.

In this project I have worked on 7 data sets related to the Indian Education System from 2013-2016. First two data sets talk about the Gross Enrollment Ratio and Dropout Ratio, remaining 5 talk about the availability of basic facilities (Water, Electricity, Boys & Girls Toilets and Computers) in Schools. Every data set has State/Union Territory, Year and Percentage data across various levels of the Education - Primary, Upper Primary, Secondary and Higher Secondary.

Lower Primary/ Primary - Nursery to class 1st Upper Primary - Class 1st to 5th Secondary - Class 6th to 8th Higher Secondary/Higher Secondary - Class 9th and 10th

The aim of this project is to perform Exploratory Data Analysis(EDA) of the 7 data sets to:

  1. Analyze the Gross Enrollment Ratio and Dropout Ratio in the above mentioned classes All over India & across states and understand the,

and provide few recommendations to the Indian Government based on the Analysis.

  1. Compare the states with lowest dropout ratio with the available facilities data sets.

  2. To find out the impact of non-availability of these facilities on the dropout ratio.

  3. To analyze the trends of available facilities data sets across India.

Loading the packages

Dataset-1 | Gross Enrollment Ratio from 2013-2016 across all Indian States

Gross Enrollment Ratio (GER) or Gross Enrollment Index (GEI) is a statistical measure used in the education sector, to determine the number of students enrolled in school at several different grade levels (like elementary, middle school and high school), and use it to show the ratio of the number of students who live in that country to those who qualify for the particular grade level.

The GER can be over 100% as it includes students who may be older or younger than the official age group.

For instance, in India it improved from 25.8 to 26.3, the GER includes students who are repeating a grade, those who enrolled late and are older than their classmates, or those who have advanced quickly and are younger than their classmates. This allows the total enrollment to exceed the population that corresponds to that level of education.

Calculation Method

a = number of students enrolled in a given level b = population of the age group corresponds to given level of education India


Reading Dataset-1

As you can see, 3 columns (Higher_Secondary_Boys, Higher_Secondary_Girls, Higher_Secondary_Total) are character instead of double. They have NR, @ in the observations. The data needs to be cleaned.

Data Wrangling

gross_enrollment_ratio[ gross_enrollment_ratio == "NR" ] <- NA
gross_enrollment_ratio[ gross_enrollment_ratio == "@" ] <- NA
ger1 <- data.frame(gross_enrollment_ratio)
ger <- na.exclude(ger1)
ger$Higher_Secondary_Boys = as.numeric(ger$Higher_Secondary_Boys)
ger$Higher_Secondary_Girls = as.numeric(ger$Higher_Secondary_Girls)
ger$Higher_Secondary_Total = as.numeric(ger$Higher_Secondary_Total)

all_india_ger <- filter(ger,  State_UT=="All India") %>% 

plotting All India girls enrollment ratio:

all_india_ger_girls <- select(all_india_ger,Year, ends_with("girls")) 
     Year Primary_Girls Upper_Primary_Girls Secondary_Girls
1 2013-14        102.65               92.75           76.47
2 2014-15        101.43               95.29           78.94
3 2015-16        100.69               97.57           80.97
1                  51.58
2                  53.81
3                  56.41
  fig1 <- pivot_longer(all_india_ger_girls, c(Primary_Girls, Upper_Primary_Girls, Secondary_Girls, Higher_Secondary_Girls), names_to = "Education_Level", values_to = "GER") 
  ggplot(fig1, aes(x=Year, y=GER, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-1: Gross Enrollment Ratio of Girls in India") +  geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + theme_classic()


Plotting All India boys enrollment ratio:

all_india_ger_boys <- select(all_india_ger, Year, ends_with("boys"))
     Year Primary_Boys Upper_Primary_Boys Secondary_Boys
1 2013-14       100.20              86.31          76.80
2 2014-15        98.85              87.71          78.13
3 2015-16        97.87              88.72          79.16
1                 52.77
2                 54.57
3                 55.95
  fig2 <- pivot_longer(all_india_ger_boys, c(Primary_Boys, Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys), names_to = "Education_Level", values_to = "GER") 
  ggplot(fig2, aes(x=Year, y=GER, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-2: Gross Enrollment Ratio of boys in India") + geom_text(aes(label=GER), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + theme_classic()


Plotting All India total enrollment ratio:
fig3 <- pivot_longer(all_india_ger, c(Primary_Boys, Primary_Girls, Primary_Total, Upper_Primary_Boys, Upper_Primary_Girls, Upper_Primary_Total, Secondary_Boys, Secondary_Girls, Secondary_Total, Higher_Secondary_Girls, Higher_Secondary_Boys, Higher_Secondary_Total), names_to = "Education_Level")
ggplot(fig3, aes(x=value, y=Education_Level)) + geom_boxplot(color="red") + geom_text(aes(label=value), size=4, vjust=0) + labs(title = "Fig-3: All India Enrollment Ratio - Education Wise ", x="Enrollment Percentage", y="Education Level") + facet_wrap(~Year) 


State-Wise Gross Enrollment Analysis

states_ger <- filter(ger,  State_UT != "All India") 
                   State_UT    Year Primary_Boys Primary_Girls
1 Andaman & Nicobar Islands 2013-14        95.88         91.97
2            Andhra Pradesh 2013-14        96.62         96.87
3         Arunachal Pradesh 2013-14       129.12        127.77
4                     Assam 2013-14       111.77        115.16
5                     Bihar 2013-14        95.03        101.15
6                Chandigarh 2013-14        88.42         96.09
  Primary_Total Upper_Primary_Boys Upper_Primary_Girls
1         93.93              94.70               88.98
2         96.74              82.81               84.38
3        128.46             112.64              115.27
4        113.43              87.85               98.69
5         97.96              80.60               94.92
6         91.85              99.93              103.02
  Upper_Primary_Total Secondary_Boys Secondary_Girls Secondary_Total
1               91.83         102.89           97.36          100.16
2               83.57          73.76           76.77           75.20
3              113.94          88.37           84.89           86.65
4               93.13          65.60           77.20           71.21
5               87.24          57.66           62.96           60.08
6              101.27          92.08           92.16           92.11
  Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
1                105.40                  96.61                 101.28
2                 59.83                  60.83                  60.30
3                 65.16                  65.38                  65.27
4                 31.78                  34.27                  32.94
5                 23.33                  24.17                  23.70
6                 90.50                  92.88                  91.49

I used google maps access key to get the Indian map and to get latitude and longitude coordinates for the states. I merged the coordinates data with my existing dataset.

register_google(key = "AIzaSyDc2lDTQRLgvlGtdiZM6hkShq0fW_wv4-0")
coordinates <- geocode(states_ger$State_UT)
plot <- merge(states_ger,coordinates)
                   State_UT    Year Primary_Boys Primary_Girls
1 Andaman & Nicobar Islands 2013-14        95.88         91.97
2            Andhra Pradesh 2013-14        96.62         96.87
3         Arunachal Pradesh 2013-14       129.12        127.77
4                     Assam 2013-14       111.77        115.16
5                     Bihar 2013-14        95.03        101.15
6                Chandigarh 2013-14        88.42         96.09
  Primary_Total Upper_Primary_Boys Upper_Primary_Girls
1         93.93              94.70               88.98
2         96.74              82.81               84.38
3        128.46             112.64              115.27
4        113.43              87.85               98.69
5         97.96              80.60               94.92
6         91.85              99.93              103.02
  Upper_Primary_Total Secondary_Boys Secondary_Girls Secondary_Total
1               91.83         102.89           97.36          100.16
2               83.57          73.76           76.77           75.20
3              113.94          88.37           84.89           86.65
4               93.13          65.60           77.20           71.21
5               87.24          57.66           62.96           60.08
6              101.27          92.08           92.16           92.11
  Higher_Secondary_Boys Higher_Secondary_Girls Higher_Secondary_Total
1                105.40                  96.61                 101.28
2                 59.83                  60.83                  60.30
3                 65.16                  65.38                  65.27
4                 31.78                  34.27                  32.94
5                 23.33                  24.17                  23.70
6                 90.50                  92.88                  91.49
       lon      lat
1 92.65864 11.74009
2 92.65864 11.74009
3 92.65864 11.74009
4 92.65864 11.74009
5 92.65864 11.74009
6 92.65864 11.74009

The below map is a terrain style map of India. I wanted to integrate my data with a choropleth map, however i understood that R-Studio has pre-existing choropleth map for world and USA but not for other countries and ggmap supports very few map types - "terrain", "satellite", "hybrid" and "roadmap" but not choropleth. I feel this is a drawback for R-Studio as well as ggmaps.

map <- get_map(location = 'India', zoom = 5, maptype= 'terrain', scale = "auto")

Plotting the Education Level - wise GER data in the map

ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size= Primary_Boys, colour=Primary_Boys, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Upper_Primary_Boys, colour=Upper_Primary_Boys, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Secondary_Boys, color=Secondary_Boys, alpha=0.5 ))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Higher_Secondary_Boys, color=Higher_Secondary_Boys, alpha=0.5))

ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Primary_Girls, color=Primary_Girls, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Upper_Primary_Girls,color=Upper_Primary_Girls, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Secondary_Girls,color=Secondary_Girls, alpha=0.5))
ggmap(map) + geom_point(data=plot, aes(x =lon, y =lat, size=Higher_Secondary_Girls,color=Higher_Secondary_Girls, alpha=0.5))

Findings from the maps:

The analysis would have been much more clear if the map is choropleth, I would take this as a scope for improvement in my next projects.

states_ger %>% 
  select(Year, State_UT, Primary_Boys,Upper_Primary_Boys, Secondary_Boys, Higher_Secondary_Boys) %>% 
  group_by(Year) %>% 
  summarise(avg_pb=mean(Primary_Boys), avg_upb=mean(Upper_Primary_Boys), avg_sb=mean(Secondary_Boys), avg_hsb=mean(Higher_Secondary_Boys)) 
My further analysis will focus on analyzing the dropout percentage and finding if we can get any correlation between the Gross Enrollment and Dropout.

Data Set-2 | Dropout Ratio/Percentage across all Indian States from 2013-2016

There are varying definitions on the web for Dropout Ratio. I will keep it simple here. Dropout Ratio simply means any student who leaves school for any reason before graduation or completion of a program of studies without transferring to another school.

Reading Dataset-2

dropout_ratio <- read_csv("601 Major Project/dropout-ratio.csv")
Datatype of each column

Data Wrangling

dropout_ratio <- clean_names(dropout_ratio)
Plotting Education-Wise All India dropout percentage:

all_india_drop <- filter(dropout_ratio, state_ut=="All India") 
ggplot(fig4, aes(x=value, y=EducationLevel)) + geom_boxplot(color="red") + geom_text(aes(label=value), size=4) + labs(title = "Fig-4: All India Dropout Ratio - Education Wise ", x="Dropout Percentage", y="Education Level") + facet_wrap(~year) 


Correlation between Gross Enrollment Ratio and Dropout Ratios:

Filtering out the State-Wise Dropouts data:

states_drop <- filter(dropout_ratio,  state_ut != "All India") 
Analysis of the dropout ratio of Primary Boys:

primary_boys_drop <- states_drop[c("state_ut", "year", "primary_boys")] 
slice_min(primary_boys_drop, primary_boys)
ggplot(top10, aes(x=year, y=primary_boys, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-5: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - Primary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=primary_boys), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut)

Analysis of the dropout ratio of Primary Girls:

primary_girls_drop <- states_drop[c("state_ut", "year", "primary_girls")] 
slice_min(primary_girls_drop, primary_girls)
ggplot(top10, aes(x=year, y=primary_girls, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-6: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - Primary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=primary_girls), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut)

Analysis of dropout ratio of Upper Primary Boys:

upper_primary_boys_drop <- states_drop[c("state_ut", "year", "upper_primary_boys")] 
slice_min(upper_primary_boys_drop, upper_primary_boys)
ggplot(top10, aes(x=year, y=upper_primary_boys, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-7: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - Upper Primary ",  y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=upper_primary_boys), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut)

Analysis of dropout ratio of upper primary girls:

upper_primary_girls_drop <- states_drop[c("state_ut", "year", "upper_primary_girls")] 
slice_min(upper_primary_girls_drop, upper_primary_girls)
ggplot(top10, aes(x=year, y=upper_primary_girls, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-8: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - Upper Primary ", y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=upper_primary_girls), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut)

Analysis of dropout ratio of Secondary boys:

secondary_boys_drop <- states_drop[c("state_ut", "year", "secondary_boys")] 
slice_min(secondary_boys_drop, secondary_boys)
ggplot(top10, aes(x=year, y=secondary_boys, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-9: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - secondary ",  y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=secondary_boys), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut)

Analysis of dropout ratio of Secondary Girls:

secondary_girls_drop <- states_drop[c("state_ut", "year", "secondary_girls")] 
slice_min(secondary_girls_drop, secondary_girls)
ggplot(top10, aes(x=year, y=secondary_girls, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-10: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - secondary ",  y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=secondary_girls), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut)

Analysis of dropout ratio of Higher-Secondary boys:

hr_secondary_boys_drop <- states_drop[c("state_ut", "year", "hr_secondary_boys")] 
slice_min(hr_secondary_boys_drop, hr_secondary_boys)
ggplot(top10, aes(x=year, y=hr_secondary_boys, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-11: top-10 states with highest dropout rate of boys in India ", subtitle = "Education Level - Higher secondary ",  y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=hr_secondary_boys), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut)

Analysis of dropout ratio of Higher Secondary Girls:

hr_secondary_girls_drop <- states_drop[c("state_ut", "year", "hr_secondary_girls")] 
slice_min(hr_secondary_girls_drop, hr_secondary_girls)
ggplot(top10, aes(x=year, y=hr_secondary_girls, fill=year)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-12: top-10 states with highest dropout rate of girls in India ", subtitle = "Education Level - Higher secondary ",  y="dropout percentage", caption="data from the Government of India") + geom_text(aes(label=hr_secondary_girls), size = 4, position = position_dodge(width = .9), vjust = 1, color = "white") + theme_dark() + facet_wrap(~state_ut, scales = "free_y")


My key findings from the analysis of dropout percentage of boys and girls across all the 4 education levels are as follows:



My analysis shows that Nagaland, Karnataka and Daman & Diu has the highest dropout rates for boys and girls in primary & upper primary levels, Secondary and Higher Secondary levels respectively.

Gujarat is doing well in terms of the dropout percentage and similar to the Gross Enrollment Ratio the dropout rates are good for girls rather than boys.

My further analysis will be focused on the states of Nagaland, Karnataka and Daman & Diu to findout whether the availability of facilities in schools is effecting the dropout percentage. The study will also focus on those states having less or no facilities at all.

Dataset - 3 | Percentage of Schools with access to computers in India

Reading Dataset-3

schools_with_comps <- read_csv("601 Major Project/percentage-of-schools-with-comps.csv")
schools_with_comps <- rename(schools_with_comps, primary="Primary_Only", upper_primary="U_Primary_Only", secondary="Sec_Only", hr_secondary="HrSec_Only")

Plotting Education-Level wise percentage of All India access to computers:

All_India <- filter(schools_with_comps, State_UT=="All India") %>% 
  select(year, primary, upper_primary, secondary, hr_secondary)
All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") 
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-13: Percentage of Schools with access to Computer facility all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 1, color = "black") + theme_classic() + facet_wrap(~Education_Level)


Plotting State-Wise Primary level percentage with access to computers:

primary_wise<- select(schools_with_comps, State_UT, year, primary) %>% 
  filter(State_UT != "All India")

primary_wise <- arrange(primary_wise, primary)
primary_wise <- slice_head(primary_wise, n=50)
ggplot(primary_wise, aes(x=primary, y=State_UT, fill=State_UT))+geom_bar(stat="identity")+facet_wrap(~year) + labs(title = "Fig-14: States with lowest percentage of Computer Facility", subtitle = "Education Level - Primary",  x="percentage", y="State name") + geom_text(aes(label=primary), size = 3, position = position_dodge(width = .9), vjust = 0, color = "black")


Nagaland is among this data, which can be related to the lowest percentage of dropouts in that area. Least access to computers might be one of the reasons for their highest dropout percentage in Primary schools.

upper_primary_wise<- select(schools_with_comps, State_UT, year, upper_primary) %>% 
  filter(State_UT != "All India") %>% 
  arrange(upper_primary) %>% 

kable(upper_primary_wise, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Percentage"), caption = "Table1 : State-wise Percentage of Upper Primary Schools having lowest access to computers", color="black") %>%
  kable_styling(font_size = 15) %>%
(#tab:State-wise_Percentage_of_Upper_Primary_Schools_having_lowest_access_to_computers)Table1 : State-wise Percentage of Upper Primary Schools having lowest access to computers
Analyzing Primary and Upper-Primary data of Nagaland

nagaland <- filter(schools_with_comps, State_UT=="Nagaland")
nagaland <- select(nagaland, State_UT, year, primary, upper_primary)
Plotting States with high access to computer facility:

secondary_wise <- slice_head(secondary_wise, n=40)
ggplot(secondary_wise, aes(x=secondary, y=State_UT, fill=State_UT))+geom_bar(stat="identity")+facet_wrap(~year) + labs(title = "Fig-16: States with highest percentage of Computer Facility", subtitle = "Education Level - secondary",  x="percentage", y="State name") + geom_text(aes(label=secondary), size = 3, position = position_dodge(width = .1), vjust = 0, color = "black")

Plotting Secondary level access to computers in the state of Karnataka

karnataka <-  filter(schools_with_comps, State_UT=="Karnataka")
karnataka <- select(karnataka, State_UT, year, secondary)
ggplot(karnataka, aes(x=year, y=secondary, group=State_UT)) + geom_line(size=1, color="purple") + geom_point() + geom_text(aes(label=secondary), size = 5) + labs(title = "Fig-17: Percentage of Schools with access to Computers in the state of Karnataka", subtitle = "Education Level - Secondary",  x="year", y="Percentage")

Plotting States with no access to computers in the Higher Secondary Schools

hr_secondary_wise<- select(schools_with_comps, State_UT, year, hr_secondary) %>% 
  filter(State_UT != "All India") %>% 
  arrange(hr_secondary) %>% 

Dataset-4 | Percentage of Schools with access to Electricity in India

Reading Dataset-4

schools_with_electricity <- read_csv("601 Major Project/percentage-of-schools-with-electricity.csv")
Datatype of each column

Plotting Education Level wise percentage with access to Electricity all over India

All_India <- filter(schools_with_electricity, State_UT=="All India") %>% 
  select(year, primary, upper_primary, secondary, hr_secondary)
All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") 
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-18: Percentage of Schools with access to Electricity all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 1, color = "black") + theme_classic() + facet_wrap(~Education_Level)


states_electricity <- filter(schools_with_electricity, State_UT != "All India")

nagaland <- filter(states_electricity, State_UT == "Nagaland") %>% 
  select(year, primary, upper_primary )

nagaland <- pivot_longer(nagaland, c(primary,upper_primary), names_to = "Education_Level", values_to = "Percentage")
ggplot(nagaland, aes(x=year, y=Percentage, fill=Education_Level)) + geom_bar(position="dodge", stat="identity") + coord_polar() + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + labs(title = "Fig-19: Percentage of Schools with access to Electricity in the state of Nagaland", subtitle = "Education Level - Primary and Upper Primary" )

karnataka <- filter(states_electricity, State_UT == "Karnataka") %>% 
  select(year, secondary, State_UT )

ggplot(karnataka, aes(x=year, y=secondary, group=State_UT)) + geom_line(color="red", size=1) + geom_point() + geom_text(aes(label=secondary), size = 4, position = position_dodge(width = .6), vjust = 0, color = "black") + labs(title = "Fig-20: Percentage of Schools with access to Electricity in the state of Karnataka", subtitle = "Education Level - Secondary" )

Daman_Diu <- filter(states_electricity, State_UT == "Daman & Diu") %>% 
  select(State_UT, year, hr_secondary )

kable(Daman_Diu, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Percentage"), caption = "Table4 : Daman & Diu Percentage of Schools with Electricity") %>%
  kable_styling(font_size = 15) %>%
Table 2: Table4 : Daman & Diu Percentage of Schools with Electricity
State/Union Territory Year Percentage
Daman & Diu 2013-14 100
Daman & Diu 2014-15 100
Daman & Diu 2015-16 100


Dataset-5 | Percentage of Schools with water faciliity in India

Reading Dataset-5

schools_with_water <- read_csv("601 Major Project/percentage-of-schools-with-water-facility.csv")
Datatype of each column

plotting Education-Wise percentage of schools with water faciltity in India

All_India <- filter(schools_with_water, State_UT=="All India") %>% 
  select(Year, primary, upper_primary, secondary, hr_secondary)

All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") 
ggplot(All_India, aes(x=Year, y=Percentage, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-21: Percentage of Schools Water Facility all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .98), vjust = 1, color = "black") + theme_classic() + facet_wrap(~Education_Level)

Finding: It is really glad to know that schools at every education level have more than 90% access to water facility, but still it is not 100%.

states_water <- filter(schools_with_water, State_UT != "All India")

nagaland <- filter(states_water, State_UT == "Nagaland") %>% 
  select(Year, primary, upper_primary )

nagaland <- pivot_longer(nagaland, c(primary,upper_primary), names_to = "Education_Level", values_to = "Percentage")
ggplot(nagaland, aes(x=Year, y=Percentage, fill=Education_Level)) + geom_bar(position="dodge", stat="identity") + coord_polar() + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .9), vjust = 0, color = "black") + labs(title = "Fig-22: Percentage of Schools with access to Drinking Water in the state of Nagaland", subtitle = "Education Level - Primary and Upper Primary" )

karnataka <- filter(schools_with_water, State_UT == "Karnataka") %>% 
  select(Year, secondary, State_UT )

ggplot(karnataka, aes(x=Year, y=secondary, group=State_UT)) + geom_line(color="red", size=1) + geom_point() + geom_text(aes(label=secondary), size = 4, position = position_dodge(width = .6), vjust = 0, color = "black") + labs(title = "Fig-23: Percentage of Schools with access to Drinking Water in the state of Karnataka", subtitle = "Education Level - Secondary" )

Daman_Diu <- filter(schools_with_water, State_UT == "Daman & Diu") %>% 
  select(State_UT, Year, hr_secondary )

kable(Daman_Diu, digits = 4, align = "ccccccc", col.names = c("State/Union Territory", "Year", "Percentage"), caption = "Table4 : Daman & Diu Percentage of Schools with Drinking Water Facility") %>%
  kable_styling(font_size = 15) %>%
Table 3: Table4 : Daman & Diu Percentage of Schools with Drinking Water Facility
State/Union Territory Year Percentage
Daman & Diu 2013-14 100
Daman & Diu 2014-15 100
Daman & Diu 2015-16 100

Dataset-6 | Percentage of Schools with boys toilet all over India

Reading Dataset-6

schools_with_boys_toilet <- read_csv("601 Major Project/schools-with-boys-toilet.csv")
Datatype of each column

Plotting Education Level wise percentage of schools with boys toilet in India

All_India <- filter(schools_with_boys_toilet, State_UT=="All India") %>% 
  select(year, primary, upper_primary, secondary, hr_secondary)

All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage")
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-24: Percentage of Schools with Boys toilet all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .98), vjust = 1, color = "black") + theme_classic() + facet_wrap(~Education_Level)


states <- c("Nagaland", "Karnataka", "Daman & Diu")
states_with_boys_toilet <- filter(schools_with_boys_toilet, State_UT == states) %>% 
  pivot_longer(c(primary,upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>% 
  select(State_UT, year, Education_Level, Percentage)
ggplot(states_with_boys_toilet, aes(x=year, y=Percentage, fill=Education_Level)) + geom_bar(position = "dodge", stat = "identity") + facet_wrap(~State_UT) + theme_dark() + labs(title = "Fig-25: Percentage of Schools with Boys toilet", subtitle = "Daman & Diu, Karnataka, Nagaland")

States with no boys toilets in India

states_with_no_boys_toilet <- filter(schools_with_boys_toilet, State_UT != "All India", upper_primary==0, secondary==0, hr_secondary==0) %>%
  pivot_longer(c(upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>% 
  select(State_UT, year, Education_Level, Percentage)

Dataset-7 | Percentage of Schools with girls toilets in India

Reading Dataset-7

schools_with_girls_toilet <- read_csv("601 Major Project/schools-with-girls-toilet.csv")
schools_with_girls_toilet <- rename(schools_with_girls_toilet, primary="Primary_Only", upper_primary="U_Primary_Only", secondary="Sec_Only", hr_secondary="HrSec_Only")
Datatype of each column

Plotting Education-Wise percentage of schools with girls toilets in India

All_India <- filter(schools_with_girls_toilet, State_UT=="All India") %>% 
  select(year, primary, upper_primary, secondary, hr_secondary)

All_India <- pivot_longer(All_India, c(primary, upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage")
ggplot(All_India, aes(x=year, y=Percentage, fill=Education_Level)) +
  geom_bar(position = "dodge", stat = "identity") + labs(title = "Fig-26: Percentage of Schools with Girls toilet all over india") + geom_text(aes(label=Percentage), size = 4, position = position_dodge(width = .98), vjust = 1, color = "black") + theme_classic() + facet_wrap(~Education_Level)

states <- c("Nagaland", "Karnataka", "Daman & Diu")
states_with_girls_toilet <- filter(schools_with_girls_toilet, State_UT == states) %>% 
  pivot_longer(c(primary,upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>% 
  select(State_UT, year, Education_Level, Percentage)
ggplot(states_with_girls_toilet, aes(x=year, y=Percentage, fill=Education_Level)) + geom_bar(position = "dodge", stat = "identity") + facet_wrap(~State_UT) + theme_dark() + labs(title = "Fig-27: Percentage of Schools with Girls toilet", subtitle = "Daman & Diu, Karnataka, Nagaland")

States with no girls toilets in India

states_with_no_girls_toilet <- filter(schools_with_girls_toilet, State_UT != "All India", upper_primary==0, secondary==0, hr_secondary==0) %>%
  pivot_longer(c(upper_primary, secondary, hr_secondary), names_to = "Education_Level", values_to = "Percentage") %>% 
  select(State_UT, year, Education_Level, Percentage)

Summary of my key Findings:

Gross Enrollment Ratio:

Dropout Ratio:



My analysis shows that Nagaland, Karnataka and Daman & Diu has the highest dropout rates for boys and girls in primary & upper primary levels, Secondary and Higher Secondary levels respectively.

Gujarat is doing well in terms of the dropout percentage and similar to the Gross Enrollment Ratio the dropout rates are good for girls rather than boys.

Access to basic facilities in schools (Electricity, Water, Toilets and Computers):



