Advanced visualisations
I have chosen the Emergency - 911 calls dataset from Kaggle (https://www.kaggle.com/mchirico/montcoalert/version/32) for my final project. The dataset contains emergency 911 calls in Montgomery County, Pennsylvania from 2015 to 2020. Below is the code snippet to read and preview the data.
lat lng
1 40.29788 -75.58129
2 40.25806 -75.26468
3 40.12118 -75.35198
4 40.11615 -75.34351
5 40.25149 -75.60335
6 40.25347 -75.28324
desc
1 REINDEER CT & DEAD END; NEW HANOVER; Station 332; 2015-12-10 @ 17:10:52;
2 BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP; Station 345; 2015-12-10 @ 17:29:21;
3 HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-Station:STA27;
4 AIRY ST & SWEDE ST; NORRISTOWN; Station 308A; 2015-12-10 @ 16:47:36;
5 CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; Station 329; 2015-12-10 @ 16:56:52;
6 CANNON AVE & W 9TH ST; LANSDALE; Station 345; 2015-12-10 @ 15:39:04;
zip title timeStamp twp
1 19525 EMS: BACK PAINS/INJURY 2015-12-10 17:10:52 NEW HANOVER
2 19446 EMS: DIABETIC EMERGENCY 2015-12-10 17:29:21 HATFIELD TOWNSHIP
3 19401 Fire: GAS-ODOR/LEAK 2015-12-10 14:39:21 NORRISTOWN
4 19401 EMS: CARDIAC EMERGENCY 2015-12-10 16:47:36 NORRISTOWN
5 NA EMS: DIZZINESS 2015-12-10 16:56:52 LOWER POTTSGROVE
6 19446 EMS: HEAD INJURY 2015-12-10 15:39:04 LANSDALE
addr e
1 REINDEER CT & DEAD END 1
2 BRIAR PATH & WHITEMARSH LN 1
3 HAWS AVE 1
4 AIRY ST & SWEDE ST 1
5 CHERRYWOOD CT & DEAD END 1
6 CANNON AVE & W 9TH ST 1
Below is a plot of mean longitude vs township (twp). Note that the values are negative and hence the error bars appear at the bottom. The cyan coloring in the graph is to demarcate the bars.
library(ggplot2)
emergency_calls_data %>%
group_by(twp) %>%
summarise(mean_longitude=mean(lng), sd=sd(lng))%>%
ggplot(aes(x = twp, y = mean_longitude)) +
geom_bar(stat="identity", colour='cyan') +
theme(axis.text.x=element_text(angle=90, size = 3)) +
geom_errorbar(aes(ymin=mean_longitude-sd, ymax=mean_longitude+sd), width=.2)
Below is a plot of count of various broad categories of emergencies vs township. The dataset has three major predefined categories - EMS, Traffic and Fire. EMS
includes serious illness or injuries like weakness, head injuries, seizures etc. Traffic
constitutes vehicle accidents, disabled vehicles etc. Fire
includes accidents resulting from any kind of fire in a building or outside.
library(stringr)
emergency_calls_data %>%
mutate(emergency_category=word(title, sep = fixed(":"))) %>%
ggplot(aes(x = twp)) +
geom_bar() +
theme(axis.text.x=element_text(angle=90, size = 2.25), legend.text=element_text(size=3)) +
facet_wrap(vars(emergency_category)) +
labs(x="Township", y="Emergency call count") +
ggtitle("Plot of call count vs township for each emergency category")
Below is a barplot of emergency counts vs township grouped by broad emergency categories (EMS, Fire and Traffic)
emergency_calls_data %>%
mutate(emergency_category=word(title, sep = fixed(":"))) %>%
ggplot(aes(x = twp,
fill = emergency_category)) +
geom_bar(position = "stack") +
theme(axis.text.x=element_text(angle=90, size = 3), legend.text=element_text(size=5)) +
labs(x="Township", y="Emergency call count") +
ggtitle("Plot of call count vs township grouped by emergency category")
These visualizations do not aid in comparing the trends in emergency calls across years or months.
The visualizations help us understand what kind of emergencies occur more frequently in Montgomery County as a whole as well as in every township. We can compare the number and type of emergencies that occur across townships. The emergency response team can make better preparations with this knowledge and can focus its efforts more on townships that have higher counts of serious emergencies.
We can conclude that vehicle accidents contribute the most to emergency calls in Montgomery County.
Lower Merion contributes the highest to the emergency calls in the county followed by Abington and Norristown.
Fire accidents are relatively lesser than EMS or Traffic related emergencies.
A naive reader would need to know the concepts like central tendency, standard errors, various kinds of plots and how categorical variables add more granular information to the same plots.
Creating a visualization showing the broad categories and sub-categories under them will aid in understanding the plot better.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Murulidhara (2022, Jan. 14). Data Analytics and Computational Social Science: Brinda Murulidhara HW5. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscombrinda854205/
BibTeX citation
@misc{murulidhara2022brinda, author = {Murulidhara, Brinda}, title = {Data Analytics and Computational Social Science: Brinda Murulidhara HW5}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscombrinda854205/}, year = {2022} }