HW5

DACSS 601 Data Science Fundamentals - Homework 5

Apoorva Hungund
2022-05-04

In the previous homework, I tried three types of data visualization. Based on the data, and the previous visualizations, a box plot or bar plot may work best.

data(Seatbelts)
Seatbelts <- data.frame(years=floor(time(Seatbelts)),months=factor(cycle(Seatbelts),labels=month.abb), Seatbelts)

Seatbelts$law<-as.factor(Seatbelts$law)
Seatbelts$DriversKilled<-as.numeric(Seatbelts$DriversKilled)
Seatbelts$VanKilled<-as.numeric(Seatbelts$VanKilled)

law_means_DK <- ddply(Seatbelts, "law", summarise, mean_DK = mean(DriversKilled))
ggplot(Seatbelts, aes(x=years, y=DriversKilled, color=law)) +
  geom_point()+
  stat_smooth(method = 'lm')+
  geom_hline(data=law_means_DK, aes(yintercept=mean_DK, color=law), 
             linetype="dashed")+
  scale_color_manual(values=wes_palette("Darjeeling2"))+
  xlab("Years") +
  ylab("Number of Drivers Killed")+
  scale_y_continuous(limits = c(0,200), breaks = c(0,20,40,60,80,100,120,140,160,180,200))+
  scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
  theme(legend.position = "right")
law_means_VK <- ddply(Seatbelts, "law", summarise, mean_VK = mean(VanKilled))
ggplot(Seatbelts, aes(x=years, y=VanKilled, color=law)) +
  geom_point()+
  stat_smooth(method = 'lm')+
  geom_hline(data=law_means_VK, aes(yintercept=mean_VK, color=law), 
             linetype="dashed")+
  scale_color_manual(values=wes_palette("GrandBudapest1"))+
  xlab("Years") +
  ylab("Number of Van Drivers Killed")+
  scale_y_continuous(limits = c(0,20), breaks = c(0,5,10,15,20))+
  scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
  theme(legend.position = "right")

data<-read.csv2(file = "Data_Variables.csv", sep = ",")

data$CONDITION<-as.factor(data$CONDITION)
data$VALUE<-as.numeric(data$VALUE)
data$YEAR<-as.factor(data$YEAR)
data$VARIABLE <- as.factor(data$VARIABLE)

L_AC<-subset(data,CONDITION=="ACTIVE", select = c("YEAR","VARIABLE", "VALUE"))
L_INAC<-subset(data,CONDITION=="INACTIVE", select = c("YEAR","VARIABLE", "VALUE"))

a<-ggplot(data=L_AC, aes(x=YEAR, y=VALUE, fill=YEAR)) +
  geom_bar(stat="identity", position=position_dodge())+
  facet_wrap(~VARIABLE,nrow = 1, ncol = 2)+
  ggtitle("Seatbelt Law - Active")+
  scale_fill_manual(values = wes_palette("Chevalier1"))+
  scale_x_discrete(name ="Years")+
  scale_y_continuous(name = "Values", limits = c(0,110))+
  theme(legend.position = "none")

b<-ggplot(data=L_INAC, aes(x=YEAR, y=VALUE, fill=YEAR)) +
  geom_bar(stat="identity", position=position_dodge())+
  facet_wrap(~VARIABLE,nrow = 1, ncol = 2)+
  ggtitle("Seatbelt Law - Inactive")+
  scale_x_discrete(name ="Years")+
  scale_y_continuous(name = "Values", limits = c(0,150))+
  theme(legend.position = "none")

ggarrange(a,b,
          ncol = 1, nrow = 2)

DK<-subset(data,VARIABLE=="DRIVERS_KILLED", select = c("YEAR","CONDITION", "VALUE"))
VK<-subset(data,VARIABLE=="VAN_KILLED", select = c("YEAR","CONDITION", "VALUE"))

c<-ggplot(data=DK, aes(x=YEAR, y=VALUE, fill=CONDITION)) +
   geom_bar(stat="identity", position=position_dodge())+
  facet_wrap(~CONDITION,nrow = 1, ncol = 2)+
  ggtitle("Drivers Killed")+
  scale_fill_manual(values = wes_palette("Chevalier1"))+
   scale_x_discrete(name ="Years")+
   scale_y_continuous(name = "Values", limits = c(0,150))+
  theme(legend.position = "none")

d<-ggplot(data=VK, aes(x=YEAR, y=VALUE, fill=CONDITION)) +
   geom_bar(stat="identity", position=position_dodge())+
  facet_wrap(~CONDITION,nrow = 1, ncol = 2)+
  ggtitle("Van Drivers Killed")+
  scale_fill_manual(values = wes_palette("Chevalier1"))+
   scale_x_discrete(name ="Years")+
   scale_y_continuous(name = "Values", limits = c(0,15))+
  theme(legend.position = "none")

ggarrange(c,d,
          ncol = 1, nrow = 2)

data_v2<-read.csv2(file = "All_Variables.csv", sep = ",")

data_v2$CONDITION<-as.factor(data_v2$CONDITION)
data_v2$VALUE<-as.numeric(data_v2$VALUE)
data_v2$VARIABLE <- as.factor(data_v2$VARIABLE)

ggplot(data=data_v2, aes(x=CONDITION, y=VALUE, fill=VARIABLE)) +
   geom_bar(stat="identity", position=position_dodge())+
  facet_wrap(~VARIABLE,nrow = 1, ncol = 2)+
  ggtitle("Variables by Law Condition")+
  scale_fill_manual(values = wes_palette("Chevalier1"))+
   scale_x_discrete(name ="Years")+
   scale_y_continuous(name = "Values", limits = c(0,150))+
  theme(legend.position = "none")

Based on the data exploration and visualizations so far, there is an increase in fatalities when the law is introduced, but over the next 12 months, fatalities increase again. This will now be tested with significance testing - either t-tests to compare the means or anovas to also check interactions between the two variables.

From my plots, a naive reader would be able to clearly understand the conclusion I stated. From the first plot, it can be understood that fatalities decrease once the law is introduced. The second and third plots support my conclusion and from the first and second plots, it can be concluded that even after a sharp decline in fatalities, once drivers have driven with the law in practice, the fatalities are very slowly increasing.

I would have liked to include injuries in my analysis as well. It would have been interesting to check the interaction between injuries and fatalities. Logically, they should follow the same trend, and would have also made the conclusion more robust.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Hungund (2022, May 4). Data Analytics and Computational Social Science: HW5. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw5/

BibTeX citation

@misc{hungund2022hw5,
  author = {Hungund, Apoorva},
  title = {Data Analytics and Computational Social Science: HW5},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw5/},
  year = {2022}
}