DACSS 601 Data Science Fundamentals - Homework 6
Use of automobiles for transport has become a major part of everyday life. Despite the many advancements and safety laws initiated, vehicle crashes are one of the biggest causes of fatalities and injuries worldwide. Previous research on this topic has found that around 65% of vehicle-related fatalities happen to in-vehicle occupants (WHO, 2009). This same report also estimated that by 2030, road traffic injuries would be the 5th leading cause of death around the world. It is also essential to consider the financial losses caused due to vehicle fatalities. A report by USDoT stated that in 2010, over 800 billion dollars were spent on injuries and fatalities caused due to vehicle crashes (U.S. Department of Transportation, 2015).
There is extensive research being conducted in this field and one of the most efficient ways is to introduce new laws and policies aimed at reduced crash rates and injuries. To actually understand the effect and importance of such laws, we will be exploring one of the initial datasets collected on automobile crashes. The dataset, Seatbelts, explores information about automobile fatalities before and after a seatbelt law was enforced. The dataset includes entries from 1969 to 1984, with number of car and van fatalities listed for each month. We will be visualizing analyzing two main parts of the data - DriversKilled and VanKilled.
My analysis will work on understanding if:
Is there a change in car driving deaths once the legislation was established?
Is there a change in van driving once the legislation was established?
Hypothesis 1: There is a reduction in car driving deaths after the legislation was established.
Hypothesis 2: There is a eduction in car driving deaths after the legislation was established.
First, we’ll set up packages and libraries need for this project.
Next, we’ll load the dataset and convert the time series into a more readable and understandable format.
years months DriversKilled drivers front rear kms PetrolPrice
1 1969 Jan 107 1687 867 269 9059 0.1029718
2 1969 Feb 97 1508 825 265 7685 0.1023630
3 1969 Mar 102 1507 806 319 9963 0.1020625
4 1969 Apr 87 1385 814 407 10955 0.1008733
5 1969 May 119 1632 991 454 11823 0.1010197
6 1969 Jun 106 1511 945 427 12391 0.1005812
VanKilled law
1 12 0
2 6 0
3 12 0
4 8 0
5 10 0
6 13 0
Next, we’ll work on seperating the dataset into two major ones based on when the law was active vs inactive.
We’ll now explore the datasets and calculate some basic statistics for both the dataframes.
summary(Law_Inactive)
years months DriversKilled drivers
Min. :1969 Jan :15 Min. : 79.0 Min. :1309
1st Qu.:1972 Feb :14 1st Qu.:108.0 1st Qu.:1511
Median :1976 Mar :14 Median :121.0 Median :1653
Mean :1976 Apr :14 Mean :125.9 Mean :1718
3rd Qu.:1979 May :14 3rd Qu.:140.0 3rd Qu.:1926
Max. :1983 Jun :14 Max. :198.0 Max. :2654
(Other):84
front rear kms PetrolPrice
Min. : 567.0 Min. :224.0 Min. : 7685 Min. :0.08118
1st Qu.: 767.0 1st Qu.:344.0 1st Qu.:12387 1st Qu.:0.09078
Median : 860.0 Median :401.0 Median :14455 Median :0.10273
Mean : 873.5 Mean :400.3 Mean :14463 Mean :0.10187
3rd Qu.: 986.0 3rd Qu.:454.0 3rd Qu.:16585 3rd Qu.:0.11132
Max. :1299.0 Max. :646.0 Max. :21040 Max. :0.13303
VanKilled law
Min. : 2.000 Min. :0
1st Qu.: 7.000 1st Qu.:0
Median :10.000 Median :0
Mean : 9.586 Mean :0
3rd Qu.:13.000 3rd Qu.:0
Max. :17.000 Max. :0
dim(Law_Inactive)
[1] 169 10
summary(Law_Active)
years months DriversKilled drivers
Min. :1983 Feb : 2 Min. : 60.0 Min. :1057
1st Qu.:1983 Mar : 2 1st Qu.: 85.0 1st Qu.:1171
Median :1984 Apr : 2 Median : 92.0 Median :1282
Mean :1984 May : 2 Mean :100.3 Mean :1322
3rd Qu.:1984 Jun : 2 3rd Qu.:119.0 3rd Qu.:1464
Max. :1984 Jul : 2 Max. :154.0 Max. :1763
(Other):11
front rear kms PetrolPrice
Min. :426.0 Min. :296.0 Min. :15511 Min. :0.1131
1st Qu.:516.0 1st Qu.:347.0 1st Qu.:17971 1st Qu.:0.1148
Median :585.0 Median :408.0 Median :19162 Median :0.1161
Mean :571.0 Mean :407.7 Mean :18890 Mean :0.1165
3rd Qu.:629.5 3rd Qu.:471.5 3rd Qu.:19952 3rd Qu.:0.1180
Max. :721.0 Max. :521.0 Max. :21626 Max. :0.1201
VanKilled law
Min. :2.000 Min. :1
1st Qu.:3.500 1st Qu.:1
Median :5.000 Median :1
Mean :5.174 Mean :1
3rd Qu.:7.000 3rd Qu.:1
Max. :8.000 Max. :1
dim(Law_Active)
[1] 23 10
We’ll also remove any variables we don’t need.
Law_Inactive <- Law_Inactive[ -c(7,8) ]
paged_table(Law_Inactive)
Law_Inactive <- Law_Inactive[ -c(7,8) ]
paged_table(Law_Inactive)
Seatbelts$law<-as.factor(Seatbelts$law)
Seatbelts$DriversKilled<-as.numeric(Seatbelts$DriversKilled)
Seatbelts$VanKilled<-as.numeric(Seatbelts$VanKilled)
Seatbelts<-Seatbelts %>%
mutate_at("law", str_replace, "0", "Inactive")
Seatbelts<-Seatbelts %>%
mutate_at("law", str_replace, "1", "Active")
stats_law_DK<-Seatbelts %>%
group_by(law) %>%
dplyr::summarize(min = min(DriversKilled),
median = median(DriversKilled),
mean = mean(DriversKilled),
sd = sd(DriversKilled),
max = max(DriversKilled))
paged_table(stats_law_DK)
##Grouping by years and calculating average accidents
ggplot(Seatbelts, aes(x=factor(law), y =DriversKilled, fill = law)) +
geom_boxplot() +
scale_fill_manual(values = wes_palette("Darjeeling2"))+
ylab ("Car Drivers Killed")+
xlab("Law Status")
ggplot(Seatbelts, aes(x=factor(law), y =VanKilled, fill = law)) +
geom_boxplot() +
scale_fill_manual(values = wes_palette("Cavalcanti1"))+
ylab ("Van Drivers Killed")+
xlab("Law Status")
Avg_DK<- Seatbelts %>%
group_by(law) %>%
dplyr::summarise(mean = mean(DriversKilled))
ggplot(Avg_DK, aes(x = law, y = mean, fill=law)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_manual(values = wes_palette("Darjeeling2"))+
xlab("Status of Law") +
ylab("Average Number of Drivers Killed")+
scale_y_continuous(limits = c(0,130), breaks = c(0,20,40,60,80,100,120))+
theme(legend.position = "none")
Avg_VK<- Seatbelts %>%
group_by(law) %>%
dplyr::summarise(mean = mean(VanKilled))
ggplot(Avg_VK, aes(x = law, y = mean, fill=law)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_manual(values = wes_palette("Cavalcanti1"))+
xlab("Status of Law") +
ylab("Average Number of Van Drivers Killed")+
scale_y_continuous(limits = c(0,20), breaks = c(0,5,10,15,20))+
theme(legend.position = "none")
law_means_DK <- ddply(Seatbelts, "law", summarise, mean_DK = mean(DriversKilled))
ggplot(Seatbelts, aes(x=years, y=DriversKilled, color=law)) +
geom_point()+
stat_smooth(method = 'lm')+
geom_hline(data=law_means_DK, aes(yintercept=mean_DK, color=law),
linetype="dashed")+
scale_color_manual(values=wes_palette("Darjeeling2"))+
xlab("Years") +
ylab("Number of Drivers Killed")+
scale_y_continuous(limits = c(0,200), breaks = c(0,20,40,60,80,100,120,140,160,180,200))+
scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
theme(legend.position = "right")
law_means_VK <- ddply(Seatbelts, "law", summarise, mean_VK = mean(VanKilled))
ggplot(Seatbelts, aes(x=years, y=VanKilled, color=law)) +
geom_point()+
stat_smooth(method = 'lm')+
geom_hline(data=law_means_VK, aes(yintercept=mean_VK, color=law),
linetype="dashed")+
scale_color_manual(values=wes_palette("GrandBudapest1"))+
xlab("Years") +
ylab("Number of Van Drivers Killed")+
scale_y_continuous(limits = c(0,20), breaks = c(0,5,10,15,20))+
scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
theme(legend.position = "right")
data<-read_csv("All_Data.csv")
data$CONDITION<-as.factor(data$CONDITION)
data$VALUE<-as.numeric(data$VALUE)
data$YEAR<-as.factor(data$YEAR)
data$VARIABLE <- as.factor(data$VARIABLE)
L_AC<-subset(data,CONDITION=="ACTIVE", select = c("YEAR","VARIABLE", "VALUE"))
L_INAC<-subset(data,CONDITION=="INACTIVE", select = c("YEAR","VARIABLE", "VALUE"))
a<-ggplot(data=L_AC, aes(x=YEAR, y=VALUE, fill=YEAR)) +
geom_bar(stat="identity", position=position_dodge())+
facet_wrap(~VARIABLE,nrow = 1, ncol = 2)+
ggtitle("Seatbelt Law - Active")+
scale_fill_manual(values = wes_palette("Chevalier1"))+
scale_x_discrete(name ="Years")+
scale_y_continuous(name = "Values", limits = c(0,110))+
theme(legend.position = "none")
b<-ggplot(data=L_INAC, aes(x=YEAR, y=VALUE, fill=YEAR)) +
geom_bar(stat="identity", position=position_dodge())+
facet_wrap(~VARIABLE,nrow = 1, ncol = 2)+
ggtitle("Seatbelt Law - Inactive")+
scale_x_discrete(name ="Years")+
scale_y_continuous(name = "Values", limits = c(0,150))+
theme(legend.position = "none")
ggpubr::ggarrange(a,b,
ncol = 1, nrow = 2)
We will now move on to statistical testing to text the significance of the differences in Cars and Van Drivers Killed before and after the law was introduced.
data_ttest<-read.csv2(file = "ttest_data.csv", sep = ",")
data_ttest$ACTIVE<-as.numeric(data_ttest$ACTIVE)
data_ttest$INACTIVE<-as.numeric(data_ttest$INACTIVE)
cd<-subset(data_ttest, VARIABLE=="CAR_DRIVERS", select = c("YEAR", "ACTIVE", "INACTIVE"))
t.test(cd$ACTIVE, cd$INACTIVE, na.rm = TRUE)
Welch Two Sample t-test
data: cd$ACTIVE and cd$INACTIVE
t = 7.1257, df = 6.0308, p-value = 0.0003757
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
16.65575 34.04414
sample estimates:
mean of x mean of y
125.5164 100.1665
Welch Two Sample t-test
data: vd$ACTIVE and vd$INACTIVE
t = 8.3989, df = 14.66, p-value = 5.557e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.223193 5.421456
sample estimates:
mean of x mean of y
9.488824 5.166500
The t-tests state that there is a difference in the fatalities before and after the law was established.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hungund (2022, May 4). Data Analytics and Computational Social Science: HW6. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw6/
BibTeX citation
@misc{hungund2022hw6, author = {Hungund, Apoorva}, title = {Data Analytics and Computational Social Science: HW6}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw6/}, year = {2022} }