DACSS 601 Data Science Fundamentals - Homework 4
#1) Reading in the Dataset
data(Seatbelts)
Seatbelts <- data.frame(years=floor(time(Seatbelts)),months=factor(cycle(Seatbelts),labels=month.abb), Seatbelts)
Seatbelts$law<-as.factor(Seatbelts$law)
Seatbelts$DriversKilled<-as.numeric(Seatbelts$DriversKilled)
Seatbelts$VanKilled<-as.numeric(Seatbelts$VanKilled)
Seatbelts<-Seatbelts %>%
mutate_at("law", str_replace, "0", "Inactive")
Seatbelts<-Seatbelts %>%
mutate_at("law", str_replace, "1", "Active")
#1)Descriptive Statistics
favstats(Seatbelts$DriversKilled)
min Q1 median Q3 max mean sd n missing
60 104.75 118.5 138 198 122.8021 25.37989 192 0
stats_law_DK<-Seatbelts %>%
group_by(law) %>%
dplyr::summarize(min = min(DriversKilled),
median = median(DriversKilled),
mean = mean(DriversKilled),
sd = sd(DriversKilled),
max = max(DriversKilled))
stats_law_DK
# A tibble: 2 × 6
law min median mean sd max
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Active 60 92 100. 22.2 154
2 Inactive 79 121 126. 24.3 198
stats_law_VK<-Seatbelts %>%
group_by(law) %>%
dplyr::summarize(min = min(VanKilled),
median = median(VanKilled),
mean = mean(VanKilled),
sd = sd(VanKilled),
max = max(VanKilled))
stats_law_VK
# A tibble: 2 × 6
law min median mean sd max
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Active 2 5 5.17 1.83 8
2 Inactive 2 10 9.59 3.50 17
stats_years_DK<-Seatbelts %>%
group_by(years) %>%
dplyr::summarize(min = min(DriversKilled),
median = median(DriversKilled),
mean = mean(DriversKilled),
sd = sd(DriversKilled),
max = max(DriversKilled))
stats_years_DK
# A tibble: 16 × 6
years min median mean sd max
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1969 87 107 117. 25.5 180
2 1970 102 124. 133. 30.6 190
3 1971 104 138 138. 22.5 187
4 1972 114 150. 147. 25.5 198
5 1973 109 145 144. 15.2 168
6 1974 100 131 129. 21.9 163
7 1975 92 111 118. 23.4 161
8 1976 97 112. 120. 20.4 162
9 1977 79 111 119. 31.2 183
10 1978 100 120. 127. 22.5 178
11 1979 94 116. 123. 20.7 171
12 1980 92 109 112. 16.3 137
13 1981 84 108. 112. 19.5 153
14 1982 103 122 123. 14.9 152
15 1983 60 97.5 99.8 20.2 126
16 1984 79 91 102. 24.7 154
stats_years_VK<-Seatbelts %>%
group_by(years) %>%
dplyr::summarize(min = min(VanKilled),
median = median(VanKilled),
mean = mean(VanKilled),
sd = sd(VanKilled),
max = max(VanKilled))
stats_years_VK
# A tibble: 16 × 6
years min median mean sd max
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1969 6 11.5 10.9 3.09 16
2 1970 6 12 11.5 3.12 16
3 1971 6 13.5 13.2 3.07 17
4 1972 3 12 10.8 4.29 17
5 1973 5 11 10.6 3.15 15
6 1974 4 10.5 10.2 3.76 15
7 1975 6 11 10.7 3.14 16
8 1976 4 10 9.75 2.96 14
9 1977 5 7.5 8.33 3.03 15
10 1978 3 9 8.33 3.26 14
11 1979 5 8.5 9 2.30 13
12 1980 4 8 8.08 2.68 14
13 1981 4 7 7.25 2.38 12
14 1982 2 5.5 5.83 2.76 12
15 1983 2 5.5 5.25 2.22 8
16 1984 3 5.5 5.33 1.56 7
#2)Visualizations
##Grouping by years and calculating average accidents
Avg_DK<- Seatbelts %>%
group_by(law) %>%
dplyr::summarise(mean = mean(DriversKilled))
ggplot(Avg_DK, aes(x = law, y = mean, fill=law)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_manual(values = wes_palette("Darjeeling2"))+
xlab("Status of Law") +
ylab("Average Number of Drivers Killed")+
scale_y_continuous(limits = c(0,130), breaks = c(0,20,40,60,80,100,120))+
theme(legend.position = "none")
Avg_VK<- Seatbelts %>%
group_by(law) %>%
dplyr::summarise(mean = mean(VanKilled))
ggplot(Avg_VK, aes(x = law, y = mean, fill=law)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_manual(values = wes_palette("Cavalcanti1"))+
xlab("Status of Law") +
ylab("Average Number of Van Drivers Killed")+
scale_y_continuous(limits = c(0,20), breaks = c(0,5,10,15,20))+
theme(legend.position = "none")
Very basic lot, but gives us a good idea of the average number of drivers and van drivers killed when the law is inactive and active. It’s clear that there are fewer fatalities when the law is active, which is to be expected. This does provide us with the insight that once the law was activated, the number of drivers and van drivers killed on average over the years reduced comapred to years when the seatbelt law is active. However, these are only averages. This doesn’t give us an idea of the trend in fatalities over the years.
#2)Visualizations Continued
law_means_DK <- ddply(Seatbelts, "law", summarise, mean_DK = mean(DriversKilled))
ggplot(Seatbelts, aes(x=years, y=DriversKilled, color=law)) +
geom_point()+
stat_smooth(method = 'lm')+
geom_hline(data=law_means_DK, aes(yintercept=mean_DK, color=law),
linetype="dashed")+
scale_color_manual(values=wes_palette("Darjeeling2"))+
xlab("Years") +
ylab("Number of Drivers Killed")+
scale_y_continuous(limits = c(0,200), breaks = c(0,20,40,60,80,100,120,140,160,180,200))+
scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
theme(legend.position = "right")
law_means_VK <- ddply(Seatbelts, "law", summarise, mean_VK = mean(VanKilled))
ggplot(Seatbelts, aes(x=years, y=VanKilled, color=law)) +
geom_point()+
stat_smooth(method = 'lm')+
geom_hline(data=law_means_VK, aes(yintercept=mean_VK, color=law),
linetype="dashed")+
scale_color_manual(values=wes_palette("GrandBudapest1"))+
xlab("Years") +
ylab("Number of Van Drivers Killed")+
scale_y_continuous(limits = c(0,20), breaks = c(0,5,10,15,20))+
scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
theme(legend.position = "right")
Based on the two plots above, we can make a conclusion, based only on the visualization that the fatalities, both in car and van drivers, show a decreasing trend up until the law is introduced, at which point, there is a definite decrease in fatalities. However, it should also be observed that even after the law is introduced in 1983, there is a slight increase from 1983 to 1984 - in both car and van drivers. This image however does not give us an idea of the average fatalities by year. Insted of scatter plot, maybe a box plot would be a better visualization technique here.
#2)Visualizations Continued
ggplot(Seatbelts, aes(x=DriversKilled, y=VanKilled, color=law)) +
geom_point()+
stat_smooth(method = 'lm')+
scale_color_manual(values=wes_palette("BottleRocket1"))+
xlab("Number of Drivers Killed") +
ylab("Number of Van Drivers Killed")+
theme(legend.position = "right")
This graph depicts the association Drivers and Van Drivers Killed when the law is active vs inactive. This graph helps us understand that fatalities show an increase both when the law is active and inactive. When inactive, the fatalities show a sharper increase as compared to when active. However this does not help us understand the fatalities trend by year. A good idea would be to heva years in y-axis and both VanKilled and DriversKilled as paired box-plots.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hungund (2022, May 4). Data Analytics and Computational Social Science: HW4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw4/
BibTeX citation
@misc{hungund2022hw4, author = {Hungund, Apoorva}, title = {Data Analytics and Computational Social Science: HW4}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw4/}, year = {2022} }