HW6

DACSS 601 Data Science Fundamentals - Homework 6

Apoorva Hungund
2022-05-04

Introduction

Use of automobiles for transport has become a major part of everyday life. Despite the many advancements and safety laws initiated, vehicle crashes are one of the biggest causes of fatalities and injuries worldwide. Previous research on this topic has found that around 65% of vehicle-related fatalities happen to in-vehicle occupants (WHO, 2009). This same report also estimated that by 2030, road traffic injuries would be the 5th leading cause of death around the world. It is also essential to consider the financial losses caused due to vehicle fatalities. A report by USDoT stated that in 2010, over 800 billion dollars were spent on injuries and fatalities caused due to vehicle crashes (U.S. Department of Transportation, 2015).

There is extensive research being conducted in this field and one of the most efficient ways is to introduce new laws and policies aimed at reduced crash rates and injuries. To actually understand the effect and importance of such laws, we will be exploring one of the initial datasets collected on automobile crashes. The dataset, Seatbelts, explores information about automobile fatalities before and after a seatbelt law was enforced. The dataset includes entries from 1969 to 1984, with number of car and van fatalities listed for each month. We will be visualizing analyzing two main parts of the data - DriversKilled and VanKilled.

Research Questions

My analysis will work on understanding if:

  1. Is there a change in car driving deaths once the legislation was established?

  2. Is there a change in van driving once the legislation was established?

Hypothesis

Hypothesis 1: There is a reduction in car driving deaths after the legislation was established.

Hypothesis 2: There is a eduction in car driving deaths after the legislation was established.

Data

Data Selection

First, we’ll set up packages and libraries need for this project.

Show code

Next, we’ll load the dataset and convert the time series into a more readable and understandable format.

Show code
data(Seatbelts)
Seatbelts <- data.frame(years=floor(time(Seatbelts)),months=factor(cycle(Seatbelts),labels=month.abb), Seatbelts)
head(Seatbelts)
  years months DriversKilled drivers front rear   kms PetrolPrice
1  1969    Jan           107    1687   867  269  9059   0.1029718
2  1969    Feb            97    1508   825  265  7685   0.1023630
3  1969    Mar           102    1507   806  319  9963   0.1020625
4  1969    Apr            87    1385   814  407 10955   0.1008733
5  1969    May           119    1632   991  454 11823   0.1010197
6  1969    Jun           106    1511   945  427 12391   0.1005812
  VanKilled law
1        12   0
2         6   0
3        12   0
4         8   0
5        10   0
6        13   0

Next, we’ll work on seperating the dataset into two major ones based on when the law was active vs inactive.

Show code
Law_Active<-subset(Seatbelts, law == 1, select = c(years:law))
Law_Inactive<-subset(Seatbelts, law == 0, select = c(years:law))

Data Tidying and Exploration

We’ll now explore the datasets and calculate some basic statistics for both the dataframes.

Show code
summary(Law_Inactive)
     years          months   DriversKilled      drivers    
 Min.   :1969   Jan    :15   Min.   : 79.0   Min.   :1309  
 1st Qu.:1972   Feb    :14   1st Qu.:108.0   1st Qu.:1511  
 Median :1976   Mar    :14   Median :121.0   Median :1653  
 Mean   :1976   Apr    :14   Mean   :125.9   Mean   :1718  
 3rd Qu.:1979   May    :14   3rd Qu.:140.0   3rd Qu.:1926  
 Max.   :1983   Jun    :14   Max.   :198.0   Max.   :2654  
                (Other):84                                 
     front             rear            kms         PetrolPrice     
 Min.   : 567.0   Min.   :224.0   Min.   : 7685   Min.   :0.08118  
 1st Qu.: 767.0   1st Qu.:344.0   1st Qu.:12387   1st Qu.:0.09078  
 Median : 860.0   Median :401.0   Median :14455   Median :0.10273  
 Mean   : 873.5   Mean   :400.3   Mean   :14463   Mean   :0.10187  
 3rd Qu.: 986.0   3rd Qu.:454.0   3rd Qu.:16585   3rd Qu.:0.11132  
 Max.   :1299.0   Max.   :646.0   Max.   :21040   Max.   :0.13303  
                                                                   
   VanKilled           law   
 Min.   : 2.000   Min.   :0  
 1st Qu.: 7.000   1st Qu.:0  
 Median :10.000   Median :0  
 Mean   : 9.586   Mean   :0  
 3rd Qu.:13.000   3rd Qu.:0  
 Max.   :17.000   Max.   :0  
                             
Show code
dim(Law_Inactive)
[1] 169  10
Show code
summary(Law_Active)
     years          months   DriversKilled      drivers    
 Min.   :1983   Feb    : 2   Min.   : 60.0   Min.   :1057  
 1st Qu.:1983   Mar    : 2   1st Qu.: 85.0   1st Qu.:1171  
 Median :1984   Apr    : 2   Median : 92.0   Median :1282  
 Mean   :1984   May    : 2   Mean   :100.3   Mean   :1322  
 3rd Qu.:1984   Jun    : 2   3rd Qu.:119.0   3rd Qu.:1464  
 Max.   :1984   Jul    : 2   Max.   :154.0   Max.   :1763  
                (Other):11                                 
     front            rear            kms         PetrolPrice    
 Min.   :426.0   Min.   :296.0   Min.   :15511   Min.   :0.1131  
 1st Qu.:516.0   1st Qu.:347.0   1st Qu.:17971   1st Qu.:0.1148  
 Median :585.0   Median :408.0   Median :19162   Median :0.1161  
 Mean   :571.0   Mean   :407.7   Mean   :18890   Mean   :0.1165  
 3rd Qu.:629.5   3rd Qu.:471.5   3rd Qu.:19952   3rd Qu.:0.1180  
 Max.   :721.0   Max.   :521.0   Max.   :21626   Max.   :0.1201  
                                                                 
   VanKilled          law   
 Min.   :2.000   Min.   :1  
 1st Qu.:3.500   1st Qu.:1  
 Median :5.000   Median :1  
 Mean   :5.174   Mean   :1  
 3rd Qu.:7.000   3rd Qu.:1  
 Max.   :8.000   Max.   :1  
                            
Show code
dim(Law_Active)
[1] 23 10

We’ll also remove any variables we don’t need.

Show code
Law_Inactive <- Law_Inactive[ -c(7,8) ]
paged_table(Law_Inactive)
Show code
Law_Inactive <- Law_Inactive[ -c(7,8) ]
paged_table(Law_Inactive)

Visualizations

Show code
Seatbelts$law<-as.factor(Seatbelts$law)
Seatbelts$DriversKilled<-as.numeric(Seatbelts$DriversKilled)
Seatbelts$VanKilled<-as.numeric(Seatbelts$VanKilled)

Seatbelts<-Seatbelts %>% 
  mutate_at("law", str_replace, "0", "Inactive")

Seatbelts<-Seatbelts %>% 
  mutate_at("law", str_replace, "1", "Active")

stats_law_DK<-Seatbelts %>%
  group_by(law) %>%
  dplyr::summarize(min = min(DriversKilled),
            median = median(DriversKilled),
            mean = mean(DriversKilled),
            sd = sd(DriversKilled),
            max = max(DriversKilled))
paged_table(stats_law_DK)
Show code
stats_law_VK<-Seatbelts %>%
  group_by(law) %>%
  dplyr::summarize(min = min(VanKilled),
            median = median(VanKilled),
            mean = mean(VanKilled),
            sd = sd(VanKilled),
            max = max(VanKilled))
paged_table(stats_law_VK)
Show code
##Grouping by years and calculating average accidents

ggplot(Seatbelts, aes(x=factor(law), y =DriversKilled, fill = law)) +
  geom_boxplot() +
  scale_fill_manual(values = wes_palette("Darjeeling2"))+
  ylab ("Car Drivers Killed")+
  xlab("Law Status")
Show code
ggplot(Seatbelts, aes(x=factor(law), y =VanKilled, fill = law)) +
  geom_boxplot() +
  scale_fill_manual(values = wes_palette("Cavalcanti1"))+
  ylab ("Van Drivers Killed")+
  xlab("Law Status")
Show code
Avg_DK<- Seatbelts %>%
  group_by(law) %>%
  dplyr::summarise(mean = mean(DriversKilled))

ggplot(Avg_DK, aes(x = law, y = mean, fill=law)) +
  geom_bar(stat="identity", position=position_dodge())+
  scale_fill_manual(values = wes_palette("Darjeeling2"))+
  xlab("Status of Law") +
  ylab("Average Number of Drivers Killed")+
  scale_y_continuous(limits = c(0,130), breaks = c(0,20,40,60,80,100,120))+
  theme(legend.position = "none")
Show code
Avg_VK<- Seatbelts %>%
  group_by(law) %>%
  dplyr::summarise(mean = mean(VanKilled))

ggplot(Avg_VK, aes(x = law, y = mean, fill=law)) +
  geom_bar(stat="identity", position=position_dodge())+
  scale_fill_manual(values = wes_palette("Cavalcanti1"))+
  xlab("Status of Law") +
  ylab("Average Number of Van Drivers Killed")+
  scale_y_continuous(limits = c(0,20), breaks = c(0,5,10,15,20))+
  theme(legend.position = "none")
Show code
law_means_DK <- ddply(Seatbelts, "law", summarise, mean_DK = mean(DriversKilled))
ggplot(Seatbelts, aes(x=years, y=DriversKilled, color=law)) +
  geom_point()+
  stat_smooth(method = 'lm')+
  geom_hline(data=law_means_DK, aes(yintercept=mean_DK, color=law), 
             linetype="dashed")+
  scale_color_manual(values=wes_palette("Darjeeling2"))+
  xlab("Years") +
  ylab("Number of Drivers Killed")+
  scale_y_continuous(limits = c(0,200), breaks = c(0,20,40,60,80,100,120,140,160,180,200))+
  scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
  theme(legend.position = "right")
Show code
law_means_VK <- ddply(Seatbelts, "law", summarise, mean_VK = mean(VanKilled))
ggplot(Seatbelts, aes(x=years, y=VanKilled, color=law)) +
  geom_point()+
  stat_smooth(method = 'lm')+
  geom_hline(data=law_means_VK, aes(yintercept=mean_VK, color=law), 
             linetype="dashed")+
  scale_color_manual(values=wes_palette("GrandBudapest1"))+
  xlab("Years") +
  ylab("Number of Van Drivers Killed")+
  scale_y_continuous(limits = c(0,20), breaks = c(0,5,10,15,20))+
  scale_x_continuous(limits = c(1969,1984), breaks = c(1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984))+
  theme(legend.position = "right")

Show code
data<-read_csv("All_Data.csv")

data$CONDITION<-as.factor(data$CONDITION)
data$VALUE<-as.numeric(data$VALUE)
data$YEAR<-as.factor(data$YEAR)
data$VARIABLE <- as.factor(data$VARIABLE)

L_AC<-subset(data,CONDITION=="ACTIVE", select = c("YEAR","VARIABLE", "VALUE"))
L_INAC<-subset(data,CONDITION=="INACTIVE", select = c("YEAR","VARIABLE", "VALUE"))

a<-ggplot(data=L_AC, aes(x=YEAR, y=VALUE, fill=YEAR)) +
  geom_bar(stat="identity", position=position_dodge())+
  facet_wrap(~VARIABLE,nrow = 1, ncol = 2)+
  ggtitle("Seatbelt Law - Active")+
  scale_fill_manual(values = wes_palette("Chevalier1"))+
  scale_x_discrete(name ="Years")+
  scale_y_continuous(name = "Values", limits = c(0,110))+
  theme(legend.position = "none")

b<-ggplot(data=L_INAC, aes(x=YEAR, y=VALUE, fill=YEAR)) +
  geom_bar(stat="identity", position=position_dodge())+
  facet_wrap(~VARIABLE,nrow = 1, ncol = 2)+
  ggtitle("Seatbelt Law - Inactive")+
  scale_x_discrete(name ="Years")+
  scale_y_continuous(name = "Values", limits = c(0,150))+
  theme(legend.position = "none")

ggpubr::ggarrange(a,b,
          ncol = 1, nrow = 2)

We will now move on to statistical testing to text the significance of the differences in Cars and Van Drivers Killed before and after the law was introduced.

Show code
data_ttest<-read.csv2(file = "ttest_data.csv", sep = ",")

data_ttest$ACTIVE<-as.numeric(data_ttest$ACTIVE)
data_ttest$INACTIVE<-as.numeric(data_ttest$INACTIVE)

cd<-subset(data_ttest, VARIABLE=="CAR_DRIVERS", select = c("YEAR", "ACTIVE", "INACTIVE"))
t.test(cd$ACTIVE, cd$INACTIVE, na.rm = TRUE)

    Welch Two Sample t-test

data:  cd$ACTIVE and cd$INACTIVE
t = 7.1257, df = 6.0308, p-value = 0.0003757
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 16.65575 34.04414
sample estimates:
mean of x mean of y 
 125.5164  100.1665 
Show code
vd<-subset(data_ttest, VARIABLE=="VAN_DRIVERS", select = c("YEAR", "ACTIVE", "INACTIVE"))
t.test(vd$ACTIVE, vd$INACTIVE, na.rm = TRUE)

    Welch Two Sample t-test

data:  vd$ACTIVE and vd$INACTIVE
t = 8.3989, df = 14.66, p-value = 5.557e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 3.223193 5.421456
sample estimates:
mean of x mean of y 
 9.488824  5.166500 

The t-tests state that there is a difference in the fatalities before and after the law was established.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Hungund (2022, May 4). Data Analytics and Computational Social Science: HW6. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw6/

BibTeX citation

@misc{hungund2022hw6,
  author = {Hungund, Apoorva},
  title = {Data Analytics and Computational Social Science: HW6},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw6/},
  year = {2022}
}