<-read_excel("_data/StateCounty2012.xls", skip = 4, col_names= c("state", "delete", "county", "delete", "employees"))%>%
railroadxlsselect(!contains("delete"))%>%
filter(!str_detect(state, "Total"))
<-head(railroadxls, -2)%>%
railroadxlsmutate(county = ifelse(state=="CANADA", "CANADA", county))
railroadxls
Challenge 10 Solution
The challenge
For my final project, I need to collect the range of groups of values. Therefore, for this challenge, I will practice creating min and max columns and then using map_dbl() to calculate the range between those values. I’ll use the 2012 Railroad data as my dataset, and calculate, for each U.S. State/District, the range in number of employees per county.
First I will read in the data, skip the 4 rows of metadata, label the empty columns for deletion, and remove the rows which contain summative (“Total”) values. I’ll then remove the row for “Canada”, since its values can not be compared with the US State values, and therefore can not be analyzed together.
I now have a tidier dataset that can be grouped by State. I will create a tibble with two columns: one with the number of employees in the county of most employees, and another with the number of employees in the county of least employees. (In other words, employee counts for counties that engage the most and the least railroad personnel.)
<-railroadxls%>%
MaxMin_Employeesgroup_by(state)%>%
summarise(max_employees = max(employees), min_employees=min(employees))%>%
arrange(desc(max_employees))
MaxMin_Employees
I can now use map2_dbl() to calculate the range across the max and min columns by subtracting the min from the max.
map2_dbl(MaxMin_Employees$max_employees, MaxMin_Employees$min_employees, `-`)
[1] 8206 4234 3796 3680 3248 3072 2887 2054 1996 1646 1535 1285 1213 1078 1038
[16] 989 971 877 848 839 808 746 734 629 0 650 620 608 579 552
[31] 545 537 524 482 465 463 429 406 405 376 340 321 310 0 268
[46] 219 166 144 115 87 80 0 1 0