This routine will include some of the challenges that were posted each week but that I haven’t really tried until now.
Challenges:
I have now created a personal blog for DACSS 601 on Github and will keep all posts here!
rr_county <- read.csv("_data/Clean/railroad_2012_clean_county_tidy.csv")
rr_state <- read.csv("_data/Clean/railroad_2012_clean_state.csv")
aussie_marriage <- read_excel("_data/Unclean/australian_marriage_law_postal_survey_2017_-_response_final.xls", sheet = "Table 2", na = "", skip = 6) %>%
select(1:2,4,9,11,13) %>%
rename(town = 1, "response_clear:yes" = 2, "response_clear:no" = 3, "eligible:yes" = 4, "eligible:no" = 5, "eligible:noresponse" = 6) %>%
filter(str_detect(town, "Divisions", negate=TRUE), str_detect(town, "(Total)", negate=TRUE)) %>%
pivot_longer(cols = 2:6, names_to = "temp") %>%
separate(temp, into = c("q","r"), sep = ":")
head(rr_county)
state county total_employees
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
head(rr_state)
state total_employees
1 AE 2
2 AK 103
3 AL 4257
4 AP 1
5 AR 3871
6 AZ 3153
head(aussie_marriage)
# A tibble: 6 x 4
town q r value
<chr> <chr> <chr> <dbl>
1 Banks response_clear yes 37736
2 Banks response_clear no 46343
3 Banks eligible yes 84079
4 Banks eligible no 247
5 Banks eligible noresponse 20928
6 Barton response_clear yes 37153
aussie_marriage %>%
filter(q == "response_clear") %>%
drop_na() %>%
group_by(r) %>%
summarize(r, total = sum(value)) %>%
distinct() %>%
ggplot(aes(x=r, y=total)) +
geom_point()
I did a pivot longer in the previous challenge and in HW2.
ActiveDutyMilitary <- read_excel("_data/Unclean/ActiveDuty_MaritalStatus.xls", sheet = "TotalDoD", skip = 8) %>%
select(2:4,6,7,9,10,12,13) %>%
rename(paygrade = 1, "single::yes:male" = 2, "single::yes:female" = 3, "single::no:male" = 4, "single::no:female" = 5, "married:military::male" = 6, "married:military::female" = 7,"married:civilian::male" = 8, "married:civilian::female" = 9) %>%
filter(str_detect(paygrade, "TOTAL", negate = TRUE)) %>%
pivot_longer(cols = 2:9, names_to = "temp") %>%
separate(paygrade, into = c("Class","level"), sep = "-") %>%
separate(temp, into = c("married","spouse","chidren","gender"), sep = ":")
head(ActiveDutyMilitary)
# A tibble: 6 x 7
Class level married spouse chidren gender value
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 E 1 single "" "yes" male 31229
2 E 1 single "" "yes" female 5717
3 E 1 single "" "no" male 563
4 E 1 single "" "no" female 122
5 E 1 married "military" "" male 139
6 E 1 married "military" "" female 141
try an earlier challenge; try using transactional data, grouping, and then visualizing grouped statistics; (intermediate) try graphing data with time on the x axis; (advanced) try reading in the active duty military dataset (one sheet only);
Try an earlier challenge; try using lubridate to transform non-date variables to dates (e.g, in eggs dataset); try using facet_wrap or facet_grid; try reading in a single sheet of active duty military data
Try an earlier challenge; try to figure out the messy data at https://docs.google.com/spreadsheets/d/1N3FpC8k_0jNiA6uYkFP0JITzxDfh83oGalkRVo9ZBgc/edit#gid=2009914420
We will work on dates using lubridate in the next session. One challenge would be to try to create a minimal working example to post in a help request.
Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill at https://rstudio.github.io/distill.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
O'Connell (2022, March 23). Data Analytics and Computational Social Science: Challenges. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsnorthonumgithubiodacss601-blogposts2022-03-16-challenges/
BibTeX citation
@misc{o'connell2022challenges, author = {O'Connell, Jason}, title = {Data Analytics and Computational Social Science: Challenges}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsnorthonumgithubiodacss601-blogposts2022-03-16-challenges/}, year = {2022} }