Below is my attempt to organize a dataset on employment rates across different states. As will be seen below, I attempt to organize the data but isolating the totals, and another by removing the totals rows from the data set. I was able to do some calulations (but was having errors with others).
#State Data looks at the overall employment rates aross different counties in different U.S. States.
…2 …6
2 NA NA
3 STATE TOTAL 4 AE 2.0
5 AE Total1 2
6 AK 7.0
7 AK 2.0
8 AK 3.0
9 AK 2.0
10 AK 1.0
# … with 2,983 more rows
#The table itself inclues empty columns and rows which need to be reduced.
state<-drop_na(state)
…2 …6
3 AE Total1 2
4 AK 7.0
5 AK 2.0
6 AK 3.0
7 AK 2.0
8 AK 1.0
9 AK 88.0 10 AK Total 103
# … with 2,976 more rows
colnames(state) colnames(state)<-c(“State”, “Total”) state<-state[-c(1),] # A tibble: 2,985 × 2 State Total
2 AE Total1 2
3 AK 7.0
4 AK 2.0
5 AK 3.0
6 AK 2.0
7 AK 1.0
8 AK 88.0 9 AK Total 103
10 AL 102.0 # … with 2,975 more rows > state_totals<- state[c(2, 9, 77, 79, 152, 168, 224, 282, 291, 293, 297, 365, 518, 522, 622, 659, 763, 856, 952, 1072, 1136, 1148, 1174, 1191, 1270, 1357, 1473, 1552, 1606, 1701, 1751, 1841, 1852, 1874, 1904, 1917, 1979, 2068, 2142, 2176, 2242, 2248, 2295, 2348, 2440, 2662, 2688, 2781, 2796, 2836, 2906, 2960, 2983),]
State Total
2 AK Total 103
3 AL Total 4257 4 AP Total1 1
5 AR Total 3871 6 AZ Total 3153 7 CA Total 13137 8 CO Total 3650 9 CT Total 2592 10 DC Total 279
# … with 43 more rows
state_numbers<- state[-c(2, 9, 77, 79, 152, 168, 224, 282, 291, 293, 297, 365, 518, 522, 622, 659, 763, 856, 952, 1072, 1136, 1148, 1174, 1191, 1270, 1357, 1473, 1552, 1606, 1701, 1751, 1841, 1852, 1874, 1904, 1917, 1979, 2068, 2142, 2176, 2242, 2248, 2295, 2348, 2440, 2662, 2688, 2781, 2796, 2836, 2906, 2960, 2983),] > state_numbers # A tibble: 2,932 × 2 State Total
2 AK 7.0
3 AK 2.0
4 AK 3.0
5 AK 2.0
6 AK 1.0
7 AK 88.0 8 AL 102.0 9 AL 143.0 10 AL 1.0
# … with 2,922 more rows
# not sure why this is happening, sadly cannot move forward with group_by() and summarise() functions for either data set.
state_numbers %>%
+ group_by(State) %>%
+ summarize(Ave_Total=mean(Total))
# A tibble: 56 × 2
State Ave_Total
<chr> <dbl>
1 AE NA
2 AK NA
3 AL NA
4 AP NA
5 AR NA
6 AZ NA
7 CA NA
8 CANADA NA
9 CO NA
10 CT NA
# … with 46 more rows
There were 50 or more warnings (use warnings() to see the first 50)
state_totals %>%
+ + group_by("State") %>%
+ + summarize(Ave_Total=mean("Total"))
Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "character"
glimpse(state_totals)
Rows: 53
Columns: 2
$ State <chr> "AE Total1", "AK Total", "AL Total", "AP Total1", "AR Total", "AZ…
$ Total <chr> "2", "103", "4257", "1", "3871", "3153", "13137", "3650", "2592",…
ggplot(data = state_totals) + geom_bar(mapping = aes(x = State, y = Total), stat = "identity")
Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill for R Markdown at <https://rstudio.github.io/distill>.
```{.r .distill-force-highlighting-css}
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Dodds (2022, Feb. 23). Data Analytics and Computational Social Science: Wrangling data. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomaristotle1869077/
BibTeX citation
@misc{dodds2022wrangling, author = {Dodds, Eris}, title = {Data Analytics and Computational Social Science: Wrangling data}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomaristotle1869077/}, year = {2022} }