Data Analytics and Computational Social Science: Wrangling data

Eris Dodds

#State Data looks at the overall employment rates aross different counties in different U.S. States.

A tibble: 2,993 × 2

…2 …6 1 NA NA
2 NA NA
3 STATE TOTAL 4 AE 2.0
5 AE Total1 2
6 AK 7.0
7 AK 2.0
8 AK 3.0
9 AK 2.0
10 AK 1.0
# … with 2,983 more rows

#The table itself inclues empty columns and rows which need to be reduced.

state<-drop_na(state)

A tibble: 2,986 × 2

…2 …6 1 STATE TOTAL 2 AE 2.0
3 AE Total1 2
4 AK 7.0
5 AK 2.0
6 AK 3.0
7 AK 2.0
8 AK 1.0
9 AK 88.0 10 AK Total 103
# … with 2,976 more rows

colnames(state) colnames(state)<-c(“State”, “Total”) state<-state[-c(1),] # A tibble: 2,985 × 2 State Total 1 AE 2.0
2 AE Total1 2
3 AK 7.0
4 AK 2.0
5 AK 3.0
6 AK 2.0
7 AK 1.0
8 AK 88.0 9 AK Total 103
10 AL 102.0 # … with 2,975 more rows > state_totals<- state[c(2, 9, 77, 79, 152, 168, 224, 282, 291, 293, 297, 365, 518, 522, 622, 659, 763, 856, 952, 1072, 1136, 1148, 1174, 1191, 1270, 1357, 1473, 1552, 1606, 1701, 1751, 1841, 1852, 1874, 1904, 1917, 1979, 2068, 2142, 2176, 2242, 2248, 2295, 2348, 2440, 2662, 2688, 2781, 2796, 2836, 2906, 2960, 2983),]

Isolating the totaled state data from the individual county data.

A tibble: 53 × 2

State Total 1 AE Total1 2
2 AK Total 103
3 AL Total 4257 4 AP Total1 1
5 AR Total 3871 6 AZ Total 3153 7 CA Total 13137 8 CO Total 3650 9 CT Total 2592 10 DC Total 279
# … with 43 more rows

state_numbers<- state[-c(2, 9, 77, 79, 152, 168, 224, 282, 291, 293, 297, 365, 518, 522, 622, 659, 763, 856, 952, 1072, 1136, 1148, 1174, 1191, 1270, 1357, 1473, 1552, 1606, 1701, 1751, 1841, 1852, 1874, 1904, 1917, 1979, 2068, 2142, 2176, 2242, 2248, 2295, 2348, 2440, 2662, 2688, 2781, 2796, 2836, 2906, 2960, 2983),] > state_numbers # A tibble: 2,932 × 2 State Total 1 AE 2.0
2 AK 7.0
3 AK 2.0
4 AK 3.0
5 AK 2.0
6 AK 1.0
7 AK 88.0 8 AL 102.0 9 AL 143.0 10 AL 1.0
# … with 2,922 more rows

# not sure why this is happening, sadly cannot move forward with group_by() and summarise() functions for either data set.

state_numbers %>%
+ group_by(State) %>%
+ summarize(Ave_Total=mean(Total))
# A tibble: 56 × 2
   State  Ave_Total
   <chr>      <dbl>
 1 AE            NA
 2 AK            NA
 3 AL            NA
 4 AP            NA
 5 AR            NA
 6 AZ            NA
 7 CA            NA
 8 CANADA        NA
 9 CO            NA
10 CT            NA
# … with 46 more rows
There were 50 or more warnings (use warnings() to see the first 50)

state_totals %>%
+     + group_by("State") %>%
+     + summarize(Ave_Total=mean("Total"))
Error in UseMethod("group_by") : 
  no applicable method for 'group_by' applied to an object of class "character"
  
glimpse(state_totals)
Rows: 53
Columns: 2
$ State <chr> "AE Total1", "AK Total", "AL Total", "AP Total1", "AR Total", "AZ…
$ Total <chr> "2", "103", "4257", "1", "3871", "3153", "13137", "3650", "2592",…


ggplot(data = state_totals) + geom_bar(mapping = aes(x = State, y = Total), stat = "identity")



Distill is a publication format for scientific and technical writing, native to the web. 

Learn more about using Distill for R Markdown at <https://rstudio.github.io/distill>.





```{.r .distill-force-highlighting-css}

Comment on this article Share:

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Dodds (2022, Feb. 23). Data Analytics and Computational Social Science: Wrangling data. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomaristotle1869077/

BibTeX citation

@misc{dodds2022wrangling,
  author = {Dodds, Eris},
  title = {Data Analytics and Computational Social Science: Wrangling data},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomaristotle1869077/},
  year = {2022}
}