1987 USDA Ag Census –trying to make it useable by creating search functions
The underlying data files are read-in and tidied in another script.
usable output of that script includes
tidy_MJ_Counties[includes 1987 crime data, USDA county typology, USDA rural county econ codes; census regions; eliminates counties with 0 population and 0 MJ manufacture arrests; selects only MJ crimes; filters only counties with code >3 (eg no major metro areas)]
MJ_Counties_joinedUSDA [includes everything in tidy_MJ_Counties + all 1987 Ag Census data, minus the columns called flag]
USDA35206_labels [codebook for Ag Census data]
There are a couple of notable features of the data as I’ve tidied it. MJ_Counties_joinedUSDA has 1186 observations of 2637 variables, so many that it’s not super useable. The variables are identified with codes, and I have a lookup table USDA35206_labels. I first tried manually breaking the variables into useful, thematic subsets.
#sample keyword searches and colnameslookup
farmvalue <- MJ_Counties_joinedUSDA %>%
select(FIPS5, ITEM01019:ITEM01029)
head(farmvalue)
# A tibble: 6 x 12
FIPS5 ITEM01019 ITEM01020 ITEM01021 ITEM01022 ITEM01023 ITEM01024
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01005 19702 39562 10891 8811 138 57
2 01017 4389 12024 529 3860 157 60
3 01019 26593 54606 16356 10237 164 84
4 01021 9454 13429 5528 3927 250 171
5 01023 4146 13865 349 3797 126 77
6 01031 73123 86844 15076 58047 185 98
# ... with 5 more variables: ITEM01025 <dbl>, ITEM01026 <dbl>,
# ITEM01027 <dbl>, ITEM01028 <dbl>, ITEM01029 <dbl>
I wanted to rename those variables from the lookup table. I wrote a function that would look those up. I will admit that I don’t really understand why it works, but it does.
#create function to lookup variable names, to be used on the results of a search
USDA_names <- function(x){
names(x)[names(x) %in% USDA35206_labels$code] = USDA35206_labels$name[match(names(x)[names(x) %in% USDA35206_labels$code], USDA35206_labels$code)]
}
farmvalue <- MJ_Counties_joinedUSDA %>%
select(FIPS5, ITEM01019:ITEM01029)
colnames(farmvalue) <- USDA_names(farmvalue)
head(farmvalue)
# A tibble: 6 x 12
FIPS5 `Market value of ag~ `Market value of ag ~ `Crops,incl nurser~
<chr> <dbl> <dbl> <dbl>
1 01005 19702 39562 10891
2 01017 4389 12024 529
3 01019 26593 54606 16356
4 01021 9454 13429 5528
5 01023 4146 13865 349
6 01031 73123 86844 15076
# ... with 8 more variables:
# Livestock, poultry&their prod ($1,000), 1987 <dbl>,
# Farms by value of sales: Less than $2,500, 1987 <dbl>,
# Farms by value of sales: $2,500 to $4,999, 1987 <dbl>,
# Farms by value of sales: $5,000 to $9,999, 1987 <dbl>,
# Farms by value of sales: $10,000 to $24,999, 1987 <dbl>,
# Farms by value of sales: $25,000 to $49,999, 1987 <dbl>,
# Farms by value of sales: $50,000 to $99,999, 1987 <dbl>,
# Farms by value of sales: $100,000 or more, 1987 <dbl>
That is better. I wanted to write a function that would enable me to do a search for a keyword in the lookup table and get a subset of variables that matched those keywords. The first step was to write out the steps. The first example here is the result, with the keyword “operator.” The result is a table called operator_chars that has all the variables that contain the word operator (USDA-speak for farmer).
#filter variables for those containing keywords, then select from MJ_Counties_joinedUSDA and rename
operator_filter <- USDA35206_labels %>%
filter(grepl("operator", name, ignore.case = TRUE))
operator_chars <- MJ_Counties_joinedUSDA %>%
select(FIPS5, all_of(operator_filter$code))
colnames(operator_chars) <- USDA_names(operator_chars)
Since that worked, I created a function to do that.
#create function to systematize breaking up the data
#input must be in quotes
searchUSDA <- function(x){
y <- USDA35206_labels %>%
filter(grepl(x, name, ignore.case = TRUE)) %>%
add_row(MJcolnames)
result <- MJ_Counties_joinedUSDA[, y$code, with = TRUE]
colnames(result) <- USDA_names(result)
view(result)
}
#to call the function
keyword <- searchUSDA("keyword")
That works really well for keyword searching. There are a couple of problems. The biggest problem is that I have not figured out how to name the output. I would like the user input (x) to become the name of the output. As is, the name is result until/unless I rename it in calling the function. I have tried many things to make the user input be used as the name of an object, but R never likes those attempts.
The reason it matters is because I want to iterate searchUSDA over a vector of search terms. Ideally the result would be a series of tables, each named after the search term used to create it.
It is easy enough to iterate the function over a string of search terms.
The output of this is unsatisfying. The object name “result” gets replaced with each iteration, so the only one that actually remains at the end of process is the last item in the list (in this case, the search for “exp”). It’s clear in the console that it has in fact iterated through each search term, but it discards the results with the next iteration because it writes over the object “result.” I have been trying to figure out how to get it to rename the output with each iteration, ideally by naming it whatever the search term was. I haven’t figured out how to give each iteration a separate name, though.
I have tried assigning the result to x (the user input), which doesn’t work. I have tried {{x}}. I have tried various other things.
Or perhaps there is a way to create a temporary name that gets replaced.
#throws error: Error in colnames("x") <- USDA_names("x") : target of assignment expands to non-language object
searchUSDA0 <- function(x){
y <- USDA35206_labels %>%
filter(grepl(x, name, ignore.case = TRUE)) %>%
add_row(MJcolnames)
"x" <- MJ_Counties_joinedUSDA[, y$code, with = TRUE]
colnames("x") <- USDA_names("x")
}
#throws error: "Error in { : could not find function "{<-""
searchUSDA1 <- function(x){
y <- USDA35206_labels %>%
filter(grepl(x, name, ignore.case = TRUE)) %>%
add_row(MJcolnames)
{{x}} <- MJ_Counties_joinedUSDA[, y$code, with = TRUE]
colnames({{x}}) <- USDA_names({{x}})
}
#doesn't throw an error, but only returns the variable names (eg it doesn't actually do the second step)
searchUSDA2 <- function(x){
y <- USDA35206_labels %>%
filter(grepl(x, name, ignore.case = TRUE)) %>%
add_row(MJcolnames)
x <- MJ_Counties_joinedUSDA[, y$code, with = TRUE]
colnames(x) <- USDA_names(x)
}
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Merleaux (2022, April 3). Data Analytics and Computational Social Science: Functions to help organize unwieldy data. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomamerleaux885445/
BibTeX citation
@misc{merleaux2022functions, author = {Merleaux, April}, title = {Data Analytics and Computational Social Science: Functions to help organize unwieldy data}, url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomamerleaux885445/}, year = {2022} }