This is Homework 2! Pulling JSON data from USDA food data website
API has 200 item limit per call, iterate 50 times over pageNumber to generate dataset of 1000 items:
#API KEY: CUz3HomE1xtius6ZtxNx4MnWny9tgvYT0mWS4HDb
i<-1
foodlist <- list()
while (i<=50){
if (i==1){
foodlist<- fromJSON(rawToChar(GET(paste("https://api.nal.usda.gov/fdc/v1/foods/list?api_key=CUz3HomE1xtius6ZtxNx4MnWny9tgvYT0mWS4HDb&&pageNumber=",1,"&&PageSize=200",sep=""))$content))
}else{
foodlist <-rbind(foodlist,fromJSON(rawToChar(GET(paste("https://api.nal.usda.gov/fdc/v1/foods/list?api_key=CUz3HomE1xtius6ZtxNx4MnWny9tgvYT0mWS4HDb&&pageNumber=",1,"&&PageSize=200",sep=""))$content)))
}
i <- i+1
}
Data contains nested tables, pull them out:
foodlist <- foodlist %>% unnest(cols = c(foodNutrients)) %>% rename(measureID = number, measureName = name, value = amount )
print(head(foodlist))
# A tibble: 6 × 12
fdcId description dataType publicationDate foodCode measureID
<int> <chr> <chr> <chr> <chr> <chr>
1 1104067 100 GRAND Bar Survey (FN… 2020-10-30 91715300 203
2 1104067 100 GRAND Bar Survey (FN… 2020-10-30 91715300 204
3 1104067 100 GRAND Bar Survey (FN… 2020-10-30 91715300 205
4 1104067 100 GRAND Bar Survey (FN… 2020-10-30 91715300 208
5 1104067 100 GRAND Bar Survey (FN… 2020-10-30 91715300 221
6 1104067 100 GRAND Bar Survey (FN… 2020-10-30 91715300 255
# … with 6 more variables: measureName <chr>, value <dbl>,
# unitName <chr>, derivationCode <chr>,
# derivationDescription <chr>, ndbNumber <chr>
Describe the data:
str(foodlist)
tibble [600,800 × 12] (S3: tbl_df/tbl/data.frame)
$ fdcId : int [1:600800] 1104067 1104067 1104067 1104067 1104067 1104067 1104067 1104067 1104067 1104067 ...
$ description : chr [1:600800] "100 GRAND Bar" "100 GRAND Bar" "100 GRAND Bar" "100 GRAND Bar" ...
$ dataType : chr [1:600800] "Survey (FNDDS)" "Survey (FNDDS)" "Survey (FNDDS)" "Survey (FNDDS)" ...
$ publicationDate : chr [1:600800] "2020-10-30" "2020-10-30" "2020-10-30" "2020-10-30" ...
$ foodCode : chr [1:600800] "91715300" "91715300" "91715300" "91715300" ...
$ measureID : chr [1:600800] "203" "204" "205" "208" ...
$ measureName : chr [1:600800] "Protein" "Total lipid (fat)" "Carbohydrate, by difference" "Energy" ...
$ value : num [1:600800] 2.5 19.3 71 468 0 6.1 8 55 51.9 1 ...
$ unitName : chr [1:600800] "G" "G" "G" "KCAL" ...
$ derivationCode : chr [1:600800] NA NA NA NA ...
$ derivationDescription: chr [1:600800] NA NA NA NA ...
$ ndbNumber : chr [1:600800] NA NA NA NA ...
Sort data by Measure ID:
Create a list of datasets, one for each dataType:
dataTypes <- distinct(foodlist,dataType)
foodlist_by_dataType <- list()
for (i in seq_along(dataTypes[[1]])){
foodlist_by_dataType[[i]] <- foodlist %>% filter(dataType == dataTypes[[1]][i])
}
head(foodlist_by_dataType)
[[1]]
# A tibble: 13,950 × 12
fdcId description dataType publicationDate foodCode measureID
<int> <chr> <chr> <chr> <chr> <chr>
1 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
2 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
3 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
4 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
5 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
6 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
7 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
8 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
9 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
10 1999631 Almond milk, u… Foundat… 2021-10-28 <NA> ""
# … with 13,940 more rows, and 6 more variables: measureName <chr>,
# value <dbl>, unitName <chr>, derivationCode <chr>,
# derivationDescription <chr>, ndbNumber <chr>
[[2]]
# A tibble: 256,750 × 12
fdcId description dataType publicationDate foodCode measureID
<int> <chr> <chr> <chr> <chr> <chr>
1 1104067 100 GRAND Bar Survey … 2020-10-30 91715300 203
2 1104086 3 MUSKETEERS B… Survey … 2020-10-30 91726420 203
3 1104087 3 Musketeers T… Survey … 2020-10-30 91726425 203
4 1099098 Abalone, cooke… Survey … 2020-10-30 26301110 203
5 1099099 Abalone, flour… Survey … 2020-10-30 26301140 203
6 1099100 Abalone, steam… Survey … 2020-10-30 26301160 203
7 1102193 Adobo, with no… Survey … 2020-10-30 58137300 203
8 1102343 Adobo, with ri… Survey … 2020-10-30 58150530 203
9 1103957 Agave liquid s… Survey … 2020-10-30 91302020 203
10 1101164 Air filled fri… Survey … 2020-10-30 53420300 203
# … with 256,740 more rows, and 6 more variables: measureName <chr>,
# value <dbl>, unitName <chr>, derivationCode <chr>,
# derivationDescription <chr>, ndbNumber <chr>
[[3]]
# A tibble: 330,100 × 12
fdcId description dataType publicationDate foodCode measureID
<int> <chr> <chr> <chr> <chr> <chr>
1 167782 Abiyuch, raw SR Lega… 2019-04-01 <NA> 203
2 171687 Acerola juice, … SR Lega… 2019-04-01 <NA> 203
3 171686 Acerola, (west … SR Lega… 2019-04-01 <NA> 203
4 168061 Acorn stew (Apa… SR Lega… 2019-04-01 <NA> 203
5 168992 Agave, cooked (… SR Lega… 2019-04-01 <NA> 203
6 168993 Agave, dried (S… SR Lega… 2019-04-01 <NA> 203
7 169814 Agave, raw (Sou… SR Lega… 2019-04-01 <NA> 203
8 169823 Agutuk, fish wi… SR Lega… 2019-04-01 <NA> 203
9 168976 Agutuk, fish/be… SR Lega… 2019-04-01 <NA> 203
10 168977 Agutuk, meat-ca… SR Lega… 2019-04-01 <NA> 203
# … with 330,090 more rows, and 6 more variables: measureName <chr>,
# value <dbl>, unitName <chr>, derivationCode <chr>,
# derivationDescription <chr>, ndbNumber <chr>
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Markowitz (2022, Feb. 9). Data Analytics and Computational Social Science: HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomamarkowitzhw2/
BibTeX citation
@misc{markowitz2022hw2, author = {Markowitz, Ari}, title = {Data Analytics and Computational Social Science: HW2}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomamarkowitzhw2/}, year = {2022} }