HW2

This is Homework 2! Pulling JSON data from USDA food data website

Ari Markowitz
2022-02-08

Setup

knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
library(tidyverse)
library(readxl)
library(httr)
library(jsonlite)
knitr::opts_knit$set(root.dir = "/Users/Lion_1/Desktop/DACSS/601/Datasets")

Read Food Data from API Key

API has 200 item limit per call, iterate 50 times over pageNumber to generate dataset of 1000 items:

#API KEY: CUz3HomE1xtius6ZtxNx4MnWny9tgvYT0mWS4HDb
i<-1
foodlist <- list()
while (i<=50){
  if (i==1){
      foodlist<- fromJSON(rawToChar(GET(paste("https://api.nal.usda.gov/fdc/v1/foods/list?api_key=CUz3HomE1xtius6ZtxNx4MnWny9tgvYT0mWS4HDb&&pageNumber=",1,"&&PageSize=200",sep=""))$content))
  }else{
    foodlist <-rbind(foodlist,fromJSON(rawToChar(GET(paste("https://api.nal.usda.gov/fdc/v1/foods/list?api_key=CUz3HomE1xtius6ZtxNx4MnWny9tgvYT0mWS4HDb&&pageNumber=",1,"&&PageSize=200",sep=""))$content)))
  }
i <- i+1
}

Clean Up Data

Data contains nested tables, pull them out:

foodlist <- foodlist %>% unnest(cols = c(foodNutrients)) %>% rename(measureID = number, measureName = name, value = amount )
print(head(foodlist))
# A tibble: 6 × 12
    fdcId description   dataType    publicationDate foodCode measureID
    <int> <chr>         <chr>       <chr>           <chr>    <chr>    
1 1104067 100 GRAND Bar Survey (FN… 2020-10-30      91715300 203      
2 1104067 100 GRAND Bar Survey (FN… 2020-10-30      91715300 204      
3 1104067 100 GRAND Bar Survey (FN… 2020-10-30      91715300 205      
4 1104067 100 GRAND Bar Survey (FN… 2020-10-30      91715300 208      
5 1104067 100 GRAND Bar Survey (FN… 2020-10-30      91715300 221      
6 1104067 100 GRAND Bar Survey (FN… 2020-10-30      91715300 255      
# … with 6 more variables: measureName <chr>, value <dbl>,
#   unitName <chr>, derivationCode <chr>,
#   derivationDescription <chr>, ndbNumber <chr>

Describe the data:

str(foodlist)
tibble [600,800 × 12] (S3: tbl_df/tbl/data.frame)
 $ fdcId                : int [1:600800] 1104067 1104067 1104067 1104067 1104067 1104067 1104067 1104067 1104067 1104067 ...
 $ description          : chr [1:600800] "100 GRAND Bar" "100 GRAND Bar" "100 GRAND Bar" "100 GRAND Bar" ...
 $ dataType             : chr [1:600800] "Survey (FNDDS)" "Survey (FNDDS)" "Survey (FNDDS)" "Survey (FNDDS)" ...
 $ publicationDate      : chr [1:600800] "2020-10-30" "2020-10-30" "2020-10-30" "2020-10-30" ...
 $ foodCode             : chr [1:600800] "91715300" "91715300" "91715300" "91715300" ...
 $ measureID            : chr [1:600800] "203" "204" "205" "208" ...
 $ measureName          : chr [1:600800] "Protein" "Total lipid (fat)" "Carbohydrate, by difference" "Energy" ...
 $ value                : num [1:600800] 2.5 19.3 71 468 0 6.1 8 55 51.9 1 ...
 $ unitName             : chr [1:600800] "G" "G" "G" "KCAL" ...
 $ derivationCode       : chr [1:600800] NA NA NA NA ...
 $ derivationDescription: chr [1:600800] NA NA NA NA ...
 $ ndbNumber            : chr [1:600800] NA NA NA NA ...

Sort data by Measure ID:

foodlist <- foodlist %>% arrange(measureID)

Create a list of datasets, one for each dataType:

dataTypes <- distinct(foodlist,dataType)
foodlist_by_dataType <- list()
for (i in seq_along(dataTypes[[1]])){
  foodlist_by_dataType[[i]] <- foodlist %>% filter(dataType == dataTypes[[1]][i])
}
head(foodlist_by_dataType)
[[1]]
# A tibble: 13,950 × 12
     fdcId description     dataType publicationDate foodCode measureID
     <int> <chr>           <chr>    <chr>           <chr>    <chr>    
 1 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 2 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 3 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 4 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 5 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 6 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 7 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 8 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
 9 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
10 1999631 Almond milk, u… Foundat… 2021-10-28      <NA>     ""       
# … with 13,940 more rows, and 6 more variables: measureName <chr>,
#   value <dbl>, unitName <chr>, derivationCode <chr>,
#   derivationDescription <chr>, ndbNumber <chr>

[[2]]
# A tibble: 256,750 × 12
     fdcId description     dataType publicationDate foodCode measureID
     <int> <chr>           <chr>    <chr>           <chr>    <chr>    
 1 1104067 100 GRAND Bar   Survey … 2020-10-30      91715300 203      
 2 1104086 3 MUSKETEERS B… Survey … 2020-10-30      91726420 203      
 3 1104087 3 Musketeers T… Survey … 2020-10-30      91726425 203      
 4 1099098 Abalone, cooke… Survey … 2020-10-30      26301110 203      
 5 1099099 Abalone, flour… Survey … 2020-10-30      26301140 203      
 6 1099100 Abalone, steam… Survey … 2020-10-30      26301160 203      
 7 1102193 Adobo, with no… Survey … 2020-10-30      58137300 203      
 8 1102343 Adobo, with ri… Survey … 2020-10-30      58150530 203      
 9 1103957 Agave liquid s… Survey … 2020-10-30      91302020 203      
10 1101164 Air filled fri… Survey … 2020-10-30      53420300 203      
# … with 256,740 more rows, and 6 more variables: measureName <chr>,
#   value <dbl>, unitName <chr>, derivationCode <chr>,
#   derivationDescription <chr>, ndbNumber <chr>

[[3]]
# A tibble: 330,100 × 12
    fdcId description      dataType publicationDate foodCode measureID
    <int> <chr>            <chr>    <chr>           <chr>    <chr>    
 1 167782 Abiyuch, raw     SR Lega… 2019-04-01      <NA>     203      
 2 171687 Acerola juice, … SR Lega… 2019-04-01      <NA>     203      
 3 171686 Acerola, (west … SR Lega… 2019-04-01      <NA>     203      
 4 168061 Acorn stew (Apa… SR Lega… 2019-04-01      <NA>     203      
 5 168992 Agave, cooked (… SR Lega… 2019-04-01      <NA>     203      
 6 168993 Agave, dried (S… SR Lega… 2019-04-01      <NA>     203      
 7 169814 Agave, raw (Sou… SR Lega… 2019-04-01      <NA>     203      
 8 169823 Agutuk, fish wi… SR Lega… 2019-04-01      <NA>     203      
 9 168976 Agutuk, fish/be… SR Lega… 2019-04-01      <NA>     203      
10 168977 Agutuk, meat-ca… SR Lega… 2019-04-01      <NA>     203      
# … with 330,090 more rows, and 6 more variables: measureName <chr>,
#   value <dbl>, unitName <chr>, derivationCode <chr>,
#   derivationDescription <chr>, ndbNumber <chr>

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Markowitz (2022, Feb. 9). Data Analytics and Computational Social Science: HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomamarkowitzhw2/

BibTeX citation

@misc{markowitz2022hw2,
  author = {Markowitz, Ari},
  title = {Data Analytics and Computational Social Science: HW2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomamarkowitzhw2/},
  year = {2022}
}