Challenge 1

challenge_1

railroads

faostat

wildbirds

Author

Matthew O’Neill

Published

October 5, 2022

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

railroad_2012_clean_county.csv ⭐
birds.csv ⭐⭐
FAOstat*.csv ⭐⭐
wild_bird_data.xlsx ⭐⭐⭐
StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code

data <- read_csv("../posts/_data/FAOSTAT_cattle_dairy.csv")

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

Code

head(data)

# A tibble: 6 × 14
  Domai…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year Unit 
  <chr>   <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl> <chr>
1 QL      Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961 Head 
2 QL      Lives…       2 Afgh…    5420 Yield       882 Milk…    1961  1961 hg/An
3 QL      Lives…       2 Afgh…    5510 Produc…     882 Milk…    1961  1961 tonn…
4 QL      Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1962  1962 Head 
5 QL      Lives…       2 Afgh…    5420 Yield       882 Milk…    1962  1962 hg/An
6 QL      Lives…       2 Afgh…    5510 Produc…     882 Milk…    1962  1962 tonn…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
#   and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
#   ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`

Code

dim(data)

[1] 36449    14

Code

colnames(data)

 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"

To begin, I’ve output the top few rows of data from our dataset to help visualize what is going on. Based on the data we see in our header(along with the name of the file alluding to working with cattle and dairy products), we can assume this dataset includes data on dairy production from various countries over many years. Some rows have a “Flag Description” of “FAO estimate”, which leads me to believe much of this data was collected by the Food and Agriculture Organization in the United States.

The column names aren’t very descriptive for this dataset, as columns such as “domain”, “item”, and “unit” are very vague. To get a better idea of what’s going on, we can dive into each column a bit more.

Code

domain <- select(data, "Domain")
table(domain)

Domain
Livestock Primary 
            36449

First, we can see that there appears to only be one domain in this dataset, Livestock Primary. This column and it’s code are likely mainly useful if the dataset is joined with another one which has the same column.

Code

item <- select(data, "Item")
table(item)

Item
Milk, whole fresh cow 
                36449

Code

prop.table(table(item))

Item
Milk, whole fresh cow 
                    1

Once again, it appears all cows are being used for their milk production, which makes sense given the context of the table.

Code

unit <- select(data, "Unit")
table(unit)

Unit
  Head  hg/An tonnes 
 12158  12121  12170

Code

prop.table(table(unit))

Unit
     Head     hg/An    tonnes 
0.3335620 0.3325468 0.3338912

The “unit” column appears to be three different ways to weigh a given cow. While “tonnes” is obvious, there unfortunately isn’t too much context as to what the other two are, but it could be that “head” would be a measure of how many cows a given farm has.

Code

years <- select(data, "Year")
table(years)

Year
1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 
 594  594  594  594  594  594  594  594  594  594  594  594  594  594  594  594 
1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 
 594  594  594  594  594  594  594  594  594  594  594  594  594  594  600  657 
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 
 663  664  664  664  664  664  665  666  666  666  666  666  666  669  671  669 
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 
 671  669  669  672  672  674  674  674  672  672

Overall, it appears that this dataset is a record of cattle/dairy data across many different countries over many different years. For each country/year combination, there are three entries for animal count, meat yield, and production weight.