Code
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Matthew O’Neill
October 5, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
# A tibble: 6 × 14
  Domai…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year Unit 
  <chr>   <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl> <chr>
1 QL      Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961 Head 
2 QL      Lives…       2 Afgh…    5420 Yield       882 Milk…    1961  1961 hg/An
3 QL      Lives…       2 Afgh…    5510 Produc…     882 Milk…    1961  1961 tonn…
4 QL      Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1962  1962 Head 
5 QL      Lives…       2 Afgh…    5420 Yield       882 Milk…    1962  1962 hg/An
6 QL      Lives…       2 Afgh…    5510 Produc…     882 Milk…    1962  1962 tonn…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
#   and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
#   ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`[1] 36449    14 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"To begin, I’ve output the top few rows of data from our dataset to help visualize what is going on. Based on the data we see in our header(along with the name of the file alluding to working with cattle and dairy products), we can assume this dataset includes data on dairy production from various countries over many years. Some rows have a “Flag Description” of “FAO estimate”, which leads me to believe much of this data was collected by the Food and Agriculture Organization in the United States.
The column names aren’t very descriptive for this dataset, as columns such as “domain”, “item”, and “unit” are very vague. To get a better idea of what’s going on, we can dive into each column a bit more.
First, we can see that there appears to only be one domain in this dataset, Livestock Primary. This column and it’s code are likely mainly useful if the dataset is joined with another one which has the same column.
Item
Milk, whole fresh cow 
                36449 Item
Milk, whole fresh cow 
                    1 Once again, it appears all cows are being used for their milk production, which makes sense given the context of the table.
Unit
  Head  hg/An tonnes 
 12158  12121  12170 Unit
     Head     hg/An    tonnes 
0.3335620 0.3325468 0.3338912 The “unit” column appears to be three different ways to weigh a given cow. While “tonnes” is obvious, there unfortunately isn’t too much context as to what the other two are, but it could be that “head” would be a measure of how many cows a given farm has.
Year
1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 
 594  594  594  594  594  594  594  594  594  594  594  594  594  594  594  594 
1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 
 594  594  594  594  594  594  594  594  594  594  594  594  594  594  600  657 
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 
 663  664  664  664  664  664  665  666  666  666  666  666  666  669  671  669 
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 
 671  669  669  672  672  674  674  674  672  672 Overall, it appears that this dataset is a record of cattle/dairy data across many different countries over many different years. For each country/year combination, there are three entries for animal count, meat yield, and production weight.
---
title: "Challenge 1"
author: "Matthew O'Neill"
desription: "Reading in data and creating a post"
date: "10/05/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1)  read in a dataset, and
2)  describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
-   railroad_2012_clean_county.csv ⭐
-   birds.csv ⭐⭐
-   FAOstat\*.csv ⭐⭐
-   wild_bird_data.xlsx ⭐⭐⭐
-   StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
data <- read_csv("../posts/_data/FAOSTAT_cattle_dairy.csv")
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
```{r}
#| label: summary
head(data)
dim(data)
colnames(data)
```
To begin, I've output the top few rows of data from our dataset to help visualize what is going on. Based on the data we see in our header(along with the name of the file alluding to working with cattle and dairy products), we can assume this dataset includes data on dairy production from various countries over many years. Some rows have a "Flag Description" of "FAO estimate", which leads me to believe much of this data was collected by the Food and Agriculture Organization in the United States.
The column names aren't very descriptive for this dataset, as columns such as "domain", "item", and "unit" are very vague. To get a better idea of what's going on, we can dive into each column a bit more.
```{r}
domain <- select(data, "Domain")
table(domain)
```
First, we can see that there appears to only be one domain in this dataset, Livestock Primary. This column and it's code are likely mainly useful if the dataset is joined with another one which has the same column.
```{r}
item <- select(data, "Item")
table(item)
prop.table(table(item))
```
Once again, it appears all cows are being used for their milk production, which makes sense given the context of the table.
```{r}
unit <- select(data, "Unit")
table(unit)
prop.table(table(unit))
```
The "unit" column appears to be three different ways to weigh a given cow. While "tonnes" is obvious, there unfortunately isn't too much context as to what the other two are, but it could be that "head" would be a measure of how many cows a given farm has. 
```{r}
years <- select(data, "Year")
table(years)
```
Overall, it appears that this dataset is a record of cattle/dairy data across many different countries over many different years. For each country/year combination, there are three entries for animal count, meat yield, and production weight.