Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Matthew O’Neill
October 5, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961 Head
2 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1961 1961 hg/An
3 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1961 1961 tonn…
4 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1962 1962 Head
5 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1962 1962 hg/An
6 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1962 1962 tonn…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
[1] 36449 14
[1] "Domain Code" "Domain" "Area Code" "Area"
[5] "Element Code" "Element" "Item Code" "Item"
[9] "Year Code" "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
To begin, I’ve output the top few rows of data from our dataset to help visualize what is going on. Based on the data we see in our header(along with the name of the file alluding to working with cattle and dairy products), we can assume this dataset includes data on dairy production from various countries over many years. Some rows have a “Flag Description” of “FAO estimate”, which leads me to believe much of this data was collected by the Food and Agriculture Organization in the United States.
The column names aren’t very descriptive for this dataset, as columns such as “domain”, “item”, and “unit” are very vague. To get a better idea of what’s going on, we can dive into each column a bit more.
First, we can see that there appears to only be one domain in this dataset, Livestock Primary. This column and it’s code are likely mainly useful if the dataset is joined with another one which has the same column.
Item
Milk, whole fresh cow
36449
Item
Milk, whole fresh cow
1
Once again, it appears all cows are being used for their milk production, which makes sense given the context of the table.
Unit
Head hg/An tonnes
12158 12121 12170
Unit
Head hg/An tonnes
0.3335620 0.3325468 0.3338912
The “unit” column appears to be three different ways to weigh a given cow. While “tonnes” is obvious, there unfortunately isn’t too much context as to what the other two are, but it could be that “head” would be a measure of how many cows a given farm has.
Year
1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976
594 594 594 594 594 594 594 594 594 594 594 594 594 594 594 594
1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
594 594 594 594 594 594 594 594 594 594 594 594 594 594 600 657
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
663 664 664 664 664 664 665 666 666 666 666 666 666 669 671 669
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
671 669 669 672 672 674 674 674 672 672
Overall, it appears that this dataset is a record of cattle/dairy data across many different countries over many different years. For each country/year combination, there are three entries for animal count, meat yield, and production weight.
---
title: "Challenge 1"
author: "Matthew O'Neill"
desription: "Reading in data and creating a post"
date: "10/05/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
data <- read_csv("../posts/_data/FAOSTAT_cattle_dairy.csv")
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
```{r}
#| label: summary
head(data)
dim(data)
colnames(data)
```
To begin, I've output the top few rows of data from our dataset to help visualize what is going on. Based on the data we see in our header(along with the name of the file alluding to working with cattle and dairy products), we can assume this dataset includes data on dairy production from various countries over many years. Some rows have a "Flag Description" of "FAO estimate", which leads me to believe much of this data was collected by the Food and Agriculture Organization in the United States.
The column names aren't very descriptive for this dataset, as columns such as "domain", "item", and "unit" are very vague. To get a better idea of what's going on, we can dive into each column a bit more.
```{r}
domain <- select(data, "Domain")
table(domain)
```
First, we can see that there appears to only be one domain in this dataset, Livestock Primary. This column and it's code are likely mainly useful if the dataset is joined with another one which has the same column.
```{r}
item <- select(data, "Item")
table(item)
prop.table(table(item))
```
Once again, it appears all cows are being used for their milk production, which makes sense given the context of the table.
```{r}
unit <- select(data, "Unit")
table(unit)
prop.table(table(unit))
```
The "unit" column appears to be three different ways to weigh a given cow. While "tonnes" is obvious, there unfortunately isn't too much context as to what the other two are, but it could be that "head" would be a measure of how many cows a given farm has.
```{r}
years <- select(data, "Year")
table(years)
```
Overall, it appears that this dataset is a record of cattle/dairy data across many different countries over many different years. For each country/year combination, there are three entries for animal count, meat yield, and production weight.