Code
library(tidyverse)
library(here)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Daniel Manning
January 7, 2023
First I loaded the “FAOSTAT_cattle_dairy.csv dataset. This dataset was already largely tidy, but I finished the process by removing variables that were redundant or only had one case, leaving each column with its own variable and each row as an occurrence.
# A tibble: 36,449 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
2 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1961 1961
3 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1961 1961
4 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1962 1962
5 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1962 1962
6 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1962 1962
7 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1963 1963
8 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1963 1963
9 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1963 1963
10 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1964 1964
# … with 36,439 more rows, 4 more variables: Unit <chr>, Value <dbl>,
# Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
# ¹`Domain Code`, ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
# A tibble: 36,449 × 5
Area Item Year Value `Flag Description`
<chr> <chr> <dbl> <dbl> <chr>
1 Afghanistan Milk, whole fresh cow 1961 700000 FAO estimate
2 Afghanistan Milk, whole fresh cow 1961 5000 Calculated data
3 Afghanistan Milk, whole fresh cow 1961 350000 FAO estimate
4 Afghanistan Milk, whole fresh cow 1962 700000 FAO estimate
5 Afghanistan Milk, whole fresh cow 1962 5000 Calculated data
6 Afghanistan Milk, whole fresh cow 1962 350000 FAO estimate
7 Afghanistan Milk, whole fresh cow 1963 780000 FAO estimate
8 Afghanistan Milk, whole fresh cow 1963 5128 Calculated data
9 Afghanistan Milk, whole fresh cow 1963 400000 FAO estimate
10 Afghanistan Milk, whole fresh cow 1964 780000 FAO estimate
# … with 36,439 more rows
This dataset consists of five variables with the following types: Area: string Item: string Year: double Value: double Flag Description: string
The overall dataset represents values for whole fresh cow milk in different countries across different years, as well as whether the data was estiamted/calculated.
One potential research question could investigate the difference in values with respect to the area. For example: Which area produced the most whole fresh cow milk? What country produced the least whole fresh cow milk? What was the distribution of average whole fresh cow milk production for each area?
Another research question could investigate the change in milk production with respect to year. For example: Did the production of whole fresh cow milk increase from 1961 to 2018? During which year was the most cow milk produced and during which year was the least produced?
A third research question could investigate the relationship between the method of estimating/calculating values and the values themselves. For example: Is there a difference between the average of values described as “Calculated data” and “FAO estimates”.
Lastly, this dataset could be used to investigate relationships between multiple variables, such as year and area. For example: Did some areas see an increase in production from 1961 to 2018 while others saw a decrease? Was the magnitude of change in production different across areas?
---
title: "HW 2"
author: "Daniel Manning"
description: "Reading in Data"
date: "1/7/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(here)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Tidying of Dataset
First I loaded the "FAOSTAT_cattle_dairy.csv dataset. This dataset was already largely tidy, but I finished the process by removing variables that were redundant or only had one case, leaving each column with its own variable and each row as an occurrence.
```{r}
cattle_dairy <- here("posts","_data","FAOSTAT_cattle_dairy.csv")%>%
read_csv()
cattle_dairy
cattle_dairy_new <- cattle_dairy %>%
select(-c('Domain', 'Domain Code', 'Area Code', 'Element', 'Element Code', 'Item Code', 'Year Code', 'Unit', 'Flag'))
cattle_dairy_new
```
## Narrative, Variables, and Research Question
This dataset consists of five variables with the following types:
Area: string
Item: string
Year: double
Value: double
Flag Description: string
The overall dataset represents values for whole fresh cow milk in different countries across different years, as well as whether the data was estiamted/calculated.
One potential research question could investigate the difference in values with respect to the area. For example: Which area produced the most whole fresh cow milk? What country produced the least whole fresh cow milk? What was the distribution of average whole fresh cow milk production for each area?
Another research question could investigate the change in milk production with respect to year. For example: Did the production of whole fresh cow milk increase from 1961 to 2018? During which year was the most cow milk produced and during which year was the least produced?
A third research question could investigate the relationship between the method of estimating/calculating values and the values themselves. For example: Is there a difference between the average of values described as "Calculated data" and "FAO estimates".
Lastly, this dataset could be used to investigate relationships between multiple variables, such as year and area. For example: Did some areas see an increase in production from 1961 to 2018 while others saw a decrease? Was the magnitude of change in production different across areas?