Challenge8_MarcelaRobinson

Author

Marcela Robinson

library(here)
here() starts at /home/runner/work/601_Winter_2022-2023/601_Winter_2022-2023
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2
──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
#Read the first dataset
cattle<- here("posts", "_data", "FAOSTAT_cattle_dairy.csv")%>%
read_csv()
Rows: 36449 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Read second dataset 
chicken<-here("posts", "_data", "FAOSTAT_egg_chicken.csv")%>%
read_csv()
Rows: 38170 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

For this challenge, I’ll be joining two sets of data. Both sets of data are provided by the Food and Agriculture Organization of the United Nations. The first set, cattle, represent the vaues of livestock (Milk Animals, Yield, or Production) in different areas of the world over the years. Other variables include unit, flag, flag description, and different codes.

The second set of data that I’ll be joining, chicken, contains values of chickens (Laying, Yield, or Production) in different areas of the world over the years.

#Join both datasets 
joined<-bind_rows(cattle, chicken)
print(joined)
# A tibble: 74,619 × 14
   Domain Cod…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year
   <chr>        <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl>
 1 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 2 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1961  1961
 3 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1961  1961
 4 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1962  1962
 5 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1962  1962
 6 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1962  1962
 7 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1963  1963
 8 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1963  1963
 9 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1963  1963
10 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1964  1964
# … with 74,609 more rows, 4 more variables: Unit <chr>, Value <dbl>,
#   Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
#   ¹​`Domain Code`, ²​`Area Code`, ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`

I’m using the bind_rows function since both datasets have the same variables. As I result, my new dataset called joined has 74619 observations and 14 variables.

Clean the data

#Remove unnecessary columns 
joined%>%
  select(-c("Domain Code", "Domain", "Area Code","Element Code", "Flag", "Flag Description", "Year Code", "Item Code"))
# A tibble: 74,619 × 6
   Area        Element      Item                   Year Unit    Value
   <chr>       <chr>        <chr>                 <dbl> <chr>   <dbl>
 1 Afghanistan Milk Animals Milk, whole fresh cow  1961 Head   700000
 2 Afghanistan Yield        Milk, whole fresh cow  1961 hg/An    5000
 3 Afghanistan Production   Milk, whole fresh cow  1961 tonnes 350000
 4 Afghanistan Milk Animals Milk, whole fresh cow  1962 Head   700000
 5 Afghanistan Yield        Milk, whole fresh cow  1962 hg/An    5000
 6 Afghanistan Production   Milk, whole fresh cow  1962 tonnes 350000
 7 Afghanistan Milk Animals Milk, whole fresh cow  1963 Head   780000
 8 Afghanistan Yield        Milk, whole fresh cow  1963 hg/An    5128
 9 Afghanistan Production   Milk, whole fresh cow  1963 tonnes 400000
10 Afghanistan Milk Animals Milk, whole fresh cow  1964 Head   780000
# … with 74,609 more rows
print(joined)
# A tibble: 74,619 × 14
   Domain Cod…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year
   <chr>        <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl>
 1 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 2 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1961  1961
 3 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1961  1961
 4 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1962  1962
 5 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1962  1962
 6 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1962  1962
 7 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1963  1963
 8 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1963  1963
 9 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1963  1963
10 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1964  1964
# … with 74,609 more rows, 4 more variables: Unit <chr>, Value <dbl>,
#   Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
#   ¹​`Domain Code`, ²​`Area Code`, ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`