Challenge 1 Quinn He

challenge_1
Author

Quinn He

Published

August 15, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xlsx ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
birds <- read_csv("_data/birds.csv")

view(birds)

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

The birds data set contains a wide range of range of entries. With the function below we can see all the column names listed. A few are hard to figure out what exactly the represent and just how important they are.

Code
colnames(birds)
 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"

It appears the data set was taken from a farm organization. The data is definitely a little messy, but makes sense on the data entry side. Each country has descending rows of chickens, ducks, and fowls from 1961 to 2018. This is mostly a bit redundant. This whole data set keeps track of the value of these three types of birds in a 60 year window. There is also a possibility this data set came from a larger set with other types of animals because the “Domain” column lists ‘Livestock’ throughout the entire data set.

Code
summary(birds)
 Domain Code           Domain            Area Code        Area          
 Length:30977       Length:30977       Min.   :   1   Length:30977      
 Class :character   Class :character   1st Qu.:  79   Class :character  
 Mode  :character   Mode  :character   Median : 156   Mode  :character  
                                       Mean   :1202                     
                                       3rd Qu.: 231                     
                                       Max.   :5504                     
                                                                        
  Element Code    Element            Item Code        Item          
 Min.   :5112   Length:30977       Min.   :1057   Length:30977      
 1st Qu.:5112   Class :character   1st Qu.:1057   Class :character  
 Median :5112   Mode  :character   Median :1068   Mode  :character  
 Mean   :5112                      Mean   :1066                     
 3rd Qu.:5112                      3rd Qu.:1072                     
 Max.   :5112                      Max.   :1083                     
                                                                    
   Year Code         Year          Unit               Value         
 Min.   :1961   Min.   :1961   Length:30977       Min.   :       0  
 1st Qu.:1976   1st Qu.:1976   Class :character   1st Qu.:     171  
 Median :1992   Median :1992   Mode  :character   Median :    1800  
 Mean   :1991   Mean   :1991                      Mean   :   99411  
 3rd Qu.:2005   3rd Qu.:2005                      3rd Qu.:   15404  
 Max.   :2018   Max.   :2018                      Max.   :23707134  
                                                  NA's   :1036      
     Flag           Flag Description  
 Length:30977       Length:30977      
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
                                      
Code
dim(birds)
[1] 30977    14
Code
head(birds)
# A tibble: 6 × 14
  Domai…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year Unit 
  <chr>   <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl> <chr>
1 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1961  1961 1000…
2 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1962  1962 1000…
3 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1963  1963 1000…
4 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1964  1964 1000…
5 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1965  1965 1000…
6 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1966  1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
#   and abbreviated variable names ¹​`Domain Code`, ²​`Area Code`,
#   ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`
# ℹ Use `colnames()` to see all variable names
Code
tail(birds)
# A tibble: 6 × 14
  Domai…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year Unit 
  <chr>   <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl> <chr>
1 QA      Live …    5504 Poly…    5112 Stocks     1068 Ducks    2013  2013 1000…
2 QA      Live …    5504 Poly…    5112 Stocks     1068 Ducks    2014  2014 1000…
3 QA      Live …    5504 Poly…    5112 Stocks     1068 Ducks    2015  2015 1000…
4 QA      Live …    5504 Poly…    5112 Stocks     1068 Ducks    2016  2016 1000…
5 QA      Live …    5504 Poly…    5112 Stocks     1068 Ducks    2017  2017 1000…
6 QA      Live …    5504 Poly…    5112 Stocks     1068 Ducks    2018  2018 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
#   and abbreviated variable names ¹​`Domain Code`, ²​`Area Code`,
#   ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`
# ℹ Use `colnames()` to see all variable names
Code
#table(birds)

ggplot(birds, mapping = aes(x = 'Year', y = 'Value'))

I’m wondering if I should use the %>% function here. I’m also having an issue with my functions because they don’t run correctly. Is it from my the “delimiter” error above? It may also be an issue with my working directory.

I commented out ‘table(birds)’ because it was giving me an error when I rendered it the function.

Code
birds %>%
  select(Item)
# A tibble: 30,977 × 1
   Item    
   <chr>   
 1 Chickens
 2 Chickens
 3 Chickens
 4 Chickens
 5 Chickens
 6 Chickens
 7 Chickens
 8 Chickens
 9 Chickens
10 Chickens
# … with 30,967 more rows
# ℹ Use `print(n = ...)` to see more rows