challenge_1
Matthew_Weiner
birds.csv
Author

Matthew Weiner

Published

March 6, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Introduction

For this challenge I chose to analyze the dataset birds.csv

First steps

When first examining this data I wanted to simply print out the results of the CSV file and also view the column names to get a general idea of the format of the file.

Code
library(readr)
birds <- read_csv("_data/birds.csv")
head(birds)
# A tibble: 6 × 14
  Domai…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year Unit 
  <chr>   <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl> <chr>
1 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1961  1961 1000…
2 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1962  1962 1000…
3 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1963  1963 1000…
4 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1964  1964 1000…
5 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1965  1965 1000…
6 QA      Live …       2 Afgh…    5112 Stocks     1057 Chic…    1966  1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
#   and abbreviated variable names ¹​`Domain Code`, ²​`Area Code`,
#   ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`
Code
colnames(birds)
 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"

These column names include things like “Year”, “Item”, “Value”, “Area” so at this point I am suspecting that the data has to do with the sale of some kind of item, most likely some sort of bird (based on the title).

We can also use the following commands to view the dimensions of the data:

Code
dim(birds)
[1] 30977    14

This shows us that there are 30977 rows in the data.

Investigating Deeper

By using the following command, we are able to generate a table which shows us the distrbution of entries by country. The results of this show us that this file contains sales from multiple countries indicating to us that the data involves international sale of birds.

Code
head(table(select(birds,Area)))
Area
   Afghanistan         Africa        Albania        Algeria American Samoa 
            58            290            232            232             58 
      Americas 
           232 

Likewise, we can use a very similar command to view the distrbution of items sold.

Code
table(select(birds,Item))
Item
              Chickens                  Ducks Geese and guinea fowls 
                 13074                   6909                   4136 
  Pigeons, other birds                Turkeys 
                  1165                   5693 

The results of this command show us that there are multiple types of birds being sold including chickens, ducks, geese, and guinea fowls, turkeys, and pigeons.

Finally we can view the range of years in which these sales took place by using the following code block.

Code
year <- select(birds,Year)
min(year)
[1] 1961
Code
max(year)
[1] 2018

This shows us that the sales have run from 1961 until 2018.

Results

Through the use of multiple R commands I was able to discover that this dataset is about the international trade of over 30000 birds from 1961 until 2018.