challenge_1
railroads
faostat
wildbirds
Reading in data and creating a post
Author

Noah Dixon

Published

June 2, 2023

Setup

Code
library(tidyverse)
library(dplyr)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Part 1: Read in the Data

Using the read_csv function we can load the birds.csv data from the file.

Code
birds_from_csv <- read_csv("_data/birds.csv")

Part 2: Describe the data

Using the dim function we can see the dimensions of the data.

Code
dim(birds_from_csv)
[1] 30977    14

We can see that there are 14 columns and 30977 rows in the data set. Now, using the colnames and spec functions, we can see the names and data types of each of the 14 columns.

Code
colnames(birds_from_csv)
 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"
Code
spec(birds_from_csv)
cols(
  `Domain Code` = col_character(),
  Domain = col_character(),
  `Area Code` = col_double(),
  Area = col_character(),
  `Element Code` = col_double(),
  Element = col_character(),
  `Item Code` = col_double(),
  Item = col_character(),
  `Year Code` = col_double(),
  Year = col_double(),
  Unit = col_character(),
  Value = col_double(),
  Flag = col_character(),
  `Flag Description` = col_character()
)

In order to get a better sense of what the data in these columns looks like, we can print the first 6 rows of the data using the head function.

Code
head(birds_from_csv)
# A tibble: 6 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
2 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
3 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
4 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
5 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
6 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>

We can see that each of the first 6 rows have data from the Area Afghanistan. Using the distinct and select functions, lets see a full list of all the Areas for this data

Code
distinct(select(birds_from_csv, "Area"))
# A tibble: 248 × 1
   Area               
   <chr>              
 1 Afghanistan        
 2 Albania            
 3 Algeria            
 4 American Samoa     
 5 Angola             
 6 Antigua and Barbuda
 7 Argentina          
 8 Armenia            
 9 Aruba              
10 Australia          
# ℹ 238 more rows

We can see that the full list of Areas is extensive, and we can infer that this data was collected from all around the world. Lets do some more select statements to get a better understanding of the data.

Code
distinct(select(birds_from_csv, "Item"))
# A tibble: 5 × 1
  Item                  
  <chr>                 
1 Chickens              
2 Ducks                 
3 Geese and guinea fowls
4 Turkeys               
5 Pigeons, other birds  
Code
distinct(select(birds_from_csv, "Year"))
# A tibble: 58 × 1
    Year
   <dbl>
 1  1961
 2  1962
 3  1963
 4  1964
 5  1965
 6  1966
 7  1967
 8  1968
 9  1969
10  1970
# ℹ 48 more rows
Code
distinct(select(birds_from_csv, "Element"))
# A tibble: 1 × 1
  Element
  <chr>  
1 Stocks 
Code
distinct(select(birds_from_csv, "Unit"))
# A tibble: 1 × 1
  Unit     
  <chr>    
1 1000 Head

From these results we can see that the data set contains the number of “Stocks” of birds in “1000 Head” units for chickens, ducks, geese & guinea fowls, turkeys, and pigeons & other birds for areas all around the world from 1961-2018. Each record contains data specific to a bird type, area, and year.