Exploring and Analysing the Birds Dataset

challenge_1
birds
hw1
wildbirds
shantanu patil
Author

Shantanu Patil

Published

February 23, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Reading in the bird data

I loaded the library readr using command library(readr)

The birds.csv file has 30977 Rows and 14 Columns. I have used head function to load the column headers and first 3 rows.

Code
library(readr)
birds_data <- read.csv(file = "_data/birds.csv")

head(birds_data, 3)
  Domain.Code       Domain Area.Code        Area Element.Code Element Item.Code
1          QA Live Animals         2 Afghanistan         5112  Stocks      1057
2          QA Live Animals         2 Afghanistan         5112  Stocks      1057
3          QA Live Animals         2 Afghanistan         5112  Stocks      1057
      Item Year.Code Year      Unit Value Flag Flag.Description
1 Chickens      1961 1961 1000 Head  4700    F     FAO estimate
2 Chickens      1962 1962 1000 Head  4900    F     FAO estimate
3 Chickens      1963 1963 1000 Head  5000    F     FAO estimate
Code
# a data set created with only numeric values skip header
bird_data2 <- read_csv(file = "_data/birds.csv", skip=1)
head(bird_data2, 1)
# A tibble: 1 × 14
  QA    Live Animal…¹   `2` Afgha…² `5112` Stocks `1057` Chick…³ 1961.…⁴ 1961.…⁵
  <chr> <chr>         <dbl> <chr>    <dbl> <chr>   <dbl> <chr>     <dbl>   <dbl>
1 QA    Live Animals      2 Afghan…   5112 Stocks   1057 Chicke…    1962    1962
# … with 4 more variables: `1000 Head` <chr>, `4700` <dbl>, F <chr>,
#   `FAO estimate` <chr>, and abbreviated variable names ¹​`Live Animals`,
#   ²​Afghanistan, ³​Chickens, ⁴​`1961...9`, ⁵​`1961...10`

Describe the data

We can see that the bird data is made of 14 columns of which 8 are of character data type and the remaining are of int data type. To find out what are the column names we can use colnames function. The data gathered has information about Domain, Area, Element, Item, Year, Unit, Value, Flag, Flag.Description.

Code
str(birds_data)
'data.frame':   30977 obs. of  14 variables:
 $ Domain.Code     : chr  "QA" "QA" "QA" "QA" ...
 $ Domain          : chr  "Live Animals" "Live Animals" "Live Animals" "Live Animals" ...
 $ Area.Code       : int  2 2 2 2 2 2 2 2 2 2 ...
 $ Area            : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 $ Element.Code    : int  5112 5112 5112 5112 5112 5112 5112 5112 5112 5112 ...
 $ Element         : chr  "Stocks" "Stocks" "Stocks" "Stocks" ...
 $ Item.Code       : int  1057 1057 1057 1057 1057 1057 1057 1057 1057 1057 ...
 $ Item            : chr  "Chickens" "Chickens" "Chickens" "Chickens" ...
 $ Year.Code       : int  1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
 $ Year            : int  1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
 $ Unit            : chr  "1000 Head" "1000 Head" "1000 Head" "1000 Head" ...
 $ Value           : int  4700 4900 5000 5300 5500 5800 6600 6290 6300 6000 ...
 $ Flag            : chr  "F" "F" "F" "F" ...
 $ Flag.Description: chr  "FAO estimate" "FAO estimate" "FAO estimate" "FAO estimate" ...
Code
colnames(birds_data)
 [1] "Domain.Code"      "Domain"           "Area.Code"        "Area"            
 [5] "Element.Code"     "Element"          "Item.Code"        "Item"            
 [9] "Year.Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag.Description"

Finding Out the start and end date from when the data was collected.

We can see that the data was collected from 1961 to 2018.

Code
max(birds_data$Year)
[1] 2018
Code
min(birds_data$Year)
[1] 1961