challenge_1
Jaswanth Reddy Kommuru
birds
Reading in data and creating a post
Author

Jaswanth Reddy Kommuru

Published

May 8, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
 library(readxl)
 birds<-read_csv("~/Documents/601/601_Spring_2023/posts/_data/birds.csv")
 
 birds_skip <- read_csv("~/Documents/601/601_Spring_2023/posts/_data/birds.csv",skip=4)

I read the data from the csv file twice in the first step I read the whole csv file and in the second read I skipped the first 4 rows of the dataset.

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

The data appears to represent some sort of agricultural or livestock-related information. The rows appear to contain data entries for different years, areas, elements, items, and their corresponding values. Overall, the data seems to capture information related to live animal stocks, particularly chickens, in various countries over specific years.

Code
 head(birds)
# A tibble: 6 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
2 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
3 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
4 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
5 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
6 QA            Live Anima…           2 Afgh…           5112 Stocks         1057
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>

Having a look at the first 6 rows of the dataset to have an idea of what kind of data is available.

Code
 dim(birds)
[1] 30977    14

getting to know the dimensions of the dataset.

Code
 colnames(birds)
 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"

The column names of the dataset.

Code
 filter(birds, `Year`==1968 & `Area`=="Africa")
# A tibble: 5 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QA            Live Anima…        5100 Afri…           5112 Stocks         1057
2 QA            Live Anima…        5100 Afri…           5112 Stocks         1068
3 QA            Live Anima…        5100 Afri…           5112 Stocks         1072
4 QA            Live Anima…        5100 Afri…           5112 Stocks         1083
5 QA            Live Anima…        5100 Afri…           5112 Stocks         1079
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>

The birds which have the year as 1968 and the country as Africa.

Code
 birds%>%
  select("Area") %>%
  distinct(.)
# A tibble: 248 × 1
   Area               
   <chr>              
 1 Afghanistan        
 2 Albania            
 3 Algeria            
 4 American Samoa     
 5 Angola             
 6 Antigua and Barbuda
 7 Argentina          
 8 Armenia            
 9 Aruba              
10 Australia          
# ℹ 238 more rows

The distinct area names where the birds live.

Code
 birds%>%
  select("Area") %>%
  n_distinct(.)
[1] 248

The count distinct area names where the birds live.

Code
 birds.sm<-birds%>%
  select(-contains("Code"))
 birds.sm
# A tibble: 30,977 × 9
   Domain       Area    Element Item   Year Unit  Value Flag  `Flag Description`
   <chr>        <chr>   <chr>   <chr> <dbl> <chr> <dbl> <chr> <chr>             
 1 Live Animals Afghan… Stocks  Chic…  1961 1000…  4700 F     FAO estimate      
 2 Live Animals Afghan… Stocks  Chic…  1962 1000…  4900 F     FAO estimate      
 3 Live Animals Afghan… Stocks  Chic…  1963 1000…  5000 F     FAO estimate      
 4 Live Animals Afghan… Stocks  Chic…  1964 1000…  5300 F     FAO estimate      
 5 Live Animals Afghan… Stocks  Chic…  1965 1000…  5500 F     FAO estimate      
 6 Live Animals Afghan… Stocks  Chic…  1966 1000…  5800 F     FAO estimate      
 7 Live Animals Afghan… Stocks  Chic…  1967 1000…  6600 F     FAO estimate      
 8 Live Animals Afghan… Stocks  Chic…  1968 1000…  6290 <NA>  Official data     
 9 Live Animals Afghan… Stocks  Chic…  1969 1000…  6300 F     FAO estimate      
10 Live Animals Afghan… Stocks  Chic…  1970 1000…  6000 F     FAO estimate      
# ℹ 30,967 more rows

The columns which doesn’t have the word “Code” in their column name so that we can use them to group the data.

Code
 column_with_na<-birds %>%
  select_if(~ any(is.na(.))) %>%
  names()
column_with_na
[1] "Value" "Flag" 

Getting to know which columns are having atleast one NA value.

Code
 summary(birds)
 Domain Code           Domain            Area Code        Area          
 Length:30977       Length:30977       Min.   :   1   Length:30977      
 Class :character   Class :character   1st Qu.:  79   Class :character  
 Mode  :character   Mode  :character   Median : 156   Mode  :character  
                                       Mean   :1202                     
                                       3rd Qu.: 231                     
                                       Max.   :5504                     
                                                                        
  Element Code    Element            Item Code        Item          
 Min.   :5112   Length:30977       Min.   :1057   Length:30977      
 1st Qu.:5112   Class :character   1st Qu.:1057   Class :character  
 Median :5112   Mode  :character   Median :1068   Mode  :character  
 Mean   :5112                      Mean   :1066                     
 3rd Qu.:5112                      3rd Qu.:1072                     
 Max.   :5112                      Max.   :1083                     
                                                                    
   Year Code         Year          Unit               Value         
 Min.   :1961   Min.   :1961   Length:30977       Min.   :       0  
 1st Qu.:1976   1st Qu.:1976   Class :character   1st Qu.:     171  
 Median :1992   Median :1992   Mode  :character   Median :    1800  
 Mean   :1991   Mean   :1991                      Mean   :   99411  
 3rd Qu.:2005   3rd Qu.:2005                      3rd Qu.:   15404  
 Max.   :2018   Max.   :2018                      Max.   :23707134  
                                                  NA's   :1036      
     Flag           Flag Description  
 Length:30977       Length:30977      
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
                                      

A brief summary of the dataset.