challenge_1
Author

Lai Wei

Published

August 22, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xlsx ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
library(readxl)
FAOstat <- read_csv("_data/FAOSTAT_country_groups.csv")
FAOstat
# A tibble: 1,943 × 7
   `Country Group Code` `Country Group` Countr…¹ Country M49 C…² ISO2 …³ ISO3 …⁴
                  <dbl> <chr>              <dbl> <chr>   <chr>   <chr>   <chr>  
 1                 5100 Africa                 4 Algeria 012     DZ      DZA    
 2                 5100 Africa                 7 Angola  024     AO      AGO    
 3                 5100 Africa                53 Benin   204     BJ      BEN    
 4                 5100 Africa                20 Botswa… 072     BW      BWA    
 5                 5100 Africa               233 Burkin… 854     BF      BFA    
 6                 5100 Africa                29 Burundi 108     BI      BDI    
 7                 5100 Africa                35 Cabo V… 132     CV      CPV    
 8                 5100 Africa                32 Camero… 120     CM      CMR    
 9                 5100 Africa                37 Centra… 140     CF      CAF    
10                 5100 Africa                39 Chad    148     TD      TCD    
# … with 1,933 more rows, and abbreviated variable names ¹​`Country Code`,
#   ²​`M49 Code`, ³​`ISO2 Code`, ⁴​`ISO3 Code`
# ℹ Use `print(n = ...)` to see more rows

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

Code
as_tibble(FAOstat)
# A tibble: 1,943 × 7
   `Country Group Code` `Country Group` Countr…¹ Country M49 C…² ISO2 …³ ISO3 …⁴
                  <dbl> <chr>              <dbl> <chr>   <chr>   <chr>   <chr>  
 1                 5100 Africa                 4 Algeria 012     DZ      DZA    
 2                 5100 Africa                 7 Angola  024     AO      AGO    
 3                 5100 Africa                53 Benin   204     BJ      BEN    
 4                 5100 Africa                20 Botswa… 072     BW      BWA    
 5                 5100 Africa               233 Burkin… 854     BF      BFA    
 6                 5100 Africa                29 Burundi 108     BI      BDI    
 7                 5100 Africa                35 Cabo V… 132     CV      CPV    
 8                 5100 Africa                32 Camero… 120     CM      CMR    
 9                 5100 Africa                37 Centra… 140     CF      CAF    
10                 5100 Africa                39 Chad    148     TD      TCD    
# … with 1,933 more rows, and abbreviated variable names ¹​`Country Code`,
#   ²​`M49 Code`, ³​`ISO2 Code`, ⁴​`ISO3 Code`
# ℹ Use `print(n = ...)` to see more rows
Code
#display the 5th row
FAOstat[5,]
# A tibble: 1 × 7
  `Country Group Code` `Country Group` Country…¹ Country M49 C…² ISO2 …³ ISO3 …⁴
                 <dbl> <chr>               <dbl> <chr>   <chr>   <chr>   <chr>  
1                 5100 Africa                233 Burkin… 854     BF      BFA    
# … with abbreviated variable names ¹​`Country Code`, ²​`M49 Code`, ³​`ISO2 Code`,
#   ⁴​`ISO3 Code`
Code
#preview the first six rows of data
head(FAOstat)
# A tibble: 6 × 7
  `Country Group Code` `Country Group` Country…¹ Country M49 C…² ISO2 …³ ISO3 …⁴
                 <dbl> <chr>               <dbl> <chr>   <chr>   <chr>   <chr>  
1                 5100 Africa                  4 Algeria 012     DZ      DZA    
2                 5100 Africa                  7 Angola  024     AO      AGO    
3                 5100 Africa                 53 Benin   204     BJ      BEN    
4                 5100 Africa                 20 Botswa… 072     BW      BWA    
5                 5100 Africa                233 Burkin… 854     BF      BFA    
6                 5100 Africa                 29 Burundi 108     BI      BDI    
# … with abbreviated variable names ¹​`Country Code`, ²​`M49 Code`, ³​`ISO2 Code`,
#   ⁴​`ISO3 Code`
Code
#get the dimension of FAOstat-country
dim(FAOstat)
[1] 1943    7
Code
#get the col names of FAOstat-country
colnames(FAOstat)
[1] "Country Group Code" "Country Group"      "Country Code"      
[4] "Country"            "M49 Code"           "ISO2 Code"         
[7] "ISO3 Code"         
Code
#Select Country name out of FAOstat
select(FAOstat,"Country")
# A tibble: 1,943 × 1
   Country                 
   <chr>                   
 1 Algeria                 
 2 Angola                  
 3 Benin                   
 4 Botswana                
 5 Burkina Faso            
 6 Burundi                 
 7 Cabo Verde              
 8 Cameroon                
 9 Central African Republic
10 Chad                    
# … with 1,933 more rows
# ℹ Use `print(n = ...)` to see more rows