Challenge 2

challenge_2

railroads

faostat

hotel_bookings

Author

Jack Sniezek

Published

November 29, 2022

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
provide summary statistics for different interesting groups within the data, and interpret those statistics

Read in the Data

Read in one (or more) of the following data sets, available in the posts/_data folder, using the correct R package and command.

railroad*.csv or StateCounty2012.xls ⭐
FAOstat*.csv or birds.csv ⭐⭐⭐
hotel_bookings.csv ⭐⭐⭐⭐

Code

birds <- read_csv("_data/birds.csv")%>%
  select(-c(contains("Code"), Element, Domain, Unit))
birds

# A tibble: 30,977 × 6
   Area        Item      Year Value Flag  `Flag Description`
   <chr>       <chr>    <dbl> <dbl> <chr> <chr>             
 1 Afghanistan Chickens  1961  4700 F     FAO estimate      
 2 Afghanistan Chickens  1962  4900 F     FAO estimate      
 3 Afghanistan Chickens  1963  5000 F     FAO estimate      
 4 Afghanistan Chickens  1964  5300 F     FAO estimate      
 5 Afghanistan Chickens  1965  5500 F     FAO estimate      
 6 Afghanistan Chickens  1966  5800 F     FAO estimate      
 7 Afghanistan Chickens  1967  6600 F     FAO estimate      
 8 Afghanistan Chickens  1968  6290 <NA>  Official data     
 9 Afghanistan Chickens  1969  6300 F     FAO estimate      
10 Afghanistan Chickens  1970  6000 F     FAO estimate      
# … with 30,967 more rows

Code

summary(birds)

     Area               Item                Year          Value         
 Length:30977       Length:30977       Min.   :1961   Min.   :       0  
 Class :character   Class :character   1st Qu.:1976   1st Qu.:     171  
 Mode  :character   Mode  :character   Median :1992   Median :    1800  
                                       Mean   :1991   Mean   :   99411  
                                       3rd Qu.:2005   3rd Qu.:   15404  
                                       Max.   :2018   Max.   :23707134  
                                                      NA's   :1036      
     Flag           Flag Description  
 Length:30977       Length:30977      
 Class :character   Class :character  
 Mode  :character   Mode  :character

Describe the data

The birds dataset contained 14 variables, 8 of which are character variables and 6 are numeric variables. It was collected by the Food and Agriculture Association of the United Nations. This dataset features estimates of five types of bird(Chickens, Ducks, Geese and fowls, Turkeys, and Pigeons/Other birds) in 248 regions. The data was collected from 1961-2018.

Reading in the data, I chose to omit Element, Domain, and Unit as they are the same for every data point. I also eliminated all of the “Code” variables, as they are either redundant, or not useful to work with.

Code

Area <- select(birds,"Area")
num_areas <- unique(Area)
num_areas

# A tibble: 248 × 1
   Area               
   <chr>              
 1 Afghanistan        
 2 Albania            
 3 Algeria            
 4 American Samoa     
 5 Angola             
 6 Antigua and Barbuda
 7 Argentina          
 8 Armenia            
 9 Aruba              
10 Australia          
# … with 238 more rows

Code

Item <- select(birds,"Item")
num_items <- unique(Item)
num_items

# A tibble: 5 × 1
  Item                  
  <chr>                 
1 Chickens              
2 Ducks                 
3 Geese and guinea fowls
4 Turkeys               
5 Pigeons, other birds

Provide Grouped Summary Statistics

I started my analysis of the birds dataset by taking a look at the average and median stock values by year.

Code

birds%>%
    group_by(Year)%>%
     summarise(avg_stocks = mean(Value, na.rm=TRUE),
               med_stocks = median(Value, na.rm=TRUE))

# A tibble: 58 × 3
    Year avg_stocks med_stocks
   <dbl>      <dbl>      <dbl>
 1  1961     36752.      1033 
 2  1962     37787.      1014 
 3  1963     38736.      1106 
 4  1964     39325.      1103 
 5  1965     40334.      1104 
 6  1966     41229.      1088.
 7  1967     43240.      1193 
 8  1968     44420.      1252.
 9  1969     45607.      1267 
10  1970     47706.      1259 
# … with 48 more rows

While this was helpful in showing a general trend for the data over the 58 years, it was very basic. The next step I took was to show the average of each Item(type of bird) across each year. I dropped the median because I felt focusing on average would provide more information.

Code

t1<-birds%>%
     group_by(Item,Year)%>%
     summarise(avg_stocks = mean(Value, na.rm=TRUE))%>%
     pivot_wider(names_from = Year, values_from = (avg_stocks))
t1

# A tibble: 5 × 59
# Groups:   Item [5]
  Item     `1961` `1962` `1963` `1964` `1965` `1966` `1967` `1968` `1969` `1970`
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Chickens 74060. 76753. 78922. 80213. 82458. 83880. 88047. 91003. 94121. 98297.
2 Ducks     7232.  7520.  7861.  8082.  8329.  8592.  8861.  9166.  9257.  9493.
3 Geese a…  2364.  2435.  2483.  2641.  2808.  2870.  3099.  3245.  3331.  3465.
4 Pigeons…  3307.  3771.  4004.  4227.  4440.  4630.  4673.  2840.  2978.  3110.
5 Turkeys  10610.  9043   8377.  7987.  7938.  8546.  8931.  7959.  7998.  9062.
# … with 48 more variables: `1971` <dbl>, `1972` <dbl>, `1973` <dbl>,
#   `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>,
#   `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>,
#   `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>,
#   `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>,
#   `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>,
#   `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, …

Finally, I wanted to try to focus on a singular Area for the table, so naturally I chose to filter the Area by ‘Americas’ which had some of the largest numbers and is ugly to look at in the rendering. However, it was a very complete data point to focus on so it works out.

Code

t2<-birds%>%
     filter(Area == "Americas")%>%
     group_by(Item,Year)%>%
     summarise(avg_stocks = mean(Value, na.rm=TRUE))%>%
     pivot_wider(names_from = Year, values_from = (avg_stocks))
t2

# A tibble: 4 × 59
# Groups:   Item [4]
  Item     `1961` `1962` `1963` `1964` `1965` `1966` `1967` `1968` `1969` `1970`
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Chickens 1.19e6 1.22e6 1.24e6 1.29e6 1.33e6 1.37e6 1.43e6 1.47e6 1.53e6 1.56e6
2 Ducks    9.64e3 9.99e3 1.07e4 1.10e4 1.13e4 1.19e4 1.18e4 1.20e4 1.20e4 1.21e4
3 Geese a… 5.53e2 5.61e2 5.95e2 6.07e2 6.18e2 6.43e2 5.95e2 6.23e2 6.59e2 6.65e2
4 Turkeys  1.19e5 1.03e5 1.05e5 1.13e5 1.18e5 1.30e5 1.39e5 1.20e5 1.20e5 1.31e5
# … with 48 more variables: `1971` <dbl>, `1972` <dbl>, `1973` <dbl>,
#   `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>,
#   `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>,
#   `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>,
#   `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>,
#   `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>,
#   `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, …

Explain and Interpret

Taking a look at my initial analysis of the average stock values by year, I can see that the stock values increase over time. When I divided the stock values by bird type, I could see that Chickens, Ducks, and Geese have increased steadily almost every year until plateauing in the 2010s. Pigeons peaked in the 1990s and then have leveled out ever since. Turkeys have been hovering around the same since 1980. When I further narrowed down to just the Americas, I noticed that there are no pigeons. Chickens grew steadily each year. Ducks and Turkeys plateaued around 1990. Geese experienced a peak in 1988-1989, and then dropped significantly, and then leveled off.