Challenge 2 by Jinxia Niu

Jinxia Niu

Jinxia Niu


March 14, 2023

Challenge Overview

Today’s challenge is to

  1. read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
  2. provide summary statistics for different interesting groups within the data, and interpret those statistics

Read in the Data

Read in one (or more) of the following data sets, available in the posts/_data folder, using the correct R package and command.

  • railroad*.csv or StateCounty2012.xls ⭐
  • FAOstat*.csv or birds.csv ⭐⭐⭐
  • hotel_bookings.csv ⭐⭐⭐⭐
Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.1.8
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<>) to force all conflicts to become errors
birds <- read_csv("_data/birds.csv") %>%
Rows: 30977 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Describe the data

Birds.csv that includes estimates of the stock of five different types of poultry (Chickens, Ducks, Geese and guinea fowls, Turkeys, and Pigeons/Others) for 248 areas for 58 years between 1961-2018. Estimated stocks are given in 1000 head.

Provide Grouped Summary Statistics

Conduct some exploratory data analysis, using dplyr commands such as group_by(), select(), filter(), and summarise(). Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.

birds %>%
  summarise(avg_stocks = mean(Value, na.rm = TRUE))
# A tibble: 5 × 2
  Item                   avg_stocks
  <chr>                       <dbl>
1 Chickens                   58443.
2 Ducks                       9856.
3 Geese and guinea fowls      6484.
4 Pigeons, other birds        3124.
5 Turkeys                     1924.

Explain and Interpret

Try to understand the sizes of stocks of each of the five types of poultry in the dataset and try to understand which arears/countries has how many of poultry during this time. On average, we can see that countries have far more chickens as livestock (=58.4million head) than other livestock birds.

birds %>%
  summarize(avg_stocks = mean(Value, na.rm = TRUE))
# A tibble: 209 × 2
   Area                avg_stocks
   <chr>                    <dbl>
 1 Afghanistan             6527. 
 2 Albania                 1064. 
 3 Algeria                15656. 
 4 American Samoa            41.9
 5 Angola                  8357. 
 6 Antigua and Barbuda       98.0
 7 Argentina              24878. 
 8 Armenia                 1341. 
 9 Aruba                    NaN  
10 Australia              10085. 
# … with 199 more rows