::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Theresa Szczepanski
September 19, 2022
Today’s challenge is to
Data Source
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
Domain Code Domain Area Code Area
Length:30977 Length:30977 Min. : 1 Length:30977
Class :character Class :character 1st Qu.: 79 Class :character
Mode :character Mode :character Median : 156 Mode :character
Mean :1202
3rd Qu.: 231
Max. :5504
Element Code Element Item Code Item
Min. :5112 Length:30977 Min. :1057 Length:30977
1st Qu.:5112 Class :character 1st Qu.:1057 Class :character
Median :5112 Mode :character Median :1068 Mode :character
Mean :5112 Mean :1066
3rd Qu.:5112 3rd Qu.:1072
Max. :5112 Max. :1083
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:30977 Min. : 0
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 171
Median :1992 Median :1992 Mode :character Median : 1800
Mean :1991 Mean :1991 Mean : 99411
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 15404
Max. :2018 Max. :2018 Max. :23707134
NA's :1036
Flag Flag Description
Length:30977 Length:30977
Class :character Class :character
Mode :character Mode :character
The birds
data consists of 8 variables of character type and 6 variables of double type. The data seem to describe the population of stock
or domesticated fowl in regions of the world for given years between 1961 and 2018. The character variables have an associated numeric code variable.
, Element
: For this data set all of the cases have the same Domain
and Domain Code
representing “live animals” and the same Element
and Element Code
representing stocks
. The term stock
seems to indicate that the animals represent domesticated stock rather than wild fowl.
# A tibble: 1 × 2
`Domain Code` Domain
<chr> <chr>
1 QA Live Animals
# A tibble: 1 × 2
`Element Code` Element
<dbl> <chr>
1 5112 Stocks
: For this data set, all of the observations are of items of type chicken, duck, geese and guinea fowls, turkeys, or pigeons/other birds.
# A tibble: 5 × 2
`Item Code` Item
<dbl> <chr>
1 1057 Chickens
2 1068 Ducks
3 1072 Geese and guinea fowls
4 1079 Turkeys
5 1083 Pigeons, other birds
consists of 248 entries. Notably, the entries with values less than 5000 represent countries of the world. The numeric codes correspond with the the alphabetical order of the country names. The remaining codes greater than 5000, correspond to regions of the world rather than a specific country. In these cases, regions with numbers closer in value seem to have closer geographic proximity. It should be noted that there is a value for Europe as well as a value for Eastern Europe and Western Europe, so there are regions that are represented in multiple cases of these entries.
# A tibble: 248 × 2
`Area Code` Area
<dbl> <chr>
1 2 Afghanistan
2 3 Albania
3 4 Algeria
4 5 American Samoa
5 7 Angola
6 8 Antigua and Barbuda
7 9 Argentina
8 1 Armenia
9 22 Aruba
10 10 Australia
# … with 238 more rows
# A tibble: 248 × 2
`Area Code` Area
<dbl> <chr>
1 1 Armenia
2 2 Afghanistan
3 3 Albania
4 4 Algeria
5 5 American Samoa
6 7 Angola
7 8 Antigua and Barbuda
8 9 Argentina
9 10 Australia
10 11 Austria
# … with 238 more rows
# A tibble: 248 × 2
`Area Code` Area
<dbl> <chr>
1 5504 Polynesia
2 5503 Micronesia
3 5502 Melanesia
4 5501 Australia and New Zealand
5 5500 Oceania
6 5404 Western Europe
7 5403 Southern Europe
8 5402 Northern Europe
9 5401 Eastern Europe
10 5400 Europe
# … with 238 more rows
# A tibble: 28 × 2
`Area Code` Area
<dbl> <chr>
1 5000 World
2 5100 Africa
3 5101 Eastern Africa
4 5102 Middle Africa
5 5103 Northern Africa
6 5104 Southern Africa
7 5105 Western Africa
8 5200 Americas
9 5203 Northern America
10 5204 Central America
# … with 18 more rows
, Value
: For a given observation, there is the year the observation was made (between 1961 and 2018), and the number of stock
counted as a value
with units
of 1000 head
. 4700 represents, 4,700,000 heads of the given type of bird observed.
# A tibble: 1 × 1
1 1000 Head
Unit Value
Length:30977 Min. : 0
Class :character 1st Qu.: 171
Mode :character Median : 1800
Mean : 99411
3rd Qu.: 15404
Max. :23707134
NA's :1036
# A tibble: 30,977 × 2
Unit Value
<chr> <dbl>
1 1000 Head 4700
2 1000 Head 4900
3 1000 Head 5000
4 1000 Head 5300
5 1000 Head 5500
6 1000 Head 5800
7 1000 Head 6600
8 1000 Head 6290
9 1000 Head 6300
10 1000 Head 6000
# … with 30,967 more rows
consists of 6 values describing the methodology by which the data was collected.
# A tibble: 6 × 2
Flag `Flag Description`
<chr> <chr>
1 F FAO estimate
2 <NA> Official data
3 Im FAO data based on imputation methodology
4 M Data not available
5 * Unofficial figure
6 A Aggregate, may include official, semi-official, estimated or calculated…
* A F Im M
1494 6488 10007 1213 1002
For the Birds
data set, each case provides an estimate for the population of domesticated fowl for a given type of bird, in a given region of the world, for a given year.
When considering the data filtered by cases with Area
= World
, there is global aggregate data for each type of bird per year. Considering the measures of central tendency by item shows that chickens are the dominant domesticated fowl globally. The measures of dispersion, indicate that the rise of the domesticated Chicken population since 1961 is much more extreme than that of the other domesticated fowl.
World_Data <-filter(Birds, `Area Code` == 5000)
# summary(World_Data)
# World_Flags <-select(World_Data, `Flag Description`)
#vNum_World_Flags <- unique(World_Flags)
# Num_World_Flags
World_Item <- World_Data %>% group_by(Item)
# World_Item
World_Item %>% summarise(mean = mean(Value, na.rm = TRUE), median = median(Value, na.rm =TRUE), sd = sd(Value, na.rm = TRUE), max = max(Value), min = min(Value), range = max-min, var = var(Value))
# A tibble: 5 × 8
Item mean median sd max min range var
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Chickens 11624407. 10436552. 6.13e6 2.37e7 3.91e6 1.98e7 3.76e13
2 Ducks 645609. 544471 3.58e5 1.20e6 1.93e5 1.00e6 1.28e11
3 Geese and guinea fowls 177314. 124515 1.24e5 3.91e5 3.66e4 3.54e5 1.55e10
4 Pigeons, other birds 29409. 32222 1.15e4 5.79e4 1.21e4 4.58e4 1.32e 8
5 Turkeys 352802. 421909 1.16e5 4.74e5 1.54e5 3.20e5 1.35e10
# A tibble: 58 × 6
Year Chickens Ducks `Geese and guinea fowls` `Pigeons, other birds` Turkeys
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1961 3906690 193452 36640 14055 204241
2 1962 4048728 201167 37737 16026 174077
3 1963 4163131 210275 38489 17018 161262
4 1964 4231221 216183 40928 17963 153758
5 1965 4349674 222799 43523 18869 154790
6 1966 4445629 229837 44491 19676 166655
7 1967 4666511 237028 48028 19860 174158
8 1968 4823170 245201 50302 12068 155205
9 1969 4988438 249936 52459 12656 157950
10 1970 5209733 256318 54578 13219 178971
# … with 48 more rows
When considering the change in value of each item over time (this would best be visualized with line plots of item values on the y-axis and year on the x-axis):
# A tibble: 58 × 6
Year Chickens Ducks `Geese and guinea fowls` `Pigeons, other birds` Turkeys
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1961 3906690 193452 36640 14055 204241
2 1962 4048728 201167 37737 16026 174077
3 1963 4163131 210275 38489 17018 161262
4 1964 4231221 216183 40928 17963 153758
5 1965 4349674 222799 43523 18869 154790
6 1966 4445629 229837 44491 19676 166655
7 1967 4666511 237028 48028 19860 174158
8 1968 4823170 245201 50302 12068 155205
9 1969 4988438 249936 52459 12656 157950
10 1970 5209733 256318 54578 13219 178971
# … with 48 more rows
Global domesticated production and consumption of chickens, turkeys, ducks, and geese has steadily increased from 1961-1990; however pigeons and other birds do not see this same pattern. Perhaps there was technological innovation during this period that allowed for a large scale increase in the capacity of farms to support this growth. Perhaps the increase was also necessitated by general population growth and the globalization of farming in this time period. Global production of chickens has seen the most extreme growth in this period. It would be worthwhile to explore the preference of items and growth of the value fields by regions of the world.
##Further Challenge to attempt later - hotel_bookings.csv ⭐⭐⭐⭐
title: "Challenge 2"
author: "Theresa Szczepanski"
desription: "Data wrangling: using group() and summarise()"
date: "09/19/2022"
toc: true
code-fold: true
code-copy: true
code-tools: true
- challenge_2
- birds
- Theresa_Szczepanski
#| label: setup
#| warning: false
#| message: false
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
## Challenge Overview
Today's challenge is to
1) read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
2) provide summary statistics for different interesting groups within the data, and interpret those statistics
## Read in the Data Birds
Data Source
- birds.csv ⭐⭐⭐
Birds <- read_csv("_data/birds.csv")
#| label: birds read in/summary
## Describe the data
The `birds` data consists of 8 variables of character type and 6 variables of double type. The data seem to describe the population of `stock` or domesticated fowl in regions of the world for given years between 1961 and 2018. The character variables have an associated numeric code variable.
`Domain`, `Element`: For this data set all of the cases have the same `Domain` and `Domain Code` representing "live animals" and the same `Element` and `Element Code` representing `stocks`. The term `stock` seems to indicate that the animals represent domesticated stock rather than wild fowl.
Domains <-select(Birds, "Domain Code", Domain)
Num_Domains <-unique(Domains)
Elements <-select(Birds, "Element Code", Element)
Num_Elements <-unique(Elements)
#| label: Domains /Elements info
`Item`: For this data set, all of the observations are of items of type chicken, duck, geese and guinea fowls, turkeys, or pigeons/other birds.
Items <-select(Birds, "Item Code", Item)
Num_Items <-unique(Items)
#| label: Items info
`Area` consists of 248 entries. Notably, the entries with values less than 5000 represent countries of the world. The numeric codes correspond with the the alphabetical order of the country names. The remaining codes greater than 5000, correspond to regions of the world rather than a specific country. In these cases, regions with numbers closer in value seem to have closer geographic proximity. It should be noted that there is a value for Europe as well as a value for Eastern Europe and Western Europe, so there are regions that are represented in multiple cases of these entries.
Areas <-select(Birds, "Area Code", Area)
Num_Areas <-unique(Areas)
arrange(Num_Areas, `Area Code`)
arrange(Num_Areas, desc(`Area Code`))
World_Region <- filter(Num_Areas, `Area Code` >= 5000)
arrange(World_Region, `Area Code`)
#| label: Area info
`Unit`, `Value`: For a given observation, there is the year the observation was made (between 1961 and 2018), and the number of `stock` counted as a `value` with `units` of `1000 head`. 4700 represents, 4,700,000 heads of the given type of bird observed.
Birds_Values <-select(Birds, Unit, Value)
Units <-select(Birds, Unit)
Num_Units <-unique(Units)
#| label: Years info
`Flag` consists of 6 values describing the methodology by which the data was collected.
Flag_Descriptions <-select(Birds, Flag, `Flag Description`)
Num_Flag_Descriptions <-unique(Flag_Descriptions)
Flags <-select(Birds, Flag)
#| label: Flags info
For the `Birds` data set, each case provides an estimate for the population of domesticated fowl for a given type of bird, in a given region of the world, for a given year.
## Provide Grouped Summary Statistics
When considering the data filtered by cases with `Area` = `World`, there is global aggregate data for each type of bird per year. Considering the measures of central tendency by item shows that chickens are the dominant domesticated fowl globally. The measures of dispersion, indicate that the rise of the domesticated Chicken population since 1961 is much more extreme than that of the other domesticated fowl.
World_Data <-filter(Birds, `Area Code` == 5000)
# summary(World_Data)
# World_Flags <-select(World_Data, `Flag Description`)
#vNum_World_Flags <- unique(World_Flags)
# Num_World_Flags
World_Item <- World_Data %>% group_by(Item)
# World_Item
World_Item %>% summarise(mean = mean(Value, na.rm = TRUE), median = median(Value, na.rm =TRUE), sd = sd(Value, na.rm = TRUE), max = max(Value), min = min(Value), range = max-min, var = var(Value))
World_Data <- select(World_Data, Item, Year, Value )
World_Data_by_Item <-select(World_Data, Item, Value)
World_Data_by_Item <- pivot_wider(World_Data, names_from = `Item`, values_from = `Value`)
#global summary statistics by item.
When considering the change in value of each item over time (this would best be visualized with line plots of item values on the y-axis and year on the x-axis):
- The world Turkey population seems to have steadily increased from 1961-1990. From 1990-2018 the population of Turkey is consistently larger than the previous 30 years but has not grown incrementally grown year to year.
- The world chicken population seems to have consistently increased year to year from 1961-2018.
- The world duck population seems to have consistently increased year-to-year until 2004.
- The world geese and guinea fowl population seems to have consistently increased year-to-year until 1993.
- The world pigeon and other bird population has much more variation in the year to year population changes. This suggests that trends in global production, domestication, and consumption/use of chickens, ducks, turkeys, and geese over the last 60 years is much different than that of pigeons and other birds.
arrange(World_Data_by_Item, `Year`)
#arrange(World_Data_by_Item, `Turkeys`)
#arrange(World_Data_by_Item, `Chickens`)
#arrange(World_Data_by_Item, `Ducks`)
#arrange(World_Data_by_Item, `Geese and guinea fowls`)
#arrange(World_Data_by_Item, `Pigeons, other birds`)
#perform global analysis by item of value over time
### Explain and Interpret
Global domesticated production and consumption of chickens, turkeys, ducks, and geese has steadily increased from 1961-1990; however pigeons and other birds do not see this same pattern. Perhaps there was technological innovation during this period that allowed for a large scale increase in the capacity of farms to support this growth. Perhaps the increase was also necessitated by general population growth and the globalization of farming in this time period. Global production of chickens has seen the most extreme growth in this period. It would be worthwhile to explore the preference of items and growth of the value fields by regions of the world.
##Further Challenge to attempt later
- hotel_bookings.csv ⭐⭐⭐⭐