Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Said Arslan
September 25, 2022
I will analyze the birds.csv
dataset in this challenge.
birds
variable is created to internalize the dataset so I can analyze and manipulate it.
[1] 30977 14
Domain.Code Domain Area.Code Area Element.Code Element Item.Code
1 QA Live Animals 2 Afghanistan 5112 Stocks 1057
2 QA Live Animals 2 Afghanistan 5112 Stocks 1057
3 QA Live Animals 2 Afghanistan 5112 Stocks 1057
4 QA Live Animals 2 Afghanistan 5112 Stocks 1057
5 QA Live Animals 2 Afghanistan 5112 Stocks 1057
6 QA Live Animals 2 Afghanistan 5112 Stocks 1057
Item Year.Code Year Unit Value Flag Flag.Description
1 Chickens 1961 1961 1000 Head 4700 F FAO estimate
2 Chickens 1962 1962 1000 Head 4900 F FAO estimate
3 Chickens 1963 1963 1000 Head 5000 F FAO estimate
4 Chickens 1964 1964 1000 Head 5300 F FAO estimate
5 Chickens 1965 1965 1000 Head 5500 F FAO estimate
6 Chickens 1966 1966 1000 Head 5800 F FAO estimate
Domain.Code Domain Area.Code Area Element.Code Element
30972 QA Live Animals 5504 Polynesia 5112 Stocks
30973 QA Live Animals 5504 Polynesia 5112 Stocks
30974 QA Live Animals 5504 Polynesia 5112 Stocks
30975 QA Live Animals 5504 Polynesia 5112 Stocks
30976 QA Live Animals 5504 Polynesia 5112 Stocks
30977 QA Live Animals 5504 Polynesia 5112 Stocks
Item.Code Item Year.Code Year Unit Value Flag
30972 1068 Ducks 2013 2013 1000 Head 30 A
30973 1068 Ducks 2014 2014 1000 Head 30 A
30974 1068 Ducks 2015 2015 1000 Head 30 A
30975 1068 Ducks 2016 2016 1000 Head 31 A
30976 1068 Ducks 2017 2017 1000 Head 30 A
30977 1068 Ducks 2018 2018 1000 Head 30 A
Flag.Description
30972 Aggregate, may include official, semi-official, estimated or calculated data
30973 Aggregate, may include official, semi-official, estimated or calculated data
30974 Aggregate, may include official, semi-official, estimated or calculated data
30975 Aggregate, may include official, semi-official, estimated or calculated data
30976 Aggregate, may include official, semi-official, estimated or calculated data
30977 Aggregate, may include official, semi-official, estimated or calculated data
In the dataset, there are 14 variables (columns) and 30977 observations (rows). From head and tail tables, roughly speaking, each observation includes stock level of a specific type of animal in a country in a year.
Domain.Code Domain Area.Code Area
1 1 248 248
Element.Code Element Item.Code Item
1 1 5 5
Year.Code Year Unit Value
58 58 1 11496
Flag Flag.Description
6 6
Chickens Ducks Geese and guinea fowls
13074 6909 4136
Pigeons, other birds Turkeys
1165 5693
* A F Im M
10773 1494 6488 10007 1213 1002
[1] "Afghanistan"
[2] "Albania"
[3] "Algeria"
[4] "American Samoa"
[5] "Angola"
[6] "Antigua and Barbuda"
[7] "Argentina"
[8] "Armenia"
[9] "Aruba"
[10] "Australia"
[11] "Austria"
[12] "Azerbaijan"
[13] "Bahamas"
[14] "Bahrain"
[15] "Bangladesh"
[16] "Barbados"
[17] "Belarus"
[18] "Belgium"
[19] "Belgium-Luxembourg"
[20] "Belize"
[21] "Benin"
[22] "Bermuda"
[23] "Bhutan"
[24] "Bolivia (Plurinational State of)"
[25] "Bosnia and Herzegovina"
[26] "Botswana"
[27] "Brazil"
[28] "Brunei Darussalam"
[29] "Bulgaria"
[30] "Burkina Faso"
[31] "Burundi"
[32] "Cabo Verde"
[33] "Cambodia"
[34] "Cameroon"
[35] "Canada"
[36] "Cayman Islands"
[37] "Central African Republic"
[38] "Chad"
[39] "Chile"
[40] "China, Hong Kong SAR"
[41] "China, Macao SAR"
[42] "China, mainland"
[43] "China, Taiwan Province of"
[44] "Colombia"
[45] "Comoros"
[46] "Congo"
[47] "Cook Islands"
[48] "Costa Rica"
[49] "Côte d'Ivoire"
[50] "Croatia"
[51] "Cuba"
[52] "Cyprus"
[53] "Czechia"
[54] "Czechoslovakia"
[55] "Democratic People's Republic of Korea"
[56] "Democratic Republic of the Congo"
[57] "Denmark"
[58] "Dominica"
[59] "Dominican Republic"
[60] "Ecuador"
[61] "Egypt"
[62] "El Salvador"
[63] "Equatorial Guinea"
[64] "Eritrea"
[65] "Estonia"
[66] "Eswatini"
[67] "Ethiopia"
[68] "Ethiopia PDR"
[69] "Falkland Islands (Malvinas)"
[70] "Fiji"
[71] "Finland"
[72] "France"
[73] "French Guyana"
[74] "French Polynesia"
[75] "Gabon"
[76] "Gambia"
[77] "Georgia"
[78] "Germany"
[79] "Ghana"
[80] "Greece"
[81] "Grenada"
[82] "Guadeloupe"
[83] "Guam"
[84] "Guatemala"
[85] "Guinea"
[86] "Guinea-Bissau"
[87] "Guyana"
[88] "Haiti"
[89] "Honduras"
[90] "Hungary"
[91] "Iceland"
[92] "India"
[93] "Indonesia"
[94] "Iran (Islamic Republic of)"
[95] "Iraq"
[96] "Ireland"
[97] "Israel"
[98] "Italy"
[99] "Jamaica"
[100] "Japan"
[101] "Jordan"
[102] "Kazakhstan"
[103] "Kenya"
[104] "Kiribati"
[105] "Kuwait"
[106] "Kyrgyzstan"
[107] "Lao People's Democratic Republic"
[108] "Latvia"
[109] "Lebanon"
[110] "Lesotho"
[111] "Liberia"
[112] "Libya"
[113] "Liechtenstein"
[114] "Lithuania"
[115] "Luxembourg"
[116] "Madagascar"
[117] "Malawi"
[118] "Malaysia"
[119] "Mali"
[120] "Malta"
[121] "Martinique"
[122] "Mauritania"
[123] "Mauritius"
[124] "Mexico"
[125] "Micronesia (Federated States of)"
[126] "Mongolia"
[127] "Montenegro"
[128] "Montserrat"
[129] "Morocco"
[130] "Mozambique"
[131] "Myanmar"
[132] "Namibia"
[133] "Nauru"
[134] "Nepal"
[135] "Netherlands"
[136] "Netherlands Antilles (former)"
[137] "New Caledonia"
[138] "New Zealand"
[139] "Nicaragua"
[140] "Niger"
[141] "Nigeria"
[142] "Niue"
[143] "North Macedonia"
[144] "Norway"
[145] "Oman"
[146] "Pacific Islands Trust Territory"
[147] "Pakistan"
[148] "Palestine"
[149] "Panama"
[150] "Papua New Guinea"
[151] "Paraguay"
[152] "Peru"
[153] "Philippines"
[154] "Poland"
[155] "Portugal"
[156] "Puerto Rico"
[157] "Qatar"
[158] "Republic of Korea"
[159] "Republic of Moldova"
[160] "Réunion"
[161] "Romania"
[162] "Russian Federation"
[163] "Rwanda"
[164] "Saint Helena, Ascension and Tristan da Cunha"
[165] "Saint Kitts and Nevis"
[166] "Saint Lucia"
[167] "Saint Pierre and Miquelon"
[168] "Saint Vincent and the Grenadines"
[169] "Samoa"
[170] "Sao Tome and Principe"
[171] "Saudi Arabia"
[172] "Senegal"
[173] "Serbia"
[174] "Serbia and Montenegro"
[175] "Seychelles"
[176] "Sierra Leone"
[177] "Singapore"
[178] "Slovakia"
[179] "Slovenia"
[180] "Solomon Islands"
[181] "Somalia"
[182] "South Africa"
[183] "South Sudan"
[184] "Spain"
[185] "Sri Lanka"
[186] "Sudan"
[187] "Sudan (former)"
[188] "Suriname"
[189] "Sweden"
[190] "Switzerland"
[191] "Syrian Arab Republic"
[192] "Tajikistan"
[193] "Thailand"
[194] "Timor-Leste"
[195] "Togo"
[196] "Tokelau"
[197] "Tonga"
[198] "Trinidad and Tobago"
[199] "Tunisia"
[200] "Turkey"
[201] "Turkmenistan"
[202] "Tuvalu"
[203] "Uganda"
[204] "Ukraine"
[205] "United Arab Emirates"
[206] "United Kingdom of Great Britain and Northern Ireland"
[207] "United Republic of Tanzania"
[208] "United States of America"
[209] "United States Virgin Islands"
[210] "Uruguay"
[211] "USSR"
[212] "Uzbekistan"
[213] "Vanuatu"
[214] "Venezuela (Bolivarian Republic of)"
[215] "Viet Nam"
[216] "Wallis and Futuna Islands"
[217] "Yemen"
[218] "Yugoslav SFR"
[219] "Zambia"
[220] "Zimbabwe"
[221] "World"
[222] "Africa"
[223] "Eastern Africa"
[224] "Middle Africa"
[225] "Northern Africa"
[226] "Southern Africa"
[227] "Western Africa"
[228] "Americas"
[229] "Northern America"
[230] "Central America"
[231] "Caribbean"
[232] "South America"
[233] "Asia"
[234] "Central Asia"
[235] "Eastern Asia"
[236] "Southern Asia"
[237] "South-eastern Asia"
[238] "Western Asia"
[239] "Europe"
[240] "Eastern Europe"
[241] "Northern Europe"
[242] "Southern Europe"
[243] "Western Europe"
[244] "Oceania"
[245] "Australia and New Zealand"
[246] "Melanesia"
[247] "Micronesia"
[248] "Polynesia"
[1] 1961 2018
If we assess the number of unique values in each column with the names of variables, we can say that this dataset is extracted from a large stock dataset because it includes only live animals as Domain
and only 5 types of domestic fowl is included under Item
variable which are chicken, duck, geese, turkey and other types of birds. Besides that, only one type of element (stocks) is included in the data.
Observations are from 248 different countries and regions. Observations are years from 1961 to 2018. The dataset covers almost every region and countries of the World.
The core of each observation is given under Value
variable, which should be number of stocks in thousands. All of other variables are basically contain characteristic information about observations.
Data might be obtained from poultry farms in each country from 1961 to 2018. It includes aggregated data.
It looks 1036 value are missing/ not observed in the dataset.
# A tibble: 248 × 2
Area Total
<chr> <int>
1 World 744113376
2 Asia 361468192
3 Americas 198674579
4 Eastern Asia 197028391
5 China, mainland 170767908
6 Europe 122216094
7 Northern America 100752122
8 United States of America 93432842
9 South-eastern Asia 79669999
10 South America 68503385
# … with 238 more rows
Summarize table shows the total of stock levels in 58 years for each predefined region and country. Among countries, China had the largest stock of poultry over 58 years.
# A tibble: 5 × 3
Item Total Percentage
<chr> <dbl> <dbl>
1 Chickens 2696862583 90.6
2 Ducks 149781301 5
3 Turkeys 81850064 2.7
4 Geese and guinea fowls 41136874 1.4
5 Pigeons, other birds 6822856 0.2
In terms of bird types, more than 90% of poultry stocks consisted of chickens, 5% ducks and so on.
---
title: "Challenge-1"
author: "Said Arslan"
desription: "Reading in data and creating a post"
date: "09/25/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- birds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
I will analyze the `birds.csv` dataset in this challenge.
## Read Data
```{r read_data}
birds <- read.csv("_data/birds.csv")
```
`birds` variable is created to internalize the dataset so I can analyze and manipulate it.
## Describe the data
```{r about-data-1}
dim(birds)
head(birds)
tail(birds)
```
In the dataset, there are 14 variables (columns) and 30977 observations (rows). From head and tail tables, roughly speaking, each observation includes stock level of a specific type of animal in a country in a year.
```{r about-data-2}
sapply(birds, function(x) n_distinct(x)) # number of unique values under each column
table(birds$Item) # types of birds
table(birds$Flag) # flag types
country.names <- unique(birds$Area) # countries included in the dataset
print(country.names)
years.of.observations <- range(birds$Year) # range of data in terms of years
years.of.observations
```
If we assess the number of unique values in each column with the names of variables, we can say that this dataset is extracted from a large stock dataset because it includes only live animals as `Domain` and only 5 types of domestic fowl is included under `Item` variable which are chicken, duck, geese, turkey and other types of birds. Besides that, only one type of element (stocks) is included in the data.
Observations are from 248 different countries and regions. Observations are years from 1961 to 2018. The dataset covers almost every region and countries of the World.
The core of each observation is given under `Value` variable, which should be number of stocks in thousands. All of other variables are basically contain characteristic information about observations.
Data might be obtained from poultry farms in each country from 1961 to 2018. It includes aggregated data.
## Missing Data
```{r missing-data}
sum(is.na(birds$Value))
sum(is.null(birds$Value))
```
It looks 1036 value are missing/ not observed in the dataset.
## Stock levels by country and by bird type
```{r stock-levels-by-country-and-region}
birds %>%
select(Area, Value) %>%
group_by(Area) %>%
summarize(Total = sum(Value)) %>%
arrange(desc(Total))
```
Summarize table shows the total of stock levels in 58 years for each predefined region and country. Among countries, China had the largest stock of poultry over 58 years.
```{r stock-levels-for-each-type-of-birds}
stock.levels <- birds %>%
select(Item, Value) %>%
group_by(Item) %>%
summarize(Total = sum(Value, na.rm = TRUE)) %>%
arrange(desc(Total)) %>%
mutate(Percentage = 100*round(proportions(Total), digits=3))
stock.levels
```
In terms of bird types, more than 90% of poultry stocks consisted of chickens, 5% ducks and so on.