DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge-1

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Read Data
  • Describe the data
  • Missing Data
  • Stock levels by country and by bird type

Challenge-1

  • Show All Code
  • Hide All Code

  • View Source
birds
Author

Said Arslan

Published

September 25, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

I will analyze the birds.csv dataset in this challenge.

Read Data

Code
birds <- read.csv("_data/birds.csv")

birds variable is created to internalize the dataset so I can analyze and manipulate it.

Describe the data

Code
dim(birds)
[1] 30977    14
Code
head(birds)
  Domain.Code       Domain Area.Code        Area Element.Code Element Item.Code
1          QA Live Animals         2 Afghanistan         5112  Stocks      1057
2          QA Live Animals         2 Afghanistan         5112  Stocks      1057
3          QA Live Animals         2 Afghanistan         5112  Stocks      1057
4          QA Live Animals         2 Afghanistan         5112  Stocks      1057
5          QA Live Animals         2 Afghanistan         5112  Stocks      1057
6          QA Live Animals         2 Afghanistan         5112  Stocks      1057
      Item Year.Code Year      Unit Value Flag Flag.Description
1 Chickens      1961 1961 1000 Head  4700    F     FAO estimate
2 Chickens      1962 1962 1000 Head  4900    F     FAO estimate
3 Chickens      1963 1963 1000 Head  5000    F     FAO estimate
4 Chickens      1964 1964 1000 Head  5300    F     FAO estimate
5 Chickens      1965 1965 1000 Head  5500    F     FAO estimate
6 Chickens      1966 1966 1000 Head  5800    F     FAO estimate
Code
tail(birds)
      Domain.Code       Domain Area.Code      Area Element.Code Element
30972          QA Live Animals      5504 Polynesia         5112  Stocks
30973          QA Live Animals      5504 Polynesia         5112  Stocks
30974          QA Live Animals      5504 Polynesia         5112  Stocks
30975          QA Live Animals      5504 Polynesia         5112  Stocks
30976          QA Live Animals      5504 Polynesia         5112  Stocks
30977          QA Live Animals      5504 Polynesia         5112  Stocks
      Item.Code  Item Year.Code Year      Unit Value Flag
30972      1068 Ducks      2013 2013 1000 Head    30    A
30973      1068 Ducks      2014 2014 1000 Head    30    A
30974      1068 Ducks      2015 2015 1000 Head    30    A
30975      1068 Ducks      2016 2016 1000 Head    31    A
30976      1068 Ducks      2017 2017 1000 Head    30    A
30977      1068 Ducks      2018 2018 1000 Head    30    A
                                                                  Flag.Description
30972 Aggregate, may include official, semi-official, estimated or calculated data
30973 Aggregate, may include official, semi-official, estimated or calculated data
30974 Aggregate, may include official, semi-official, estimated or calculated data
30975 Aggregate, may include official, semi-official, estimated or calculated data
30976 Aggregate, may include official, semi-official, estimated or calculated data
30977 Aggregate, may include official, semi-official, estimated or calculated data

In the dataset, there are 14 variables (columns) and 30977 observations (rows). From head and tail tables, roughly speaking, each observation includes stock level of a specific type of animal in a country in a year.

Code
sapply(birds, function(x) n_distinct(x)) # number of unique values under each column
     Domain.Code           Domain        Area.Code             Area 
               1                1              248              248 
    Element.Code          Element        Item.Code             Item 
               1                1                5                5 
       Year.Code             Year             Unit            Value 
              58               58                1            11496 
            Flag Flag.Description 
               6                6 
Code
table(birds$Item) # types of birds

              Chickens                  Ducks Geese and guinea fowls 
                 13074                   6909                   4136 
  Pigeons, other birds                Turkeys 
                  1165                   5693 
Code
table(birds$Flag) # flag types

          *     A     F    Im     M 
10773  1494  6488 10007  1213  1002 
Code
country.names <- unique(birds$Area) # countries included in the dataset
print(country.names)
  [1] "Afghanistan"                                         
  [2] "Albania"                                             
  [3] "Algeria"                                             
  [4] "American Samoa"                                      
  [5] "Angola"                                              
  [6] "Antigua and Barbuda"                                 
  [7] "Argentina"                                           
  [8] "Armenia"                                             
  [9] "Aruba"                                               
 [10] "Australia"                                           
 [11] "Austria"                                             
 [12] "Azerbaijan"                                          
 [13] "Bahamas"                                             
 [14] "Bahrain"                                             
 [15] "Bangladesh"                                          
 [16] "Barbados"                                            
 [17] "Belarus"                                             
 [18] "Belgium"                                             
 [19] "Belgium-Luxembourg"                                  
 [20] "Belize"                                              
 [21] "Benin"                                               
 [22] "Bermuda"                                             
 [23] "Bhutan"                                              
 [24] "Bolivia (Plurinational State of)"                    
 [25] "Bosnia and Herzegovina"                              
 [26] "Botswana"                                            
 [27] "Brazil"                                              
 [28] "Brunei Darussalam"                                   
 [29] "Bulgaria"                                            
 [30] "Burkina Faso"                                        
 [31] "Burundi"                                             
 [32] "Cabo Verde"                                          
 [33] "Cambodia"                                            
 [34] "Cameroon"                                            
 [35] "Canada"                                              
 [36] "Cayman Islands"                                      
 [37] "Central African Republic"                            
 [38] "Chad"                                                
 [39] "Chile"                                               
 [40] "China, Hong Kong SAR"                                
 [41] "China, Macao SAR"                                    
 [42] "China, mainland"                                     
 [43] "China, Taiwan Province of"                           
 [44] "Colombia"                                            
 [45] "Comoros"                                             
 [46] "Congo"                                               
 [47] "Cook Islands"                                        
 [48] "Costa Rica"                                          
 [49] "Côte d'Ivoire"                                       
 [50] "Croatia"                                             
 [51] "Cuba"                                                
 [52] "Cyprus"                                              
 [53] "Czechia"                                             
 [54] "Czechoslovakia"                                      
 [55] "Democratic People's Republic of Korea"               
 [56] "Democratic Republic of the Congo"                    
 [57] "Denmark"                                             
 [58] "Dominica"                                            
 [59] "Dominican Republic"                                  
 [60] "Ecuador"                                             
 [61] "Egypt"                                               
 [62] "El Salvador"                                         
 [63] "Equatorial Guinea"                                   
 [64] "Eritrea"                                             
 [65] "Estonia"                                             
 [66] "Eswatini"                                            
 [67] "Ethiopia"                                            
 [68] "Ethiopia PDR"                                        
 [69] "Falkland Islands (Malvinas)"                         
 [70] "Fiji"                                                
 [71] "Finland"                                             
 [72] "France"                                              
 [73] "French Guyana"                                       
 [74] "French Polynesia"                                    
 [75] "Gabon"                                               
 [76] "Gambia"                                              
 [77] "Georgia"                                             
 [78] "Germany"                                             
 [79] "Ghana"                                               
 [80] "Greece"                                              
 [81] "Grenada"                                             
 [82] "Guadeloupe"                                          
 [83] "Guam"                                                
 [84] "Guatemala"                                           
 [85] "Guinea"                                              
 [86] "Guinea-Bissau"                                       
 [87] "Guyana"                                              
 [88] "Haiti"                                               
 [89] "Honduras"                                            
 [90] "Hungary"                                             
 [91] "Iceland"                                             
 [92] "India"                                               
 [93] "Indonesia"                                           
 [94] "Iran (Islamic Republic of)"                          
 [95] "Iraq"                                                
 [96] "Ireland"                                             
 [97] "Israel"                                              
 [98] "Italy"                                               
 [99] "Jamaica"                                             
[100] "Japan"                                               
[101] "Jordan"                                              
[102] "Kazakhstan"                                          
[103] "Kenya"                                               
[104] "Kiribati"                                            
[105] "Kuwait"                                              
[106] "Kyrgyzstan"                                          
[107] "Lao People's Democratic Republic"                    
[108] "Latvia"                                              
[109] "Lebanon"                                             
[110] "Lesotho"                                             
[111] "Liberia"                                             
[112] "Libya"                                               
[113] "Liechtenstein"                                       
[114] "Lithuania"                                           
[115] "Luxembourg"                                          
[116] "Madagascar"                                          
[117] "Malawi"                                              
[118] "Malaysia"                                            
[119] "Mali"                                                
[120] "Malta"                                               
[121] "Martinique"                                          
[122] "Mauritania"                                          
[123] "Mauritius"                                           
[124] "Mexico"                                              
[125] "Micronesia (Federated States of)"                    
[126] "Mongolia"                                            
[127] "Montenegro"                                          
[128] "Montserrat"                                          
[129] "Morocco"                                             
[130] "Mozambique"                                          
[131] "Myanmar"                                             
[132] "Namibia"                                             
[133] "Nauru"                                               
[134] "Nepal"                                               
[135] "Netherlands"                                         
[136] "Netherlands Antilles (former)"                       
[137] "New Caledonia"                                       
[138] "New Zealand"                                         
[139] "Nicaragua"                                           
[140] "Niger"                                               
[141] "Nigeria"                                             
[142] "Niue"                                                
[143] "North Macedonia"                                     
[144] "Norway"                                              
[145] "Oman"                                                
[146] "Pacific Islands Trust Territory"                     
[147] "Pakistan"                                            
[148] "Palestine"                                           
[149] "Panama"                                              
[150] "Papua New Guinea"                                    
[151] "Paraguay"                                            
[152] "Peru"                                                
[153] "Philippines"                                         
[154] "Poland"                                              
[155] "Portugal"                                            
[156] "Puerto Rico"                                         
[157] "Qatar"                                               
[158] "Republic of Korea"                                   
[159] "Republic of Moldova"                                 
[160] "Réunion"                                             
[161] "Romania"                                             
[162] "Russian Federation"                                  
[163] "Rwanda"                                              
[164] "Saint Helena, Ascension and Tristan da Cunha"        
[165] "Saint Kitts and Nevis"                               
[166] "Saint Lucia"                                         
[167] "Saint Pierre and Miquelon"                           
[168] "Saint Vincent and the Grenadines"                    
[169] "Samoa"                                               
[170] "Sao Tome and Principe"                               
[171] "Saudi Arabia"                                        
[172] "Senegal"                                             
[173] "Serbia"                                              
[174] "Serbia and Montenegro"                               
[175] "Seychelles"                                          
[176] "Sierra Leone"                                        
[177] "Singapore"                                           
[178] "Slovakia"                                            
[179] "Slovenia"                                            
[180] "Solomon Islands"                                     
[181] "Somalia"                                             
[182] "South Africa"                                        
[183] "South Sudan"                                         
[184] "Spain"                                               
[185] "Sri Lanka"                                           
[186] "Sudan"                                               
[187] "Sudan (former)"                                      
[188] "Suriname"                                            
[189] "Sweden"                                              
[190] "Switzerland"                                         
[191] "Syrian Arab Republic"                                
[192] "Tajikistan"                                          
[193] "Thailand"                                            
[194] "Timor-Leste"                                         
[195] "Togo"                                                
[196] "Tokelau"                                             
[197] "Tonga"                                               
[198] "Trinidad and Tobago"                                 
[199] "Tunisia"                                             
[200] "Turkey"                                              
[201] "Turkmenistan"                                        
[202] "Tuvalu"                                              
[203] "Uganda"                                              
[204] "Ukraine"                                             
[205] "United Arab Emirates"                                
[206] "United Kingdom of Great Britain and Northern Ireland"
[207] "United Republic of Tanzania"                         
[208] "United States of America"                            
[209] "United States Virgin Islands"                        
[210] "Uruguay"                                             
[211] "USSR"                                                
[212] "Uzbekistan"                                          
[213] "Vanuatu"                                             
[214] "Venezuela (Bolivarian Republic of)"                  
[215] "Viet Nam"                                            
[216] "Wallis and Futuna Islands"                           
[217] "Yemen"                                               
[218] "Yugoslav SFR"                                        
[219] "Zambia"                                              
[220] "Zimbabwe"                                            
[221] "World"                                               
[222] "Africa"                                              
[223] "Eastern Africa"                                      
[224] "Middle Africa"                                       
[225] "Northern Africa"                                     
[226] "Southern Africa"                                     
[227] "Western Africa"                                      
[228] "Americas"                                            
[229] "Northern America"                                    
[230] "Central America"                                     
[231] "Caribbean"                                           
[232] "South America"                                       
[233] "Asia"                                                
[234] "Central Asia"                                        
[235] "Eastern Asia"                                        
[236] "Southern Asia"                                       
[237] "South-eastern Asia"                                  
[238] "Western Asia"                                        
[239] "Europe"                                              
[240] "Eastern Europe"                                      
[241] "Northern Europe"                                     
[242] "Southern Europe"                                     
[243] "Western Europe"                                      
[244] "Oceania"                                             
[245] "Australia and New Zealand"                           
[246] "Melanesia"                                           
[247] "Micronesia"                                          
[248] "Polynesia"                                           
Code
years.of.observations <- range(birds$Year) # range of data in terms of years
years.of.observations
[1] 1961 2018

If we assess the number of unique values in each column with the names of variables, we can say that this dataset is extracted from a large stock dataset because it includes only live animals as Domain and only 5 types of domestic fowl is included under Item variable which are chicken, duck, geese, turkey and other types of birds. Besides that, only one type of element (stocks) is included in the data.

Observations are from 248 different countries and regions. Observations are years from 1961 to 2018. The dataset covers almost every region and countries of the World.

The core of each observation is given under Value variable, which should be number of stocks in thousands. All of other variables are basically contain characteristic information about observations.

Data might be obtained from poultry farms in each country from 1961 to 2018. It includes aggregated data.

Missing Data

Code
sum(is.na(birds$Value))
[1] 1036
Code
sum(is.null(birds$Value))
[1] 0

It looks 1036 value are missing/ not observed in the dataset.

Stock levels by country and by bird type

Code
birds %>% 
  select(Area, Value) %>% 
  group_by(Area) %>% 
  summarize(Total = sum(Value)) %>% 
  arrange(desc(Total))
# A tibble: 248 × 2
   Area                         Total
   <chr>                        <int>
 1 World                    744113376
 2 Asia                     361468192
 3 Americas                 198674579
 4 Eastern Asia             197028391
 5 China, mainland          170767908
 6 Europe                   122216094
 7 Northern America         100752122
 8 United States of America  93432842
 9 South-eastern Asia        79669999
10 South America             68503385
# … with 238 more rows

Summarize table shows the total of stock levels in 58 years for each predefined region and country. Among countries, China had the largest stock of poultry over 58 years.

Code
stock.levels <-  birds %>% 
  select(Item, Value) %>% 
  group_by(Item) %>% 
  summarize(Total = sum(Value, na.rm = TRUE)) %>% 
  arrange(desc(Total)) %>% 
  mutate(Percentage = 100*round(proportions(Total), digits=3))
  

stock.levels
# A tibble: 5 × 3
  Item                        Total Percentage
  <chr>                       <dbl>      <dbl>
1 Chickens               2696862583       90.6
2 Ducks                   149781301        5  
3 Turkeys                  81850064        2.7
4 Geese and guinea fowls   41136874        1.4
5 Pigeons, other birds      6822856        0.2

In terms of bird types, more than 90% of poultry stocks consisted of chickens, 5% ducks and so on.

Source Code
---
title: "Challenge-1"
author: "Said Arslan"
desription: "Reading in data and creating a post"
date: "09/25/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - birds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

```

I will analyze the `birds.csv` dataset in this challenge.

## Read Data

```{r read_data}

birds <- read.csv("_data/birds.csv")

```

`birds` variable is created to internalize the dataset so I can analyze and manipulate it.

## Describe the data


```{r about-data-1}

dim(birds)
head(birds)
tail(birds)

```


In the dataset, there are 14 variables (columns) and 30977 observations (rows). From head and tail tables, roughly speaking, each observation includes stock level of a specific type of animal in a country in a year.



```{r about-data-2}

sapply(birds, function(x) n_distinct(x)) # number of unique values under each column

table(birds$Item) # types of birds

table(birds$Flag) # flag types

country.names <- unique(birds$Area) # countries included in the dataset
print(country.names)

years.of.observations <- range(birds$Year) # range of data in terms of years
years.of.observations


```
If we assess the number of unique values in each column with the names of variables, we can say that this dataset is extracted from a large stock dataset because it includes only live animals as `Domain` and only 5 types of domestic fowl is included under `Item` variable which are chicken, duck, geese, turkey and other types of birds. Besides that, only one type of element (stocks) is included in the data.

Observations are from 248 different countries and regions. Observations are years from 1961 to 2018. The dataset covers almost every region and countries of the World.

The core of each observation is given under `Value` variable, which should be number of stocks in thousands. All of other variables are basically contain characteristic information about observations.

Data might be obtained from poultry farms in each country from 1961 to 2018. It includes aggregated data.


## Missing Data

```{r missing-data}

sum(is.na(birds$Value))

sum(is.null(birds$Value))

```

It looks 1036 value are missing/ not observed in the dataset.


## Stock levels by country and by bird type

```{r stock-levels-by-country-and-region}

birds %>% 
  select(Area, Value) %>% 
  group_by(Area) %>% 
  summarize(Total = sum(Value)) %>% 
  arrange(desc(Total))

```

Summarize table shows the total of stock levels in 58 years for each predefined region and country. Among countries, China had the largest stock of poultry over 58 years.


```{r stock-levels-for-each-type-of-birds}

stock.levels <-  birds %>% 
  select(Item, Value) %>% 
  group_by(Item) %>% 
  summarize(Total = sum(Value, na.rm = TRUE)) %>% 
  arrange(desc(Total)) %>% 
  mutate(Percentage = 100*round(proportions(Total), digits=3))
  

stock.levels


```

In terms of bird types, more than 90% of poultry stocks consisted of chickens, 5% ducks and so on.