challenge_1
Aritra Basu
wildbirds
Author

Aritra Basu

Published

February 28, 2023

Code
library(tidyverse)
library(dplyr)
library(descr)
library(summarytools)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Reading the Data

I have decided to work on the dataset birds:

Code
library(readr)
birds <- read.csv("_data/birds.csv")

Describing the data

Using the package descr, it can be seen that the dataset contains observations on birds from 1976 to 2018. We can see the frequency. The summary statistics, because of the nature of the variables is not very useful here.

Code
#install.packages('descr')
#library(descr)
descr(birds)
Descriptive Statistics  
birds  
N: 30977  

                    Area.Code   Element.Code   Item.Code         Value       Year   Year.Code
----------------- ----------- -------------- ----------- ------------- ---------- -----------
             Mean     1201.73        5112.00     1066.48      99410.63    1990.58     1990.58
          Std.Dev     2099.42           0.00        9.03     720611.42      16.73       16.73
              Min        1.00        5112.00     1057.00          0.00    1961.00     1961.00
               Q1       79.00        5112.00     1057.00        171.00    1976.00     1976.00
           Median      156.00        5112.00     1068.00       1800.00    1992.00     1992.00
               Q3      231.00        5112.00     1072.00      15404.00    2005.00     2005.00
              Max     5504.00        5112.00     1083.00   23707134.00    2018.00     2018.00
              MAD      114.16           0.00       16.31       2647.92      20.76       20.76
              IQR      152.00           0.00       15.00      15233.00      29.00       29.00
               CV        1.75           0.00        0.01          7.25       0.01        0.01
         Skewness        1.43            NaN        0.26         18.18      -0.09       -0.09
      SE.Skewness        0.01           0.01        0.01          0.01       0.01        0.01
         Kurtosis        0.05            NaN       -1.37        424.79      -1.19       -1.19
          N.Valid    30977.00       30977.00    30977.00      29941.00   30977.00    30977.00
        Pct.Valid      100.00         100.00      100.00         96.66     100.00      100.00

Using n_distinct, it can be observed that there are 248 distinct area codes.

Code
birds%>%
  select(Area.Code)%>%
  n_distinct(.)
[1] 248

Next, we could look at a cross sectional component by looking at the dataset for a particular year. Say, we want to filter chickens in 1962, and looking at area, items and value.

Code
birds%>%
  select(Area, Item, Value, Year)%>%
  filter(Year=="1962", Item=="Chickens")
                                                    Area     Item   Value Year
1                                            Afghanistan Chickens    4900 1962
2                                                Albania Chickens    1677 1962
3                                                Algeria Chickens    8600 1962
4                                         American Samoa Chickens      80 1962
5                                                 Angola Chickens    3500 1962
6                                    Antigua and Barbuda Chickens      44 1962
7                                              Argentina Chickens   38000 1962
8                                              Australia Chickens   20290 1962
9                                                Austria Chickens    9943 1962
10                                               Bahamas Chickens     580 1962
11                                               Bahrain Chickens      80 1962
12                                            Bangladesh Chickens   20000 1962
13                                              Barbados Chickens     319 1962
14                                    Belgium-Luxembourg Chickens   31177 1962
15                                                Belize Chickens     150 1962
16                                                 Benin Chickens    5000 1962
17                                               Bermuda Chickens      56 1962
18                                                Bhutan Chickens      75 1962
19                      Bolivia (Plurinational State of) Chickens    5200 1962
20                                              Botswana Chickens     244 1962
21                                                Brazil Chickens  137430 1962
22                                     Brunei Darussalam Chickens     160 1962
23                                              Bulgaria Chickens   21574 1962
24                                          Burkina Faso Chickens    8900 1962
25                                               Burundi Chickens    1200 1962
26                                            Cabo Verde Chickens      42 1962
27                                              Cambodia Chickens    2315 1962
28                                              Cameroon Chickens    3000 1962
29                                                Canada Chickens   65031 1962
30                              Central African Republic Chickens     800 1962
31                                                  Chad Chickens    2500 1962
32                                                 Chile Chickens    9630 1962
33                                  China, Hong Kong SAR Chickens    3251 1962
34                                      China, Macao SAR Chickens     230 1962
35                                       China, mainland Chickens  543000 1962
36                             China, Taiwan Province of Chickens    7942 1962
37                                              Colombia Chickens   19750 1962
38                                               Comoros Chickens     190 1962
39                                                 Congo Chickens     580 1962
40                                          Cook Islands Chickens      50 1962
41                                            Costa Rica Chickens    2361 1962
42                                         Côte d'Ivoire Chickens    4380 1962
43                                                  Cuba Chickens    7500 1962
44                                                Cyprus Chickens    1500 1962
45                                        Czechoslovakia Chickens   28507 1962
46                 Democratic People's Republic of Korea Chickens   11500 1962
47                      Democratic Republic of the Congo Chickens    4400 1962
48                                               Denmark Chickens   29047 1962
49                                              Dominica Chickens      54 1962
50                                    Dominican Republic Chickens    6400 1962
51                                               Ecuador Chickens    4380 1962
52                                                 Egypt Chickens   22096 1962
53                                           El Salvador Chickens    1878 1962
54                                     Equatorial Guinea Chickens      77 1962
55                                              Eswatini Chickens     334 1962
56                                          Ethiopia PDR Chickens   41000 1962
57                           Falkland Islands (Malvinas) Chickens       1 1962
58                                                  Fiji Chickens     200 1962
59                                               Finland Chickens    6509 1962
60                                                France Chickens  170000 1962
61                                         French Guyana Chickens      35 1962
62                                      French Polynesia Chickens      10 1962
63                                                 Gabon Chickens     270 1962
64                                                Gambia Chickens     180 1962
65                                               Germany Chickens  101716 1962
66                                                 Ghana Chickens    6005 1962
67                                                Greece Chickens   15030 1962
68                                               Grenada Chickens     151 1962
69                                            Guadeloupe Chickens     160 1962
70                                                  Guam Chickens      90 1962
71                                             Guatemala Chickens    7330 1962
72                                                Guinea Chickens    3200 1962
73                                         Guinea-Bissau Chickens     200 1962
74                                                Guyana Chickens    2100 1962
75                                                 Haiti Chickens    2600 1962
76                                              Honduras Chickens    2367 1962
77                                               Hungary Chickens   26245 1962
78                                               Iceland Chickens      95 1962
79                                                 India Chickens  107200 1962
80                                             Indonesia Chickens   60500 1962
81                            Iran (Islamic Republic of) Chickens   24382 1962
82                                                  Iraq Chickens    7100 1962
83                                               Ireland Chickens    6954 1962
84                                                Israel Chickens   11500 1962
85                                                 Italy Chickens   93000 1962
86                                               Jamaica Chickens    1899 1962
87                                                 Japan Chickens   90669 1962
88                                                Jordan Chickens    1500 1962
89                                                 Kenya Chickens    8300 1962
90                                              Kiribati Chickens     103 1962
91                                                Kuwait Chickens    2000 1962
92                      Lao People's Democratic Republic Chickens    6000 1962
93                                               Lebanon Chickens    3900 1962
94                                               Lesotho Chickens     657 1962
95                                               Liberia Chickens    1250 1962
96                                                 Libya Chickens     439 1962
97                                         Liechtenstein Chickens      NA 1962
98                                            Madagascar Chickens   10200 1962
99                                                Malawi Chickens    3000 1962
100                                             Malaysia Chickens   22500 1962
101                                                 Mali Chickens   10545 1962
102                                                Malta Chickens     526 1962
103                                           Martinique Chickens     300 1962
104                                           Mauritania Chickens    1850 1962
105                                            Mauritius Chickens     400 1962
106                                               Mexico Chickens   64172 1962
107                                             Mongolia Chickens     139 1962
108                                           Montserrat Chickens      24 1962
109                                              Morocco Chickens   13170 1962
110                                           Mozambique Chickens    4300 1962
111                                              Myanmar Chickens    7756 1962
112                                              Namibia Chickens     234 1962
113                                                Nauru Chickens       2 1962
114                                                Nepal Chickens    4300 1962
115                                          Netherlands Chickens   45890 1962
116                        Netherlands Antilles (former) Chickens      56 1962
117                                        New Caledonia Chickens     139 1962
118                                          New Zealand Chickens    4500 1962
119                                            Nicaragua Chickens    2062 1962
120                                                Niger Chickens    4850 1962
121                                              Nigeria Chickens   38710 1962
122                                                 Niue Chickens      11 1962
123                                               Norway Chickens    4745 1962
124                                                 Oman Chickens     315 1962
125                      Pacific Islands Trust Territory Chickens     106 1962
126                                             Pakistan Chickens   12000 1962
127                                            Palestine Chickens      NA 1962
128                                               Panama Chickens    2410 1962
129                                     Papua New Guinea Chickens     746 1962
130                                             Paraguay Chickens    5384 1962
131                                                 Peru Chickens   21000 1962
132                                          Philippines Chickens   51354 1962
133                                               Poland Chickens   69525 1962
134                                             Portugal Chickens    8900 1962
135                                          Puerto Rico Chickens    3532 1962
136                                                Qatar Chickens      22 1962
137                                    Republic of Korea Chickens   13047 1962
138                                              Réunion Chickens    1500 1962
139                                              Romania Chickens   38192 1962
140                                               Rwanda Chickens     400 1962
141         Saint Helena, Ascension and Tristan da Cunha Chickens      10 1962
142                                Saint Kitts and Nevis Chickens      48 1962
143                                          Saint Lucia Chickens      45 1962
144                     Saint Vincent and the Grenadines Chickens      30 1962
145                                                Samoa Chickens     450 1962
146                                Sao Tome and Principe Chickens      36 1962
147                                         Saudi Arabia Chickens    1550 1962
148                                              Senegal Chickens    2900 1962
149                                           Seychelles Chickens      45 1962
150                                         Sierra Leone Chickens    2000 1962
151                                            Singapore Chickens    2000 1962
152                                      Solomon Islands Chickens     100 1962
153                                              Somalia Chickens    1400 1962
154                                         South Africa Chickens   19000 1962
155                                                Spain Chickens   36248 1962
156                                            Sri Lanka Chickens    3765 1962
157                                       Sudan (former) Chickens   10000 1962
158                                             Suriname Chickens    3000 1962
159                                               Sweden Chickens   10061 1962
160                                          Switzerland Chickens    5880 1962
161                                 Syrian Arab Republic Chickens    3356 1962
162                                             Thailand Chickens   41192 1962
163                                          Timor-Leste Chickens     530 1962
164                                                 Togo Chickens    1028 1962
165                                              Tokelau Chickens       2 1962
166                                                Tonga Chickens      55 1962
167                                  Trinidad and Tobago Chickens    2800 1962
168                                              Tunisia Chickens    5200 1962
169                                               Turkey Chickens   26116 1962
170                                               Tuvalu Chickens      10 1962
171                                               Uganda Chickens    6700 1962
172                                 United Arab Emirates Chickens      40 1962
173 United Kingdom of Great Britain and Northern Ireland Chickens  104288 1962
174                          United Republic of Tanzania Chickens    7200 1962
175                             United States of America Chickens  782000 1962
176                         United States Virgin Islands Chickens      18 1962
177                                              Uruguay Chickens    4800 1962
178                                                 USSR Chickens  488300 1962
179                                              Vanuatu Chickens      95 1962
180                   Venezuela (Bolivarian Republic of) Chickens   13696 1962
181                                             Viet Nam Chickens   37100 1962
182                            Wallis and Futuna Islands Chickens      20 1962
183                                                Yemen Chickens    2340 1962
184                                         Yugoslav SFR Chickens   24908 1962
185                                               Zambia Chickens    4200 1962
186                                             Zimbabwe Chickens    6600 1962
187                                                World Chickens 4048728 1962
188                                               Africa Chickens  282821 1962
189                                       Eastern Africa Chickens   96635 1962
190                                        Middle Africa Chickens   15163 1962
191                                      Northern Africa Chickens   59505 1962
192                                      Southern Africa Chickens   20469 1962
193                                       Western Africa Chickens   91049 1962
194                                             Americas Chickens 1220784 1962
195                                     Northern America Chickens  847087 1962
196                                      Central America Chickens   82730 1962
197                                            Caribbean Chickens   26560 1962
198                                        South America Chickens  264407 1962
199                                                 Asia Chickens 1139125 1962
200                                         Eastern Asia Chickens  669777 1962
201                                        Southern Asia Chickens  176622 1962
202                                   South-eastern Asia Chickens  231407 1962
203                                         Western Asia Chickens   61319 1962
204                                               Europe Chickens 1378938 1962
205                                       Eastern Europe Chickens  672343 1962
206                                      Northern Europe Chickens  161699 1962
207                                      Southern Europe Chickens  180289 1962
208                                       Western Europe Chickens  364607 1962
209                                              Oceania Chickens   27059 1962
210                            Australia and New Zealand Chickens   24790 1962
211                                            Melanesia Chickens    1280 1962
212                                           Micronesia Chickens     301 1962
213                                            Polynesia Chickens     688 1962

Next, I identify the missing values for 1962.

Code
birds%>%
  select(Area, Element, Item, Value, Year)%>%
  filter(Year == "1962")%>%
  filter(is.na(Value))
                                    Area Element                   Item Value
1                                Albania  Stocks                  Ducks    NA
2                                Albania  Stocks Geese and guinea fowls    NA
3                                Albania  Stocks                Turkeys    NA
4                               Barbados  Stocks                Turkeys    NA
5                                  Chile  Stocks                Turkeys    NA
6  Democratic People's Republic of Korea  Stocks                  Ducks    NA
7                                Finland  Stocks                Turkeys    NA
8                          Liechtenstein  Stocks               Chickens    NA
9                                Morocco  Stocks                Turkeys    NA
10                            Mozambique  Stocks Geese and guinea fowls    NA
11                            Mozambique  Stocks                Turkeys    NA
12                               Namibia  Stocks   Pigeons, other birds    NA
13                                Norway  Stocks                  Ducks    NA
14                                Norway  Stocks                Turkeys    NA
15                             Palestine  Stocks               Chickens    NA
16             Saint Pierre and Miquelon  Stocks                  Ducks    NA
17                           Switzerland  Stocks Geese and guinea fowls    NA
18                           Switzerland  Stocks                Turkeys    NA
19                       Southern Africa  Stocks   Pigeons, other birds    NA
   Year
1  1962
2  1962
3  1962
4  1962
5  1962
6  1962
7  1962
8  1962
9  1962
10 1962
11 1962
12 1962
13 1962
14 1962
15 1962
16 1962
17 1962
18 1962
19 1962

Trying another function: dfsummary. This function is provided by the package summary tools.

Code
#install.packages('summarytools')
#library(summarytools)
dfSummary(birds, style="grid")
Data Frame Summary  
birds  
Dimensions: 30977 x 14  
Duplicates: 0  

+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| No | Variable         | Stats / Values                 | Freqs (% of Valid)    | Graph                | Valid    | Missing |
+====+==================+================================+=======================+======================+==========+=========+
| 1  | Domain.Code      | 1. QA                          | 30977 (100.0%)        | IIIIIIIIIIIIIIIIIIII | 30977    | 0       |
|    | [character]      |                                |                       |                      | (100.0%) | (0.0%)  |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 2  | Domain           | 1. Live Animals                | 30977 (100.0%)        | IIIIIIIIIIIIIIIIIIII | 30977    | 0       |
|    | [character]      |                                |                       |                      | (100.0%) | (0.0%)  |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 3  | Area.Code        | Mean (sd) : 1201.7 (2099.4)    | 248 distinct values   | :                    | 30977    | 0       |
|    | [integer]        | min < med < max:               |                       | :                    | (100.0%) | (0.0%)  |
|    |                  | 1 < 156 < 5504                 |                       | :                    |          |         |
|    |                  | IQR (CV) : 152 (1.7)           |                       | :                 .  |          |         |
|    |                  |                                |                       | :                 :  |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 4  | Area             | 1. Africa                      |   290 ( 0.9%)         |                      | 30977    | 0       |
|    | [character]      | 2. Asia                        |   290 ( 0.9%)         |                      | (100.0%) | (0.0%)  |
|    |                  | 3. Eastern Asia                |   290 ( 0.9%)         |                      |          |         |
|    |                  | 4. Egypt                       |   290 ( 0.9%)         |                      |          |         |
|    |                  | 5. Europe                      |   290 ( 0.9%)         |                      |          |         |
|    |                  | 6. France                      |   290 ( 0.9%)         |                      |          |         |
|    |                  | 7. Greece                      |   290 ( 0.9%)         |                      |          |         |
|    |                  | 8. Myanmar                     |   290 ( 0.9%)         |                      |          |         |
|    |                  | 9. Northern Africa             |   290 ( 0.9%)         |                      |          |         |
|    |                  | 10. South-eastern Asia         |   290 ( 0.9%)         |                      |          |         |
|    |                  | [ 238 others ]                 | 28077 (90.6%)         | IIIIIIIIIIIIIIIIII   |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 5  | Element.Code     | 1 distinct value               | 5112 : 30977 (100.0%) | IIIIIIIIIIIIIIIIIIII | 30977    | 0       |
|    | [integer]        |                                |                       |                      | (100.0%) | (0.0%)  |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 6  | Element          | 1. Stocks                      | 30977 (100.0%)        | IIIIIIIIIIIIIIIIIIII | 30977    | 0       |
|    | [character]      |                                |                       |                      | (100.0%) | (0.0%)  |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 7  | Item.Code        | Mean (sd) : 1066.5 (9)         | 1057 : 13074 (42.2%)  | IIIIIIII             | 30977    | 0       |
|    | [integer]        | min < med < max:               | 1068 :  6909 (22.3%)  | IIII                 | (100.0%) | (0.0%)  |
|    |                  | 1057 < 1068 < 1083             | 1072 :  4136 (13.4%)  | II                   |          |         |
|    |                  | IQR (CV) : 15 (0)              | 1079 :  5693 (18.4%)  | III                  |          |         |
|    |                  |                                | 1083 :  1165 ( 3.8%)  |                      |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 8  | Item             | 1. Chickens                    | 13074 (42.2%)         | IIIIIIII             | 30977    | 0       |
|    | [character]      | 2. Ducks                       |  6909 (22.3%)         | IIII                 | (100.0%) | (0.0%)  |
|    |                  | 3. Geese and guinea fowls      |  4136 (13.4%)         | II                   |          |         |
|    |                  | 4. Pigeons, other birds        |  1165 ( 3.8%)         |                      |          |         |
|    |                  | 5. Turkeys                     |  5693 (18.4%)         | III                  |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 9  | Year.Code        | Mean (sd) : 1990.6 (16.7)      | 58 distinct values    | . . .   . :   : : :  | 30977    | 0       |
|    | [integer]        | min < med < max:               |                       | : : : . : : : : : :  | (100.0%) | (0.0%)  |
|    |                  | 1961 < 1992 < 2018             |                       | : : : : : : : : : :  |          |         |
|    |                  | IQR (CV) : 29 (0)              |                       | : : : : : : : : : :  |          |         |
|    |                  |                                |                       | : : : : : : : : : :  |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 10 | Year             | Mean (sd) : 1990.6 (16.7)      | 58 distinct values    | . . .   . :   : : :  | 30977    | 0       |
|    | [integer]        | min < med < max:               |                       | : : : . : : : : : :  | (100.0%) | (0.0%)  |
|    |                  | 1961 < 1992 < 2018             |                       | : : : : : : : : : :  |          |         |
|    |                  | IQR (CV) : 29 (0)              |                       | : : : : : : : : : :  |          |         |
|    |                  |                                |                       | : : : : : : : : : :  |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 11 | Unit             | 1. 1000 Head                   | 30977 (100.0%)        | IIIIIIIIIIIIIIIIIIII | 30977    | 0       |
|    | [character]      |                                |                       |                      | (100.0%) | (0.0%)  |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 12 | Value            | Mean (sd) : 99410.6 (720611.4) | 11495 distinct values | :                    | 29941    | 1036    |
|    | [integer]        | min < med < max:               |                       | :                    | (96.7%)  | (3.3%)  |
|    |                  | 0 < 1800 < 23707134            |                       | :                    |          |         |
|    |                  | IQR (CV) : 15233 (7.2)         |                       | :                    |          |         |
|    |                  |                                |                       | :                    |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 13 | Flag             | 1. (Empty string)              | 10773 (34.8%)         | IIIIII               | 30977    | 0       |
|    | [character]      | 2. *                           |  1494 ( 4.8%)         |                      | (100.0%) | (0.0%)  |
|    |                  | 3. A                           |  6488 (20.9%)         | IIII                 |          |         |
|    |                  | 4. F                           | 10007 (32.3%)         | IIIIII               |          |         |
|    |                  | 5. Im                          |  1213 ( 3.9%)         |                      |          |         |
|    |                  | 6. M                           |  1002 ( 3.2%)         |                      |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 14 | Flag.Description | 1. Aggregate, may include of   |  6488 (20.9%)         | IIII                 | 30977    | 0       |
|    | [character]      | 2. Data not available          |  1002 ( 3.2%)         |                      | (100.0%) | (0.0%)  |
|    |                  | 3. FAO data based on imputat   |  1213 ( 3.9%)         |                      |          |         |
|    |                  | 4. FAO estimate                | 10007 (32.3%)         | IIIIII               |          |         |
|    |                  | 5. Official data               | 10773 (34.8%)         | IIIIII               |          |         |
|    |                  | 6. Unofficial figure           |  1494 ( 4.8%)         |                      |          |         |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+

Here, I have used dfsummary for the entire dataset. It provides a clear description of the data.We can see that there are 30977 rows and 15 columns. As had been obseved before, there are 248 area codes. In the composition of birds, we can see that chickens comprise of 42.2% of the total, followed by ducks (22.3%), Geese and Guinea Fowls (13.4%), Pigeons (3.8%) and Turkeys (18.4%). From the flag description, it can be observed that 34.8% of the data have been obtained from official sources, 32.3% are FAO estimates, 20.9% comprise of other aggregation.