Code
library(tidyverse)
library(dplyr)
library(descr)
library(summarytools)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Aritra Basu
February 28, 2023
I have decided to work on the dataset birds:
Using the package descr, it can be seen that the dataset contains observations on birds from 1976 to 2018. We can see the frequency. The summary statistics, because of the nature of the variables is not very useful here.
Descriptive Statistics
birds
N: 30977
Area.Code Element.Code Item.Code Value Year Year.Code
----------------- ----------- -------------- ----------- ------------- ---------- -----------
Mean 1201.73 5112.00 1066.48 99410.63 1990.58 1990.58
Std.Dev 2099.42 0.00 9.03 720611.42 16.73 16.73
Min 1.00 5112.00 1057.00 0.00 1961.00 1961.00
Q1 79.00 5112.00 1057.00 171.00 1976.00 1976.00
Median 156.00 5112.00 1068.00 1800.00 1992.00 1992.00
Q3 231.00 5112.00 1072.00 15404.00 2005.00 2005.00
Max 5504.00 5112.00 1083.00 23707134.00 2018.00 2018.00
MAD 114.16 0.00 16.31 2647.92 20.76 20.76
IQR 152.00 0.00 15.00 15233.00 29.00 29.00
CV 1.75 0.00 0.01 7.25 0.01 0.01
Skewness 1.43 NaN 0.26 18.18 -0.09 -0.09
SE.Skewness 0.01 0.01 0.01 0.01 0.01 0.01
Kurtosis 0.05 NaN -1.37 424.79 -1.19 -1.19
N.Valid 30977.00 30977.00 30977.00 29941.00 30977.00 30977.00
Pct.Valid 100.00 100.00 100.00 96.66 100.00 100.00
Using n_distinct, it can be observed that there are 248 distinct area codes.
Next, we could look at a cross sectional component by looking at the dataset for a particular year. Say, we want to filter chickens in 1962, and looking at area, items and value.
Area Item Value Year
1 Afghanistan Chickens 4900 1962
2 Albania Chickens 1677 1962
3 Algeria Chickens 8600 1962
4 American Samoa Chickens 80 1962
5 Angola Chickens 3500 1962
6 Antigua and Barbuda Chickens 44 1962
7 Argentina Chickens 38000 1962
8 Australia Chickens 20290 1962
9 Austria Chickens 9943 1962
10 Bahamas Chickens 580 1962
11 Bahrain Chickens 80 1962
12 Bangladesh Chickens 20000 1962
13 Barbados Chickens 319 1962
14 Belgium-Luxembourg Chickens 31177 1962
15 Belize Chickens 150 1962
16 Benin Chickens 5000 1962
17 Bermuda Chickens 56 1962
18 Bhutan Chickens 75 1962
19 Bolivia (Plurinational State of) Chickens 5200 1962
20 Botswana Chickens 244 1962
21 Brazil Chickens 137430 1962
22 Brunei Darussalam Chickens 160 1962
23 Bulgaria Chickens 21574 1962
24 Burkina Faso Chickens 8900 1962
25 Burundi Chickens 1200 1962
26 Cabo Verde Chickens 42 1962
27 Cambodia Chickens 2315 1962
28 Cameroon Chickens 3000 1962
29 Canada Chickens 65031 1962
30 Central African Republic Chickens 800 1962
31 Chad Chickens 2500 1962
32 Chile Chickens 9630 1962
33 China, Hong Kong SAR Chickens 3251 1962
34 China, Macao SAR Chickens 230 1962
35 China, mainland Chickens 543000 1962
36 China, Taiwan Province of Chickens 7942 1962
37 Colombia Chickens 19750 1962
38 Comoros Chickens 190 1962
39 Congo Chickens 580 1962
40 Cook Islands Chickens 50 1962
41 Costa Rica Chickens 2361 1962
42 Côte d'Ivoire Chickens 4380 1962
43 Cuba Chickens 7500 1962
44 Cyprus Chickens 1500 1962
45 Czechoslovakia Chickens 28507 1962
46 Democratic People's Republic of Korea Chickens 11500 1962
47 Democratic Republic of the Congo Chickens 4400 1962
48 Denmark Chickens 29047 1962
49 Dominica Chickens 54 1962
50 Dominican Republic Chickens 6400 1962
51 Ecuador Chickens 4380 1962
52 Egypt Chickens 22096 1962
53 El Salvador Chickens 1878 1962
54 Equatorial Guinea Chickens 77 1962
55 Eswatini Chickens 334 1962
56 Ethiopia PDR Chickens 41000 1962
57 Falkland Islands (Malvinas) Chickens 1 1962
58 Fiji Chickens 200 1962
59 Finland Chickens 6509 1962
60 France Chickens 170000 1962
61 French Guyana Chickens 35 1962
62 French Polynesia Chickens 10 1962
63 Gabon Chickens 270 1962
64 Gambia Chickens 180 1962
65 Germany Chickens 101716 1962
66 Ghana Chickens 6005 1962
67 Greece Chickens 15030 1962
68 Grenada Chickens 151 1962
69 Guadeloupe Chickens 160 1962
70 Guam Chickens 90 1962
71 Guatemala Chickens 7330 1962
72 Guinea Chickens 3200 1962
73 Guinea-Bissau Chickens 200 1962
74 Guyana Chickens 2100 1962
75 Haiti Chickens 2600 1962
76 Honduras Chickens 2367 1962
77 Hungary Chickens 26245 1962
78 Iceland Chickens 95 1962
79 India Chickens 107200 1962
80 Indonesia Chickens 60500 1962
81 Iran (Islamic Republic of) Chickens 24382 1962
82 Iraq Chickens 7100 1962
83 Ireland Chickens 6954 1962
84 Israel Chickens 11500 1962
85 Italy Chickens 93000 1962
86 Jamaica Chickens 1899 1962
87 Japan Chickens 90669 1962
88 Jordan Chickens 1500 1962
89 Kenya Chickens 8300 1962
90 Kiribati Chickens 103 1962
91 Kuwait Chickens 2000 1962
92 Lao People's Democratic Republic Chickens 6000 1962
93 Lebanon Chickens 3900 1962
94 Lesotho Chickens 657 1962
95 Liberia Chickens 1250 1962
96 Libya Chickens 439 1962
97 Liechtenstein Chickens NA 1962
98 Madagascar Chickens 10200 1962
99 Malawi Chickens 3000 1962
100 Malaysia Chickens 22500 1962
101 Mali Chickens 10545 1962
102 Malta Chickens 526 1962
103 Martinique Chickens 300 1962
104 Mauritania Chickens 1850 1962
105 Mauritius Chickens 400 1962
106 Mexico Chickens 64172 1962
107 Mongolia Chickens 139 1962
108 Montserrat Chickens 24 1962
109 Morocco Chickens 13170 1962
110 Mozambique Chickens 4300 1962
111 Myanmar Chickens 7756 1962
112 Namibia Chickens 234 1962
113 Nauru Chickens 2 1962
114 Nepal Chickens 4300 1962
115 Netherlands Chickens 45890 1962
116 Netherlands Antilles (former) Chickens 56 1962
117 New Caledonia Chickens 139 1962
118 New Zealand Chickens 4500 1962
119 Nicaragua Chickens 2062 1962
120 Niger Chickens 4850 1962
121 Nigeria Chickens 38710 1962
122 Niue Chickens 11 1962
123 Norway Chickens 4745 1962
124 Oman Chickens 315 1962
125 Pacific Islands Trust Territory Chickens 106 1962
126 Pakistan Chickens 12000 1962
127 Palestine Chickens NA 1962
128 Panama Chickens 2410 1962
129 Papua New Guinea Chickens 746 1962
130 Paraguay Chickens 5384 1962
131 Peru Chickens 21000 1962
132 Philippines Chickens 51354 1962
133 Poland Chickens 69525 1962
134 Portugal Chickens 8900 1962
135 Puerto Rico Chickens 3532 1962
136 Qatar Chickens 22 1962
137 Republic of Korea Chickens 13047 1962
138 Réunion Chickens 1500 1962
139 Romania Chickens 38192 1962
140 Rwanda Chickens 400 1962
141 Saint Helena, Ascension and Tristan da Cunha Chickens 10 1962
142 Saint Kitts and Nevis Chickens 48 1962
143 Saint Lucia Chickens 45 1962
144 Saint Vincent and the Grenadines Chickens 30 1962
145 Samoa Chickens 450 1962
146 Sao Tome and Principe Chickens 36 1962
147 Saudi Arabia Chickens 1550 1962
148 Senegal Chickens 2900 1962
149 Seychelles Chickens 45 1962
150 Sierra Leone Chickens 2000 1962
151 Singapore Chickens 2000 1962
152 Solomon Islands Chickens 100 1962
153 Somalia Chickens 1400 1962
154 South Africa Chickens 19000 1962
155 Spain Chickens 36248 1962
156 Sri Lanka Chickens 3765 1962
157 Sudan (former) Chickens 10000 1962
158 Suriname Chickens 3000 1962
159 Sweden Chickens 10061 1962
160 Switzerland Chickens 5880 1962
161 Syrian Arab Republic Chickens 3356 1962
162 Thailand Chickens 41192 1962
163 Timor-Leste Chickens 530 1962
164 Togo Chickens 1028 1962
165 Tokelau Chickens 2 1962
166 Tonga Chickens 55 1962
167 Trinidad and Tobago Chickens 2800 1962
168 Tunisia Chickens 5200 1962
169 Turkey Chickens 26116 1962
170 Tuvalu Chickens 10 1962
171 Uganda Chickens 6700 1962
172 United Arab Emirates Chickens 40 1962
173 United Kingdom of Great Britain and Northern Ireland Chickens 104288 1962
174 United Republic of Tanzania Chickens 7200 1962
175 United States of America Chickens 782000 1962
176 United States Virgin Islands Chickens 18 1962
177 Uruguay Chickens 4800 1962
178 USSR Chickens 488300 1962
179 Vanuatu Chickens 95 1962
180 Venezuela (Bolivarian Republic of) Chickens 13696 1962
181 Viet Nam Chickens 37100 1962
182 Wallis and Futuna Islands Chickens 20 1962
183 Yemen Chickens 2340 1962
184 Yugoslav SFR Chickens 24908 1962
185 Zambia Chickens 4200 1962
186 Zimbabwe Chickens 6600 1962
187 World Chickens 4048728 1962
188 Africa Chickens 282821 1962
189 Eastern Africa Chickens 96635 1962
190 Middle Africa Chickens 15163 1962
191 Northern Africa Chickens 59505 1962
192 Southern Africa Chickens 20469 1962
193 Western Africa Chickens 91049 1962
194 Americas Chickens 1220784 1962
195 Northern America Chickens 847087 1962
196 Central America Chickens 82730 1962
197 Caribbean Chickens 26560 1962
198 South America Chickens 264407 1962
199 Asia Chickens 1139125 1962
200 Eastern Asia Chickens 669777 1962
201 Southern Asia Chickens 176622 1962
202 South-eastern Asia Chickens 231407 1962
203 Western Asia Chickens 61319 1962
204 Europe Chickens 1378938 1962
205 Eastern Europe Chickens 672343 1962
206 Northern Europe Chickens 161699 1962
207 Southern Europe Chickens 180289 1962
208 Western Europe Chickens 364607 1962
209 Oceania Chickens 27059 1962
210 Australia and New Zealand Chickens 24790 1962
211 Melanesia Chickens 1280 1962
212 Micronesia Chickens 301 1962
213 Polynesia Chickens 688 1962
Next, I identify the missing values for 1962.
Area Element Item Value
1 Albania Stocks Ducks NA
2 Albania Stocks Geese and guinea fowls NA
3 Albania Stocks Turkeys NA
4 Barbados Stocks Turkeys NA
5 Chile Stocks Turkeys NA
6 Democratic People's Republic of Korea Stocks Ducks NA
7 Finland Stocks Turkeys NA
8 Liechtenstein Stocks Chickens NA
9 Morocco Stocks Turkeys NA
10 Mozambique Stocks Geese and guinea fowls NA
11 Mozambique Stocks Turkeys NA
12 Namibia Stocks Pigeons, other birds NA
13 Norway Stocks Ducks NA
14 Norway Stocks Turkeys NA
15 Palestine Stocks Chickens NA
16 Saint Pierre and Miquelon Stocks Ducks NA
17 Switzerland Stocks Geese and guinea fowls NA
18 Switzerland Stocks Turkeys NA
19 Southern Africa Stocks Pigeons, other birds NA
Year
1 1962
2 1962
3 1962
4 1962
5 1962
6 1962
7 1962
8 1962
9 1962
10 1962
11 1962
12 1962
13 1962
14 1962
15 1962
16 1962
17 1962
18 1962
19 1962
Trying another function: dfsummary. This function is provided by the package summary tools.
Data Frame Summary
birds
Dimensions: 30977 x 14
Duplicates: 0
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing |
+====+==================+================================+=======================+======================+==========+=========+
| 1 | Domain.Code | 1. QA | 30977 (100.0%) | IIIIIIIIIIIIIIIIIIII | 30977 | 0 |
| | [character] | | | | (100.0%) | (0.0%) |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 2 | Domain | 1. Live Animals | 30977 (100.0%) | IIIIIIIIIIIIIIIIIIII | 30977 | 0 |
| | [character] | | | | (100.0%) | (0.0%) |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 3 | Area.Code | Mean (sd) : 1201.7 (2099.4) | 248 distinct values | : | 30977 | 0 |
| | [integer] | min < med < max: | | : | (100.0%) | (0.0%) |
| | | 1 < 156 < 5504 | | : | | |
| | | IQR (CV) : 152 (1.7) | | : . | | |
| | | | | : : | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 4 | Area | 1. Africa | 290 ( 0.9%) | | 30977 | 0 |
| | [character] | 2. Asia | 290 ( 0.9%) | | (100.0%) | (0.0%) |
| | | 3. Eastern Asia | 290 ( 0.9%) | | | |
| | | 4. Egypt | 290 ( 0.9%) | | | |
| | | 5. Europe | 290 ( 0.9%) | | | |
| | | 6. France | 290 ( 0.9%) | | | |
| | | 7. Greece | 290 ( 0.9%) | | | |
| | | 8. Myanmar | 290 ( 0.9%) | | | |
| | | 9. Northern Africa | 290 ( 0.9%) | | | |
| | | 10. South-eastern Asia | 290 ( 0.9%) | | | |
| | | [ 238 others ] | 28077 (90.6%) | IIIIIIIIIIIIIIIIII | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 5 | Element.Code | 1 distinct value | 5112 : 30977 (100.0%) | IIIIIIIIIIIIIIIIIIII | 30977 | 0 |
| | [integer] | | | | (100.0%) | (0.0%) |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 6 | Element | 1. Stocks | 30977 (100.0%) | IIIIIIIIIIIIIIIIIIII | 30977 | 0 |
| | [character] | | | | (100.0%) | (0.0%) |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 7 | Item.Code | Mean (sd) : 1066.5 (9) | 1057 : 13074 (42.2%) | IIIIIIII | 30977 | 0 |
| | [integer] | min < med < max: | 1068 : 6909 (22.3%) | IIII | (100.0%) | (0.0%) |
| | | 1057 < 1068 < 1083 | 1072 : 4136 (13.4%) | II | | |
| | | IQR (CV) : 15 (0) | 1079 : 5693 (18.4%) | III | | |
| | | | 1083 : 1165 ( 3.8%) | | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 8 | Item | 1. Chickens | 13074 (42.2%) | IIIIIIII | 30977 | 0 |
| | [character] | 2. Ducks | 6909 (22.3%) | IIII | (100.0%) | (0.0%) |
| | | 3. Geese and guinea fowls | 4136 (13.4%) | II | | |
| | | 4. Pigeons, other birds | 1165 ( 3.8%) | | | |
| | | 5. Turkeys | 5693 (18.4%) | III | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 9 | Year.Code | Mean (sd) : 1990.6 (16.7) | 58 distinct values | . . . . : : : : | 30977 | 0 |
| | [integer] | min < med < max: | | : : : . : : : : : : | (100.0%) | (0.0%) |
| | | 1961 < 1992 < 2018 | | : : : : : : : : : : | | |
| | | IQR (CV) : 29 (0) | | : : : : : : : : : : | | |
| | | | | : : : : : : : : : : | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 10 | Year | Mean (sd) : 1990.6 (16.7) | 58 distinct values | . . . . : : : : | 30977 | 0 |
| | [integer] | min < med < max: | | : : : . : : : : : : | (100.0%) | (0.0%) |
| | | 1961 < 1992 < 2018 | | : : : : : : : : : : | | |
| | | IQR (CV) : 29 (0) | | : : : : : : : : : : | | |
| | | | | : : : : : : : : : : | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 11 | Unit | 1. 1000 Head | 30977 (100.0%) | IIIIIIIIIIIIIIIIIIII | 30977 | 0 |
| | [character] | | | | (100.0%) | (0.0%) |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 12 | Value | Mean (sd) : 99410.6 (720611.4) | 11495 distinct values | : | 29941 | 1036 |
| | [integer] | min < med < max: | | : | (96.7%) | (3.3%) |
| | | 0 < 1800 < 23707134 | | : | | |
| | | IQR (CV) : 15233 (7.2) | | : | | |
| | | | | : | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 13 | Flag | 1. (Empty string) | 10773 (34.8%) | IIIIII | 30977 | 0 |
| | [character] | 2. * | 1494 ( 4.8%) | | (100.0%) | (0.0%) |
| | | 3. A | 6488 (20.9%) | IIII | | |
| | | 4. F | 10007 (32.3%) | IIIIII | | |
| | | 5. Im | 1213 ( 3.9%) | | | |
| | | 6. M | 1002 ( 3.2%) | | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
| 14 | Flag.Description | 1. Aggregate, may include of | 6488 (20.9%) | IIII | 30977 | 0 |
| | [character] | 2. Data not available | 1002 ( 3.2%) | | (100.0%) | (0.0%) |
| | | 3. FAO data based on imputat | 1213 ( 3.9%) | | | |
| | | 4. FAO estimate | 10007 (32.3%) | IIIIII | | |
| | | 5. Official data | 10773 (34.8%) | IIIIII | | |
| | | 6. Unofficial figure | 1494 ( 4.8%) | | | |
+----+------------------+--------------------------------+-----------------------+----------------------+----------+---------+
Here, I have used dfsummary for the entire dataset. It provides a clear description of the data.We can see that there are 30977 rows and 15 columns. As had been obseved before, there are 248 area codes. In the composition of birds, we can see that chickens comprise of 42.2% of the total, followed by ducks (22.3%), Geese and Guinea Fowls (13.4%), Pigeons (3.8%) and Turkeys (18.4%). From the flag description, it can be observed that 34.8% of the data have been obtained from official sources, 32.3% are FAO estimates, 20.9% comprise of other aggregation.
---
title: "Challenge 1"
author: "Aritra Basu"
desription: "Reading in data and creating a post"
date: "02/28/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- Aritra Basu
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(dplyr)
library(descr)
library(summarytools)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Reading the Data
I have decided to work on the dataset birds:
```{r}
library(readr)
birds <- read.csv("_data/birds.csv")
```
## Describing the data
Using the package descr, it can be seen that the dataset contains observations on birds from 1976 to 2018. We can see the frequency. The summary statistics, because of the nature of the variables is not very useful here.
```{r}
#install.packages('descr')
#library(descr)
descr(birds)
```
Using n_distinct, it can be observed that there are 248 distinct area codes.
```{r}
birds%>%
select(Area.Code)%>%
n_distinct(.)
```
Next, we could look at a cross sectional component by looking at the dataset for a particular year. Say, we want to filter chickens in 1962, and looking at area, items and value.
```{r}
birds%>%
select(Area, Item, Value, Year)%>%
filter(Year=="1962", Item=="Chickens")
```
Next, I identify the missing values for 1962.
```{r}
birds%>%
select(Area, Element, Item, Value, Year)%>%
filter(Year == "1962")%>%
filter(is.na(Value))
```
Trying another function: dfsummary. This function is provided by the package summary tools.
```{r}
#install.packages('summarytools')
#library(summarytools)
dfSummary(birds, style="grid")
```
Here, I have used dfsummary for the entire dataset. It provides a clear description of the data.We can see that there are 30977 rows and 15 columns. As had been obseved before, there are 248 area codes. In the composition of birds, we can see that chickens comprise of 42.2% of the total, followed by ducks (22.3%), Geese and Guinea Fowls (13.4%), Pigeons (3.8%) and Turkeys (18.4%). From the flag description, it can be observed that 34.8% of the data have been obtained from official sources, 32.3% are FAO estimates, 20.9% comprise of other aggregation.