Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Vinitha Maheswaran
October 11, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
For this challenge I am reading the “wild_bird_data” data set. Since the wild_bird_data.xlsx data set is in Excel format, I am using the ‘readxl’ package for reading the data. After reading the data is stored in a dataframe “bird_data”. The data in the first row of the dataframe seems like column names rather than numerical values as seen in the remaining rows. I resolve this issue by skipping the first row while reading the data set and storing in dataframe. Now, if I print the dataframe, I can see both the variables have only numerical values of double-precision floating-point format.
# A tibble: 146 × 2
`Wet body weight [g]` `Population size`
<dbl> <dbl>
1 5.46 532194.
2 7.76 3165107.
3 8.64 2592997.
4 10.7 3524193.
5 7.42 389806.
6 9.12 604766.
7 8.04 192361.
8 8.70 250452.
9 8.89 16997.
10 9.52 595.
# … with 136 more rows
tibble [146 × 2] (S3: tbl_df/tbl/data.frame)
$ Wet body weight [g]: num [1:146] 5.46 7.76 8.64 10.69 7.42 ...
$ Population size : num [1:146] 532194 3165107 2592997 3524193 389806 ...
Wet body weight [g] Population size
Min. : 5.459 Min. : 5
1st Qu.: 18.620 1st Qu.: 1821
Median : 69.232 Median : 24353
Mean : 363.694 Mean : 382874
3rd Qu.: 309.826 3rd Qu.: 198515
Max. :9639.845 Max. :5093378
[1] 0
[1] 0
# A tibble: 146 × 2
`Wet body weight [g]` `Population size`
<dbl> <dbl>
1 9640. 3417.
2 4451. 4789.
3 4224. 433.
4 2320. 151.
5 2054. 20661.
6 1368. 9797.
7 1138. 53.9
8 1106. 3975.
9 1064. 107.
10 1042. 1759.
# … with 136 more rows
5.45887180052624 7.41722577905587 7.76456810683605 8.03684333000353
1 1 1 1
8.63858738018464 8.70473119796067 8.89032317828959 9.1169347252776
1 1 1 1
9.51590845877281 10.0837373583453 10.4227948279533 10.6897349302105
1 1 1 1
10.9430490453536 11.0657951888437 11.3325639394677 11.7501765156051
1 1 1 1
11.8338885705264 11.911899237195 12.073867120209 12.9107556641051
1 1 1 1
13.4190149139047 13.7066181013302 14.5348410462378 14.9327931173702
1 1 1 1
15.250338580451 15.4756001614342 15.5437260853068 16.173314711739
1 1 1 1
16.3353638573666 16.713625723242 16.7374033065426 16.8480457069965
1 1 1 1
16.9541655530351 17.7715823185348 18.4587461659311 18.4717033015697
1 1 1 1
18.5902522391181 18.7105850428847 18.7218421193228 19.1013268966518
1 1 1 1
19.3930121740443 19.492233639612 20.4175008347481 20.556216729787
1 1 1 1
21.2700694334402 22.2677519174593 22.3057316395665 22.4321177480063
1 1 1 1
22.689316708056 22.7960529394349 23.9762697808197 24.4422047589553
1 1 1 1
25.2824793404366 27.1737255667256 27.6790862634083 27.8958488030849
1 1 1 1
28.6687569833571 32.2163205913993 33.6261656866144 35.1758377124708
1 1 1 1
35.4203082644676 36.9646386454904 37.5910580962782 39.0641452782309
1 1 1 1
42.8545510609481 43.1315511306876 44.8797938229394 47.9477535298889
1 1 1 1
49.9139486401791 52.312838051431 64.74167603758 66.4645646879698
1 1 1 1
67.0724165442755 71.3921665737811 72.4618812159043 79.3893544290883
1 1 1 1
82.8877457373263 95.6625004989711 102.577037919272 103.351145634621
1 1 1 1
105.251145466066 105.27800534013 110.993242131064 114.163158354383
1 1 1 1
116.289475709079 128.395335803439 128.575653541364 135.486491343145
1 1 1 1
138.513955740119 163.276223677476 173.454892979421 175.840624652506
1 1 1 1
194.303351756182 212.147017049613 221.693442317827 226.218006871375
1 1 1 1
232.254993447017 240.78960679855 251.746484961854 251.762544720258
1 1 1 1
255.963705031502 263.89604533577 265.463867650509 275.131029910782
1 1 1 1
282.288057014247 287.527400278505 293.799587810237 301.332380779176
1 1 1 1
303.545552195872 311.918910823616 345.112605430166 352.281226176494
1 1 1 1
371.889868196688 380.296824961942 393.854923117015 394.224786249175
1 1 1 1
428.549354432903 462.782070038066 479.722220970486 480.965400920035
1 1 1 1
486.856962904678 527.513423559442 530.357342971806 603.577402390915
1 1 1 1
645.449083461034 672.284528540897 684.506509216141 757.205197645473
1 1 1 1
765.921951081732 798.020716987072 820.520151625127 887.34848570896
1 1 1 1
923.172177028805 954.839393219695 1003.03939853867 1008.19886351951
1 1 1 1
1042.06074444654 1064.32682601983 1106.07510035687 1137.96479906865
1 1 1 1
1368.36501582366 2053.74863827143 2320.09569921356 4223.72945322751
1 1 1 1
4450.50815600577 9639.84540069595
1 1
4.91634835467142 6.70868864239466 8.73586159709388 9.31902770511663
1 1 1 1
15.2869871955882 15.4655672951415 22.3522084608936 30.127933260934
1 1 1 1
53.9362035620238 59.0620904204364 59.6194023648399 60.4991412161533
1 1 1 1
72.5770178814215 90.9054052719037 106.902249798449 149.890268699652
1 1 1 1
151.097182999144 208.885343731786 253.181014123717 281.556308988712
1 1 1 1
291.178964028914 385.727617599983 397.264219914397 433.148027881688
1 1 1 1
452.038679090149 495.841172633925 544.171979070565 592.145764114067
1 1 1 1
592.580758810679 595.09393677964 612.028409808944 628.023224840693
1 1 1 1
792.971565300218 864.665387886239 907.937967876158 1162.33256269623
1 1 1 1
1758.76989729032 2007.50380975505 2237.33222320424 3069.67346119425
1 1 1 1
3417.07357811317 3974.59483110465 4110.35378650503 4111.78379795295
1 1 1 1
4165.60888930286 4203.61578270669 4380.25359020738 4596.79968789988
1 1 1 1
4775.8178179398 4788.66353161336 5210.62705096944 5352.59147293765
1 1 1 1
7080.77132481909 8449.49796848251 8621.86411771869 9240.89055513834
1 1 1 1
9360.77252455304 9797.18806953683 10196.0549227289 10942.0508967534
1 1 1 1
12423.3328081174 12525.5435856356 13390.6423878795 15151.7816516081
1 1 1 1
15173.7578211317 16997.4156415239 17889.1504645705 20287.3606591725
1 1 1 1
20660.849081157 21007.4914234587 21110.8595100455 21761.5934166234
1 1 1 1
23056.4777079282 25649.9515609322 25686.6579025093 25700.0638322942
1 1 1 1
30378.3213430714 30656.0846698027 30829.5599923666 32206.460433963
1 1 1 1
34547.4925608766 36108.5800741424 38141.260546798 40057.7978966742
1 1 1 1
40510.0866666354 48588.014888741 49948.4692178362 54219.7070863559
1 1 1 1
57460.3862091927 62266.8051981741 70377.2193951022 73944.9926361407
1 1 1 1
74386.4256983317 76836.5900893805 77389.4460799703 81067.9439550841
1 1 1 1
86286.0274600145 90663.8017792314 98289.3823961815 102770.468623667
1 1 1 1
131929.739792131 143593.859644388 143943.923703849 153007.935788715
1 1 1 1
164390.10921714 192360.511579436 194871.065036599 195278.198371782
1 1 1 1
195591.665828951 199489.109295523 250452.449623033 259613.414523361
1 1 1 1
272074.336929843 279260.298751568 282801.151970821 283748.186271429
1 1 1 1
323395.683066309 331552.480261483 369717.294242925 386610.335143534
1 1 1 1
389806.168891807 405284.31744811 413176.502290094 471555.281780475
1 1 1 1
472595.269474694 532194.395145161 604765.97978904 642164.626619563
1 1 1 1
801278.748155215 815305.139691342 850796.834759033 907294.021838339
1 1 1 1
1217093.59903949 1533110.9951272 1546053.00700726 2020904.68541013
1 1 1 1
2055446.33281584 2503916.1239288 2592996.86778979 3165107.44544653
1 1 1 1
3507479.42637697 3524193.2266336 4131320.19597057 4262042.03813322
1 1 1 1
4812996.90520353 5093377.94709106
1 1
The “wild_bird_data”.xlsx data set is likely gathered from the first figure in a research paper authored by Nee et al as mentioned in the first row of the Excel data before removing it. The “wild_bird_data”.xlsx data set contains 146 observations (rows) and 2 variables (columns). The data set has 2 variables - Wet boday weight [g] and Population size. Both the variables have only numerical values of double-precision floating-point format. The data set represents the average wet body weight in grams and the size of the population for different species of wild birds. This data may be used to estimate the biomass of wild birds. The data set does not have any missing values or null values. The minimum, maximum, mean and median values for the Wet body weight variable are 5.459 [g], 9639.845 [g], 363.694 [g], and 69.232[g] respectively. The minimum, maximum, mean and median values for the Population size variable are 5, 5093378, 82874, and 24353 respectively.
---
title: "Challenge 1 Solutions"
author: "Vinitha Maheswaran"
desription: "Reading in data and creating a post"
date: "10/11/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
For this challenge I am reading the "wild_bird_data" data set. Since the wild_bird_data.xlsx data set is in Excel format, I am using the 'readxl' package for reading the data. After reading the data is stored in a dataframe "bird_data". The data in the first row of the dataframe seems like column names rather than numerical values as seen in the remaining rows. I resolve this issue by skipping the first row while reading the data set and storing in dataframe. Now, if I print the dataframe, I can see both the variables have only numerical values of double-precision floating-point format.
```{r}
#install.packages('readxl')
library(readxl)
# Reading the wild_bird_data.xlsx data set and storing in a data frame
bird_data <- read_excel("_data/wild_bird_data.xlsx", skip=1)
print(bird_data)
```
## Describe the data
```{r}
#Finding dimension of the data set
dim(bird_data)
```
```{r}
#Finding column names
colnames(bird_data)
```
```{r}
#Structure of bird_data
str(bird_data)
```
```{r}
#Summary of bird_data
summary(bird_data)
```
```{r}
#Check for missing/null data in the bird_data
sum(is.na(bird_data))
sum(is.null(bird_data))
```
```{r}
#Arranging the bird_data in descending order of wet body weight
arrange(bird_data,desc(`Wet body weight [g]`))
```
```{r}
#Frequency tables for the variables in bird_data
table(bird_data$`Wet body weight [g]`)
```
```{r}
#Frequency tables for the variables in bird_data
table(bird_data$`Population size`)
```
The "wild_bird_data".xlsx data set is likely gathered from the first figure in a research paper authored by Nee et al as mentioned in the first row of the Excel data before removing it. The "wild_bird_data".xlsx data set contains 146 observations (rows) and 2 variables (columns). The data set has 2 variables - Wet boday weight [g] and Population size. Both the variables have only numerical values of double-precision floating-point format. The data set represents the average wet body weight in grams and the size of the population for different species of wild birds. This data may be used to estimate the biomass of wild birds. The data set does not have any missing values or null values. The minimum, maximum, mean and median values for the Wet body weight variable are 5.459 [g], 9639.845 [g], 363.694 [g], and 69.232[g] respectively. The minimum, maximum, mean and median values for the Population size variable are 5, 5093378, 82874, and 24353 respectively.