Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Vishnupriya Varadharaju
October 12, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
# A tibble: 6 × 2
body_weight pop_size
<dbl> <dbl>
1 5.46 532194.
2 7.76 3165107.
3 8.64 2592997.
4 10.7 3524193.
5 7.42 389806.
6 9.12 604766.
[1] 146 2
[1] "body_weight" "pop_size"
The data has been read from the excel file and is stored in a variable named wild_bird. It consists of 2 columns and 146 rows. Each observation seems to correspond to a particular species of bird. The first column corresponds to the body weight of the bird in grams and the second column corresponds to the size of the population of that particular species.
# A tibble: 6 × 2
body_weight pop_size
<dbl> <dbl>
1 5.46 532194.
2 7.42 389806.
3 7.76 3165107.
4 8.04 192361.
5 8.64 2592997.
6 8.70 250452.
[1] FALSE
tibble [146 × 2] (S3: tbl_df/tbl/data.frame)
$ body_weight: num [1:146] 5.46 7.42 7.76 8.04 8.64 ...
$ pop_size : num [1:146] 532194 389806 3165107 192361 2592997 ...
# A tibble: 1 × 14
body_weight_…¹ pop_s…² body_…³ pop_s…⁴ body_…⁵ pop_s…⁶ body_…⁷ pop_s…⁸ body_…⁹
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 364. 382874. 69.2 24353. 5.46 4.92 9640. 5.09e6 984.
# … with 5 more variables: pop_size_sd <dbl>, body_weight_var <dbl>,
# pop_size_var <dbl>, body_weight_IQR <dbl>, pop_size_IQR <dbl>, and
# abbreviated variable names ¹body_weight_mean, ²pop_size_mean,
# ³body_weight_median, ⁴pop_size_median, ⁵body_weight_min, ⁶pop_size_min,
# ⁷body_weight_max, ⁸pop_size_max, ⁹body_weight_sd
The wild birds data here consists of the body weight and the population size of different species. There is a good chance that this dataset was collected for research purposes by scientists. It could include bird species from different regions like marshlands, tropics, deserts etc. The population size can tell us if whether the species are endangered, vulnerable or threatened. Furthermore, from the body weight we can also know about the build of each specie and the quantity of food that it might need to survive. This all numerical data set does not have any null values. The descriptive stats with mean, median, min, max, standard deviation, variance and inter-quartile range for the dataset is seen above.
---
title: "Challenge 1 Solutions"
author: "Vishnupriya Varadharaju"
desription: "Reading in data and creating a post"
date: "10/12/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
### Working with the Wild Birds Dataset
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## 1. Read in the Data
```{r}
library("readxl")
# Reading in the data set such that the first row is skipped and the columns
# are renamed
wild_bird <- read_excel("_data/wild_bird_data.xlsx", skip=2, col_names=c('body_weight','pop_size'))
head(wild_bird)
```
```{r}
# To show the columns and the dimensions of the data
dim(wild_bird)
colnames(wild_bird)
```
The data has been read from the excel file and is stored in a variable named wild_bird. It consists of 2 columns and 146 rows. Each observation seems to correspond to a particular species of bird. The first column corresponds to the body weight of the bird in grams and the second column corresponds to the size of the population of that particular species.
## 2. Describe the data
```{r}
#| label: summary
# Arranging the data in ascending order of body_weights
wild_bird <- arrange(wild_bird, body_weight)
head(wild_bird)
# Checking for Null values
is.null(wild_bird)
#Checking datatype of the two columns
str(wild_bird)
# As the two columns are numerical data, we can use summarize all to get a high
# descriptive statistics of the data
summarize_all(wild_bird, list(mean=mean, median=median, min=min, max=max, sd=sd, var=var, IQR=IQR))
```
The wild birds data here consists of the body weight and the population size of different species. There is a good chance that this dataset was collected for research purposes by scientists. It could include bird species from different regions like marshlands, tropics, deserts etc. The population size can tell us if whether the species are endangered, vulnerable or threatened. Furthermore, from the body weight we can also know about the build of each specie and the quantity of food that it might need to survive.
This all numerical data set does not have any null values. The descriptive stats with mean, median, min, max, standard deviation, variance and inter-quartile range for the dataset is seen above.