Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Paritosh Gandhi
March 28, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
_data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.The wild bird data conatins 2 columns and 147 entries. The first columns consists of wet body weights of the birds in grams and and second columns consists of population size.The dataset is different as the reference column which provides the information about wet body weight in grams is stored in the form of character which is converted into numeric form using the “as.numeric” function The minimum wet body weight in grams is 5.45 gms and the mean being 363.74 gms while the max is 9639.84
# A tibble: 3 × 2
`Wet body weight [g]` `Population size`
<dbl> <dbl>
1 9640. 3417.
2 4451. 4789.
3 4224. 433.
Data Frame Summary
df
Dimensions: 146 x 2
Duplicates: 0
-------------------------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- --------------------- ------------------------------- --------------------- ------- ---------- ---------
1 Wet body weight [g] Mean (sd) : 363.7 (983.5) 146 distinct values : 146 0
[numeric] min < med < max: : (100.0%) (0.0%)
5.5 < 69.2 < 9639.8 :
IQR (CV) : 291.2 (2.7) :
: .
2 Population size Mean (sd) : 382874 (951938.7) 146 distinct values : 146 0
[numeric] min < med < max: : (100.0%) (0.0%)
4.9 < 24353.2 < 5093378 :
IQR (CV) : 196693.8 (2.5) :
: .
-------------------------------------------------------------------------------------------------------------
---
title: "Challenge 1 Paritosh"
author: "Paritosh Gandhi"
description: "Challenge_1_Final"
date: "03/28/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
## Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
------------------------------------------------------------------------
```{r}
library(readxl)
library(tidyverse)
library(summarytools)
df = read_excel("_data/wild_bird_data.xlsx", skip=1)
## instead os using skip we can also use the below line of code
#df <- df[2:147,]
```
## Describe the data
The wild bird data conatins 2 columns and 147 entries. The first columns consists of wet body weights of the birds in grams and and second columns consists of population size.The dataset is different as the reference column which provides the information about wet body weight in grams is stored in the form of character which is converted into numeric form using the "as.numeric" function The minimum wet body weight in grams is 5.45 gms and the mean being 363.74 gms while the max is 9639.84
## ***Using as.numeric(), Min, Max,Mean***
```{r}
#| label: summary
min(as.numeric(df$`Wet body weight [g]`), na.rm = T)
mean(as.numeric(df$`Wet body weight [g]`), na.rm = T)
max(as.numeric(df$`Wet body weight [g]`), na.rm = T)
```
## ***Using Select()***
```{r}
df %>%
select(`Wet body weight [g]`) %>%
n_distinct()
```
## ***Using Filter()***
```{r}
df %>% filter(`Wet body weight [g]` > 3000)
```
- summary(df)
```{r}
dfSummary(df)
```
------------------------------------------------------------------------