Code
library(tidyverse)
::opts_chunk$set(echo = TRUE) knitr
Lai Wei
August 20, 2022
Homework 2: Reading in Data -Read in a dataset from the _data folder in the course blog repository, or choose your own data. If you decide to use one of the datasets we have provided, please use a challenging dataset - check with us if you are not sure. -Clean the data as needed using dplyr and related tidyverse packages. -Provide a narrative about the data set (look it up if you aren’t sure what you have got) and the variables in your dataset, including what type of data each variable is. The goal of this step is to communicate in a visually appealing way to non-experts - not to replicate r-code. -Identify potential research questions that your dataset can help answer
# A tibble: 146 × 2
Body_Weight Population_Size
<dbl> <dbl>
1 5.46 532194.
2 7.76 3165107.
3 8.64 2592997.
4 10.7 3524193.
5 7.42 389806.
6 9.12 604766.
7 8.04 192361.
8 8.70 250452.
9 8.89 16997.
10 9.52 595.
# … with 136 more rows
# ℹ Use `print(n = ...)` to see more rows
[1] "Body_Weight" "Population_Size"
[1] 146 2
# A tibble: 10 × 2
Body_Weight Population_Size
<dbl> <dbl>
1 4451. 4789.
2 4224. 433.
3 2320. 151.
4 1064. 107.
5 1138. 53.9
6 1003. 22.4
7 1042. 1759.
8 1106. 3975.
9 1368. 9797.
10 2054. 20661.
Using filter() function to get cases.
# A tibble: 94 × 2
Body_Weight Population_Size
<dbl> <dbl>
1 27.9 4262042.
2 33.6 2055446.
3 27.2 1546053.
4 28.7 815305.
5 25.3 98289.
6 32.2 209.
7 37.0 70377.
8 71.4 1533111.
9 79.4 4131320.
10 95.7 4812997.
# … with 84 more rows
# ℹ Use `print(n = ...)` to see more rows
Listing Population values in descending order from highest to lowest.
# A tibble: 10 × 2
Body_Weight Population_Size
<dbl> <dbl>
1 10.1 74386.
2 9.52 595.
3 9.12 604766.
4 8.89 16997.
5 8.70 250452.
6 8.64 2592997.
7 8.04 192361.
8 7.76 3165107.
9 7.42 389806.
10 5.46 532194.
---
title: "hw 2"
author: "Lai Wei"
desription: "Howmework 2"
date: "08/20/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw2
- Lai Wei
- dataset
- ggplot2
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE)
```
## Instructions
Homework 2: Reading in Data
-Read in a dataset from the _data folder in the course blog repository, or choose your own data. If you decide to use one of the datasets we have provided, please use a challenging dataset - check with us if you are not sure.
-Clean the data as needed using dplyr and related tidyverse packages.
-Provide a narrative about the data set (look it up if you aren't sure what you have got) and the variables in your dataset, including what type of data each variable is. The goal of this step is to communicate in a visually appealing way to non-experts - not to replicate r-code.
-Identify potential research questions that your dataset can help answer
```{r}
#Import Wild_bird data from pasts
library(readxl)
wild_bird_data <- read_excel("_data/wild_bird_data.xlsx",
skip = 2,
col_names = c("Body_Weight", "Population_Size"))
wild_bird_data
```
## describe the data
```{r}
#Show the colname of Wild_bird_data
colnames(wild_bird_data)
#Get the dimensions of electric
dim(wild_bird_data)
#Get the last 10 rows
tail(wild_bird_data, 10)
```
## Select the Data
Using filter() function to get cases.
```{r}
#Filter the rows in Weight that are greater than 25
filter(wild_bird_data, Body_Weight > 25)
```
## Arrange the Data
Listing Population values in descending order from highest to lowest.
```{r}
#Set the Body_Weight from highest to lowest
Table_1 <- arrange(wild_bird_data, desc(Body_Weight))
#Get the lowest 10 rows
tail(Table_1, 10)
```
## Summary the Data
```{r}
#Get the agerage and median of Body_Weight variable
wild_bird_data %>%
summarise(avg_weight = mean(Body_Weight, na.rm = TRUE),
med_weight = median(Body_Weight, na.rm = TRUE),
)
```