hw2
Lai Wei
dataset
ggplot2
Author

Lai Wei

Published

August 20, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Instructions

Homework 2: Reading in Data -Read in a dataset from the _data folder in the course blog repository, or choose your own data. If you decide to use one of the datasets we have provided, please use a challenging dataset - check with us if you are not sure. -Clean the data as needed using dplyr and related tidyverse packages. -Provide a narrative about the data set (look it up if you aren’t sure what you have got) and the variables in your dataset, including what type of data each variable is. The goal of this step is to communicate in a visually appealing way to non-experts - not to replicate r-code. -Identify potential research questions that your dataset can help answer

Code
#Import Wild_bird data from pasts
library(readxl)
wild_bird_data <- read_excel("_data/wild_bird_data.xlsx",
                             skip = 2,
                             col_names = c("Body_Weight", "Population_Size"))
wild_bird_data
# A tibble: 146 × 2
   Body_Weight Population_Size
         <dbl>           <dbl>
 1        5.46         532194.
 2        7.76        3165107.
 3        8.64        2592997.
 4       10.7         3524193.
 5        7.42         389806.
 6        9.12         604766.
 7        8.04         192361.
 8        8.70         250452.
 9        8.89          16997.
10        9.52            595.
# … with 136 more rows
# ℹ Use `print(n = ...)` to see more rows

describe the data

Code
#Show the colname of Wild_bird_data
colnames(wild_bird_data)
[1] "Body_Weight"     "Population_Size"
Code
#Get the dimensions of electric
dim(wild_bird_data)
[1] 146   2
Code
#Get the last 10 rows
tail(wild_bird_data, 10)
# A tibble: 10 × 2
   Body_Weight Population_Size
         <dbl>           <dbl>
 1       4451.          4789. 
 2       4224.           433. 
 3       2320.           151. 
 4       1064.           107. 
 5       1138.            53.9
 6       1003.            22.4
 7       1042.          1759. 
 8       1106.          3975. 
 9       1368.          9797. 
10       2054.         20661. 

Select the Data

Using filter() function to get cases.

Code
#Filter the rows in Weight that are greater than 25
   filter(wild_bird_data, Body_Weight > 25)
# A tibble: 94 × 2
   Body_Weight Population_Size
         <dbl>           <dbl>
 1        27.9        4262042.
 2        33.6        2055446.
 3        27.2        1546053.
 4        28.7         815305.
 5        25.3          98289.
 6        32.2            209.
 7        37.0          70377.
 8        71.4        1533111.
 9        79.4        4131320.
10        95.7        4812997.
# … with 84 more rows
# ℹ Use `print(n = ...)` to see more rows

Arrange the Data

Listing Population values in descending order from highest to lowest.

Code
#Set the Body_Weight from highest to lowest
Table_1 <- arrange(wild_bird_data, desc(Body_Weight)) 
#Get the lowest 10 rows
tail(Table_1, 10)
# A tibble: 10 × 2
   Body_Weight Population_Size
         <dbl>           <dbl>
 1       10.1           74386.
 2        9.52            595.
 3        9.12         604766.
 4        8.89          16997.
 5        8.70         250452.
 6        8.64        2592997.
 7        8.04         192361.
 8        7.76        3165107.
 9        7.42         389806.
10        5.46         532194.

Summary the Data

Code
#Get the agerage and median of Body_Weight variable
wild_bird_data %>%
  summarise(avg_weight = mean(Body_Weight, na.rm = TRUE),
            med_weight = median(Body_Weight, na.rm = TRUE),
            )
# A tibble: 1 × 2
  avg_weight med_weight
       <dbl>      <dbl>
1       364.       69.2