DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in the Data
  • Describe the data

Challenge 1

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Author

Saisrinivas Ambatipudi

Published

October 13, 2022

Code
library(tidyverse)
library('readxl')
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
wildbird <- read_excel("_data/wild_bird_data.xlsx")
wildbird
# A tibble: 147 × 2
   Reference           `Taken from Figure 1 of Nee et al.`
   <chr>               <chr>                              
 1 Wet body weight [g] Population size                    
 2 5.45887180052624    532194.395145161                   
 3 7.76456810683605    3165107.44544653                   
 4 8.63858738018464    2592996.86778979                   
 5 10.6897349302105    3524193.2266336                    
 6 7.41722577905587    389806.168891807                   
 7 9.1169347252776     604765.97978904                    
 8 8.03684333000353    192360.511579436                   
 9 8.70473119796067    250452.449623033                   
10 8.89032317828959    16997.4156415239                   
# … with 137 more rows

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

The data has been extracted paper by Nee et al and there are 146 rows and 2 columns. The first column indicates wet body difference and the second column indicates the population of the birds in the wild. Both columns have floating point values. The data set does not have missing values or null values but when running the summary function we get two NA’s as the as.numeric function converted the first row in numeric values. The minimum value is 5 and the maximum value is 5093378 The mean and median is 191619 and 491 respectively.

Code
suppressWarnings(x <- as.numeric(unlist(wild_bird_data)))
Error in unlist(wild_bird_data): object 'wild_bird_data' not found
Code
summary(x)
Error in summary(x): object 'x' not found
Source Code
---
title: "Challenge 1"
author: "Saisrinivas Ambatipudi"
desription: "Reading in data and creating a post"
date: "10/13/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library('readxl')
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to

1)  read in a dataset, and

2)  describe the dataset using both words and any supporting information (e.g., tables, etc)

## Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

-   railroad_2012_clean_county.csv ⭐
-   birds.csv ⭐⭐
-   FAOstat\*.csv ⭐⭐
-   wild_bird_data.xlsx ⭐⭐⭐
-   StateCounty2012.xls ⭐⭐⭐⭐

Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.

```{r}
wildbird <- read_excel("_data/wild_bird_data.xlsx")
wildbird
```

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

## Describe the data

The data has been extracted paper by Nee et al and there are 146 rows and 2 columns. 
The first column indicates wet body difference and the second column indicates the population of the birds in the wild.
Both columns have floating point values.
The data set does not have missing values or null values but when running the summary function we get two NA's as the 
as.numeric function converted the first row in numeric values.
The minimum value is 5 and the maximum value is 5093378
The mean and median is 191619 and 491 respectively.

```{r}
suppressWarnings(x <- as.numeric(unlist(wild_bird_data)))
summary(x)
```