DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in the Data
  • Describe the data

Challenge 1

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Author

Siddharth Nammara Kalyana Raman

Published

November 27, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
library(readxl)
wild_birdData<-read_excel("_data/wild_bird_data.xlsx",skip=1)
wild_birdData
# A tibble: 146 × 2
   `Wet body weight [g]` `Population size`
                   <dbl>             <dbl>
 1                  5.46           532194.
 2                  7.76          3165107.
 3                  8.64          2592997.
 4                 10.7           3524193.
 5                  7.42           389806.
 6                  9.12           604766.
 7                  8.04           192361.
 8                  8.70           250452.
 9                  8.89            16997.
10                  9.52              595.
# … with 136 more rows

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

##Dimensions and Data Extraction

Code
dim(wild_birdData)
[1] 146   2

The given data is in the form of excel and is read using the readxl library. The data is stored in wild_birdData. The first row of the actual dataset shows reference and from where the data is being extracted which is not relevant for our study. Hence, the first row is skipped. The dataset consists of two columns ‘bodyweight’ and ‘population_size’. The dimensions of the dataset is [146,2].

#null check

Code
is.null(wild_birdData)
[1] FALSE
Code
summary(wild_birdData)
 Wet body weight [g] Population size  
 Min.   :   5.459    Min.   :      5  
 1st Qu.:  18.620    1st Qu.:   1821  
 Median :  69.232    Median :  24353  
 Mean   : 363.694    Mean   : 382874  
 3rd Qu.: 309.826    3rd Qu.: 198515  
 Max.   :9639.845    Max.   :5093378  

The wild birds dataset consists of two columns or attributes ‘Wet body Weight’ and ‘Population Size’. We have checked that there are no null values in the dataset. If we have null values in the data set we need to preprocess it before going to further evaluations. Moreover, the birds with less body weight are low in number and the birds that are heavier are greater in number. From this we can infer that the population of the birds are directly proportional to their body weight. All the other statistical information about the dataset is shown above.

Source Code
---
title: "Challenge 1"
author: "Siddharth Nammara Kalyana Raman"
desription: "Reading in data and creating a post"
date: "11/27/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to

1)  read in a dataset, and

2)  describe the dataset using both words and any supporting information (e.g., tables, etc)

## Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

-   railroad_2012_clean_county.csv ⭐
-   birds.csv ⭐⭐
-   FAOstat\*.csv ⭐⭐
-   wild_bird_data.xlsx ⭐⭐⭐
-   StateCounty2012.xls ⭐⭐⭐⭐

Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.

```{r}
library(readxl)
wild_birdData<-read_excel("_data/wild_bird_data.xlsx",skip=1)
wild_birdData
```

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.




## Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

##Dimensions and Data Extraction

```{r}
dim(wild_birdData)
```

The given data is in the form of excel and is read using the readxl library. The data is stored in wild_birdData. The first row of the actual dataset shows reference and from where the data is being extracted which is not relevant for our study. Hence, the first row is skipped. The dataset consists of two columns 'bodyweight' and 'population_size'. The dimensions of the dataset is [146,2].

#null check

```{r}
is.null(wild_birdData)
```

```{r}
#| label: summary
summary(wild_birdData)

```

The wild birds dataset consists of two columns or attributes 'Wet body Weight' and 'Population Size'. We have checked that there are no null values in the dataset. If we have null values in the data set we need to preprocess it before going to further evaluations. Moreover, the birds with less body weight are low in number and the birds that are heavier are greater in number. From this we can infer that the population of the birds are directly proportional to their body weight. All the other statistical information about the dataset is shown above.