DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in the Data
  • Display the dimensions of the data
  • Describe the data

Challenge 1

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Author

Shriya Sehgal

Published

November 1, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Code
library(readxl)
dataset<-read_excel("_data/wild_bird_data.xlsx", skip=1)
dataset
# A tibble: 146 × 2
   `Wet body weight [g]` `Population size`
                   <dbl>             <dbl>
 1                  5.46           532194.
 2                  7.76          3165107.
 3                  8.64          2592997.
 4                 10.7           3524193.
 5                  7.42           389806.
 6                  9.12           604766.
 7                  8.04           192361.
 8                  8.70           250452.
 9                  8.89            16997.
10                  9.52              595.
# … with 136 more rows

Display the dimensions of the data

Code
dim(dataset)
[1] 146   2

The data presented in the dataset is in the form of an excel and stored in the variable dataset. The first row which consists of the description of the dataset has been ignored in our analyses. The dataset consists of 2 columns (‘body weight’ and ‘population_size’) with the dimensions [146, 2].

Describe the data

Code
# Checking if there are any null values.
is.null(dataset)
[1] FALSE
Code
#Summary of the dataset
summary(dataset)
 Wet body weight [g] Population size  
 Min.   :   5.459    Min.   :      5  
 1st Qu.:  18.620    1st Qu.:   1821  
 Median :  69.232    Median :  24353  
 Mean   : 363.694    Mean   : 382874  
 3rd Qu.: 309.826    3rd Qu.: 198515  
 Max.   :9639.845    Max.   :5093378  

The dataset consists of 2 columns body_weight and population_size containing the numerical values. Before our analysis, we need to make sure that there is no NULL values present in the data. We then use the summary(dataset) that summarizes the statistical data of out dataset and provides the mean, median, min and max values. One can see that the birds which are heavier and have higher body weight are higher in number clearly stating that the survival rate of the bird is directly proportional to the body weight of the bird.

Source Code
---
title: "Challenge 1"
author: "Shriya Sehgal"
desription: "Reading in data and creating a post"
date: "11/1/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to

1)  read in a dataset, and

2)  describe the dataset using both words and any supporting information (e.g., tables, etc)

## Read in the Data





```{r}
library(readxl)
dataset<-read_excel("_data/wild_bird_data.xlsx", skip=1)
dataset

```

## Display the dimensions of the data
```{r}
dim(dataset)

```

The data presented in the dataset is in the form of an excel and stored in the variable dataset. The first row which consists of the description of the dataset has been ignored in our analyses. The dataset consists of 2 columns ('body weight' and 'population_size') with the dimensions [146, 2].


## Describe the data



```{r}
# Checking if there are any null values.
is.null(dataset)

```
```{r}
#Summary of the dataset
summary(dataset)
```



The dataset consists of 2 columns body_weight and population_size containing the numerical values. Before our analysis, we need to make sure that there is no NULL values present in the data. We then use the summary(dataset) that summarizes the statistical data of out dataset and provides the mean, median, min and max values. 
One can see that the birds which are heavier and have higher body weight are higher in number clearly stating that the survival rate of the bird is directly proportional to the body weight of the bird.