DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1 Instructions

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in the Data
  • Display the dimensions of the data
  • Describe the data

Challenge 1 Instructions

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Author

Neeharika Karanam

Published

September 24, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Code
library(readxl)
wild_birds_dataset<-read_excel("_data/wild_bird_data.xlsx", skip=1)
wild_birds_dataset
# A tibble: 146 × 2
   `Wet body weight [g]` `Population size`
                   <dbl>             <dbl>
 1                  5.46           532194.
 2                  7.76          3165107.
 3                  8.64          2592997.
 4                 10.7           3524193.
 5                  7.42           389806.
 6                  9.12           604766.
 7                  8.04           192361.
 8                  8.70           250452.
 9                  8.89            16997.
10                  9.52              595.
# … with 136 more rows

Display the dimensions of the data

Code
dim(wild_birds_dataset)
[1] 146   2

The data present in the form of an excel has been read and stored into the variable(wild_birds_dataset). The first line of the dataset consists of the description of where it has been taken from which can be ignored for our analysis. Therefore, I have skipped the first row of the dataset and then read the excel and displayed the details. The dataset consists of 2 columns the “body weight” and the “population_size”. The dimensions of the dataset is displayed i.e [146, 2].

Describe the data

Code
#Checking if there are any null values
is.null(wild_birds_dataset)
[1] FALSE
Code
#Summary of the dataset
summary(wild_birds_dataset)
 Wet body weight [g] Population size  
 Min.   :   5.459    Min.   :      5  
 1st Qu.:  18.620    1st Qu.:   1821  
 Median :  69.232    Median :  24353  
 Mean   : 363.694    Mean   : 382874  
 3rd Qu.: 309.826    3rd Qu.: 198515  
 Max.   :9639.845    Max.   :5093378  

The wild birds dataset consists of only 2 columns body_weight and population_size which are both numerical values. Before we actually perform any analysis on the data and try to understand the distribution of the values in the dataset we need to make sure that there are no NULL values present in the data as it might effect our data adversely. We Could clearly see that there are no NULL values in the dataset we can summarize the data which will provide us the statistical data of the dataset. The birds which have a very low body weight are very less in number whereas the birds which are heavier and have high body weight are high in number. This clearly says that the surival rate or the population size is directly proportional to the body weigt of the bird. The mean, median, min, max and other statistical information is displayed above.

Source Code
---
title: "Challenge 1 Instructions"
author: "Neeharika Karanam"
desription: "Reading in data and creating a post"
date: "09/24/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to

1)  read in a dataset, and

2)  describe the dataset using both words and any supporting information (e.g., tables, etc)

## Read in the Data

```{r}
library(readxl)
wild_birds_dataset<-read_excel("_data/wild_bird_data.xlsx", skip=1)
wild_birds_dataset
```
## Display the dimensions of the data

```{r}
dim(wild_birds_dataset)
```

The data present in the form of an excel has been read and stored into the variable(wild_birds_dataset). The first line of the dataset consists of the description of where it has been taken from which can be ignored for our analysis. Therefore, I have skipped the first row of the dataset and then read the excel and displayed the details. The dataset consists of 2 columns the "body weight" and the "population_size". The dimensions of the dataset is displayed i.e [146, 2].

## Describe the data

```{r}
#Checking if there are any null values
is.null(wild_birds_dataset)
```

```{r}
#Summary of the dataset
summary(wild_birds_dataset)
```

The wild birds dataset consists of only 2 columns body_weight and population_size which are both numerical values. Before we actually perform any analysis on the data and try to understand the distribution of the values in the dataset we need to make sure that there are no NULL values present in the data as it might effect our data adversely. We Could clearly see that there are no NULL values in the dataset we can summarize the data which will provide us  the statistical data of the dataset. The birds which have a very low body weight are very less in number whereas the birds which are heavier and have high body weight are high in number. This clearly says that the surival rate or the population size is directly proportional to the body weigt of the bird. The mean, median, min, max and other statistical information is displayed above.