Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Shantanu Patil
February 23, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
I loaded the library readr using command library(readr)
The birds.csv file has 30977 Rows and 14 Columns. I have used head function to load the column headers and first 3 rows.
Domain.Code Domain Area.Code Area Element.Code Element Item.Code
1 QA Live Animals 2 Afghanistan 5112 Stocks 1057
2 QA Live Animals 2 Afghanistan 5112 Stocks 1057
3 QA Live Animals 2 Afghanistan 5112 Stocks 1057
Item Year.Code Year Unit Value Flag Flag.Description
1 Chickens 1961 1961 1000 Head 4700 F FAO estimate
2 Chickens 1962 1962 1000 Head 4900 F FAO estimate
3 Chickens 1963 1963 1000 Head 5000 F FAO estimate
# A tibble: 1 × 14
QA Live Animal…¹ `2` Afgha…² `5112` Stocks `1057` Chick…³ 1961.…⁴ 1961.…⁵
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QA Live Animals 2 Afghan… 5112 Stocks 1057 Chicke… 1962 1962
# … with 4 more variables: `1000 Head` <chr>, `4700` <dbl>, F <chr>,
# `FAO estimate` <chr>, and abbreviated variable names ¹`Live Animals`,
# ²Afghanistan, ³Chickens, ⁴`1961...9`, ⁵`1961...10`
We can see that the bird data is made of 14 columns of which 8 are of character data type and the remaining are of int data type. To find out what are the column names we can use colnames function. The data gathered has information about Domain, Area, Element, Item, Year, Unit, Value, Flag, Flag.Description.
'data.frame': 30977 obs. of 14 variables:
$ Domain.Code : chr "QA" "QA" "QA" "QA" ...
$ Domain : chr "Live Animals" "Live Animals" "Live Animals" "Live Animals" ...
$ Area.Code : int 2 2 2 2 2 2 2 2 2 2 ...
$ Area : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
$ Element.Code : int 5112 5112 5112 5112 5112 5112 5112 5112 5112 5112 ...
$ Element : chr "Stocks" "Stocks" "Stocks" "Stocks" ...
$ Item.Code : int 1057 1057 1057 1057 1057 1057 1057 1057 1057 1057 ...
$ Item : chr "Chickens" "Chickens" "Chickens" "Chickens" ...
$ Year.Code : int 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
$ Year : int 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 ...
$ Unit : chr "1000 Head" "1000 Head" "1000 Head" "1000 Head" ...
$ Value : int 4700 4900 5000 5300 5500 5800 6600 6290 6300 6000 ...
$ Flag : chr "F" "F" "F" "F" ...
$ Flag.Description: chr "FAO estimate" "FAO estimate" "FAO estimate" "FAO estimate" ...
[1] "Domain.Code" "Domain" "Area.Code" "Area"
[5] "Element.Code" "Element" "Item.Code" "Item"
[9] "Year.Code" "Year" "Unit" "Value"
[13] "Flag" "Flag.Description"
We can see that the data was collected from 1961 to 2018.
---
title: "Exploring and Analysing the Birds Dataset"
author: "Shantanu Patil"
desription: "Reading in data and creating a post"
date: "02/23/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- birds
- hw1
- wildbirds
- shantanu patil
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Reading in the bird data
I loaded the library readr using command library(readr)
The birds.csv file has 30977 Rows and 14 Columns.
I have used head function to load the column headers and first 3 rows.
```{r}
library(readr)
birds_data <- read.csv(file = "_data/birds.csv")
head(birds_data, 3)
# a data set created with only numeric values skip header
bird_data2 <- read_csv(file = "_data/birds.csv", skip=1)
head(bird_data2, 1)
```
## Describe the data
We can see that the bird data is made of 14 columns of which 8 are of character data type and the remaining are of int data type.
To find out what are the column names we can use colnames function.
The data gathered has information about Domain, Area, Element, Item, Year, Unit, Value, Flag, Flag.Description.
```{r}
str(birds_data)
colnames(birds_data)
```
## Finding Out the start and end date from when the data was collected.
We can see that the data was collected from 1961 to 2018.
```{r}
max(birds_data$Year)
min(birds_data$Year)
```