Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Quinn He
August 15, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The birds data set contains a wide range of range of entries. With the function below we can see all the column names listed. A few are hard to figure out what exactly the represent and just how important they are.
[1] "Domain Code" "Domain" "Area Code" "Area"
[5] "Element Code" "Element" "Item Code" "Item"
[9] "Year Code" "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
It appears the data set was taken from a farm organization. The data is definitely a little messy, but makes sense on the data entry side. Each country has descending rows of chickens, ducks, and fowls from 1961 to 2018. This is mostly a bit redundant. This whole data set keeps track of the value of these three types of birds in a 60 year window. There is also a possibility this data set came from a larger set with other types of animals because the “Domain” column lists ‘Livestock’ throughout the entire data set.
Domain Code Domain Area Code Area
Length:30977 Length:30977 Min. : 1 Length:30977
Class :character Class :character 1st Qu.: 79 Class :character
Mode :character Mode :character Median : 156 Mode :character
Mean :1202
3rd Qu.: 231
Max. :5504
Element Code Element Item Code Item
Min. :5112 Length:30977 Min. :1057 Length:30977
1st Qu.:5112 Class :character 1st Qu.:1057 Class :character
Median :5112 Mode :character Median :1068 Mode :character
Mean :5112 Mean :1066
3rd Qu.:5112 3rd Qu.:1072
Max. :5112 Max. :1083
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:30977 Min. : 0
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 171
Median :1992 Median :1992 Mode :character Median : 1800
Mean :1991 Mean :1991 Mean : 99411
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 15404
Max. :2018 Max. :2018 Max. :23707134
NA's :1036
Flag Flag Description
Length:30977 Length:30977
Class :character Class :character
Mode :character Mode :character
[1] 30977 14
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
# ℹ Use `colnames()` to see all variable names
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 5504 Poly… 5112 Stocks 1068 Ducks 2013 2013 1000…
2 QA Live … 5504 Poly… 5112 Stocks 1068 Ducks 2014 2014 1000…
3 QA Live … 5504 Poly… 5112 Stocks 1068 Ducks 2015 2015 1000…
4 QA Live … 5504 Poly… 5112 Stocks 1068 Ducks 2016 2016 1000…
5 QA Live … 5504 Poly… 5112 Stocks 1068 Ducks 2017 2017 1000…
6 QA Live … 5504 Poly… 5112 Stocks 1068 Ducks 2018 2018 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
# ℹ Use `colnames()` to see all variable names
I’m wondering if I should use the %>% function here. I’m also having an issue with my functions because they don’t run correctly. Is it from my the “delimiter” error above? It may also be an issue with my working directory.
I commented out ‘table(birds)’ because it was giving me an error when I rendered it the function.
---
title: "Challenge 1 Quinn He"
author: "Quinn He"
desription: "Reading in data and creating a post"
date: "08/15/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xlsx ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
birds <- read_csv("_data/birds.csv")
view(birds)
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The birds data set contains a wide range of range of entries. With the function below we can see all the column names listed. A few are hard to figure out what exactly the represent and just how important they are.
```{r}
colnames(birds)
```
It appears the data set was taken from a farm organization. The data is definitely a little messy, but makes sense on the data entry side. Each country has descending rows of chickens, ducks, and fowls from 1961 to 2018. This is mostly a bit redundant. This whole data set keeps track of the value of these three types of birds in a 60 year window. There is also a possibility this data set came from a larger set with other types of animals because the "Domain" column lists 'Livestock' throughout the entire data set.
```{r}
#| label: Summary
summary(birds)
dim(birds)
head(birds)
tail(birds)
#table(birds)
ggplot(birds, mapping = aes(x = 'Year', y = 'Value'))
```
I'm wondering if I should use the %>% function here. I'm also having an issue with my functions because they don't run correctly. Is it from my the "delimiter" error above? It may also be an issue with my working directory.
I commented out 'table(birds)' because it was giving me an error when I rendered it the function.
```{r}
birds %>%
select(Item)
```