Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Darron Bunt
October 9, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
Ok, so from what’s been read in above, we know that the birds dataset has 30,977 rows and 14 columns. Eight of those columns are character-based, while the remaining six are number-based. Neat.
So now if I run birds, I should get a tibble, and in theory that tibble is going to help me perform a high-level description of the data.
# A tibble: 30,977 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966
7 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1967 1967
8 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1968 1968
9 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1969 1969
10 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1970 1970
# … with 30,967 more rows, 4 more variables: Unit <chr>, Value <dbl>,
# Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
# ¹`Domain Code`, ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
The data appears to show the worldwide data on the historical value of 1,000 head of five different birds. Specifically, the dataset includes information relating to chickens, ducks, geeese and guinea fowls, pigeons and other birds, and turkeys, from 601 areas of the world (some countries, some regions), dating from 1961 to 2018.
# A tibble: 5 × 2
Item n
<chr> <int>
1 Chickens 13074
2 Ducks 6909
3 Geese and guinea fowls 4136
4 Pigeons, other birds 1165
5 Turkeys 5693
# A tibble: 248 × 2
Area n
<chr> <int>
1 Afghanistan 58
2 Africa 290
3 Albania 232
4 Algeria 232
5 American Samoa 58
6 Americas 232
7 Angola 58
8 Antigua and Barbuda 58
9 Argentina 232
10 Armenia 54
# … with 238 more rows
# A tibble: 58 × 2
Year n
<dbl> <int>
1 1961 493
2 1962 493
3 1963 493
4 1964 493
5 1965 494
6 1966 495
7 1967 495
8 1968 495
9 1969 498
10 1970 498
# … with 48 more rows
Judging by the flag descriptions, this data has come from a variety of sources, most commonly FAO (Food and Agriculture Organization) estimates and official data.
# A tibble: 6 × 2
`Flag Description` n
<chr> <int>
1 Aggregate, may include official, semi-official, estimated or calculated… 6488
2 Data not available 1002
3 FAO data based on imputation methodology 1213
4 FAO estimate 10007
5 Official data 10773
6 Unofficial figure 1494
Several columns contain repetitive data; the value for Domain Code and Domain is the same across all entries in the dataset (QA for the former; Live Animals for the latter), as is the value for Element Code and Element (5112 and Stocks, respecitvely). The columns for Year Code and Year repeat the same data. The Unit is also the same for the entire dataset (1,000 head).
I used a variety of count commands to ascertain the above; for reference I have included that for Domain Code and Domain.
# A tibble: 1 × 2
`Domain Code` n
<chr> <int>
1 QA 30977
# A tibble: 1 × 2
Domain n
<chr> <int>
1 Live Animals 30977
---
title: "Challenge 1 - Darron Bunt"
author: "Darron Bunt"
desription: "Reading in data and creating a post"
date: "10/09/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- birds
- darron bunt
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Step 1 - Read in the Data
*Read in one (or more) of the following data sets, using the correct R package and command.*
- birds.csv ⭐⭐
```{r}
birds <- read_csv("_data/birds.csv")
```
*Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.*
## Step 2 - Describe the data
*Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).*
Ok, so from what's been read in above, we know that the birds dataset has 30,977 rows and 14 columns. Eight of those columns are character-based, while the remaining six are number-based. Neat.
So now if I run birds, I should get a tibble, and in theory that tibble is going to help me perform a high-level description of the data.
```{r}
birds
```
The data appears to show the worldwide data on the historical value of 1,000 head of five different birds. Specifically, the dataset includes information relating to chickens, ducks, geeese and guinea fowls, pigeons and other birds, and turkeys, from 601 areas of the world (some countries, some regions), dating from 1961 to 2018.
```{r}
count(birds, Item)
count(birds, Area)
count(birds,Year)
```
Judging by the flag descriptions, this data has come from a variety of sources, most commonly FAO (Food and Agriculture Organization) estimates and official data.
```{r}
count(birds,`Flag Description`)
```
Several columns contain repetitive data; the value for Domain Code and Domain is the same across all entries in the dataset (QA for the former; Live Animals for the latter), as is the value for Element Code and Element (5112 and Stocks, respecitvely). The columns for Year Code and Year repeat the same data. The Unit is also the same for the entire dataset (1,000 head).
I used a variety of count commands to ascertain the above; for reference I have included that for Domain Code and Domain.
```{r}
count(birds,`Domain Code`)
count(birds, Domain)
```
```{r}
#| label: summary
```