Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Kekai Liu
February 21, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
The birds.csv file is read in using read_csv().
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
This dataset has 30,977 rows and 14 columns. The 30,977 rows represent 30,977 observations or cases, and the 14 columns represent 14 variables: Domain Code, Domain, Area Code, Area, Element Code, Element, Item Code, Item, Year Code, Year, Unit, Value, Flag, and Flag Description.
spc_tbl_ [30,977 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Domain Code : chr [1:30977] "QA" "QA" "QA" "QA" ...
$ Domain : chr [1:30977] "Live Animals" "Live Animals" "Live Animals" "Live Animals" ...
$ Area Code : num [1:30977] 2 2 2 2 2 2 2 2 2 2 ...
$ Area : chr [1:30977] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
$ Element Code : num [1:30977] 5112 5112 5112 5112 5112 ...
$ Element : chr [1:30977] "Stocks" "Stocks" "Stocks" "Stocks" ...
$ Item Code : num [1:30977] 1057 1057 1057 1057 1057 ...
$ Item : chr [1:30977] "Chickens" "Chickens" "Chickens" "Chickens" ...
$ Year Code : num [1:30977] 1961 1962 1963 1964 1965 ...
$ Year : num [1:30977] 1961 1962 1963 1964 1965 ...
$ Unit : chr [1:30977] "1000 Head" "1000 Head" "1000 Head" "1000 Head" ...
$ Value : num [1:30977] 4700 4900 5000 5300 5500 5800 6600 6290 6300 6000 ...
$ Flag : chr [1:30977] "F" "F" "F" "F" ...
$ Flag Description: chr [1:30977] "FAO estimate" "FAO estimate" "FAO estimate" "FAO estimate" ...
- attr(*, "spec")=
.. cols(
.. `Domain Code` = col_character(),
.. Domain = col_character(),
.. `Area Code` = col_double(),
.. Area = col_character(),
.. `Element Code` = col_double(),
.. Element = col_character(),
.. `Item Code` = col_double(),
.. Item = col_character(),
.. `Year Code` = col_double(),
.. Year = col_double(),
.. Unit = col_character(),
.. Value = col_double(),
.. Flag = col_character(),
.. `Flag Description` = col_character()
.. )
- attr(*, "problems")=<externalptr>
The dataset covers 1961-2018; earlier years have less cases than recent years.
Year
1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976
493 493 493 493 494 495 495 495 498 498 498 498 498 499 499 499
1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
498 498 497 496 498 498 495 498 499 499 500 502 503 512 514 569
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
574 574 574 574 574 574 574 575 575 575 575 575 575 576 576 576
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
576 576 576 577 577 577 577 577 577 577
The Flag Description provides information on the sources of data. 6,488 cases are aggregate data, 1,002 cases do not have data available, 1,213 cases are FAO imputed data, 10,007 cases are FAO estimates, 10,773 cases are from official data, and 1,494 cases are unofficial figures.
Flag Description
Aggregate, may include official, semi-official, estimated or calculated data
6488
Data not available
1002
FAO data based on imputation methodology
1213
FAO estimate
10007
Official data
10773
Unofficial figure
1494
Of the 30,977 total cases, chickens comprised 13,074, ducks comprised 6,909, geese and guinea fowls comprised 4,136, pigeons and other birds comprised 1,165, and turkeys comprised 5,693.
Item
Chickens Ducks Geese and guinea fowls
13074 6909 4136
Pigeons, other birds Turkeys
1165 5693
The data only covers live animals.
These are the ten areas with the most number of cases. This output shows that the data includes supranational cases: Africa, Asia, Eastern Asia, Europe). There are several areas with the most number of cases overall, 290.
Area
Africa Asia Eastern Asia Egypt
290 290 290 290
Europe France Greece Myanmar
290 290 290 290
Northern Africa South-eastern Asia
290 290
These are the ten areas with the least number of cases. South Sudan and Sudan jointly have the least number of cases overall with only seven.
Area
South Sudan Sudan Montenegro Luxembourg Eritrea
7 7 13 19 26
Ethiopia North Macedonia Tajikistan Aruba Ethiopia PDR
26 27 27 29 32
The data contains cases from 248 unique areas.
[1] 248
Here is the five summary of the data. The smallest stock value is 0 units of 1000 Head, and the largest stock value is 23,707,134 units of 1000 Head. The mean or average across all cases is 99,411 units of 1000 Head.
Value
Min. : 0
1st Qu.: 171
Median : 1800
Mean : 99411
3rd Qu.: 15404
Max. :23707134
NA's :1036
There are 1,036 cases with missing stock values.
[1] 1036
From this quick analysis, we can summarize this as a dataset of selected types of live bird stock measured in units of 1000 Head in 248 defined areas of the world in a calendar year. A case corresponds to the live stock of a type of bird in an area of the world in a calendar year.
---
title: "Challenge 1 Reading Birds"
author: "Kekai Liu"
desription: "Reading in birds.csv"
date: "02/21/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- birds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
The birds.csv file is read in using read_csv().
```{r}
birds <- read_csv("_data/birds.csv") #read in the data and assign it to birds
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
This dataset has 30,977 rows and 14 columns. The 30,977 rows represent 30,977 observations or cases, and the 14 columns represent 14 variables: Domain Code, Domain, Area Code, Area, Element Code, Element, Item Code, Item, Year Code, Year, Unit, Value, Flag, and Flag Description.
```{r}
str(birds) #produce a summary of the contents (dimensions, variables, variable types) of the data
```
The dataset covers 1961-2018; earlier years have less cases than recent years.
```{r}
table((select(birds, Year))) #retrieve Year column from birds, calculate frequencies
```
The Flag Description provides information on the sources of data. 6,488 cases are aggregate data, 1,002 cases do not have data available, 1,213 cases are FAO imputed data, 10,007 cases are FAO estimates, 10,773 cases are from official data, and 1,494 cases are unofficial figures.
```{r}
table((select(birds, "Flag Description"))) #retrieve "Flag Description" column from birds, calculate frequencies
```
Of the 30,977 total cases, chickens comprised 13,074, ducks comprised 6,909, geese and guinea fowls comprised 4,136, pigeons and other birds comprised 1,165, and turkeys comprised 5,693.
```{r}
table((select(birds, Item))) #retrieve Item column from birds, calculate frequencies
```
The data only covers live animals.
```{r}
table((select(birds, Domain)))
```
These are the ten areas with the most number of cases. This output shows that the data includes supranational cases: Africa, Asia, Eastern Asia, Europe). There are several areas with the most number of cases overall, 290.
```{r}
head(sort(table((select(birds, Area))),decreasing=TRUE), n=10) #retrieve Area column from birds, calculate frequencies, sort in ascending order, display first ten
```
These are the ten areas with the least number of cases. South Sudan and Sudan jointly have the least number of cases overall with only seven.
```{r}
head(sort(table((select(birds, Area))),decreasing=FALSE), n=10) #retrieve Area column from birds, calculate frequencies, sort in descending order, display first ten
```
The data contains cases from 248 unique areas.
```{r}
nrow(unique(select(birds, Area))) #retrieve Area column from birds, identify unique Area values, calculate total number of unique Area values
```
Here is the five summary of the data. The smallest stock value is 0 units of 1000 Head, and the largest stock value is 23,707,134 units of 1000 Head. The mean or average across all cases is 99,411 units of 1000 Head.
```{r}
summary(select(birds, Value)) #retrieve Value column from birds, produce five number summary of Value
```
There are 1,036 cases with missing stock values.
```{r}
sum(is.na(select(birds, Value))) #retrieve Value column from birds, identify cases with missing values, total the number of cases with missing values
```
From this quick analysis, we can summarize this as a dataset of selected types of live bird stock measured in units of 1000 Head in 248 defined areas of the world in a calendar year. A case corresponds to the live stock of a type of bird in an area of the world in a calendar year.