For this challenge, I will be reading in the dataset birds.csv
Code
birds <- readr::read_csv("_data/birds.csv")
Rows: 30977 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s look at first several rows of the dataset:
Code
head(birds)
# A tibble: 6 × 14
`Domain Code` Domain `Area Code` Area `Element Code` Element `Item Code`
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 QA Live Anima… 2 Afgh… 5112 Stocks 1057
2 QA Live Anima… 2 Afgh… 5112 Stocks 1057
3 QA Live Anima… 2 Afgh… 5112 Stocks 1057
4 QA Live Anima… 2 Afgh… 5112 Stocks 1057
5 QA Live Anima… 2 Afgh… 5112 Stocks 1057
6 QA Live Anima… 2 Afgh… 5112 Stocks 1057
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
# Value <dbl>, Flag <chr>, `Flag Description` <chr>
Year and YearCode appear to be duplicate variables.
As shown above, the data has 14 columns and 30977 rows. Let’s look at the column names:
Now, I will use spec() to inspect the data types of each of the columns in the dataset. Eight of the variables are categorical and six are numeric.
Code
spec(birds)
cols(
`Domain Code` = col_character(),
Domain = col_character(),
`Area Code` = col_double(),
Area = col_character(),
`Element Code` = col_double(),
Element = col_character(),
`Item Code` = col_double(),
Item = col_character(),
`Year Code` = col_double(),
Year = col_double(),
Unit = col_character(),
Value = col_double(),
Flag = col_character(),
`Flag Description` = col_character()
)
Here is a table of all of the types of birds found in the dataset under the column ‘Item’. Chickens appear to be the most common type of bird here.
Code
table(birds$Item)
Chickens Ducks Geese and guinea fowls
13074 6909 4136
Pigeons, other birds Turkeys
1165 5693
Now, we will use colSums(is.na()) to see where data is missing. We see that some data is missing in the ‘Value’ and ‘Flag’ columns.
Code
colSums(is.na(birds))
Domain Code Domain Area Code Area
0 0 0 0
Element Code Element Item Code Item
0 0 0 0
Year Code Year Unit Value
0 0 0 1036
Flag Flag Description
10773 0
The dataset counts different types of live birds (shown in column ‘Item’) in different areas (columns ‘Area’ and ‘Area Code’) and years (‘Year’ and ‘Year Code’). Based on the information in the ‘Flag Description’ column, the data appears to be a mix of collected data and estimates.
Source Code
---title: "Challenge 1"description: "Reading in data and creating a post"author: "Danny Holt"date: "2023-06-01"format: html: toc: true code-fold: true code-copy: true code-tools: trueexecute: echo: false---```{r}#| label: setup#| warning: falselibrary(tidyverse)library(readr)knitr::opts_chunk$set(echo =TRUE)```## `birds.csv`For this challenge, I will be reading in the dataset birds.csv```{r}birds <- readr::read_csv("_data/birds.csv")```Let's look at first several rows of the dataset:```{r}head(birds)```Year and YearCode appear to be duplicate variables.As shown above, the data has 14 columns and 30977 rows. Let's look at the column names:```{r}colnames(birds)```Now, I will use `spec()` to inspect the data types of each of the columns in the dataset. Eight of the variables are categorical and six are numeric.```{r}spec(birds)```Here is a table of all of the types of birds found in the dataset under the column 'Item'. Chickens appear to be the most common type of bird here.```{r}table(birds$Item)```Now, we will use `colSums(is.na())` to see where data is missing. We see that some data is missing in the 'Value' and 'Flag' columns.```{r}colSums(is.na(birds))```The dataset counts different types of live birds (shown in column ‘Item’) in different areas (columns ‘Area’ and ‘Area Code’) and years (‘Year’ and ‘Year Code’). Based on the information in the ‘Flag Description’ column, the data appears to be a mix of collected data and estimates.