Code
library(tidyverse)
library(readr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Siddharth Goel
January 20, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
cols(
`Domain Code` = col_character(),
Domain = col_character(),
`Area Code` = col_double(),
Area = col_character(),
`Element Code` = col_double(),
Element = col_character(),
`Item Code` = col_double(),
Item = col_character(),
`Year Code` = col_double(),
Year = col_double(),
Unit = col_character(),
Value = col_double(),
Flag = col_character(),
`Flag Description` = col_character()
)
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
This dataset has 14 columns and 30977 data values. All the columns are either of the type col_character
or col_double
From the columns and the data, we can see that there are multiple columns that represent the same data in multiple forms. For example, Area
and Area code
, Domain
and Domain Code
, Element
and Element Code
, Item
and Item Code
, and Year
and Year Code
. We can de-duplicate these columns and create separate mappings to reduce the size of the data. Also, the columns Domain
and Element
have a single value which means that these columns can also be eliminated.
[1] "Live Animals"
[1] 248
[1] "Chickens" "Ducks" "Geese and guinea fowls"
[4] "Turkeys" "Pigeons, other birds"
[1] "Stocks"
[1] 58
[1] "F" NA "Im" "M" "*" "A"
By analyzing the data values and the unique column values, we can assert that the dataset contains the livestock data about five birds over a certain period of time. This dataset is over a single domain, which is Live Animals
and mainly contains information about birds and the regions they belong to.
---
title: "Challenge 1"
author: "Siddharth Goel"
description: "Reading in data and creating a post"
date: "01/20/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
```{r}
birds_set = read_csv('_data/birds.csv')
spec(birds_set)
head(birds_set)
```
This dataset has 14 columns and 30977 data values. All the columns are either of the type `col_character` or `col_double`
## Describe the data
From the columns and the data, we can see that there are multiple columns that represent the same data in multiple forms. For example, `Area` and `Area code`, `Domain` and `Domain Code`, `Element` and `Element Code`, `Item` and `Item Code`, and `Year` and `Year Code`. We can de-duplicate these columns and create separate mappings to reduce the size of the data.
Also, the columns `Domain` and `Element` have a single value which means that these columns can also be eliminated.
```{r}
#| label: summary
unique(birds_set$Domain)
length(unique(birds_set$Area))
unique(birds_set$Item)
unique(birds_set$Element)
length(unique(birds_set$Year))
unique(birds_set$Flag)
```
By analyzing the data values and the unique column values, we can assert that the dataset contains the livestock data about five birds over a certain period of time. This dataset is over a single domain, which is `Live Animals` and mainly contains information about birds and the regions they belong to.