Code
library(tidyverse)
library(readr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Siddharth Goel
January 20, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
cols(
`Domain Code` = col_character(),
Domain = col_character(),
`Area Code` = col_double(),
Area = col_character(),
`Element Code` = col_double(),
Element = col_character(),
`Item Code` = col_double(),
Item = col_character(),
`Year Code` = col_double(),
Year = col_double(),
Unit = col_character(),
Value = col_double(),
Flag = col_character(),
`Flag Description` = col_character()
)
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
This dataset has 14 columns and 30977 data values. All the columns are either of the type col_character or col_double
From the columns and the data, we can see that there are multiple columns that represent the same data in multiple forms. For example, Area and Area code, Domain and Domain Code, Element and Element Code, Item and Item Code, and Year and Year Code. We can de-duplicate these columns and create separate mappings to reduce the size of the data. Also, the columns Domain and Element have a single value which means that these columns can also be eliminated.
[1] "Live Animals"
[1] 248
[1] "Chickens" "Ducks" "Geese and guinea fowls"
[4] "Turkeys" "Pigeons, other birds"
[1] "Stocks"
[1] 58
[1] "F" NA "Im" "M" "*" "A"
By analyzing the data values and the unique column values, we can assert that the dataset contains the livestock data about five birds over a certain period of time. This dataset is over a single domain, which is Live Animals and mainly contains information about birds and the regions they belong to.
---
title: "Challenge 1"
author: "Siddharth Goel"
description: "Reading in data and creating a post"
date: "01/20/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
```{r}
birds_set = read_csv('_data/birds.csv')
spec(birds_set)
head(birds_set)
```
This dataset has 14 columns and 30977 data values. All the columns are either of the type `col_character` or `col_double`
## Describe the data
From the columns and the data, we can see that there are multiple columns that represent the same data in multiple forms. For example, `Area` and `Area code`, `Domain` and `Domain Code`, `Element` and `Element Code`, `Item` and `Item Code`, and `Year` and `Year Code`. We can de-duplicate these columns and create separate mappings to reduce the size of the data.
Also, the columns `Domain` and `Element` have a single value which means that these columns can also be eliminated.
```{r}
#| label: summary
unique(birds_set$Domain)
length(unique(birds_set$Area))
unique(birds_set$Item)
unique(birds_set$Element)
length(unique(birds_set$Year))
unique(birds_set$Flag)
```
By analyzing the data values and the unique column values, we can assert that the dataset contains the livestock data about five birds over a certain period of time. This dataset is over a single domain, which is `Live Animals` and mainly contains information about birds and the regions they belong to.