Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning =FALSE, message=FALSE) knitr
Priyanka Perumalla
May 15, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
The birds.csv file is read in using read_csv().
# A tibble: 6 × 14
`Domain Code` Domain `Area Code` Area `Element Code` Element `Item Code`
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 QL Livestock … 2 Afgh… 5313 Laying 1062
2 QL Livestock … 2 Afgh… 5410 Yield 1062
3 QL Livestock … 2 Afgh… 5510 Produc… 1062
4 QL Livestock … 2 Afgh… 5313 Laying 1062
5 QL Livestock … 2 Afgh… 5410 Yield 1062
6 QL Livestock … 2 Afgh… 5510 Produc… 1062
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
# Value <dbl>, Flag <chr>, `Flag Description` <chr>
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
Description : The dataset ‘FAOSTAT_egg_chicken.csv’ contains information about the livestock data produced/consumed by countries during various timelines (number of years) ranging from 1961 to 2018.
# A tibble: 6 × 14
`Domain Code` Domain `Area Code` Area `Element Code` Element `Item Code`
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 QL Livestock … 2 Afgh… 5313 Laying 1062
2 QL Livestock … 2 Afgh… 5410 Yield 1062
3 QL Livestock … 2 Afgh… 5510 Produc… 1062
4 QL Livestock … 2 Afgh… 5313 Laying 1062
5 QL Livestock … 2 Afgh… 5410 Yield 1062
6 QL Livestock … 2 Afgh… 5510 Produc… 1062
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
# Value <dbl>, Flag <chr>, `Flag Description` <chr>
Displaying the summary of the dataset.
Domain Code Domain Area Code Area
Length:38170 Length:38170 Min. : 1.0 Length:38170
Class :character Class :character 1st Qu.: 70.0 Class :character
Mode :character Mode :character Median : 143.0 Mode :character
Mean : 771.1
3rd Qu.: 215.0
Max. :5504.0
Element Code Element Item Code Item
Min. :5313 Length:38170 Min. :1062 Length:38170
1st Qu.:5313 Class :character 1st Qu.:1062 Class :character
Median :5410 Mode :character Median :1062 Mode :character
Mean :5411 Mean :1062
3rd Qu.:5510 3rd Qu.:1062
Max. :5510 Max. :1062
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:38170 Min. : 1
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 2600
Median :1991 Median :1991 Mode :character Median : 31996
Mean :1990 Mean :1990 Mean : 291341
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 93836
Max. :2018 Max. :2018 Max. :76769955
NA's :40
Flag Flag Description
Length:38170 Length:38170
Class :character Class :character
Mode :character Mode :character
Displaying the dimensions of the dataset,
Printing all columns
[1] "Domain Code" "Domain" "Area Code" "Area"
[5] "Element Code" "Element" "Item Code" "Item"
[9] "Year Code" "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
Printing the unique Years
The dataset contains info of 58 unique years i.e 1961-2018
Printing the unique areas
The dataset contains info about 245 unique areas
Printing the unique Domains (eg; Primary Live Stock etc) The data is all from a single domain
---
title: "Challenge 1"
author: "Priyanka Perumalla"
description: "Getting acquainted with the properties of the dataset"
date: "05/15/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning =FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
The birds.csv file is read in using read_csv().
```{r}
df <- read_csv('/Users/priyankaperumalla/Desktop/daccs/601_Spring_2023/posts/_data/FAOSTAT_egg_chicken.csv', show_col_types = FALSE)
head(df)
```
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
Description : The dataset 'FAOSTAT_egg_chicken.csv' contains information about the livestock data produced/consumed by countries during various timelines (number of years) ranging from 1961 to 2018.
```{r}
#| label: summary
head(df)
```
Displaying the summary of the dataset.
```{r}
summary(df)
```
Displaying the dimensions of the dataset,
```{r}
dim(df)
```
Printing all columns
```{r}
colnames(df)
```
Printing the unique Years
```{r}
unique_years <- df%>% select(Year)%>% n_distinct(.)
unique_years
```
The dataset contains info of 58 unique years i.e 1961-2018
Printing the unique areas
```{r}
unique_areas <- df%>% select(Area)%>% n_distinct(.)
unique_areas
```
The dataset contains info about 245 unique areas
Printing the unique Domains (eg; Primary Live Stock etc)
The data is all from a single domain
```{r}
unique_areas <- df%>% select(Domain)%>% n_distinct(.)
unique_areas
```