Code
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning =FALSE, message=FALSE)Priyanka Perumalla
May 15, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.
The birds.csv file is read in using read_csv().
# A tibble: 6 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QL            Livestock …           2 Afgh…           5313 Laying         1062
2 QL            Livestock …           2 Afgh…           5410 Yield          1062
3 QL            Livestock …           2 Afgh…           5510 Produc…        1062
4 QL            Livestock …           2 Afgh…           5313 Laying         1062
5 QL            Livestock …           2 Afgh…           5410 Yield          1062
6 QL            Livestock …           2 Afgh…           5510 Produc…        1062
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
Description : The dataset ‘FAOSTAT_egg_chicken.csv’ contains information about the livestock data produced/consumed by countries during various timelines (number of years) ranging from 1961 to 2018.
# A tibble: 6 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QL            Livestock …           2 Afgh…           5313 Laying         1062
2 QL            Livestock …           2 Afgh…           5410 Yield          1062
3 QL            Livestock …           2 Afgh…           5510 Produc…        1062
4 QL            Livestock …           2 Afgh…           5313 Laying         1062
5 QL            Livestock …           2 Afgh…           5410 Yield          1062
6 QL            Livestock …           2 Afgh…           5510 Produc…        1062
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>Displaying the summary of the dataset.
 Domain Code           Domain            Area Code          Area          
 Length:38170       Length:38170       Min.   :   1.0   Length:38170      
 Class :character   Class :character   1st Qu.:  70.0   Class :character  
 Mode  :character   Mode  :character   Median : 143.0   Mode  :character  
                                       Mean   : 771.1                     
                                       3rd Qu.: 215.0                     
                                       Max.   :5504.0                     
                                                                          
  Element Code    Element            Item Code        Item          
 Min.   :5313   Length:38170       Min.   :1062   Length:38170      
 1st Qu.:5313   Class :character   1st Qu.:1062   Class :character  
 Median :5410   Mode  :character   Median :1062   Mode  :character  
 Mean   :5411                      Mean   :1062                     
 3rd Qu.:5510                      3rd Qu.:1062                     
 Max.   :5510                      Max.   :1062                     
                                                                    
   Year Code         Year          Unit               Value         
 Min.   :1961   Min.   :1961   Length:38170       Min.   :       1  
 1st Qu.:1976   1st Qu.:1976   Class :character   1st Qu.:    2600  
 Median :1991   Median :1991   Mode  :character   Median :   31996  
 Mean   :1990   Mean   :1990                      Mean   :  291341  
 3rd Qu.:2005   3rd Qu.:2005                      3rd Qu.:   93836  
 Max.   :2018   Max.   :2018                      Max.   :76769955  
                                                  NA's   :40        
     Flag           Flag Description  
 Length:38170       Length:38170      
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
                                      Displaying the dimensions of the dataset,
Printing all columns
 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"Printing the unique Years
The dataset contains info of 58 unique years i.e 1961-2018
Printing the unique areas
The dataset contains info about 245 unique areas
Printing the unique Domains (eg; Primary Live Stock etc) The data is all from a single domain
---
title: "Challenge 1"
author: "Priyanka Perumalla"
description: "Getting acquainted with the properties of the dataset"
date: "05/15/2023"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning =FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1)  read in a dataset, and
2)  describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
-   railroad_2012_clean_county.csv ⭐
-   birds.csv ⭐⭐
-   FAOstat\*.csv ⭐⭐
-   wild_bird_data.xlsx ⭐⭐⭐
-   StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
The birds.csv file is read in using read_csv().
```{r}
df <- read_csv('/Users/priyankaperumalla/Desktop/daccs/601_Spring_2023/posts/_data/FAOSTAT_egg_chicken.csv', show_col_types = FALSE)
head(df)
```
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
Description : The dataset 'FAOSTAT_egg_chicken.csv' contains information about the livestock data produced/consumed by countries during various timelines (number of years) ranging from 1961 to 2018.
```{r}
#| label: summary
head(df)
```
Displaying the summary of the dataset.
```{r}
summary(df)
```
Displaying the dimensions of the dataset,
```{r}
dim(df)
```
Printing all columns
```{r}
colnames(df)
```
Printing the unique Years
```{r}
unique_years <- df%>% select(Year)%>% n_distinct(.)
unique_years
```
The dataset contains info of 58 unique years i.e 1961-2018
Printing the unique areas
```{r}
unique_areas <- df%>% select(Area)%>% n_distinct(.)
unique_areas
```
The dataset contains info about 245 unique areas
Printing the unique Domains (eg; Primary Live Stock etc) 
The data is all from a single domain
```{r}
unique_areas <- df%>% select(Domain)%>% n_distinct(.)
unique_areas
```