Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
FNU Avinesh Krishnan
May 13, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
Read the data from the csv file.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The given data appears to be related to Livestock Primary in different areas and several years. It contains information about the production of eggs, laying, yield, and their respective values in different units. The data also includes flags and flag descriptions for each entry. The columns contain information such as Domain Code, Domain, Area Code, Area, Element Code, Element, Item Code, Item, Year Code, Year, Unit, Value, Flag, and Flag Description. The data appears to have been collected by the Food and Agriculture Organization (FAO).
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1961 1961 1000…
2 QL Lives… 2 Afgh… 5410 Yield 1062 Eggs… 1961 1961 100m…
3 QL Lives… 2 Afgh… 5510 Produc… 1062 Eggs… 1961 1961 tonn…
4 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1962 1962 1000…
5 QL Lives… 2 Afgh… 5410 Yield 1062 Eggs… 1962 1962 100m…
6 QL Lives… 2 Afgh… 5510 Produc… 1062 Eggs… 1962 1962 tonn…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
Look at the dataset’s first six rows to get a sense of the type of data that is present.
Got an idea of number of observations taken on different fields.
[1] "Domain Code" "Domain" "Area Code" "Area"
[5] "Element Code" "Element" "Item Code" "Item"
[9] "Year Code" "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
the dataset’s column names.
# A tibble: 58 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1961 1961
2 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1962 1962
3 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1963 1963
4 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1964 1964
5 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1965 1965
6 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1966 1966
7 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1967 1967
8 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1968 1968
9 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1969 1969
10 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1970 1970
# … with 48 more rows, 4 more variables: Unit <chr>, Value <dbl>, Flag <chr>,
# `Flag Description` <chr>, and abbreviated variable names ¹`Domain Code`,
# ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
Knowing the data of Afghanistan and the the Yield which is Laying.
There are 245 distinct areas.
# A tibble: 245 × 1
Area
<chr>
1 Afghanistan
2 Albania
3 Algeria
4 American Samoa
5 Angola
6 Antigua and Barbuda
7 Argentina
8 Armenia
9 Australia
10 Austria
# … with 235 more rows
The above ones are the distict area names.
# A tibble: 1 × 1
n
<int>
1 7548
There are 7548 data points where the data has a flag as official data.
Domain Code Domain Area Code Area
Length:38170 Length:38170 Min. : 1.0 Length:38170
Class :character Class :character 1st Qu.: 70.0 Class :character
Mode :character Mode :character Median : 143.0 Mode :character
Mean : 771.1
3rd Qu.: 215.0
Max. :5504.0
Element Code Element Item Code Item
Min. :5313 Length:38170 Min. :1062 Length:38170
1st Qu.:5313 Class :character 1st Qu.:1062 Class :character
Median :5410 Mode :character Median :1062 Mode :character
Mean :5411 Mean :1062
3rd Qu.:5510 3rd Qu.:1062
Max. :5510 Max. :1062
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:38170 Min. : 1
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 2600
Median :1991 Median :1991 Mode :character Median : 31996
Mean :1990 Mean :1990 Mean : 291341
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 93836
Max. :2018 Max. :2018 Max. :76769955
NA's :40
Flag Flag Description
Length:38170 Length:38170
Class :character Class :character
Mode :character Mode :character
A brief summary of the dataset.
---
title: "Challenge 1"
author: "FNU Avinesh Krishnan"
description: "Reading in data and creating a post"
date: "05/13/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- FNU Avinesh Krishnan
- wild_bird_data
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
library(readxl)
wild_birds <- read_csv("~/Desktop/601_Spring_2023/posts/_data/FAOstat_egg_chicken.csv")
view(wild_birds)
```
Read the data from the csv file.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The given data appears to be related to Livestock Primary in different areas and several years. It contains information about the production of eggs, laying, yield, and their respective values in different units. The data also includes flags and flag descriptions for each entry. The columns contain information such as Domain Code, Domain, Area Code, Area, Element Code, Element, Item Code, Item, Year Code, Year, Unit, Value, Flag, and Flag Description. The data appears to have been collected by the Food and Agriculture Organization (FAO).
```{r}
#| label: summary
head(wild_birds)
```
Look at the dataset's first six rows to get a sense of the type of data that is present.
```{r}
dim(wild_birds)
```
Got an idea of number of observations taken on different fields.
```{r}
colnames(wild_birds)
```
the dataset's column names.
```{r}
filter(wild_birds, `Area`=="Afghanistan" & `Element`=="Laying")
```
Knowing the data of Afghanistan and the the Yield which is Laying.
```{r}
wild_birds%>%
select(`Area`) %>%
n_distinct(.)
```
There are 245 distinct areas.
```{r}
wild_birds%>%
select(`Area`) %>%
distinct(.)
```
The above ones are the distict area names.
```{r}
count_flag<-wild_birds%>%
filter(`Flag Description`=="Official data") %>%
count()
count_flag
```
There are 7548 data points where the data has a flag as official data.
```{r}
summary(wild_birds)
```
A brief summary of the dataset.