Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Abhinav Reddy Yadatha
February 26, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
Description : The dataset ‘birds.csv’ contains information about the population of wild birds like chicken, geese etc for a few countries anually from 1961 to 2018.
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
Displaying the summary of the dataset.
Data Frame Summary
dataframe
Dimensions: 30977 x 14
Duplicates: 0
----------------------------------------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- ------------------ -------------------------------- ----------------------- ---------------------- ---------- ---------
1 Domain Code 1. QA 30977 (100.0%) IIIIIIIIIIIIIIIIIIII 30977 0
[character] (100.0%) (0.0%)
2 Domain 1. Live Animals 30977 (100.0%) IIIIIIIIIIIIIIIIIIII 30977 0
[character] (100.0%) (0.0%)
3 Area Code Mean (sd) : 1201.7 (2099.4) 248 distinct values : 30977 0
[numeric] min < med < max: : (100.0%) (0.0%)
1 < 156 < 5504 :
IQR (CV) : 152 (1.7) : .
: :
4 Area 1. Africa 290 ( 0.9%) 30977 0
[character] 2. Asia 290 ( 0.9%) (100.0%) (0.0%)
3. Eastern Asia 290 ( 0.9%)
4. Egypt 290 ( 0.9%)
5. Europe 290 ( 0.9%)
6. France 290 ( 0.9%)
7. Greece 290 ( 0.9%)
8. Myanmar 290 ( 0.9%)
9. Northern Africa 290 ( 0.9%)
10. South-eastern Asia 290 ( 0.9%)
[ 238 others ] 28077 (90.6%) IIIIIIIIIIIIIIIIII
5 Element Code 1 distinct value 5112 : 30977 (100.0%) IIIIIIIIIIIIIIIIIIII 30977 0
[numeric] (100.0%) (0.0%)
6 Element 1. Stocks 30977 (100.0%) IIIIIIIIIIIIIIIIIIII 30977 0
[character] (100.0%) (0.0%)
7 Item Code Mean (sd) : 1066.5 (9) 1057 : 13074 (42.2%) IIIIIIII 30977 0
[numeric] min < med < max: 1068 : 6909 (22.3%) IIII (100.0%) (0.0%)
1057 < 1068 < 1083 1072 : 4136 (13.4%) II
IQR (CV) : 15 (0) 1079 : 5693 (18.4%) III
1083 : 1165 ( 3.8%)
8 Item 1. Chickens 13074 (42.2%) IIIIIIII 30977 0
[character] 2. Ducks 6909 (22.3%) IIII (100.0%) (0.0%)
3. Geese and guinea fowls 4136 (13.4%) II
4. Pigeons, other birds 1165 ( 3.8%)
5. Turkeys 5693 (18.4%) III
9 Year Code Mean (sd) : 1990.6 (16.7) 58 distinct values . . . . : : : : 30977 0
[numeric] min < med < max: : : : . : : : : : : (100.0%) (0.0%)
1961 < 1992 < 2018 : : : : : : : : : :
IQR (CV) : 29 (0) : : : : : : : : : :
: : : : : : : : : :
10 Year Mean (sd) : 1990.6 (16.7) 58 distinct values . . . . : : : : 30977 0
[numeric] min < med < max: : : : . : : : : : : (100.0%) (0.0%)
1961 < 1992 < 2018 : : : : : : : : : :
IQR (CV) : 29 (0) : : : : : : : : : :
: : : : : : : : : :
11 Unit 1. 1000 Head 30977 (100.0%) IIIIIIIIIIIIIIIIIIII 30977 0
[character] (100.0%) (0.0%)
12 Value Mean (sd) : 99410.6 (720611.4) 11495 distinct values : 29941 1036
[numeric] min < med < max: : (96.7%) (3.3%)
0 < 1800 < 23707134 :
IQR (CV) : 15233 (7.2) :
:
13 Flag 1. * 1494 ( 7.4%) I 20204 10773
[character] 2. A 6488 (32.1%) IIIIII (65.2%) (34.8%)
3. F 10007 (49.5%) IIIIIIIII
4. Im 1213 ( 6.0%) I
5. M 1002 ( 5.0%)
14 Flag Description 1. Aggregate, may include of 6488 (20.9%) IIII 30977 0
[character] 2. Data not available 1002 ( 3.2%) (100.0%) (0.0%)
3. FAO data based on imputat 1213 ( 3.9%)
4. FAO estimate 10007 (32.3%) IIIIII
5. Official data 10773 (34.8%) IIIIII
6. Unofficial figure 1494 ( 4.8%)
----------------------------------------------------------------------------------------------------------------------------
Checking the dimensions of the dataset:
It can be observed that there are 30977 rows and 14 columns
Displaying the column names of the dataset :
[1] "Domain Code" "Domain" "Area Code" "Area"
[5] "Element Code" "Element" "Item Code" "Item"
[9] "Year Code" "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
Number of unique years :
It can be observed that the dataset has the data for 58 unique years i.e 1961-2018
Number of unique wildbirds :
It can be observed that the dataset contains information about 5 different types of birds.
Number of unique Areas / countries:
It can be observed that the dataset contains information about 248 different areas / countires
---
title: "Challenge 1"
author: "Abhinav Reddy Yadatha"
description: "Reading in data and creating a post"
date: "02/26/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- my name
- dataset
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
dataframe <- read_csv('_data/birds.csv', show_col_types = FALSE)
head(dataframe)
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
Description : The dataset 'birds.csv' contains information about the population of wild birds like chicken, geese etc for a few countries anually from 1961 to 2018.
```{r}
#| Displaying the first few rows.
head(dataframe)
```
Displaying the summary of the dataset.
```{r}
#| label: summary
library(summarytools)
dfSummary(dataframe)
```
Checking the dimensions of the dataset:
```{r}
dim(dataframe)
#|
```
It can be observed that there are 30977 rows and 14 columns
Displaying the column names of the dataset :
```{r}
colnames(dataframe)
#| The dataset has 14 coulmns describing various fields such as above.
```
Number of unique years :
```{r}
unique_years <- dataframe%>% select(Year)%>% n_distinct(.)
unique_years
```
It can be observed that the dataset has the data for 58 unique years i.e 1961-2018
Number of unique wildbirds :
```{r}
unique_birds <- dataframe%>% select(Item)%>% n_distinct(.)
unique_birds
```
It can be observed that the dataset contains information about 5 different types of birds.
Number of unique Areas / countries:
```{r}
unique_areas <- dataframe%>% select(Area)%>% n_distinct(.)
unique_areas
```
It can be observed that the dataset contains information about 248 different areas / countires