Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Jaswanth Reddy Kommuru
May 8, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
I read the data from the csv file twice in the first step I read the whole csv file and in the second read I skipped the first 4 rows of the dataset.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The data appears to represent some sort of agricultural or livestock-related information. The rows appear to contain data entries for different years, areas, elements, items, and their corresponding values. Overall, the data seems to capture information related to live animal stocks, particularly chickens, in various countries over specific years.
# A tibble: 6 × 14
`Domain Code` Domain `Area Code` Area `Element Code` Element `Item Code`
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 QA Live Anima… 2 Afgh… 5112 Stocks 1057
2 QA Live Anima… 2 Afgh… 5112 Stocks 1057
3 QA Live Anima… 2 Afgh… 5112 Stocks 1057
4 QA Live Anima… 2 Afgh… 5112 Stocks 1057
5 QA Live Anima… 2 Afgh… 5112 Stocks 1057
6 QA Live Anima… 2 Afgh… 5112 Stocks 1057
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
# Value <dbl>, Flag <chr>, `Flag Description` <chr>
Having a look at the first 6 rows of the dataset to have an idea of what kind of data is available.
getting to know the dimensions of the dataset.
[1] "Domain Code" "Domain" "Area Code" "Area"
[5] "Element Code" "Element" "Item Code" "Item"
[9] "Year Code" "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
The column names of the dataset.
# A tibble: 5 × 14
`Domain Code` Domain `Area Code` Area `Element Code` Element `Item Code`
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 QA Live Anima… 5100 Afri… 5112 Stocks 1057
2 QA Live Anima… 5100 Afri… 5112 Stocks 1068
3 QA Live Anima… 5100 Afri… 5112 Stocks 1072
4 QA Live Anima… 5100 Afri… 5112 Stocks 1083
5 QA Live Anima… 5100 Afri… 5112 Stocks 1079
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
# Value <dbl>, Flag <chr>, `Flag Description` <chr>
The birds which have the year as 1968 and the country as Africa.
# A tibble: 248 × 1
Area
<chr>
1 Afghanistan
2 Albania
3 Algeria
4 American Samoa
5 Angola
6 Antigua and Barbuda
7 Argentina
8 Armenia
9 Aruba
10 Australia
# ℹ 238 more rows
The distinct area names where the birds live.
The count distinct area names where the birds live.
# A tibble: 30,977 × 9
Domain Area Element Item Year Unit Value Flag `Flag Description`
<chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 Live Animals Afghan… Stocks Chic… 1961 1000… 4700 F FAO estimate
2 Live Animals Afghan… Stocks Chic… 1962 1000… 4900 F FAO estimate
3 Live Animals Afghan… Stocks Chic… 1963 1000… 5000 F FAO estimate
4 Live Animals Afghan… Stocks Chic… 1964 1000… 5300 F FAO estimate
5 Live Animals Afghan… Stocks Chic… 1965 1000… 5500 F FAO estimate
6 Live Animals Afghan… Stocks Chic… 1966 1000… 5800 F FAO estimate
7 Live Animals Afghan… Stocks Chic… 1967 1000… 6600 F FAO estimate
8 Live Animals Afghan… Stocks Chic… 1968 1000… 6290 <NA> Official data
9 Live Animals Afghan… Stocks Chic… 1969 1000… 6300 F FAO estimate
10 Live Animals Afghan… Stocks Chic… 1970 1000… 6000 F FAO estimate
# ℹ 30,967 more rows
The columns which doesn’t have the word “Code” in their column name so that we can use them to group the data.
[1] "Value" "Flag"
Getting to know which columns are having atleast one NA value.
Domain Code Domain Area Code Area
Length:30977 Length:30977 Min. : 1 Length:30977
Class :character Class :character 1st Qu.: 79 Class :character
Mode :character Mode :character Median : 156 Mode :character
Mean :1202
3rd Qu.: 231
Max. :5504
Element Code Element Item Code Item
Min. :5112 Length:30977 Min. :1057 Length:30977
1st Qu.:5112 Class :character 1st Qu.:1057 Class :character
Median :5112 Mode :character Median :1068 Mode :character
Mean :5112 Mean :1066
3rd Qu.:5112 3rd Qu.:1072
Max. :5112 Max. :1083
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:30977 Min. : 0
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 171
Median :1992 Median :1992 Mode :character Median : 1800
Mean :1991 Mean :1991 Mean : 99411
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 15404
Max. :2018 Max. :2018 Max. :23707134
NA's :1036
Flag Flag Description
Length:30977 Length:30977
Class :character Class :character
Mode :character Mode :character
A brief summary of the dataset.
---
title: "Challenge 1"
author: "Jaswanth Reddy Kommuru"
description: "Reading in data and creating a post"
date: "05/08/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- Jaswanth Reddy Kommuru
- birds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
library(readxl)
birds<-read_csv("~/Documents/601/601_Spring_2023/posts/_data/birds.csv")
birds_skip <- read_csv("~/Documents/601/601_Spring_2023/posts/_data/birds.csv",skip=4)
```
I read the data from the csv file twice in the first step I read the whole csv file and in the second read I skipped the first 4 rows of the dataset.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The data appears to represent some sort of agricultural or livestock-related information.
The rows appear to contain data entries for different years, areas, elements, items, and their corresponding values.
Overall, the data seems to capture information related to live animal stocks, particularly chickens, in various countries over specific years.
```{r}
#| label: summary
head(birds)
```
Having a look at the first 6 rows of the dataset to have an idea of what kind of data is available.
```{r}
dim(birds)
```
getting to know the dimensions of the dataset.
```{r}
colnames(birds)
```
The column names of the dataset.
```{r}
filter(birds, `Year`==1968 & `Area`=="Africa")
```
The birds which have the year as 1968 and the country as Africa.
```{r}
birds%>%
select("Area") %>%
distinct(.)
```
The distinct area names where the birds live.
```{r}
birds%>%
select("Area") %>%
n_distinct(.)
```
The count distinct area names where the birds live.
```{r}
birds.sm<-birds%>%
select(-contains("Code"))
birds.sm
```
The columns which doesn't have the word "Code" in their column name so that we can use them to group the data.
```{r}
column_with_na<-birds %>%
select_if(~ any(is.na(.))) %>%
names()
column_with_na
```
Getting to know which columns are having atleast one NA value.
```{r}
summary(birds)
```
A brief summary of the dataset.