Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Connor Skowyra
August 16, 2022
library(readxl) StateCounty2012.csv <- read_excel(“posts/_data/StateCounty2012.xls”)
library(readr) Birds.csv <- read_csv(“posts/_data/Birds.csv”)
The data for all sets were either created in Excel or CSV. To understand the data sets, first, we must load the data to be analyzed.
library(tidyverse) load(Birds.csv)
We can create a summary to tell us the min and max as well as mean and median for the data.
An example is: summary(Birds.csv)
To find out how many columns and rows are in the dataset, we can use the dim() code to help find this answer.
An example is dim(Birds.csv)
On the left side of the answer is the amount of rows, which is 30,977, while on the right is the amount of columns, which is 14.
Conduct some exploratory data analysis, using dplyr commands such as group_by()
, select()
, filter()
, and summarise()
. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
I decided to research the amounts of chickens in Poland by year. The reason I decided on this research is that I am Polish, I found it interesting to see the amount of chickens in Poland per year and how the values changes. Using Filters, Arrange, and Select functions, I manipulated the data to show me the only important columns to this project which are the area, item, year, and value.
filter(Birds.csv, Item
== ‘Chickens’ & Area
== ‘Poland’) %>% arrange(Year
) %>% select(Area,Item,Year,Value)
In the data, the value of chickens begin to increase on a yearly basis. The main reasoning for this fact is the modernization of Poland. The modernization of Poland comes with improved ways to take care of chickens, more people to feed due to a higher population, and higher GDP capita per person due to an economic boom through the years after Europe is rebuilt after World War II.
Finally, we need to summarize the data to be easily shared and discussed between groups of people with the intent of making decisions around the chicken population of Poland.
Birds %>% filter(
Item
== ‘Chickens’ &Area
== ‘Poland’) %>% group_by(Area) %>% select(Area,Item,Year,Value) %>% summarise(mean_value = mean(Value, na.rm=TRUE), median_value = median(Value, na.rm=TRUE))
Birds %>% filter(
Item
== ‘Chickens’ &Area
== ‘Poland’) %>% group_by(Area) %>% select(Area,Item,Year,Value) %>% summarise(min_value = min(Value, na.rm=TRUE), max_value = max(Value, na.rm=TRUE), sd_value= sd(Value, na.rm=TRUE), IQR_value = IQR(Value, na.rm = TRUE))
Using this code, we found the mean to be 85,845 chickens and median of 72.520 chickens in Poland for the last 50 years. Now, let’s compare the mean and median to a similar Eastern European country, Ukraine.
Birds %>% filter(Item
== ‘Chickens’ & Area
== ‘Ukraine’) %>% group_by(Area) %>% select(Area,Item,Year,Value) %>% summarise(mean_value = mean(Value, na.rm=TRUE), median_value = median(Value, na.rm=TRUE))
To find the difference between Poland and Ukraine’s Chicken, I subtracted the means and medians to give a final statement, Ukraine has 67,548 chickens more than that Poland on average over the last 50 years.
---
title: "Challenge 2"
author: "Connor Skowyra"
desription: "Data wrangling: using group() and summarise()"
date: "08/16/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_2
- railroads
- faostat
- hotel_bookings
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Reading State County 2012
> library(readxl)
> StateCounty2012.csv <- read_excel("posts/_data/StateCounty2012.xls")
## Reading Birds
> library(readr)
> Birds.csv <- read_csv("posts/_data/Birds.csv")
## Connor's Answer to Reading the data
The data for all sets were either created in Excel or CSV. To understand the data sets, first, we must load the data to be analyzed.
> library(tidyverse)
> load(Birds.csv)
We can create a summary to tell us the min and max as well as mean and median for the data.
An example is: summary(Birds.csv)
To find out how many columns and rows are in the dataset, we can use the dim() code to help find this answer.
An example is dim(Birds.csv)
On the left side of the answer is the amount of rows, which is 30,977, while on the right is the amount of columns, which is 14.
## Provide Grouped Summary Statistics
Conduct some exploratory data analysis, using dplyr commands such as `group_by()`, `select()`, `filter()`, and `summarise()`. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
## Exploratory Data Analysis
I decided to research the amounts of chickens in Poland by year. The reason I decided on this research is that I am Polish, I found it interesting to see the amount of chickens in Poland per year and how the values changes. Using Filters, Arrange, and Select functions, I manipulated the data to show me the only important columns to this project which are the area, item, year, and value.
filter(Birds.csv, `Item` == 'Chickens' & `Area` == 'Poland') %>% arrange(`Year`) %>% select(Area,Item,Year,Value)
In the data, the value of chickens begin to increase on a yearly basis. The main reasoning for this fact is the modernization of Poland. The modernization of Poland comes with improved ways to take care of chickens, more people to feed due to a higher population, and higher GDP capita per person due to an economic boom through the years after Europe is rebuilt after World War II.
Finally, we need to summarize the data to be easily shared and discussed between groups of people with the intent of making decisions around the chicken population of Poland.
## Mean and Median
> Birds %>% filter(`Item` == 'Chickens' & `Area` == 'Poland') %>% group_by(Area) %>% select(Area,Item,Year,Value) %>% summarise(mean_value = mean(Value, na.rm=TRUE), median_value = median(Value, na.rm=TRUE))
## Min/Max, SD, IQR
> Birds %>% filter(`Item` == 'Chickens' & `Area` == 'Poland') %>% group_by(Area) %>% select(Area,Item,Year,Value) %>% summarise(min_value = min(Value, na.rm=TRUE), max_value = max(Value, na.rm=TRUE), sd_value= sd(Value, na.rm=TRUE), IQR_value = IQR(Value, na.rm = TRUE))
Using this code, we found the mean to be 85,845 chickens and median of 72.520 chickens in Poland for the last 50 years. Now, let's compare the mean and median to a similar Eastern European country, Ukraine.
Birds %>% filter(`Item` == 'Chickens' & `Area` == 'Ukraine') %>% group_by(Area) %>% select(Area,Item,Year,Value) %>% summarise(mean_value = mean(Value, na.rm=TRUE), median_value = median(Value, na.rm=TRUE))
To find the difference between Poland and Ukraine's Chicken, I subtracted the means and medians to give a final statement, Ukraine has 67,548 chickens more than that Poland on average over the last 50 years.