Data Import - With eggs_tidy.xlsx (clean dataset)
I did homework2 by using eggs_tidy.xlsx (clean dataset) from the example datasets.
knitr::opts_chunk$set(echo = TRUE)
#Set path
.libPaths() #Library path
[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library"
setwd("/Library/Frameworks/R.framework/Versions/4.1/Resources/library") #Set the working directory with the library path
getwd() # Check the working directory
[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library"
#Load pacakges
library(readxl)
library(dplyr)
#1. Read in a dataset.
eggs_data<-read_excel(path="/Users/eunsolnoh/Desktop/dacss601/eggs_tidy.xlsx")
eggs_data
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# … with 110 more rows, and 1 more variable: extra_large_dozen <dbl>
#2. Explain the variables in the dataset.
colnames(eggs_data)
[1] "month" "year"
[3] "large_half_dozen" "large_dozen"
[5] "extra_large_half_dozen" "extra_large_dozen"
# Month: Character
# Year, large_half_dozen, large_dozen, extra_large_half and extra_large_dozen: dbl (double)
#3. Perform basic data-wrangling operations.
dim(eggs_data)
[1] 120 6
#3.1
#Filtering the data only for January
eggs_data_january<-filter(eggs_data, month == "January")
#Arrage the filtered data from the oldest January to the latest January
eggs_data_arrange<-arrange(eggs_data_january,desc("year"))
eggs_data_arrange
# A tibble: 10 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 January 2005 128. 234. 136.
3 January 2006 128. 234. 136.
4 January 2007 128. 234. 136.
5 January 2008 132 237 139
6 January 2009 174. 278. 186.
7 January 2010 174. 272. 186.
8 January 2011 174. 268. 186.
9 January 2012 174. 268. 186.
10 January 2013 178 268. 188.
# … with 1 more variable: extra_large_dozen <dbl>
#3.2
#Arrange the years in the order of the filtered data (the # of extra_large_dozen) from the least to the most.
eggs_data %>%
select(year, extra_large_dozen) %>%
group_by(year) %>%
arrange(desc(extra_large_dozen)) %>%
slice(1)
# A tibble: 10 × 2
# Groups: year [10]
year extra_large_dozen
<dbl> <dbl>
1 2004 241
2 2005 241
3 2006 242.
4 2007 245
5 2008 286.
6 2009 286.
7 2010 286.
8 2011 286.
9 2012 290
10 2013 290
# If we consider that numbers to be the numbers of eggs consumed each year, we can see that there is a tendency for the number of eggs consumed to increase as the years increase (only based on the number of extra_large_dozen).
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Noh (2022, Feb. 23). Data Analytics and Computational Social Science: Homework2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh869331/
BibTeX citation
@misc{noh2022homework2, author = {Noh, Eunsol}, title = {Data Analytics and Computational Social Science: Homework2}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh869331/}, year = {2022} }