This is my Homework 2 for DACSS 601.
The dataset, eggs_tidy.xlsx, is clean and comes from the “Sample Datasets” section on Google Classroom.
egg_data <- read_excel("C:/Users/zhang/OneDrive - University of Massachusetts/_601/Sample Datasets/eggs_tidy.xlsx")
egg_data
# A tibble: 120 x 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# ... with 110 more rows, and 1 more variable:
# extra_large_dozen <dbl>
I used \(str()\) function to check the data type of each variable.
str(egg_data)
tibble [120 x 6] (S3: tbl_df/tbl/data.frame)
$ month : chr [1:120] "January" "February" "March" "April" ...
$ year : num [1:120] 2004 2004 2004 2004 2004 ...
$ large_half_dozen : num [1:120] 126 128 131 131 131 ...
$ large_dozen : num [1:120] 230 226 225 225 225 ...
$ extra_large_half_dozen: num [1:120] 132 134 137 137 137 ...
$ extra_large_dozen : num [1:120] 230 230 230 234 236 ...
Variable | Data type | Description |
---|---|---|
month | Character | Which month the data is from. |
year | Number | Which year the data is from. |
large_half_dozen | Number | How many large-half-dozen eggs. |
large_dozen | Number | How many large-dozen eggs. |
extra_large_half_dozen | Number | How many extra-large-half-dozen eggs. |
extra_large_dozen | Number | How many extra-large-dozen eggs. |
The following code show the top five years of February with the highest number of extra large eggs.
egg_data %>%
filter(month=="February") %>%
arrange(desc(extra_large_half_dozen),desc(extra_large_dozen)) %>%
select(`year`,contains("extra")) %>%
head(5)
# A tibble: 5 x 3
year extra_large_half_dozen extra_large_dozen
<dbl> <dbl> <dbl>
1 2013 188. 290
2 2012 186. 288.
3 2009 186. 286.
4 2010 186. 286.
5 2011 186. 286.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Zhang (2021, Dec. 28). Data Analytics and Computational Social Science: HW2 by Guodong Zhang. Retrieved from https://github.com/DACSS/dacss_course_website/posts/hw2-by-guodong-zhang/
BibTeX citation
@misc{zhang2021hw2, author = {Zhang, Guodong}, title = {Data Analytics and Computational Social Science: HW2 by Guodong Zhang}, url = {https://github.com/DACSS/dacss_course_website/posts/hw2-by-guodong-zhang/}, year = {2021} }