HW2 by Guodong Zhang

This is my Homework 2 for DACSS 601.

Guodong Zhang
2021-12-28

1. Read in a dataset.

The dataset, eggs_tidy.xlsx, is clean and comes from the “Sample Datasets” section on Google Classroom.

egg_data <- read_excel("C:/Users/zhang/OneDrive - University of Massachusetts/_601/Sample Datasets/eggs_tidy.xlsx")
egg_data
# A tibble: 120 x 6
   month      year large_half_dozen large_dozen extra_large_half_dozen
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>
 1 January    2004             126         230                    132 
 2 February   2004             128.        226.                   134.
 3 March      2004             131         225                    137 
 4 April      2004             131         225                    137 
 5 May        2004             131         225                    137 
 6 June       2004             134.        231.                   137 
 7 July       2004             134.        234.                   137 
 8 August     2004             134.        234.                   137 
 9 September  2004             130.        234.                   136.
10 October    2004             128.        234.                   136.
# ... with 110 more rows, and 1 more variable:
#   extra_large_dozen <dbl>

2. Explain the variables in your dataset.

I used \(str()\) function to check the data type of each variable.

str(egg_data)
tibble [120 x 6] (S3: tbl_df/tbl/data.frame)
 $ month                 : chr [1:120] "January" "February" "March" "April" ...
 $ year                  : num [1:120] 2004 2004 2004 2004 2004 ...
 $ large_half_dozen      : num [1:120] 126 128 131 131 131 ...
 $ large_dozen           : num [1:120] 230 226 225 225 225 ...
 $ extra_large_half_dozen: num [1:120] 132 134 137 137 137 ...
 $ extra_large_dozen     : num [1:120] 230 230 230 234 236 ...
Variable Data type Description
month Character Which month the data is from.
year Number Which year the data is from.
large_half_dozen Number How many large-half-dozen eggs.
large_dozen Number How many large-dozen eggs.
extra_large_half_dozen Number How many extra-large-half-dozen eggs.
extra_large_dozen Number How many extra-large-dozen eggs.

3. Demonstrate your knowledge.

The following code show the top five years of February with the highest number of extra large eggs.

egg_data %>%
  filter(month=="February") %>%
  arrange(desc(extra_large_half_dozen),desc(extra_large_dozen)) %>%
  select(`year`,contains("extra")) %>%
  head(5)
# A tibble: 5 x 3
   year extra_large_half_dozen extra_large_dozen
  <dbl>                  <dbl>             <dbl>
1  2013                   188.              290 
2  2012                   186.              288.
3  2009                   186.              286.
4  2010                   186.              286.
5  2011                   186.              286.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Zhang (2021, Dec. 28). Data Analytics and Computational Social Science: HW2 by Guodong Zhang. Retrieved from https://github.com/DACSS/dacss_course_website/posts/hw2-by-guodong-zhang/

BibTeX citation

@misc{zhang2021hw2,
  author = {Zhang, Guodong},
  title = {Data Analytics and Computational Social Science: HW2 by Guodong Zhang},
  url = {https://github.com/DACSS/dacss_course_website/posts/hw2-by-guodong-zhang/},
  year = {2021}
}