Data Analytics and Computational Social Science: Homework2

Eunsol Noh

I did homework2 by using eggs_tidy.xlsx (clean dataset) from the example datasets.

knitr::opts_chunk$set(echo = TRUE)

#Set path
.libPaths() #Library path

[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library"

setwd("/Library/Frameworks/R.framework/Versions/4.1/Resources/library") #Set the working directory with the library path
getwd() # Check the working directory

[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library"

#Load pacakges
library(readxl)
library(dplyr)

#1. Read in a dataset.
eggs_data<-read_excel(path="/Users/eunsolnoh/Desktop/dacss601/eggs_tidy.xlsx")
eggs_data

# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>
 1 January    2004             126         230                    132 
 2 February   2004             128.        226.                   134.
 3 March      2004             131         225                    137 
 4 April      2004             131         225                    137 
 5 May        2004             131         225                    137 
 6 June       2004             134.        231.                   137 
 7 July       2004             134.        234.                   137 
 8 August     2004             134.        234.                   137 
 9 September  2004             130.        234.                   136.
10 October    2004             128.        234.                   136.
# … with 110 more rows, and 1 more variable: extra_large_dozen <dbl>

#2. Explain the variables in the dataset.
colnames(eggs_data)

[1] "month"                  "year"                  
[3] "large_half_dozen"       "large_dozen"           
[5] "extra_large_half_dozen" "extra_large_dozen"

# Month: Character
# Year, large_half_dozen, large_dozen, extra_large_half and extra_large_dozen: dbl (double)

#3. Perform basic data-wrangling operations.
dim(eggs_data)

[1] 120   6

#3.1
#Filtering the data only for January 
eggs_data_january<-filter(eggs_data, month == "January") 

#Arrage the filtered data from the oldest January to the latest January
eggs_data_arrange<-arrange(eggs_data_january,desc("year"))
eggs_data_arrange

# A tibble: 10 × 6
   month    year large_half_dozen large_dozen extra_large_half_dozen
   <chr>   <dbl>            <dbl>       <dbl>                  <dbl>
 1 January  2004             126         230                    132 
 2 January  2005             128.        234.                   136.
 3 January  2006             128.        234.                   136.
 4 January  2007             128.        234.                   136.
 5 January  2008             132         237                    139 
 6 January  2009             174.        278.                   186.
 7 January  2010             174.        272.                   186.
 8 January  2011             174.        268.                   186.
 9 January  2012             174.        268.                   186.
10 January  2013             178         268.                   188.
# … with 1 more variable: extra_large_dozen <dbl>

#3.2
#Arrange the years in the order of the filtered data (the # of extra_large_dozen) from the least to the most.

eggs_data %>%
  select(year, extra_large_dozen) %>%
  group_by(year) %>%
  arrange(desc(extra_large_dozen)) %>%
  slice(1)

# A tibble: 10 × 2
# Groups:   year [10]
    year extra_large_dozen
   <dbl>             <dbl>
 1  2004              241 
 2  2005              241 
 3  2006              242.
 4  2007              245 
 5  2008              286.
 6  2009              286.
 7  2010              286.
 8  2011              286.
 9  2012              290 
10  2013              290

# If we consider that numbers to be the numbers of eggs consumed each year, we can see that there is a tendency for the number of eggs consumed to increase as the years increase (only based on the number of extra_large_dozen).

Comment on this article Share:

Homework2

Reuse

Citation