Challenge 3

challenge_3

animal_weights

eggs

australian_marriage

usa_households

sce_labor

Author

Lai Wei

Published

November 8, 2022

Code

library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

animal_weights.csv ⭐
eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
australian_marriage*.xls ⭐⭐⭐
USA Households*.xlsx ⭐⭐⭐⭐
sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟

Code

Households <- read_excel("_data/USA Households by Total Money Income, Race, and Hispanic Origin of Householder 1967 to 2019.xlsx",
                         skip = 5,
                         col_names = c("Year","Household","Total",str_c("income",1:9,sep = "_i"),"Median_income","MarginOfError","Mean_income","MarginOfError"))
Households

# A tibble: 383 × 16
   Year    House…¹ Total incom…² incom…³ incom…⁴ incom…⁵ incom…⁶ incom…⁷ incom…⁸
   <chr>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 ALL RA… <NA>       NA    NA      NA      NA      NA      NA      NA      NA  
 2 2019    128451    100     9.1     8       8.3    11.7    16.5    12.3    15.5
 3 2018    128579    100    10.1     8.8     8.7    12      17      12.5    15  
 4 2017 2  127669    100    10       9.1     9.2    12      16.4    12.4    14.7
 5 2017    127586    100    10.1     9.1     9.2    11.9    16.3    12.6    14.8
 6 2016    126224    100    10.4     9       9.2    12.3    16.7    12.2    15  
 7 2015    125819    100    10.6    10       9.6    12.1    16.1    12.4    14.9
 8 2014    124587    100    11.4    10.5     9.6    12.6    16.4    12.1    14  
 9 2013 3  123931    100    11.4    10.3     9.5    12.5    16.8    12      13.9
10 2013 4  122952    100    11.3    10.4     9.7    13.1    17      12.5    13.6
# … with 373 more rows, 6 more variables: income_i8 <dbl>, income_i9 <dbl>,
#   Median_income <dbl>, MarginOfError...14 <dbl>, Mean_income <chr>,
#   MarginOfError...16 <chr>, and abbreviated variable names ¹Household,
#   ²income_i1, ³income_i2, ⁴income_i3, ⁵income_i4, ⁶income_i5, ⁷income_i6,
#   ⁸income_i7

Read the data and rename the columns.

Briefly describe the data

Existing the number of rows

Code

nrow(Households)

[1] 383

There are 383 statistics in different years in total.

Challenge: Describe the final dimensions

Code

Households %>% 
  filter(Year%in%c(2007,2017))

# A tibble: 16 × 16
   Year  Household Total incom…¹ incom…² incom…³ incom…⁴ incom…⁵ incom…⁶ incom…⁷
   <chr> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 2017  127586      100    10.1     9.1     9.2    11.9    16.3    12.6    14.8
 2 2007  116783      100    10      10       9      12.8    17.3    12.6    15.3
 3 2017  100065      100     8.6     8.7     8.9    11.7    16.5    12.9    15.7
 4 2007  95112       100     8.4     9.6     8.9    12.6    17.5    13      16.1
 5 2017  84681       100     8       8.5     8.5    11.2    16.1    13.1    16.3
 6 2007  82765       100     7.9     9.2     8.5    12      17.3    13.2    16.9
 7 2017  17801       100    18.9    12.1    11.8    13.8    15.7    10.6     9.9
 8 2007  14976       100    19.5    13.1    10.5    14.7    16.6     9.8    10.2
 9 2017  16997       100    19.1    12.2    11.8    13.9    15.7    10.3     9.8
10 2007  14551       100    19.6    13.1    10.5    14.7    16.7     9.8    10  
11 2017  7114        100     8.6     6.3     6       9.3    14.4    12.6    16.3
12 2007  4715        100     8.6     6.6     6.6     9.7    15.4    12.8    19.1
13 2017  6735        100     8.7     6.4     5.7     9.1    14.4    12.8    16.3
14 2007  4494        100     8.6     6.7     6.5     9.5    15.5    12.5    19.5
15 2017  17318       100    11.6    10.3    11.4    14.3    18.5    12.5    12.2
16 2007  13339       100    12.5    12.3    11.5    16.6    18.6    11.9    10.5
# … with 6 more variables: income_i8 <dbl>, income_i9 <dbl>,
#   Median_income <dbl>, MarginOfError...14 <dbl>, Mean_income <chr>,
#   MarginOfError...16 <chr>, and abbreviated variable names ¹income_i1,
#   ²income_i2, ³income_i3, ⁴income_i4, ⁵income_i5, ⁶income_i6, ⁷income_i7

Use the filter() to get the information in 2007 and 2017.

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Pivot_Wider

Code

Households %>% 
  filter(Year%in%2018) %>% 
  select(Household,Year,Median_income) %>% 
  pivot_wider(values_from = Median_income ,names_from = Year)

# A tibble: 8 × 2
  Household `2018`
  <chr>      <dbl>
1 128579     64324
2 100528     68156
3 84727      71922
4 18095      42447
5 17167      42110
6 7416       88388
7 6981       88774
8 17758      52382

By using pivot_wider() function, we can get a new table that showing in year 2018, how much each household have in the median range.