Code
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Lai Wei
November 8, 2022
Today’s challenge is to:
pivot_longerRead in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 383 × 16
   Year    House…¹ Total incom…² incom…³ incom…⁴ incom…⁵ incom…⁶ incom…⁷ incom…⁸
   <chr>   <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 ALL RA… <NA>       NA    NA      NA      NA      NA      NA      NA      NA  
 2 2019    128451    100     9.1     8       8.3    11.7    16.5    12.3    15.5
 3 2018    128579    100    10.1     8.8     8.7    12      17      12.5    15  
 4 2017 2  127669    100    10       9.1     9.2    12      16.4    12.4    14.7
 5 2017    127586    100    10.1     9.1     9.2    11.9    16.3    12.6    14.8
 6 2016    126224    100    10.4     9       9.2    12.3    16.7    12.2    15  
 7 2015    125819    100    10.6    10       9.6    12.1    16.1    12.4    14.9
 8 2014    124587    100    11.4    10.5     9.6    12.6    16.4    12.1    14  
 9 2013 3  123931    100    11.4    10.3     9.5    12.5    16.8    12      13.9
10 2013 4  122952    100    11.3    10.4     9.7    13.1    17      12.5    13.6
# … with 373 more rows, 6 more variables: income_i8 <dbl>, income_i9 <dbl>,
#   Median_income <dbl>, MarginOfError...14 <dbl>, Mean_income <chr>,
#   MarginOfError...16 <chr>, and abbreviated variable names ¹Household,
#   ²income_i1, ³income_i2, ⁴income_i3, ⁵income_i4, ⁶income_i5, ⁷income_i6,
#   ⁸income_i7Read the data and rename the columns.
Existing the number of rows
There are 383 statistics in different years in total.
# A tibble: 16 × 16
   Year  Household Total incom…¹ incom…² incom…³ incom…⁴ incom…⁵ incom…⁶ incom…⁷
   <chr> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 2017  127586      100    10.1     9.1     9.2    11.9    16.3    12.6    14.8
 2 2007  116783      100    10      10       9      12.8    17.3    12.6    15.3
 3 2017  100065      100     8.6     8.7     8.9    11.7    16.5    12.9    15.7
 4 2007  95112       100     8.4     9.6     8.9    12.6    17.5    13      16.1
 5 2017  84681       100     8       8.5     8.5    11.2    16.1    13.1    16.3
 6 2007  82765       100     7.9     9.2     8.5    12      17.3    13.2    16.9
 7 2017  17801       100    18.9    12.1    11.8    13.8    15.7    10.6     9.9
 8 2007  14976       100    19.5    13.1    10.5    14.7    16.6     9.8    10.2
 9 2017  16997       100    19.1    12.2    11.8    13.9    15.7    10.3     9.8
10 2007  14551       100    19.6    13.1    10.5    14.7    16.7     9.8    10  
11 2017  7114        100     8.6     6.3     6       9.3    14.4    12.6    16.3
12 2007  4715        100     8.6     6.6     6.6     9.7    15.4    12.8    19.1
13 2017  6735        100     8.7     6.4     5.7     9.1    14.4    12.8    16.3
14 2007  4494        100     8.6     6.7     6.5     9.5    15.5    12.5    19.5
15 2017  17318       100    11.6    10.3    11.4    14.3    18.5    12.5    12.2
16 2007  13339       100    12.5    12.3    11.5    16.6    18.6    11.9    10.5
# … with 6 more variables: income_i8 <dbl>, income_i9 <dbl>,
#   Median_income <dbl>, MarginOfError...14 <dbl>, Mean_income <chr>,
#   MarginOfError...16 <chr>, and abbreviated variable names ¹income_i1,
#   ²income_i2, ³income_i3, ⁴income_i4, ⁵income_i5, ⁶income_i6, ⁷income_i7Use the filter() to get the information in 2007 and 2017.
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 8 × 2
  Household `2018`
  <chr>      <dbl>
1 128579     64324
2 100528     68156
3 84727      71922
4 18095      42447
5 17167      42110
6 7416       88388
7 6981       88774
8 17758      52382By using pivot_wider() function, we can get a new table that showing in year 2018, how much each household have in the median range.
---
title: "Challenge 3"
author: "Lai Wei"
desription: "Tidy Data: Pivoting"
date: "11/08/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_3
  - animal_weights
  - eggs
  - australian_marriage
  - usa_households
  - sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1.  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2.  identify what needs to be done to tidy the current data
3.  anticipate the shape of pivoted data
4.  pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
-   animal_weights.csv ⭐
-   eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
-   australian_marriage\*.xls ⭐⭐⭐
-   USA Households\*.xlsx ⭐⭐⭐⭐
-   sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
Households <- read_excel("_data/USA Households by Total Money Income, Race, and Hispanic Origin of Householder 1967 to 2019.xlsx",
                         skip = 5,
                         col_names = c("Year","Household","Total",str_c("income",1:9,sep = "_i"),"Median_income","MarginOfError","Mean_income","MarginOfError"))
Households
```
Read the data and rename the columns. 
### Briefly describe the data
Existing the number of rows
```{r}
nrow(Households)
```
There are 383 statistics in different years in total. 
### Challenge: Describe the final dimensions
```{r}
Households %>% 
  filter(Year%in%c(2007,2017)) 
```
Use the filter() to get the information in 2007 and 2017.
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
### Pivot_Wider
```{r}
Households %>% 
  filter(Year%in%2018) %>% 
  select(Household,Year,Median_income) %>% 
  pivot_wider(values_from = Median_income ,names_from = Year)
```
By using pivot_wider() function, we can get a new table that showing in year 2018, how much each household have in the median range.