Code
library(tidyverse)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Lai Wei
November 8, 2022
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 383 × 16
Year House…¹ Total incom…² incom…³ incom…⁴ incom…⁵ incom…⁶ incom…⁷ incom…⁸
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ALL RA… <NA> NA NA NA NA NA NA NA NA
2 2019 128451 100 9.1 8 8.3 11.7 16.5 12.3 15.5
3 2018 128579 100 10.1 8.8 8.7 12 17 12.5 15
4 2017 2 127669 100 10 9.1 9.2 12 16.4 12.4 14.7
5 2017 127586 100 10.1 9.1 9.2 11.9 16.3 12.6 14.8
6 2016 126224 100 10.4 9 9.2 12.3 16.7 12.2 15
7 2015 125819 100 10.6 10 9.6 12.1 16.1 12.4 14.9
8 2014 124587 100 11.4 10.5 9.6 12.6 16.4 12.1 14
9 2013 3 123931 100 11.4 10.3 9.5 12.5 16.8 12 13.9
10 2013 4 122952 100 11.3 10.4 9.7 13.1 17 12.5 13.6
# … with 373 more rows, 6 more variables: income_i8 <dbl>, income_i9 <dbl>,
# Median_income <dbl>, MarginOfError...14 <dbl>, Mean_income <chr>,
# MarginOfError...16 <chr>, and abbreviated variable names ¹Household,
# ²income_i1, ³income_i2, ⁴income_i3, ⁵income_i4, ⁶income_i5, ⁷income_i6,
# ⁸income_i7
Read the data and rename the columns.
Existing the number of rows
There are 383 statistics in different years in total.
# A tibble: 16 × 16
Year Household Total incom…¹ incom…² incom…³ incom…⁴ incom…⁵ incom…⁶ incom…⁷
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017 127586 100 10.1 9.1 9.2 11.9 16.3 12.6 14.8
2 2007 116783 100 10 10 9 12.8 17.3 12.6 15.3
3 2017 100065 100 8.6 8.7 8.9 11.7 16.5 12.9 15.7
4 2007 95112 100 8.4 9.6 8.9 12.6 17.5 13 16.1
5 2017 84681 100 8 8.5 8.5 11.2 16.1 13.1 16.3
6 2007 82765 100 7.9 9.2 8.5 12 17.3 13.2 16.9
7 2017 17801 100 18.9 12.1 11.8 13.8 15.7 10.6 9.9
8 2007 14976 100 19.5 13.1 10.5 14.7 16.6 9.8 10.2
9 2017 16997 100 19.1 12.2 11.8 13.9 15.7 10.3 9.8
10 2007 14551 100 19.6 13.1 10.5 14.7 16.7 9.8 10
11 2017 7114 100 8.6 6.3 6 9.3 14.4 12.6 16.3
12 2007 4715 100 8.6 6.6 6.6 9.7 15.4 12.8 19.1
13 2017 6735 100 8.7 6.4 5.7 9.1 14.4 12.8 16.3
14 2007 4494 100 8.6 6.7 6.5 9.5 15.5 12.5 19.5
15 2017 17318 100 11.6 10.3 11.4 14.3 18.5 12.5 12.2
16 2007 13339 100 12.5 12.3 11.5 16.6 18.6 11.9 10.5
# … with 6 more variables: income_i8 <dbl>, income_i9 <dbl>,
# Median_income <dbl>, MarginOfError...14 <dbl>, Mean_income <chr>,
# MarginOfError...16 <chr>, and abbreviated variable names ¹income_i1,
# ²income_i2, ³income_i3, ⁴income_i4, ⁵income_i5, ⁶income_i6, ⁷income_i7
Use the filter() to get the information in 2007 and 2017.
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 8 × 2
Household `2018`
<chr> <dbl>
1 128579 64324
2 100528 68156
3 84727 71922
4 18095 42447
5 17167 42110
6 7416 88388
7 6981 88774
8 17758 52382
By using pivot_wider() function, we can get a new table that showing in year 2018, how much each household have in the median range.
---
title: "Challenge 3"
author: "Lai Wei"
desription: "Tidy Data: Pivoting"
date: "11/08/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- australian_marriage
- usa_households
- sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
Households <- read_excel("_data/USA Households by Total Money Income, Race, and Hispanic Origin of Householder 1967 to 2019.xlsx",
skip = 5,
col_names = c("Year","Household","Total",str_c("income",1:9,sep = "_i"),"Median_income","MarginOfError","Mean_income","MarginOfError"))
Households
```
Read the data and rename the columns.
### Briefly describe the data
Existing the number of rows
```{r}
nrow(Households)
```
There are 383 statistics in different years in total.
### Challenge: Describe the final dimensions
```{r}
Households %>%
filter(Year%in%c(2007,2017))
```
Use the filter() to get the information in 2007 and 2017.
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
### Pivot_Wider
```{r}
Households %>%
filter(Year%in%2018) %>%
select(Household,Year,Median_income) %>%
pivot_wider(values_from = Median_income ,names_from = Year)
```
By using pivot_wider() function, we can get a new table that showing in year 2018, how much each household have in the median range.