DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Homework3

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data

Homework3

Homework3
Homework3
Author

Tejaswini_Ketineni

Published

November 28, 2022

library(tidyverse)
library(ggplot2)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

The emphasis in this homework is on exploratory data analysis using both graphics and statistics. You should build on your prior homework - incorporating any feedback and adjusting the code and text as needed. These homeworks are intended to be cumulative. Therefore, while it is fine to switch datasets, you will need to include all of the information from HW1 for your new (or old) dataset in this hw submission as well.

Include descriptive statistics (e.g, mean, median, and standard deviation for numerical variables, and frequencies and/or mode for categorical variables Include relevant visualizations using ggplot2 to complement these descriptive statistics. Be sure to use faceting, coloring, and titles as needed. Each visualization should be accompanied by descriptive text that highlights: the variable(s) used what questions might be answered with the visualizations what conclusions you can draw Use group_by() and summarise() to compute descriptive stats and/or visualizations for any relevant groupings. For example, if you were interested in how average income varies by state, you might compute mean income for all states combined, and then compare this to the range and distribution of mean income for each individual state in the US. Identify limitations of your visualization, such as: What questions are left unanswered with your visualizations What about the visualizations may be unclear to a naive viewer How could you improve the visualizations for the final project

Read in data

FedFundsRate <- read_csv("_data/FedFundsRate.csv")
head(FedFundsRate)
# A tibble: 6 × 10
   Year Month   Day Federal Fu…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
  <dbl> <dbl> <dbl>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1  1954     7     1           NA      NA      NA    0.8      4.6     5.8      NA
2  1954     8     1           NA      NA      NA    1.22    NA       6        NA
3  1954     9     1           NA      NA      NA    1.06    NA       6.1      NA
4  1954    10     1           NA      NA      NA    0.85     8       5.7      NA
5  1954    11     1           NA      NA      NA    0.83    NA       5.3      NA
6  1954    12     1           NA      NA      NA    1.28    NA       5        NA
# … with abbreviated variable names ¹​`Federal Funds Target Rate`,
#   ²​`Federal Funds Upper Target`, ³​`Federal Funds Lower Target`,
#   ⁴​`Effective Federal Funds Rate`, ⁵​`Real GDP (Percent Change)`,
#   ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`
colnames(FedFundsRate)
 [1] "Year"                         "Month"                       
 [3] "Day"                          "Federal Funds Target Rate"   
 [5] "Federal Funds Upper Target"   "Federal Funds Lower Target"  
 [7] "Effective Federal Funds Rate" "Real GDP (Percent Change)"   
 [9] "Unemployment Rate"            "Inflation Rate"              
library(funModeling)
plot_num(FedFundsRate)

sapply(FedFundsRate,function(x)sum(is.na(x)))
                        Year                        Month 
                           0                            0 
                         Day    Federal Funds Target Rate 
                           0                          442 
  Federal Funds Upper Target   Federal Funds Lower Target 
                         801                          801 
Effective Federal Funds Rate    Real GDP (Percent Change) 
                         152                          654 
           Unemployment Rate               Inflation Rate 
                         152                          194 

There are so many NA values in each of the columns, but we would maintain them as such as removing would lead to huge data loss.

sapply(FedFundsRate,function(x)sum(is.null(x)))
                        Year                        Month 
                           0                            0 
                         Day    Federal Funds Target Rate 
                           0                            0 
  Federal Funds Upper Target   Federal Funds Lower Target 
                           0                            0 
Effective Federal Funds Rate    Real GDP (Percent Change) 
                           0                            0 
           Unemployment Rate               Inflation Rate 
                           0                            0 
print(summarytools::dfSummary(FedFundsRate,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

FedFundsRate

Dimensions: 904 x 10
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
Year [numeric]
Mean (sd) : 1986.7 (17.2)
min ≤ med ≤ max:
1954 ≤ 1987.5 ≤ 2017
IQR (CV) : 28 (0)
64 distinct values 0 (0.0%)
Month [numeric]
Mean (sd) : 6.6 (3.5)
min ≤ med ≤ max:
1 ≤ 7 ≤ 12
IQR (CV) : 6 (0.5)
12 distinct values 0 (0.0%)
Day [numeric]
Mean (sd) : 3.6 (6.8)
min ≤ med ≤ max:
1 ≤ 1 ≤ 31
IQR (CV) : 0 (1.9)
29 distinct values 0 (0.0%)
Federal Funds Target Rate [numeric]
Mean (sd) : 5.7 (2.6)
min ≤ med ≤ max:
1 ≤ 5.5 ≤ 11.5
IQR (CV) : 4 (0.5)
63 distinct values 442 (48.9%)
Federal Funds Upper Target [numeric]
Mean (sd) : 0.3 (0.1)
min ≤ med ≤ max:
0.2 ≤ 0.2 ≤ 1
IQR (CV) : 0 (0.5)
4 distinct values 801 (88.6%)
Federal Funds Lower Target [numeric]
Mean (sd) : 0.1 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0.8
IQR (CV) : 0 (2.4)
4 distinct values 801 (88.6%)
Effective Federal Funds Rate [numeric]
Mean (sd) : 4.9 (3.6)
min ≤ med ≤ max:
0.1 ≤ 4.7 ≤ 19.1
IQR (CV) : 4.2 (0.7)
466 distinct values 152 (16.8%)
Real GDP (Percent Change) [numeric]
Mean (sd) : 3.1 (3.6)
min ≤ med ≤ max:
-10 ≤ 3.1 ≤ 16.5
IQR (CV) : 3.5 (1.1)
113 distinct values 654 (72.3%)
Unemployment Rate [numeric]
Mean (sd) : 6 (1.6)
min ≤ med ≤ max:
3.4 ≤ 5.7 ≤ 10.8
IQR (CV) : 2.1 (0.3)
71 distinct values 152 (16.8%)
Inflation Rate [numeric]
Mean (sd) : 3.7 (2.6)
min ≤ med ≤ max:
0.6 ≤ 2.8 ≤ 13.6
IQR (CV) : 2.7 (0.7)
106 distinct values 194 (21.5%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2022-12-21

FedFundsRate_mutate <- FedFundsRate%>%
  mutate(Date = str_c(Day,Month,Year,sep="/"),Date = dmy(Date))
head(FedFundsRate_mutate)
# A tibble: 6 × 11
   Year Month   Day Federal Fu…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
  <dbl> <dbl> <dbl>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1  1954     7     1           NA      NA      NA    0.8      4.6     5.8      NA
2  1954     8     1           NA      NA      NA    1.22    NA       6        NA
3  1954     9     1           NA      NA      NA    1.06    NA       6.1      NA
4  1954    10     1           NA      NA      NA    0.85     8       5.7      NA
5  1954    11     1           NA      NA      NA    0.83    NA       5.3      NA
6  1954    12     1           NA      NA      NA    1.28    NA       5        NA
# … with 1 more variable: Date <date>, and abbreviated variable names
#   ¹​`Federal Funds Target Rate`, ²​`Federal Funds Upper Target`,
#   ³​`Federal Funds Lower Target`, ⁴​`Effective Federal Funds Rate`,
#   ⁵​`Real GDP (Percent Change)`, ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`
FedFundsRate_mutate = subset(FedFundsRate_mutate,select = -c(Day,Month,Year))
head(FedFundsRate_mutate)
# A tibble: 6 × 8
  Federal Funds Tar…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷ Date      
                <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <date>    
1                  NA      NA      NA    0.8      4.6     5.8      NA 1954-07-01
2                  NA      NA      NA    1.22    NA       6        NA 1954-08-01
3                  NA      NA      NA    1.06    NA       6.1      NA 1954-09-01
4                  NA      NA      NA    0.85     8       5.7      NA 1954-10-01
5                  NA      NA      NA    0.83    NA       5.3      NA 1954-11-01
6                  NA      NA      NA    1.28    NA       5        NA 1954-12-01
# … with abbreviated variable names ¹​`Federal Funds Target Rate`,
#   ²​`Federal Funds Upper Target`, ³​`Federal Funds Lower Target`,
#   ⁴​`Effective Federal Funds Rate`, ⁵​`Real GDP (Percent Change)`,
#   ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`
FedFundsRate_mutate <- FedFundsRate_mutate %>%
  pivot_longer(col = c (`Federal Funds Target Rate`, `Effective Federal Funds Rate`, `Real GDP (Percent Change)`, `Unemployment Rate`, `Inflation Rate`), names_to = "Economic indicators", values_to = "Economic Indicator Value")
head(FedFundsRate_mutate)
# A tibble: 6 × 5
  `Federal Funds Upper Target` Federal Funds Lower …¹ Date       Econo…² Econo…³
                         <dbl>                  <dbl> <date>     <chr>     <dbl>
1                           NA                     NA 1954-07-01 Federa…    NA  
2                           NA                     NA 1954-07-01 Effect…     0.8
3                           NA                     NA 1954-07-01 Real G…     4.6
4                           NA                     NA 1954-07-01 Unempl…     5.8
5                           NA                     NA 1954-07-01 Inflat…    NA  
6                           NA                     NA 1954-08-01 Federa…    NA  
# … with abbreviated variable names ¹​`Federal Funds Lower Target`,
#   ²​`Economic indicators`, ³​`Economic Indicator Value`
ggplot(FedFundsRate_mutate, aes(`Date`, `Economic Indicator Value`, color = `Economic indicators`)) + geom_line(na.rm = TRUE) + labs(title = "Economic Rates over Time")

ggplot(FedFundsRate_mutate, aes(Date,`Economic Indicator Value`,color = "lightred")) +geom_point() +labs(title = "Economic Indicators across the Years")+facet_wrap(vars(`Economic indicators`))

ggplot(FedFundsRate_mutate, aes(`Date`, `Economic Indicator Value`, color = `Economic indicators` )) + geom_line(na.rm = TRUE) + labs(title = "Economic Rates over Time")+facet_wrap(vars(`Economic indicators`))

The new features built in homework3 are the pivot and mutate which is not added in homework 2, I used better plots for the pivots.

Source Code
---
title: "Homework3"
author: "Tejaswini_Ketineni"
description: "Homework3"
date: "11/28/2022"
format:
  html:
    toc: true
    code-copy: true
    code-tools: true
categories:
  - Homework3
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(ggplot2)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

The emphasis in this homework is on exploratory data analysis using both graphics and statistics. You should build on your prior homework - incorporating any feedback and adjusting the code and text as needed. These homeworks are intended to be cumulative. Therefore, while it is fine to switch datasets, you will need to include all of the information from HW1 for your new (or old) dataset in this hw submission as well. 

Include descriptive statistics (e.g, mean, median, and standard deviation for numerical variables, and frequencies and/or mode for categorical variables
Include relevant visualizations using ggplot2 to complement these descriptive statistics. Be sure to use faceting, coloring, and titles as needed. Each visualization should be accompanied by descriptive text that highlights:
     the variable(s) used
     what questions might be answered with the visualizations
     what conclusions you can draw
Use group_by() and summarise() to compute descriptive stats and/or visualizations for any relevant groupings. For example, if you were interested in how average income varies by state, you might compute mean income for all states combined, and then compare this to the range and distribution of mean income for each individual state in the US.
Identify limitations of your visualization, such as:
    What questions are left unanswered with your visualizations
    What about the visualizations may be unclear to a naive viewer
    How could you improve the visualizations for the final project

## Read in data

```{r}
FedFundsRate <- read_csv("_data/FedFundsRate.csv")
```

```{r}
head(FedFundsRate)
```
```{r}
colnames(FedFundsRate)
```

```{r}
library(funModeling)
plot_num(FedFundsRate)
```
```{r}
sapply(FedFundsRate,function(x)sum(is.na(x)))
```
There are so many NA values in each of the columns, but we would maintain them as such as removing would lead to huge data loss.

```{r}
sapply(FedFundsRate,function(x)sum(is.null(x)))
```

```{r}
print(summarytools::dfSummary(FedFundsRate,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')
```

```{r}
FedFundsRate_mutate <- FedFundsRate%>%
  mutate(Date = str_c(Day,Month,Year,sep="/"),Date = dmy(Date))
head(FedFundsRate_mutate)
```

```{r}
FedFundsRate_mutate = subset(FedFundsRate_mutate,select = -c(Day,Month,Year))
```

```{r}
head(FedFundsRate_mutate)
```
```{r}
FedFundsRate_mutate <- FedFundsRate_mutate %>%
  pivot_longer(col = c (`Federal Funds Target Rate`, `Effective Federal Funds Rate`, `Real GDP (Percent Change)`, `Unemployment Rate`, `Inflation Rate`), names_to = "Economic indicators", values_to = "Economic Indicator Value")
```

```{r}
head(FedFundsRate_mutate)
```

```{r}
ggplot(FedFundsRate_mutate, aes(`Date`, `Economic Indicator Value`, color = `Economic indicators`)) + geom_line(na.rm = TRUE) + labs(title = "Economic Rates over Time")
```
```{r}
ggplot(FedFundsRate_mutate, aes(Date,`Economic Indicator Value`,color = "lightred")) +geom_point() +labs(title = "Economic Indicators across the Years")+facet_wrap(vars(`Economic indicators`))
```

```{r}
ggplot(FedFundsRate_mutate, aes(`Date`, `Economic Indicator Value`, color = `Economic indicators` )) + geom_line(na.rm = TRUE) + labs(title = "Economic Rates over Time")+facet_wrap(vars(`Economic indicators`))
```

The new features built in homework3 are the pivot and mutate which is not added in homework 2, I used better plots for the pivots.