Code
library(readxl)
library(dplyr)
library(magrittr)
library(alr4)
library(smss)
library(ggplot2)
library(stargazer)
::opts_chunk$set(echo = TRUE) knitr
Jerin Jacob
May 9, 2023
region group fertility ppgdp lifeExpF pctUrban
Afghanistan Asia other 5.968 499.0 49.49 23
Albania Europe other 1.525 3677.2 80.40 53
Algeria Africa africa 2.142 4473.0 75.00 67
Angola Africa africa 5.135 4321.9 53.17 59
Anguilla Caribbean other 2.000 13750.1 81.10 100
Argentina Latin Amer other 2.172 9162.1 79.89 93
Question 1: a. Predictor is ppgdp and response is fertility b.
A straigt line won’t be plausible unless an untill some kind of data transformation is done.
Yes, now the plot looks like a straight line would fit in for a plausible simple linear regression. The relationship is negative which means as the log of GDP increase, there is a decrease in the log of Fertility.
Question 2: a. The conversion results in dividing the USD value by 1.33 which means the slope also will become divided by 1.33
Year appears to be largely unrelated to each of the other variables. The three variables starting with “O” seem to be correlated with each other, meaning that all the plot including two of these variables exhibit a dependence between the variables that is stronger than the dependence between the “O” variables and other variables. The three variables starting with “A” also seem to be another correlated group. BSAAM is more closely related to the “O” variables than the “A” variables.
Question 4:
Quality, Clarity and Helpfulness seems to have a strong correlation between them. Easiness is fairly correlated with the other three variables. raterInterest is also moderately correlated but the raters say that they almost always have fairly good interest in the subject. Overall, it means that professors doing well in on of these variables are doing pretty well in others too.
Question 5: a.
==========================================================
Dependent variable:
----------------------------
Pol. Ideology HS GPA
(1) (2)
----------------------------------------------------------
Religiosity 0.970***
(0.179)
Hours of TV -0.018**
(0.009)
Constant 0.931** 3.441***
(0.425) (0.085)
----------------------------------------------------------
Observations 60 60
R2 0.336 0.072
Adjusted R2 0.324 0.056
Residual Std. Error (df = 58) 1.345 0.447
F Statistic (df = 1; 58) 29.336*** 4.471**
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Religiosity is positively and statistically significantly (at the 0.01 significance level) associated with conservatism.
Hours of TV is negatively and statistically significantly (at the 0.05 significance level) associated with High School GPA. Watching an average of 1 more hour of TV per week is associated with a 0.018 decline in High School GPA.
---
title: "Homework 3"
author: "Jerin Jacob"
desription: "Homework 3- 603 Spring 2023"
date: "05/09/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw3
- challenge3
- Jerin Jacob
---
```{r}
#| label: setup
#| warning: false
library(readxl)
library(dplyr)
library(magrittr)
library(alr4)
library(smss)
library(ggplot2)
library(stargazer)
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
data("UN11", package = "alr4")
head(UN11)
```
Question 1:
a. Predictor is ppgdp and response is fertility
b.
```{r}
ggplot(data = UN11, aes(x = ppgdp, y = fertility)) +
geom_point()
```
A straigt line won't be plausible unless an untill some kind of data transformation is done.
c.
```{r}
ggplot(data = UN11, aes(x = log(ppgdp), y = log(fertility))) +
geom_point()
```
Yes, now the plot looks like a straight line would fit in for a plausible simple linear regression. The relationship is negative which means as the log of GDP increase, there is a decrease in the log of Fertility.
Question 2:
a.
The conversion results in dividing the USD value by 1.33 which means the slope also will become divided by 1.33
b.
The correlation will not change as the unit of measurement change because it is a standardized measure. All the values will change in the same amount.
```{r}
data("water")
pairs(water)
```
Year appears to be largely unrelated to each of the other variables.
The three variables starting with “O” seem to be correlated with each other, meaning that all the plot including two of these variables exhibit a dependence between the variables that is stronger than the dependence between the “O” variables and other variables. The three variables starting with “A” also seem to be another correlated group.
BSAAM is more closely related to the “O” variables than the “A” variables.
Question 4:
```{r}
data("Rateprof")
pairs(Rateprof[,c('quality', 'clarity', 'helpfulness', 'easiness', 'raterInterest')])
```
Quality, Clarity and Helpfulness seems to have a strong correlation between them. Easiness is fairly correlated with the other three variables. raterInterest is also moderately correlated but the raters say that they almost always have fairly good interest in the subject. Overall, it means that professors doing well in on of these variables are doing pretty well in others too.
Question 5:
a.
```{r}
data(student.survey)
ggplot(data = student.survey, aes(x = re, fill = pi)) +
geom_bar(position = "fill")
```
```{r}
ggplot(data = student.survey, aes(x = tv, y = hi)) +
geom_point()
```
b.
```{r}
model1 <- lm(as.numeric(pi) ~ as.numeric(re),
data = student.survey)
model2 <- lm(hi ~ tv, ,data = student.survey)
```
```{r}
stargazer(model1, model2, type = 'text',
dep.var.labels = c('Pol. Ideology', 'HS GPA'),
covariate.labels = c('Religiosity', 'Hours of TV')
)
```
Religiosity is positively and statistically significantly (at the 0.01 significance level) associated with conservatism.
Hours of TV is negatively and statistically significantly (at the 0.05 significance level) associated with High School GPA. Watching an average of 1 more hour of TV per week is associated with a 0.018 decline in High School GPA.