hw3
challenge3
Jerin Jacob
Author

Jerin Jacob

Published

May 9, 2023

Code
library(readxl)
library(dplyr)
library(magrittr)
library(alr4)
library(smss)
library(ggplot2)
library(stargazer)
knitr::opts_chunk$set(echo = TRUE)
Code
data("UN11", package = "alr4")
head(UN11)
                region  group fertility   ppgdp lifeExpF pctUrban
Afghanistan       Asia  other     5.968   499.0    49.49       23
Albania         Europe  other     1.525  3677.2    80.40       53
Algeria         Africa africa     2.142  4473.0    75.00       67
Angola          Africa africa     5.135  4321.9    53.17       59
Anguilla     Caribbean  other     2.000 13750.1    81.10      100
Argentina   Latin Amer  other     2.172  9162.1    79.89       93

Question 1: a. Predictor is ppgdp and response is fertility b.

Code
ggplot(data = UN11, aes(x = ppgdp, y = fertility)) +
  geom_point()

A straigt line won’t be plausible unless an untill some kind of data transformation is done.

Code
ggplot(data = UN11, aes(x = log(ppgdp), y = log(fertility))) +
  geom_point()

Yes, now the plot looks like a straight line would fit in for a plausible simple linear regression. The relationship is negative which means as the log of GDP increase, there is a decrease in the log of Fertility.

Question 2: a. The conversion results in dividing the USD value by 1.33 which means the slope also will become divided by 1.33

  1. The correlation will not change as the unit of measurement change because it is a standardized measure. All the values will change in the same amount.
Code
data("water")
pairs(water)

Year appears to be largely unrelated to each of the other variables. The three variables starting with “O” seem to be correlated with each other, meaning that all the plot including two of these variables exhibit a dependence between the variables that is stronger than the dependence between the “O” variables and other variables. The three variables starting with “A” also seem to be another correlated group. BSAAM is more closely related to the “O” variables than the “A” variables.

Question 4:

Code
data("Rateprof")
pairs(Rateprof[,c('quality', 'clarity', 'helpfulness', 'easiness', 'raterInterest')])

Quality, Clarity and Helpfulness seems to have a strong correlation between them. Easiness is fairly correlated with the other three variables. raterInterest is also moderately correlated but the raters say that they almost always have fairly good interest in the subject. Overall, it means that professors doing well in on of these variables are doing pretty well in others too.

Question 5: a.

Code
data(student.survey)
ggplot(data = student.survey, aes(x = re, fill = pi)) +
    geom_bar(position = "fill")

Code
ggplot(data = student.survey, aes(x = tv, y = hi)) +
  geom_point() 

Code
model1 <- lm(as.numeric(pi) ~ as.numeric(re),
             data = student.survey)

model2 <- lm(hi ~ tv, ,data = student.survey)
Code
stargazer(model1, model2, type = 'text', 
          dep.var.labels = c('Pol. Ideology', 'HS GPA'),
          covariate.labels = c('Religiosity', 'Hours of TV')
          )

==========================================================
                                  Dependent variable:     
                              ----------------------------
                               Pol. Ideology     HS GPA   
                                    (1)            (2)    
----------------------------------------------------------
Religiosity                       0.970***                
                                  (0.179)                 
                                                          
Hours of TV                                     -0.018**  
                                                 (0.009)  
                                                          
Constant                          0.931**       3.441***  
                                  (0.425)        (0.085)  
                                                          
----------------------------------------------------------
Observations                         60            60     
R2                                 0.336          0.072   
Adjusted R2                        0.324          0.056   
Residual Std. Error (df = 58)      1.345          0.447   
F Statistic (df = 1; 58)         29.336***       4.471**  
==========================================================
Note:                          *p<0.1; **p<0.05; ***p<0.01

Religiosity is positively and statistically significantly (at the 0.01 significance level) associated with conservatism.

Hours of TV is negatively and statistically significantly (at the 0.05 significance level) associated with High School GPA. Watching an average of 1 more hour of TV per week is associated with a 0.018 decline in High School GPA.