Homework 3

hw3

challenge3

Jerin Jacob

Author

Jerin Jacob

Published

May 9, 2023

Code

library(readxl)
library(dplyr)
library(magrittr)
library(alr4)
library(smss)
library(ggplot2)
library(stargazer)
knitr::opts_chunk$set(echo = TRUE)

Code

data("UN11", package = "alr4")
head(UN11)

                region  group fertility   ppgdp lifeExpF pctUrban
Afghanistan       Asia  other     5.968   499.0    49.49       23
Albania         Europe  other     1.525  3677.2    80.40       53
Algeria         Africa africa     2.142  4473.0    75.00       67
Angola          Africa africa     5.135  4321.9    53.17       59
Anguilla     Caribbean  other     2.000 13750.1    81.10      100
Argentina   Latin Amer  other     2.172  9162.1    79.89       93

Question 1: a. Predictor is ppgdp and response is fertility b.

Code

ggplot(data = UN11, aes(x = ppgdp, y = fertility)) +
  geom_point()

A straigt line won’t be plausible unless an untill some kind of data transformation is done.

Code

ggplot(data = UN11, aes(x = log(ppgdp), y = log(fertility))) +
  geom_point()

Yes, now the plot looks like a straight line would fit in for a plausible simple linear regression. The relationship is negative which means as the log of GDP increase, there is a decrease in the log of Fertility.

Question 2: a. The conversion results in dividing the USD value by 1.33 which means the slope also will become divided by 1.33

The correlation will not change as the unit of measurement change because it is a standardized measure. All the values will change in the same amount.

Code

data("water")
pairs(water)

Year appears to be largely unrelated to each of the other variables. The three variables starting with “O” seem to be correlated with each other, meaning that all the plot including two of these variables exhibit a dependence between the variables that is stronger than the dependence between the “O” variables and other variables. The three variables starting with “A” also seem to be another correlated group. BSAAM is more closely related to the “O” variables than the “A” variables.

Question 4:

Code

data("Rateprof")
pairs(Rateprof[,c('quality', 'clarity', 'helpfulness', 'easiness', 'raterInterest')])

Quality, Clarity and Helpfulness seems to have a strong correlation between them. Easiness is fairly correlated with the other three variables. raterInterest is also moderately correlated but the raters say that they almost always have fairly good interest in the subject. Overall, it means that professors doing well in on of these variables are doing pretty well in others too.

Question 5: a.

Code

data(student.survey)
ggplot(data = student.survey, aes(x = re, fill = pi)) +
    geom_bar(position = "fill")

Code

ggplot(data = student.survey, aes(x = tv, y = hi)) +
  geom_point()

Code

model1 <- lm(as.numeric(pi) ~ as.numeric(re),
             data = student.survey)

model2 <- lm(hi ~ tv, ,data = student.survey)

Code

stargazer(model1, model2, type = 'text', 
          dep.var.labels = c('Pol. Ideology', 'HS GPA'),
          covariate.labels = c('Religiosity', 'Hours of TV')
          )


==========================================================
                                  Dependent variable:     
                              ----------------------------
                               Pol. Ideology     HS GPA   
                                    (1)            (2)    
----------------------------------------------------------
Religiosity                       0.970***                
                                  (0.179)                 
                                                          
Hours of TV                                     -0.018**  
                                                 (0.009)  
                                                          
Constant                          0.931**       3.441***  
                                  (0.425)        (0.085)  
                                                          
----------------------------------------------------------
Observations                         60            60     
R2                                 0.336          0.072   
Adjusted R2                        0.324          0.056   
Residual Std. Error (df = 58)      1.345          0.447   
F Statistic (df = 1; 58)         29.336***       4.471**  
==========================================================
Note:                          *p<0.1; **p<0.05; ***p<0.01

Religiosity is positively and statistically significantly (at the 0.01 significance level) associated with conservatism.

Hours of TV is negatively and statistically significantly (at the 0.05 significance level) associated with High School GPA. Watching an average of 1 more hour of TV per week is associated with a 0.018 decline in High School GPA.