Loading required package: car
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(smss)
Warning: package 'smss' was built under R version 4.2.2
Code
library(dplyr)library(ggplot2)library(GGally)
Error in library(GGally): there is no package called 'GGally'
#Question 1A The predicted selling price is 107,296 dollars, the residual is 37,704 dollars. The predicted selling price undershot the actual selling price.
#Question 1B With a fixed lot size, the house selling price will increase by 53.8 for each square foot. This is because as the house is getting bigger, the house is selling for more, because the house is more valuable than the empty lot space.
#Question 1C The lot would have to increase ~18.94 square feet to have the same impact as a one square foot increase size in the home.
#Question 2
Code
data <- salarymodel1 <-lm(salary~sex, data = data)summary(model1)
Call:
lm(formula = salary ~ sex, data = data)
Residuals:
Min 1Q Median 3Q Max
-8602.8 -4296.6 -100.8 3513.1 16687.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24697 938 26.330 <2e-16 ***
sexFemale -3340 1808 -1.847 0.0706 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5782 on 50 degrees of freedom
Multiple R-squared: 0.0639, Adjusted R-squared: 0.04518
F-statistic: 3.413 on 1 and 50 DF, p-value: 0.0706
Code
model2 <-lm(salary~degree+rank+sex+year+ysdeg, data = data)summary(model2)
Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = data)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15746.05 800.18 19.678 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAssoc 5292.36 1145.40 4.621 3.22e-05 ***
rankProf 11118.76 1351.77 8.225 1.62e-10 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
data$rank <-factor(data$rank, levels =c("Prof", "Assoc", "Asst"))model3 <-lm(salary~degree+rank+sex+year+ysdeg, data = data)summary(model3)
Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = data)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26864.81 1375.29 19.534 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAssoc -5826.40 1012.93 -5.752 7.28e-07 ***
rankAsst -11118.76 1351.77 -8.225 1.62e-10 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
Code
model4 <-lm(salary~degree+sex+year+ysdeg, data = data)summary(model4)
Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = data)
Residuals:
Min 1Q Median 3Q Max
-8146.9 -2186.9 -491.5 2279.1 11186.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17183.57 1147.94 14.969 < 2e-16 ***
degreePhD -3299.35 1302.52 -2.533 0.014704 *
sexFemale -1286.54 1313.09 -0.980 0.332209
year 351.97 142.48 2.470 0.017185 *
ysdeg 339.40 80.62 4.210 0.000114 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared: 0.6312, Adjusted R-squared: 0.5998
F-statistic: 20.11 on 4 and 47 DF, p-value: 1.048e-09
Code
data$dean <-NAx =1while(x <53){if(data$ysdeg[x] >15){ data$dean[x] =0 }else{ data$dean[x] =1 } x = x +1}model5 <-lm(data = data, salary~sex+year+dean+degree)summary(model5)
Call:
lm(formula = salary ~ sex + year + dean + degree, data = data)
Residuals:
Min 1Q Median 3Q Max
-10740.1 -2550.1 -3.3 1942.4 11718.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22598.7 1792.1 12.610 < 2e-16 ***
sexFemale -523.5 1355.1 -0.386 0.701017
year 531.4 130.2 4.082 0.000172 ***
dean -4449.8 1347.2 -3.303 0.001834 **
degreePhD -1186.6 1191.2 -0.996 0.324267
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3958 on 47 degrees of freedom
Multiple R-squared: 0.5878, Adjusted R-squared: 0.5527
F-statistic: 16.75 on 4 and 47 DF, p-value: 1.338e-08
Question 2A
As shown in model1, it seems like the mean salary is the same, as the null hypothesis that there is no difference cannot be rejected with the p-value of 0.0706.
Question 2B
The 95% confidence interval for pay differences between males and females is between -697.8183 and 3030.56452.
Question 2C
There is demonstrated statistically significant evidence that rank and years in current rank show significant results on an increase in salary. Rank as a professor shows the strongest effect, with rank as an associate professor also showing a high effect, as well as years in current rank. Rank has the highest slope for statistically significantly raising salary, particularly among full professors. All other relationships are not statistically significant.
Question 2D
Changing the baseline changes the direction of the relationship for associate and assistant professors. It shows that relative to full professors, associate, and assistant professors receive less.
Question 2E
Excluding rank makes the degreePhD and years after degree variables more important, as rank was likely previously explaining their variance. Without being able to rely on rank, these serve as proxies for professors receiving more money due to being a higher rank. however, there is also no statistically significant of discrimination based on sex.
Question 2F
There is actually some evidence that the dean has been making less generous offers than previously, per model 5. Multicollinearity can make it harder to make inferences. I avoided it by excluding degree and years after degree variables, as rank explains salary better than those.
Question 3
Code
summary(house.selling.price)
Error in summary(house.selling.price): object 'house.selling.price' not found
Code
data2 <- house.selling.price
Error in eval(expr, envir, enclos): object 'house.selling.price' not found
Code
model1 <-lm(data = data2, Price~Size+New)
Error in is.data.frame(data): object 'data2' not found
Code
summary(model1)
Call:
lm(formula = salary ~ sex, data = data)
Residuals:
Min 1Q Median 3Q Max
-8602.8 -4296.6 -100.8 3513.1 16687.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24697 938 26.330 <2e-16 ***
sexFemale -3340 1808 -1.847 0.0706 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5782 on 50 degrees of freedom
Multiple R-squared: 0.0639, Adjusted R-squared: 0.04518
F-statistic: 3.413 on 1 and 50 DF, p-value: 0.0706
Size of home and how new the home is both statistically significantly increase the selling prices of homes.
Question 3B
New Home Price = 116.132(size in square feet) + 57736.283(if New) - 40230.867
Not New Home Price = 116.132(size in square feet) - 40230.867
Question 3C
New Home Prediction = 365901.416 No New Home Prediction = 308165.133
Question 3D
It seems like Size and New have a positive interaction effect. Being a new home and being larger are interrelated. This also removes the statistical significance of being new.
Question 3E
New home Prediction = 104.438(size in square feet) - 78527.502(if new) + 61.916(new times size) - 22227.808
Not new home Prediction = 104.438(size in square feet) + 22227.808
Question 3F
New Home Prediction = 398306.69
No New Home Prediction = 291086.192
Question 3G
New Home Prediction = 148775.69
No New Home Prediction = 134429.192
As size of home gets smaller, newness tends to matter less towards increasing the predicted selling price, so the disparity between new and not new homes tends to decrease.
Question 3H
I think that I prefer the interaction model. It is more responsive to size than just merely adding over 50k for being either new or not new, in a binary. It’s good to have more of a gradient measure like this interaction provides in multiple ways.
Source Code
---title: "Homework 4"author: "Donny Snyder"desription: "Homework 4 Submission"date: "11/14/2022"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - hw4 - regression---```{r}library(tidyverse)library(alr4)library(smss)library(dplyr)library(ggplot2)library(GGally)```# Question 1```{r, echo=T}predictSell <--10536+ (53.8*1240) + (2.84*18000)realSell <-145000residual <- realSell - predictSellratioLottoHome <-53.8/2.84```#Question 1AThe predicted selling price is 107,296 dollars, the residual is 37,704 dollars. The predicted selling price undershot the actual selling price. #Question 1BWith a fixed lot size, the house selling price will increase by 53.8 for each square foot. This is because as the house is getting bigger, the house is selling for more, because the house is more valuable than the empty lot space.#Question 1CThe lot would have to increase ~18.94 square feet to have the same impact as a one square foot increase size in the home.#Question 2```{r, echo=T}data <- salarymodel1 <-lm(salary~sex, data = data)summary(model1)model2 <-lm(salary~degree+rank+sex+year+ysdeg, data = data)summary(model2)confint(model2)data$rank <-factor(data$rank, levels =c("Prof", "Assoc", "Asst"))model3 <-lm(salary~degree+rank+sex+year+ysdeg, data = data)summary(model3)model4 <-lm(salary~degree+sex+year+ysdeg, data = data)summary(model4)data$dean <-NAx =1while(x <53){if(data$ysdeg[x] >15){ data$dean[x] =0 }else{ data$dean[x] =1 } x = x +1}model5 <-lm(data = data, salary~sex+year+dean+degree)summary(model5)```# Question 2AAs shown in model1, it seems like the mean salary is the same, as the null hypothesis that there is no difference cannot be rejected with the p-value of 0.0706.# Question 2BThe 95% confidence interval for pay differences between males and females is between -697.8183 and 3030.56452.# Question 2CThere is demonstrated statistically significant evidence that rank and years in current rank show significant results on an increase in salary. Rank as a professor shows the strongest effect, with rank as an associate professor also showing a high effect, as well as years in current rank. Rank has the highest slope for statistically significantly raising salary, particularly among full professors. All other relationships are not statistically significant.# Question 2DChanging the baseline changes the direction of the relationship for associate and assistant professors. It shows that relative to full professors, associate, and assistant professors receive less.# Question 2EExcluding rank makes the degreePhD and years after degree variables more important, as rank was likely previously explaining their variance. Without being able to rely on rank, these serve as proxies for professors receiving more money due to being a higher rank. however, there is also no statistically significant of discrimination based on sex.# Question 2FThere is actually some evidence that the dean has been making less generous offers than previously, per model 5. Multicollinearity can make it harder to make inferences. I avoided it by excluding degree and years after degree variables, as rank explains salary better than those.# Question 3```{r, echo=T}summary(house.selling.price)data2 <- house.selling.pricemodel1 <-lm(data = data2, Price~Size+New)summary(model1)predictNew <- (116.132*3000) +57736.283*1-40230.867predictNotNew <- (116.132*3000) +57736.283*0-40230.867model2 <-lm(data = data2, Price~Size+New+Size*New)summary(model2)predictNew2 <-104.438*3000-78527.502+61.916*3000-22227.808predictNotNew2 <-104.438*3000-22227.808predictNew3 <-104.438*1500-78527.502+61.916*1500-22227.808predictNotNew3 <-104.438*1500-22227.808```# Question 3ASize of home and how new the home is both statistically significantly increase the selling prices of homes. # Question 3BNew Home Price = 116.132(size in square feet) + 57736.283(if New) - 40230.867Not New Home Price = 116.132(size in square feet) - 40230.867# Question 3CNew Home Prediction = 365901.416No New Home Prediction = 308165.133# Question 3DIt seems like Size and New have a positive interaction effect. Being a new home and being larger are interrelated. This also removes the statistical significance of being new.# Question 3ENew home Prediction = 104.438(size in square feet) - 78527.502(if new) + 61.916(new times size) - 22227.808Not new home Prediction = 104.438(size in square feet) + 22227.808# Question 3FNew Home Prediction = 398306.69No New Home Prediction = 291086.192# Question 3GNew Home Prediction = 148775.69No New Home Prediction = 134429.192As size of home gets smaller, newness tends to matter less towards increasing the predicted selling price, so the disparity between new and not new homes tends to decrease.# Question 3HI think that I prefer the interaction model. It is more responsive to size than just merely adding over 50k for being either new or not new, in a binary. It's good to have more of a gradient measure like this interaction provides in multiple ways.