Code
library(tidyverse)
library(alr4)
library(smss)
::opts_chunk$set(echo = TRUE) knitr
Steph Roberts
November 14, 2022
For recent data in Jacksonville, Florida, on y = selling price of home (in dollars), x1 = size of home (in square feet), and x2 = lot size (in square feet), the prediction equation is ŷ = −10,536 + 53.8x1 + 2.84x2.
A particular home of 1240 square feet on a lot of 18,000 square feet sold for $145,000. Find the predicted selling price and the residual, and interpret.
[1] "The predicted selling price is 107296"
For fixed lot size, how much is the house selling price predicted to increase for each square-foot increase in home size? Why?
The coefficient of the square-footage variable is 53.8, which indicates that is how much the selling price will increase for every increase in square-foot.
According to this prediction equation, for fixed home size, how much would lot size need to increase to have the same impact as a one-square-foot increase in home size?
(Data file: salary in alr4 R package). The data file concerns salary and other characteristics of all faculty in a small Midwestern college collected in the early 1980s for presentation in legal proceedings for which discrimination against women in salary was at issue. All persons in the data hold tenured or tenure track positions; temporary faculty are not included. The variables include degree, a factor with levels PhD and MS; rank, a factor with levels Asst, Assoc, and Prof; sex, a factor with levels Male and Female; Year, years in current rank; ysdeg, years since highest degree, and salary, academic year salary in dollars.
Test the hypothesis that the mean salary for men and women is the same, without regard to any other variable but sex. Explain your findings.
[1] 52 6
degree rank sex year ysdeg salary
1 Masters Prof Male 25 35 36350
2 Masters Prof Male 13 22 35350
3 Masters Prof Male 10 23 28200
4 Masters Prof Female 7 27 26775
5 PhD Prof Male 19 30 33696
6 Masters Prof Male 16 21 28516
Welch Two Sample t-test
data: salary by sex
t = 1.7744, df = 21.591, p-value = 0.09009
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
-567.8539 7247.1471
sample estimates:
mean in group Male mean in group Female
24696.79 21357.14
The null hypothesis would be that male and females have the same mean salary. With a p-value of 0.09 we fail to reject the null hypothesis at the usual significance level of alpha = 0.05. Based on the data, we can conclude there is not enough evidence of a difference between the true average of the two groups.
Run a multiple linear regression with salary as the outcome variable and everything else as predictors, including sex. Assuming no interactions between sex and the other predictors, obtain a 95% confidence interval for the difference in salary between males and females.
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15746.05 800.18 19.678 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAssoc 5292.36 1145.40 4.621 3.22e-05 ***
rankProf 11118.76 1351.77 8.225 1.62e-10 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
As discovered earlier, sex does not have a large impact on salary when other variables are involved.
2.5 % 97.5 %
(Intercept) 14134.4059 17357.68946
degreePhD -663.2482 3440.47485
rankAssoc 2985.4107 7599.31080
rankProf 8396.1546 13841.37340
sexFemale -697.8183 3030.56452
year 285.1433 667.47476
ysdeg -280.6397 31.49105
The confidence interval for the difference between male and female salary is between -$697.81 and $3,030.57. Because this range includes negative numbers, it shows that females can make more of less than males even when all factors are accounted for.
Interpret your finding for each predictor variable; discuss (a) statistical significance, (b) interpretation of the coefficient / slope in relation to the outcome variable and other variables
Considering an alpha = 0.05, we can interpret the statistical significance of the predictor variables.
Degree: not significant and has a coefficient of 1388, meaning those with PhDs make an average of $1,388 more. Ranking Associate: IS significant and has a coefficient of 5292.36, meaning a rank change from assistant to Associate impacts the salary an average of $5,292.36. Ranking Professor: IS significant and has a coefficient of 11118.76, meaning one unit of rank from assistant to Professor impacts the salary an average of $11,118.76. Sex: not significant and has a coefficient of 1166.37, meaning a change in the sex of an observation changes the salary an average of $1,166.37, in favor of the females. Year: IS significant and has a coefficient of 476.31, meaning one more year in rank impacts the salary an average of $476.31. Year since degree: not significant and has a coefficient of -124.57, meaning every additional year since degree negatively impacts salary by $124.57.
Change the baseline category for the rank variable. Interpret the coefficients related to rank again.
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21038.41 1109.12 18.969 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAsst -5292.36 1145.40 -4.621 3.22e-05 ***
rankProf 5826.40 1012.93 5.752 7.28e-07 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
Changing the baseline of the rank variable changes the output because out input is asking a different question. Here, we see that a “unit change” is now based on the associate rank as a baseline, instead of assistant, as defaulted before. This tells us that a change in unit from associate to assistant impacts salary by an average of - $5,292.36 and a change to professor impacts salary an average of $5,826.40.
Finkelstein (1980), in a discussion of the use of regression in discrimination cases, wrote, “[a] variable may reflect a position or status bestowed by the employer, in which case if there is discrimination in the award of the position or status, the variable may be ‘tainted.’” Thus, for example, if discrimination is at work in promotion of faculty to higher ranks, using rank to adjust salaries before comparing the sexes may not be acceptable to the courts. Exclude the variable rank, refit, and summarize how your findings changed, if they did.
Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)
Residuals:
Min 1Q Median 3Q Max
-8146.9 -2186.9 -491.5 2279.1 11186.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17183.57 1147.94 14.969 < 2e-16 ***
degreePhD -3299.35 1302.52 -2.533 0.014704 *
sexFemale -1286.54 1313.09 -0.980 0.332209
year 351.97 142.48 2.470 0.017185 *
ysdeg 339.40 80.62 4.210 0.000114 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared: 0.6312, Adjusted R-squared: 0.5998
F-statistic: 20.11 on 4 and 47 DF, p-value: 1.048e-09
Excluding rank changes things a bit. Now, having a PhD degree impacts salary -$3,299 compared with masters. Also, removing rank affects the differences in the sex variable. It is still not significant, but with a coefficient of -1286.54, it suggests females make less when rank is controlled for. Years in rank is less significant and years since degree is more significant.
However, with an R-squared of 0.63 compared with the earlier 0.86, this model is a worse fit for the data.
Everyone in this dataset was hired the year they earned their highest degree. It is also known that a new Dean was appointed 15 years ago, and everyone in the dataset who earned their highest degree 15 years ago or less than that has been hired by the new Dean. Some people have argued that the new Dean has been making offers that are a lot more generous to newly hired faculty than the previous one and that this might explain some of the variation in Salary. Create a new variable that would allow you to test this hypothesis and run another multiple regression model to test this. Select variables carefully to make sure there is no multicollinearity. Explain why multicollinearity would be a concern in this case and how you avoided it. Do you find support for the hypothesis that the people hired by the new Dean are making higher than those that were not?
Call:
lm(formula = salary ~ degree + sex + rank + year + new, data = salary)
Residuals:
Min 1Q Median 3Q Max
-3403.3 -1387.0 -167.0 528.2 9233.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18301.04 1301.36 14.063 < 2e-16 ***
degreePhD 818.93 797.48 1.027 0.3100
sexFemale 907.14 840.54 1.079 0.2862
rankAsst -4972.66 997.17 -4.987 9.61e-06 ***
rankProf 6124.28 1028.58 5.954 3.65e-07 ***
year 434.85 78.89 5.512 1.65e-06 ***
newyes 2163.46 1072.04 2.018 0.0496 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2362 on 45 degrees of freedom
Multiple R-squared: 0.8594, Adjusted R-squared: 0.8407
F-statistic: 45.86 on 6 and 45 DF, p-value: < 2.2e-16
Considering the concept of multicollinearity, I excluded ysdeg as the “new” variable accounts for the same data. With this modification, our R-squared is still high at 0.86. Rank and years in rank are still significant. Interestingly, “new” is also significant, with a coefficient of 2163.46 suggests being a new hire impacts salary an average of $2,163.46. This supports the hypothesis that those hired by the new dean are getting better offers.
(Data file: house.selling.price in smss R package)
Using the house.selling.price data, run and report regression results modeling y = selling price (in dollars) in terms of size of home (in square feet) and whether the home is new (1 = yes; 0 = no). In particular, for each variable; discuss statistical significance and interpret the meaning of the coefficient.
case Taxes Beds Baths New Price Size
1 1 3104 4 2 0 279900 2048
2 2 1173 2 1 0 146500 912
3 3 3076 4 2 0 237700 1654
4 4 1608 3 2 0 200000 2068
5 5 1454 3 3 0 159900 1477
6 6 2997 3 2 1 499900 3153
Call:
lm(formula = Price ~ Size + New, data = house)
Residuals:
Min 1Q Median 3Q Max
-205102 -34374 -5778 18929 163866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -40230.867 14696.140 -2.738 0.00737 **
Size 116.132 8.795 13.204 < 2e-16 ***
New 57736.283 18653.041 3.095 0.00257 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared: 0.7226, Adjusted R-squared: 0.7169
F-statistic: 126.3 on 2 and 97 DF, p-value: < 2.2e-16
Both the size of the house and whether it is new are statistically significant factors in the selling price of these homes. For each square foot increase of house size, the price is impacted by $116.13. A home being new is worth $57,736.28 more on average than homes that are not.
Report and interpret the prediction equation, and form separate equations relating selling price to size for new and for not new homes.
The prediction equation for the size_new model is y= -40230.867 +116.132x1 +57736.283x2, where x1 = size and x2 = new.
The prediction for new homes is y= -40230.867 +116.132x1 +57736.283x2 The prediction for old homes is y= -40230.867 +116.132x1
Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
[1] "The predicted selling price for a home of 3000 square feet that is NEW is $ 365901.416"
[1] "The predicted selling price for a home of 3000 square feet that is NOT NEW is $ 308165.133"
Fit another model, this time with an interaction term allowing interaction between size and new, and report the regression results
Call:
lm(formula = Price ~ Size * New, data = house)
Residuals:
Min 1Q Median 3Q Max
-175748 -28979 -6260 14693 192519
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22227.808 15521.110 -1.432 0.15536
Size 104.438 9.424 11.082 < 2e-16 ***
New -78527.502 51007.642 -1.540 0.12697
Size:New 61.916 21.686 2.855 0.00527 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared: 0.7443, Adjusted R-squared: 0.7363
F-statistic: 93.15 on 3 and 96 DF, p-value: < 2.2e-16
The R-squared of the interaction model is slightly better than the previous, suggesting it may be a better fit. The interaction between size and “newness” of a house is significant.The effect of size when a house is new increases price by $61.916 per square foot.
Report the lines relating the predicted selling price to the size for homes that are (i) new, (ii) not new.
The prediction equation would be y = -22,2227.808 + 104.438x1 - 78,527.502x2 + 61.916x1x2, where x1 = size and x2 = new.
The equation for predicting a new house with this model is y = -22,2227.808 + 104.438x1 - 78,527.502 + 61.916x1 The equation for predicting a NOT new house with this model y = -22,2227.808 + 104.438x1
Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
[1] "The predicted selling price for a home of 3000 square feet that is NEW is $ 398306.69"
[1] "The predicted selling price for a home of 3000 square feet that is NOT NEW is $ 291086.192"
Find the predicted selling price for a home of 1500 square feet that is (i) new, (ii) not new. Comparing to (F), explain how the difference in predicted selling prices changes as the size of home increases.
[1] "The predicted selling price for a home of 1500 square feet that is NEW is $ 148775.69"
[1] "The predicted selling price for a home of 1500 square feet that is NOT NEW is $ 134429.192"
There is less of a difference when size is smaller because there are 2 coefficients that increase price as size increases. So that bigger the house, the more impact “new” has to the price.
Do you think the model with interaction or the one without it represents the relationship of size and new to the outcome price? What makes you prefer one model over another?
I think the model with the interaction represents the relationship of size and new to the price of a house because of a couple of things. First, the fit of the model, seen in the higher R-squared, is slightly favorable than the non-interaction model. Also, logically, it makes sense that a bigger house with more new materials (lumber, counters, lights) would increase the price.
---
title: "HW4"
author: "Steph Roberts"
desription: "Homework 4"
date: "11/14/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw4
- Steph Roberts
---
### Homework 4
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(alr4)
library(smss)
knitr::opts_chunk$set(echo = TRUE)
```
## Question 1
For recent data in Jacksonville, Florida, on y = selling price of home (in dollars), x1 = size of home (in square feet), and x2 = lot size (in square feet), the prediction equation is
ŷ = −10,536 + 53.8x1 + 2.84x2.
### 1A
A particular home of 1240 square feet on a lot of 18,000 square feet sold for $145,000. Find the predicted selling price and the residual, and interpret.
```{r}
est <- (53.8*1240) + (2.84*18000)-10536
print(paste("The predicted selling price is", est))
```
```{r}
res <- 145000 - est
print(paste("The residual is", res))
```
### 1B
For fixed lot size, how much is the house selling price predicted to increase for each square-foot increase in home size? Why?
The coefficient of the square-footage variable is 53.8, which indicates that is how much the selling price will increase for every increase in square-foot.
### 1C
According to this prediction equation, for fixed home size, how much would lot size need to increase to have the same impact as a one-square-foot increase in home size?
```{r}
sfcoef <- 53.8
lotcoef<- 2.84
lotinc<-sfcoef/lotcoef
print(paste("The lot size would need to increase", lotinc, "for every 1 square foot increasenin home size"))
```
## Question 2
(Data file: salary in alr4 R package). The data file concerns salary and other characteristics of all faculty in a small Midwestern college collected in the early 1980s for presentation in legal proceedings for which discrimination against women in salary was at issue. All persons in the data hold tenured or tenure track positions; temporary faculty are not included. The variables include degree, a factor with levels PhD and MS; rank, a factor with levels Asst, Assoc, and Prof; sex, a factor with levels Male and Female; Year, years in current rank; ysdeg, years since highest degree, and salary, academic year salary in dollars.
### 2A
Test the hypothesis that the mean salary for men and women is the same, without regard to any other variable but sex. Explain your findings.
```{r}
data("salary")
dim(salary)
head(salary)
```
```{r}
t.test(salary ~ sex, data=salary)
```
The null hypothesis would be that male and females have the same mean salary. With a p-value of 0.09 we fail to reject the null hypothesis at the usual significance level of alpha = 0.05. Based on the data, we can conclude there is not enough evidence of a difference between the true average of the two groups.
### 2B
Run a multiple linear regression with salary as the outcome variable and everything else as predictors, including sex. Assuming no interactions between sex and the other predictors, obtain a 95% confidence interval for the difference in salary between males and females.
```{r}
allvar <- lm(salary ~ ., data = salary)
summary(allvar)
```
As discovered earlier, sex does not have a large impact on salary when other variables are involved.
```{r}
confint(allvar)
```
The confidence interval for the difference between male and female salary is between -$697.81 and $3,030.57. Because this range includes negative numbers, it shows that females can make more of less than males even when all factors are accounted for.
### 2C
Interpret your finding for each predictor variable; discuss (a) statistical significance, (b) interpretation of the coefficient / slope in relation to the outcome variable and other variables
Considering an alpha = 0.05, we can interpret the statistical significance of the predictor variables.
**Degree:** not significant and has a coefficient of 1388, meaning those with PhDs make an average of $1,388 more.
**Ranking Associate:** IS significant and has a coefficient of 5292.36, meaning a rank change from assistant to Associate impacts the salary an average of $5,292.36.
**Ranking Professor:** IS significant and has a coefficient of 11118.76, meaning one unit of rank from assistant to Professor impacts the salary an average of $11,118.76.
**Sex:** not significant and has a coefficient of 1166.37, meaning a change in the sex of an observation changes the salary an average of $1,166.37, in favor of the females.
**Year:** IS significant and has a coefficient of 476.31, meaning one more year in rank impacts the salary an average of $476.31.
**Year since degree:** not significant and has a coefficient of -124.57, meaning every additional year since degree negatively impacts salary by $124.57.
### 2D
Change the baseline category for the rank variable. Interpret the coefficients related to rank again.
```{r}
table(salary$rank)
```
```{r}
salary$rank <- relevel(salary$rank, ref = "Assoc")
mod_rank <- lm(salary ~ ., data = salary)
summary(mod_rank)
```
Changing the baseline of the rank variable changes the output because out input is asking a different question. Here, we see that a "unit change" is now based on the associate rank as a baseline, instead of assistant, as defaulted before. This tells us that a change in unit from associate to assistant impacts salary by an average of - $5,292.36 and a change to professor impacts salary an average of $5,826.40.
### 2E
Finkelstein (1980), in a discussion of the use of regression in discrimination cases, wrote, “[a] variable may reflect a position or status bestowed by the employer, in which case if there is discrimination in the award of the position or status, the variable may be ‘tainted.’ ” Thus, for example, if discrimination is at work in promotion of faculty to higher ranks, using rank to adjust salaries before comparing the sexes may not be acceptable to the courts. Exclude the variable rank, refit, and summarize how your findings changed, if they did.
```{r}
no_rank <- lm(salary ~ degree + sex + year + ysdeg, data = salary)
summary(no_rank)
```
Excluding rank changes things a bit. Now, having a PhD degree impacts salary -$3,299 compared with masters. Also, removing rank affects the differences in the sex variable. It is still not significant, but with a coefficient of -1286.54, it suggests females make less when rank is controlled for. Years in rank is less significant and years since degree is more significant.
However, with an R-squared of 0.63 compared with the earlier 0.86, this model is a worse fit for the data.
### 2F
Everyone in this dataset was hired the year they earned their highest degree. It is also known that a new Dean was appointed 15 years ago, and everyone in the dataset who earned their highest degree 15 years ago or less than that has been hired by the new Dean. Some people have argued that the new Dean has been making offers that are a lot more generous to newly hired faculty than the previous one and that this might explain some of the variation in Salary. Create a new variable that would allow you to test this hypothesis and run another multiple regression model to test this. Select variables carefully to make sure there is no multicollinearity. Explain why multicollinearity would be a concern in this case and how you avoided it. Do you find support for the hypothesis that the people hired by the new Dean are making higher than those that were not?
```{r}
#Create new variable
salary <- salary %>%
mutate(new = case_when(ysdeg> 15 ~ 'no',
ysdeg <= 15 ~ 'yes'))
new_hire <- lm(salary ~ degree + sex + rank + year + new, data = salary)
summary(new_hire)
```
Considering the concept of multicollinearity, I excluded ysdeg as the "new" variable accounts for the same data. With this modification, our R-squared is still high at 0.86. Rank and years in rank are still significant. Interestingly, "new" is also significant, with a coefficient of 2163.46 suggests being a new hire impacts salary an average of $2,163.46. This supports the hypothesis that those hired by the new dean are getting better offers.
## Question 3
(Data file: house.selling.price in smss R package)
### 3A
Using the house.selling.price data, run and report regression results modeling y = selling price (in dollars) in terms of size of home (in square feet) and whether the home is new (1 = yes; 0 = no). In particular, for each variable; discuss statistical significance and interpret the meaning of the coefficient.
```{r}
data("house.selling.price")
house <- house.selling.price
head(house)
```
```{r}
size_new <- lm(Price ~ Size + New, data = house)
summary(size_new)
```
Both the size of the house and whether it is new are statistically significant factors in the selling price of these homes. For each square foot increase of house size, the price is impacted by $116.13. A home being new is worth $57,736.28 more on average than homes that are not.
### 3B
Report and interpret the prediction equation, and form separate equations relating selling price to size for new and for not new homes.
The prediction equation for the size_new model is y= -40230.867 +116.132x1 +57736.283x2, where x1 = size and x2 = new.
The prediction for new homes is y= -40230.867 +116.132x1 +57736.283x2
The prediction for old homes is y= -40230.867 +116.132x1
### 3C
Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
```{r}
print(paste("The predicted selling price for a home of 3000 square feet that is NEW is $",-40230.867 +116.132*3000 +57736.283))
print(paste("The predicted selling price for a home of 3000 square feet that is NOT NEW is $",-40230.867 +116.132*3000))
```
### 3D
Fit another model, this time with an interaction term allowing interaction between size and new, and report the regression results
```{r}
ia <- lm(Price ~ Size*New, data = house)
summary(ia)
```
The R-squared of the interaction model is slightly better than the previous, suggesting it may be a better fit. The interaction between size and "newness" of a house is significant.The effect of size when a house is new increases price by $61.916 per square foot.
### 3E
Report the lines relating the predicted selling price to the size for homes that are (i) new, (ii) not new.
The prediction equation would be y = -22,2227.808 + 104.438x1 - 78,527.502x2 + 61.916x1x2, where x1 = size and x2 = new.
The equation for predicting a new house with this model is y = -22,2227.808 + 104.438x1 - 78,527.502 + 61.916x1
The equation for predicting a NOT new house with this model y = -22,2227.808 + 104.438x1
### 3F
Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
```{r}
print(paste("The predicted selling price for a home of 3000 square feet that is NEW is $", -22227.808 + (104.438 * 3000) - (78527.502 * 1) + (61.916 * 3000 * 1)))
print(paste("The predicted selling price for a home of 3000 square feet that is NOT NEW is $",-22227.808 + (104.438 * 3000)))
```
### 3G
Find the predicted selling price for a home of 1500 square feet that is (i) new, (ii) not new. Comparing to (F), explain how the difference in predicted selling prices changes as the size of home increases.
```{r}
print(paste("The predicted selling price for a home of 1500 square feet that is NEW is $", -22227.808 + (104.438 * 1500) - (78527.502 * 1) + (61.916 * 1500 * 1)))
print(paste("The predicted selling price for a home of 1500 square feet that is NOT NEW is $",-22227.808 + (104.438 * 1500)))
```
There is less of a difference when size is smaller because there are 2 coefficients that increase price as size increases. So that bigger the house, the more impact "new" has to the price.
### 3H
Do you think the model with interaction or the one without it represents the relationship of size and new to the outcome price? What makes you prefer one model over another?
I think the model with the interaction represents the relationship of size and new to the price of a house because of a couple of things. First, the fit of the model, seen in the higher R-squared, is slightly favorable than the non-interaction model. Also, logically, it makes sense that a bigger house with more new materials (lumber, counters, lights) would increase the price.