Homework 4

hw4

linear regression

emma_narkewicz

Emma Narkewicz HW4

Author

Emma Narkewicz

Published

May 1, 2023

Question 1

For recent data in Jacksonville, Florida, on y = selling price of home (in dollars), x1 = size of home (in square feet), and x2 = lot size (in square feet), the prediction equation is ŷ = −10,536 + 53.8x1 + 2.84x2.

A

A particular home of 1240 square feet on a lot of 18,000 square feet sold for $145,000. Find the predicted selling price and the residual, and interpret.

To solve for the predicted price, I plugged in x1 = 1,240 & x2 = 18,000

Code

#Predicted Price
pp <- -10536 + (53.8 * 1240) + (2.84 * 18000)  
pp

[1] 107296

From the equation, the predicted selling price of the house is $107,296

The residual can be calculated using the equation:

Residual = (actual y-value) - (predicted y value)

Code

#Residual

R = 145000 - 107296
R

[1] 37704

The residual of positive $37,704 indicates that the house sold for more than the model predicted, meaning the model under-predicted the selling price.

B

For fixed lot size, how much is the house selling price predicted to increase for each square- foot increase in home size? Why?

Based on the prediction equation for a fixed lot size (x2), the price of the house is expected to increase $53.8 dollars for each square foot of house size. This is because with a fixed lot size (x2) t the 53.8x1 in the prediction equation explains the increase in house price per each square foot of house size

ŷ = −10,536 + 53.8x1 + 2.84x2.

C

According to this prediction equation, for fixed home size, how much would lot size need to increase to have the same impact as a one-square-foot increase in home size?

Home size = x1 = 1 Lot size = x2

53.81 (x1) = 2.84x2 53.8 = 2.84x2

Code

53.8/2.84

[1] 18.94366

Based on the prediction equation, the lot size would need to increase by 18.94 square feet to have the same impact as a 1-square-foot increase in home size.

Question 2

(Data file: salary in alr4 R package). The data file concerns salary and other characteristics of all faculty in a small Midwestern college collected in the early 1980s for presentation in legal proceedings for which discrimination against women in salary was at issue. All persons in the data hold tenured or tenure track positions; temporary faculty are not included. The variables include degree, a factor with levels PhD and MS; rank, a factor with levels Asst, Assoc, and Prof; sex, a factor with levels Male and Female; Year, years in current rank; ysdeg, years since highest degree, and salary, academic year salary in dollars.

A

Code

library(alr4)

Loading required package: car

Loading required package: carData

Loading required package: effects

lattice theme set by effectsTheme()
See ?effectsTheme for details.

Code

head(salary)

   degree rank    sex year ysdeg salary
1 Masters Prof   Male   25    35  36350
2 Masters Prof   Male   13    22  35350
3 Masters Prof   Male   10    23  28200
4 Masters Prof Female    7    27  26775
5     PhD Prof   Male   19    30  33696
6 Masters Prof   Male   16    21  28516

Code

summary(salary)

     degree      rank        sex          year            ysdeg      
 Masters:34   Asst :18   Male  :38   Min.   : 0.000   Min.   : 1.00  
 PhD    :18   Assoc:14   Female:14   1st Qu.: 3.000   1st Qu.: 6.75  
              Prof :20               Median : 7.000   Median :15.50  
                                     Mean   : 7.481   Mean   :16.12  
                                     3rd Qu.:11.000   3rd Qu.:23.25  
                                     Max.   :25.000   Max.   :35.00  
     salary     
 Min.   :15000  
 1st Qu.:18247  
 Median :23719  
 Mean   :23798  
 3rd Qu.:27258  
 Max.   :38045

Test the hypothesis that the mean salary for men and women is the same, without regard to any other variable but sex. Explain your findings.

To test the hypothesis if the mean salary for men and women is the same, I created a linear regression model for salary with only sex as the explanatory variable.

The null hypothesis is that mean salary men and women is the same - H0: μm = μw

The alternative hypothesis is that the mean salary of men and women is not the same - Ha: μm ≠ μw

Code

##Hypothesis testing with linear regression only
summary(lm(salary ~ sex, data = salary))


Call:
lm(formula = salary ~ sex, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-8602.8 -4296.6  -100.8  3513.1 16687.9 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    24697        938  26.330   <2e-16 ***
sexFemale      -3340       1808  -1.847   0.0706 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5782 on 50 degrees of freedom
Multiple R-squared:  0.0639,    Adjusted R-squared:  0.04518 
F-statistic: 3.413 on 1 and 50 DF,  p-value: 0.0706

The resulting coefficient for the sexFemale explanatory variable is -3340, suggesting Female staff make on average $3,340 less than Male professors. In the linear regression model, sexFemale has a p-value of 0.0706 meaning at the 0.05 level we fail to reject the null hypothesis that there is no difference in the mean salaries for men and women. We could however reject the null hypothesis at the 0.1 significance level.

B

Run a multiple linear regression with salary as the outcome variable and everything else as predictors, including sex. Assuming no interactions between sex and the other predictors, obtain a 95% confidence interval for the difference in salary between males and females.

Code

##95% confidence interval
lm(salary ~ degree + rank + sex + ysdeg + year, data = salary) |> confint()

                 2.5 %      97.5 %
(Intercept) 14134.4059 17357.68946
degreePhD    -663.2482  3440.47485
rankAssoc    2985.4107  7599.31080
rankProf     8396.1546 13841.37340
sexFemale    -697.8183  3030.56452
ysdeg        -280.6397    31.49105
year          285.1433   667.47476

The 95% confidence interval for the sexFemale is (-663, 3340) which can be interpreted as women making between $663 less or $3340 more than their male colleagues.

Compared to to the coefficient of -3340 in the model with only sex as an explanatory variable, this suggests that controlling for rank, degree, years since degree, and years of experience explains some of the difference between male and female salaries.

C

Interpret your finding for each predictor variable; discuss (a) statistical significance, (b) interpretation of the coefficient / slope in relation to the outcome variable and other variables

Code

summary (lm(salary ~ degree + rank + sex + ysdeg + year, data = salary))


Call:
lm(formula = salary ~ degree + rank + sex + ysdeg + year, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15746.05     800.18  19.678  < 2e-16 ***
degreePhD    1388.61    1018.75   1.363    0.180    
rankAssoc    5292.36    1145.40   4.621 3.22e-05 ***
rankProf    11118.76    1351.77   8.225 1.62e-10 ***
sexFemale    1166.37     925.57   1.260    0.214    
ysdeg        -124.57      77.49  -1.608    0.115    
year          476.31      94.91   5.018 8.65e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16

degree PhD has a p=value of 0.180, meaning it is not statistically significant at the 0.1 or 0.05 level. It has a positive coefficient of 1388.61 suggesting that that having a PhD increases a person’s salary by $1388 over having a Master’s degree. This is the 3rd largest coefficient of any of the explanatory variables in the model.
rankAssoc has as p-value of 3.22 * e^-05, meaning it a statistically significant explanatory variable at the 0.001 level. It has a positive coefficient of 5293.36 suggesting that being an associate professor results in $5292 larger salary than an Assistant professor. This is the 2nd largest coefficient of any of the explanatory variables in the model.
rankProf has a p-value of 1.62 * e^-10, meaning it is a statistically significant explanatory variable at the 0.001 level. It has a positive coefficient of 11,118, suggesting that having a rank of full professorship results in a salary of $11,118 more than an Assistant professor. This is the largest coefficient of any of the explanatory variables in the model, suggesting it is responsible for the largest change in salary of any variable in the model.
sexFemale has a p-value of 0.214, meaning it is not statistically significant at the 0.1 or 0.05 level. The coefficient of 1166 suggests that being female increases salary by $1166, which is surprising given the well-known phenomenon of the wage gap.
ysdeg has a p-value of 0.115, meaning it is not statistically significant at the 0.1 or 0.05 level. The coefficient of -124 suggests that every year since getting a degree, an individual’s salary decreases by $124.
year has a p-value of 8.65 * e^-06, meaning it is statistically significant at the 0.001 level. The coefficient of 476 suggests that that every year someone is in their job as a professor, their salary increases by $476.

D

Change the baseline category for the rank variable. Interpret the coefficients related to rank again.

Code

#Relevel 
salary$rank <- relevel(salary$rank, ref = 'Prof')
summary (lm(salary ~ degree + rank + sex + ysdeg + year, data = salary))


Call:
lm(formula = salary ~ degree + rank + sex + ysdeg + year, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  26864.81    1375.29  19.534  < 2e-16 ***
degreePhD     1388.61    1018.75   1.363    0.180    
rankAsst    -11118.76    1351.77  -8.225 1.62e-10 ***
rankAssoc    -5826.40    1012.93  -5.752 7.28e-07 ***
sexFemale     1166.37     925.57   1.260    0.214    
ysdeg         -124.57      77.49  -1.608    0.115    
year           476.31      94.91   5.018 8.65e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16

After re-leveling the rank to Prof instead of Assistant professor and re-running the linear regression model, rankAsst now has a coefficient of -11118 and rankAssociate has a coefficient of -5826. This means an Assistant professor makes $11,118 less than a full professsor and a an Associate professor makes $5,826 less than a full professors. These are the same coefficients for the rankProf and RankAssoc in the previous model, expect negative, as they now represent the distance from the top rank (professor) as opposed to the lowest rank (assistant).

E

Finkelstein (1980), in a discussion of the use of regression in discrimination cases, wrote, “a variable may reflect a position or status bestowed by the employer, in which case if there is discrimination in the award of the position or status, the variable may be ‘tainted.’” Thus, for example, if discrimination is at work in promotion of faculty to higher ranks, using rank to adjust salaries before comparing the sexes may not be acceptable to the courts.

Exclude the variable rank, refit, and summarize how your findings changed, if they did.

Code

#Rerun model excluding rank
summary (lm(salary ~ degree + sex + ysdeg + year, data = salary))


Call:
lm(formula = salary ~ degree + sex + ysdeg + year, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-8146.9 -2186.9  -491.5  2279.1 11186.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 17183.57    1147.94  14.969  < 2e-16 ***
degreePhD   -3299.35    1302.52  -2.533 0.014704 *  
sexFemale   -1286.54    1313.09  -0.980 0.332209    
ysdeg         339.40      80.62   4.210 0.000114 ***
year          351.97     142.48   2.470 0.017185 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared:  0.6312,    Adjusted R-squared:  0.5998 
F-statistic: 20.11 on 4 and 47 DF,  p-value: 1.048e-09

Rerunning the model excluding rank changes the coefficient of sexFemale to negative -1286, from previously being positive +1166. However, with a p-value of 0.332 the explanatory variable of sexFemale is still not statistically significant at any level after removing rank.

F

Everyone in this data set was hired the year they earned their highest degree. It is also known that a new Dean was appointed 15 years ago, and everyone in the data set who earned their highest degree 15 years ago or less than that has been hired by the new Dean. Some people have argued that the new Dean has been making offers that are a lot more generous to newly hired faculty than the previous one and that this might explain some of the variation in Salary.

Create a new variable that would allow you to test this hypothesis and run another multiple regression model to test this. Select variables carefully to make sure there is no multicollinearity. Explain why multicollinearity would be a concern in this case and how you avoided it. Do you find support for the hypothesis that the people hired by the new Dean are making higher than those that were not?

I created a new variable “New_Dean” where anyone who earned their highest degree (ysdeg) over 15 years ago are coded as a 0 & anyone who earned their highest degree (ysdeg) 15 or less years ago coded as a 1.

Code

#recreating new variable
salary_Dean <- salary %>%
  mutate(New_Dean = case_when(
    ysdeg > 15 ~ "0",
    ysdeg <= 15 ~ "1"))

Error in salary %>% mutate(New_Dean = case_when(ysdeg > 15 ~ "0", ysdeg <= : could not find function "%>%"

Code

salary_Dean

Error in eval(expr, envir, enclos): object 'salary_Dean' not found

Multicollinearity occurs when one explanatory variable is predicted by another variable in the model to a substantial degree. This can be caused by an explanatory variable is a combination of another variable in the model or there are two almost identical variables in the model.

To avoid multicollinearity in this model while testing if being hired by the New Dean results in a higher salary, I will not include ysdeg in the linear regression model. This is because the New Dean explanatory variable was created from the ysdeg model, meaning that they are likely substantially correlated.

Code

##Testing New Dean Hypothesis after removing ysdeg
summary (lm(salary ~ degree + rank + sex + New_Dean + year, data = salary_Dean))

Error in is.data.frame(data): object 'salary_Dean' not found

According to the linear regression model, having been hired by the New Dean does result in a higher salary of $2,163, as indicated by the coefficient to this variable. Furthermore, the New Dean explanatory variable is just barely statistically significant at the 0.05 level with a p-value of 0.0496.

Question 3

Code

#Load data
library(smss)
data("house.selling.price")
head(house.selling.price)

  case Taxes Beds Baths New  Price Size
1    1  3104    4     2   0 279900 2048
2    2  1173    2     1   0 146500  912
3    3  3076    4     2   0 237700 1654
4    4  1608    3     2   0 200000 2068
5    5  1454    3     3   0 159900 1477
6    6  2997    3     2   1 499900 3153

Code

summary(house.selling.price)

      case            Taxes           Beds       Baths           New      
 Min.   :  1.00   Min.   :  20   Min.   :2   Min.   :1.00   Min.   :0.00  
 1st Qu.: 25.75   1st Qu.:1178   1st Qu.:3   1st Qu.:2.00   1st Qu.:0.00  
 Median : 50.50   Median :1614   Median :3   Median :2.00   Median :0.00  
 Mean   : 50.50   Mean   :1908   Mean   :3   Mean   :1.96   Mean   :0.11  
 3rd Qu.: 75.25   3rd Qu.:2238   3rd Qu.:3   3rd Qu.:2.00   3rd Qu.:0.00  
 Max.   :100.00   Max.   :6627   Max.   :5   Max.   :4.00   Max.   :1.00  
     Price             Size     
 Min.   : 21000   Min.   : 580  
 1st Qu.: 93225   1st Qu.:1215  
 Median :132600   Median :1474  
 Mean   :155331   Mean   :1629  
 3rd Qu.:169625   3rd Qu.:1865  
 Max.   :587000   Max.   :4050

A

Using the house.selling.price data, run and report regression results modeling y = selling price (in dollars) in terms of size of home (in square feet) and whether the home is new (1 = yes; 0 = no). In particular, for each variable; discuss statistical significance and interpret the meaning of the coefficient.

Code

#linear regression model y = selling price, explanatory variables = Size & New
summary(lm(Price ~ Size + New, data = house.selling.price))


Call:
lm(formula = Price ~ Size + New, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-205102  -34374   -5778   18929  163866 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -40230.867  14696.140  -2.738  0.00737 ** 
Size           116.132      8.795  13.204  < 2e-16 ***
New          57736.283  18653.041   3.095  0.00257 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared:  0.7226,    Adjusted R-squared:  0.7169 
F-statistic: 126.3 on 2 and 97 DF,  p-value: < 2.2e-16

Size of a house statistically significant at the 0.0001 level with a p-value less than 2 * e^-16. The coefficient of the Size variable is 116 suggesting that for every square-foot the size of a house increases, the predicted price increases by $116.
New The newness of a house is statistically significant at the 0.05 level with a p-value of 0.00257. The coefficient of 57735 suggests that the predicted price of a new house is $57736.283 more than a house than is not new.

B

Report and interpret the prediction equation, and form separate equations relating selling price to size for new and for not new homes.

Y = -40230.867 + 116.132(x1) + 57726.263(x2)

where x1 = size of house in square feet, x2 = if a house is new (1 = yes, 0 = no)

This prediction equation can be interpreted that the predicted price of a house in $ can be calculated by multiplying the size of the house in square feet by ~ $116, subtracting the intercept of -$40,230 from the price, and adding $57,726 if the house is new.

Seperate equations for the selling price for new and not new houses can be generated by subbing in x2 =1 for new houses and x2 =0 for not new houses and calculating the resulting equation

Ynew house = -40230.867 + 116.132(x1) + 57726.263(1) = -40,230.867 + 57,726.263 + 116.132 (x1) Ynew house = 17495.4 + 116.132(x1) where x1 = size of house in square-feet

This equation can be interpreted as the Predicted Price of a new house in dollars can be calculated from $17;495 + $116 for every square foot in size the house is.

Code

#substraction of intercepts
57726.263 - 40230.867

[1] 17495.4

Ynot new = -40230.867 + 116.132(x1) + 57726.263(0) = -40230.867 + 116.132(x1) + 0

*Ynotnew = -40230.867 + 116.132(x1) ** where x1 = size of house in square-feet

This equation can be interpreted as the price of a not-new house in dollars is calculated as $116 for every square-foot a house increases in size, minus $40,230.

C

Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.

for a new house where x1 = 3000 Y = 17495.4 + 116.132(x1) Y = 17495.4 + 116.132 (3000)

Code

#Arithmitic
116.132 * 3000

[1] 348396

Code

348396 + 17495.4

[1] 365891.4

Y = 17495.4 + 34896 Y = 365891.4 The predicted price of a new house with a size of 3000 square feet is $365,891.4

For a non-new house where x1 = 3000 Y = -40230.867 + 116.132(x1) Y = = -40230.867 + 116.132(3000)

Code

#Arithmetic
116.132 * 3000

[1] 348396

Code

348396 - 40230.867

[1] 308165.1

Y = -40230.867 + 348396
Y = 308165.1

The predicted price of a not new house with a size of 3000 square feet is $308,165.1

D

Fit another model, this time with an interaction term allowing interaction between size and new, and report the regression results

Code

#linear regression model y = selling price, explanatory variables = Interaction Size * New
summary(lm(Price ~ Size * New, data = house.selling.price))


Call:
lm(formula = Price ~ Size * New, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-175748  -28979   -6260   14693  192519 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -22227.808  15521.110  -1.432  0.15536    
Size           104.438      9.424  11.082  < 2e-16 ***
New         -78527.502  51007.642  -1.540  0.12697    
Size:New        61.916     21.686   2.855  0.00527 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared:  0.7443,    Adjusted R-squared:  0.7363 
F-statistic: 93.15 on 3 and 96 DF,  p-value: < 2.2e-16

In the model with the interaction between Size * New, the Adjusted R-squared of the model is 0.7363 as opposed to the model with Size & New as explanatory variables without an interaction, which had an adjusted R-squared of 0.7169. In the interaction model:

Size has a p-value of less than 2*e-^16, being statistically significant at the 0.05 level. The coefficient of 104.438 suggest that the price of the house increases $104 for every square foot a house increases in size
New has a p-value of 0.12697, not being statistically significant at the 0.05 or 0.1 levels. The coefficient of -78527.502 suggests that if a house is new, regardless of size, the predicted price of the house decreases by $78527. In the previous model without the interaction between new & size, new had a positive coefficient and was statistically significant.
Size x New has a p-value of 0.00527, which is statistically significant at the 0.01 level. This suggests there is a interaction between the newness of a house and the size of a house. The coefficient of 61.916 suggests the predicted price of only new houses increases by $61.9 for each square foot the size of the house increases.

Written out as an equation for predicted price, where x1 = size of house in square feet & x2 = newness of house

Y = -22227.808 + 104.438(x1) - 78527.502(x2) + 61.916(x1)(x2)

E

Report the lines relating the predicted selling price to the size for homes that are (i) new, (ii) not new.

1. new, x2 = 1. Plugging that into the prediction equation: Y = -22227.808 + 104.438(x1) - 78527.502(x2) + 61.916(x1)(x2) Y = -22227.808 + 104.438(x1) - 78527.502(1) + 61.916(x1)(1) Y = -22227.808 + 104.438(x1) - 78527.502 + 61.916(x1)

Code

#Arithmetic
-22227.808 -78527.502

[1] -100755.3

Code

104.438 + 61.916

[1] 166.354

Ynew = -100755.3 + 166.354(x1)

1. not new, x2 = 0. Plugging that into the prediction equation: Y = -22227.808 + 104.438(x1) - 78527.502(0) + 61.916(x1)(0) Y = -22227.808 + 104.438(x1) - 0 + 0 Ynotnew = -22227.808 + 104.438(x1)

F

Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.

1. new, x1 = 3000 square feet Ynew = -100755.3 + 166.354(x1) Ynew = -100755.3 + 166.354(3000)

Code

166.354 * 3000

[1] 499062

Ynew = -100755.3 + 499062

Code

-100755.3 + 499062

[1] 398306.7

Ynew = 398306.7 The predicted selling price of a new house 3000 square feet in size is $398306.7

1. not new, x1 = 3000 square feet Ynotnew = -22227.808 + 104.438(x1) Ynotnew = -22227.808 + 104.438(3000)

Code

104.438 * 3000

[1] 313314

Ynotnew = -22227.808 + 313314

Code

-22227.808 + 313314

[1] 291086.2

Ynotnew = 291086.2 The predicted selling price of a not new house 3000 square feet in size is $291,086.2

G

Find the predicted selling price for a home of 1500 square feet that is (i) new, (ii) not new. Comparing to (F), explain how the difference in predicted selling prices changes as the size of home increases.

1. new, x1 = 1500 square feet Ynew = -100755.3 + 166.354(x1) Ynew = -100755.3 + 166.354(1500)

Code

166.354 * 1500

[1] 249531

Ynew = -100755.3 + 249531

Code

-100755.3 + 249531

[1] 148775.7

Ynew = 148775.7

The predicted selling price of a new house 1500 square feet in size is $148,775.7

1. not new, x1 = 1500 square feet Ynotnew = -22227.808 + 104.438(x1) Ynotnew = -22227.808 + 104.438(1500)

Code

 104.438 * 1500

[1] 156657

Ynotnew = -22227.808 + 156657

Code

-22227.808 + 156657

[1] 134429.2

Ynotnew = 134429.2 The predicted selling price of a not new house 1500 square feet in size is $ 134429.2

Difference in prices new & not new by size

Code

#Difference new, not new x1 = 3000 square feet
398306.7 - 291086.2

[1] 107220.5

Code

##Difference new, not new x1 = 1500 square feet
 148775.7 - 134429.2

[1] 14346.5

Code

#Division 
107220.5/14346.5

[1] 7.473635

The difference between the price of a new & not house 3000 feet in size is $107,220.5

The difference between the price of a new & not new house 1500 feet in size is $14,346.5

A house that is 3000 square feet in size is twice the size of a house 1500 square feet in size, the difference between a new & not new houses is 7.473635x more for a house that is 3000 square feet than a house that is 15000 square feet in size. This suggests that the larger the size of a house the more influence the newness of a house has on price.

H

Do you think the model with interaction or the one without it represents the relationship of size and new to the outcome price? What makes you prefer one model over another?

Based on the statistical significance of the interaction between New:Size at the 0.001 level & the higher Adjusted R-Squared of the model with the interaction term 0.7363) than the adjusted R-squared of the model without the interaction term (0.7169) both make me prefer the model with the interacction term over the other interms of best representing the relationship of size and new to the outcome price.