hw4
regression analysis
Homework 4
Author

Guanhua Tan

Published

April 1, 2023

Code
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(smss)
library(alr4)
Loading required package: car
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(ggplot2)
data(UN11)

Question 1

Code
y<--10536 + 53.8 *1240 +2.84 *18000
y
[1] 107296
Code
redidual_1 <- 145000-y
redidual_1
[1] 37704
Code
x2 <-53.8/2.84
x2
[1] 18.94366

A the predicted selling price is 107,292; the residual is 37,704; interpret is -10536.

B For fixed lot size, if the house size increase one square feet, the house price will increase 53.8.

C if the lot size need to increase 18.94, that will have the same impact as a one-square-foot increase in home size.

Question 2

Code
data(salary)
head(salary)
   degree rank    sex year ysdeg salary
1 Masters Prof   Male   25    35  36350
2 Masters Prof   Male   13    22  35350
3 Masters Prof   Male   10    23  28200
4 Masters Prof Female    7    27  26775
5     PhD Prof   Male   19    30  33696
6 Masters Prof   Male   16    21  28516
Code
summary(lm(salary~sex, salary))

Call:
lm(formula = salary ~ sex, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-8602.8 -4296.6  -100.8  3513.1 16687.9 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    24697        938  26.330   <2e-16 ***
sexFemale      -3340       1808  -1.847   0.0706 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5782 on 50 degrees of freedom
Multiple R-squared:  0.0639,    Adjusted R-squared:  0.04518 
F-statistic: 3.413 on 1 and 50 DF,  p-value: 0.0706

A. According the Chisq test, sex has the impact on salary.

Code
lm(salary ~ ., data = salary) |>
  confint()
                 2.5 %      97.5 %
(Intercept) 14134.4059 17357.68946
degreePhD    -663.2482  3440.47485
rankAssoc    2985.4107  7599.31080
rankProf     8396.1546 13841.37340
sexFemale    -697.8183  3030.56452
year          285.1433   667.47476
ysdeg        -280.6397    31.49105

The 95% confidence intervals for the female salary is (-697.81, 3030.56), other variables controlled.

Code
sex_salary <- lm(salary~degree+rank+sex+year+ysdeg, data=salary)
summary(sex_salary)

Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15746.05     800.18  19.678  < 2e-16 ***
degreePhD    1388.61    1018.75   1.363    0.180    
rankAssoc    5292.36    1145.40   4.621 3.22e-05 ***
rankProf    11118.76    1351.77   8.225 1.62e-10 ***
sexFemale    1166.37     925.57   1.260    0.214    
year          476.31      94.91   5.018 8.65e-06 ***
ysdeg        -124.57      77.49  -1.608    0.115    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16

C. degreePhD, sexFemale, and ysdeg doesn’t show statistical significance. For other fixed terms, if one person hold a associate professor ranking, the salary increases 5292.36, compared to a assistant professor. In the same condition, if one person holds a full professor, the salary increases 11118.76, compared to a assistant professor. People’s age has the positive impact on their salary. When the age increases 1 year, the salary increases 476.31.

Code
salary$rank <-relevel(salary$rank, ref="Prof")
salary$sex <-relevel(salary$sex, ref="Female")

sex_salary <- lm(salary~degree+rank+sex+year+ysdeg, data=salary)
summary(sex_salary)

Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  28031.18    1677.06  16.715  < 2e-16 ***
degreePhD     1388.61    1018.75   1.363    0.180    
rankAsst    -11118.76    1351.77  -8.225 1.62e-10 ***
rankAssoc    -5826.40    1012.93  -5.752 7.28e-07 ***
sexMale      -1166.37     925.57  -1.260    0.214    
year           476.31      94.91   5.018 8.65e-06 ***
ysdeg         -124.57      77.49  -1.608    0.115    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16

D. For other fixed terms, if one person holds a assistant professor ranking, the salary decreases 11118.76, compared to a full professor. In the same condition, if one person hold a associate professor, the salary decreases 5826.40, compared to a full professor.

Code
sex_salary <- lm(salary~degree+sex+year+ysdeg, data=salary)
summary(sex_salary)

Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-8146.9 -2186.9  -491.5  2279.1 11186.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15897.03    1259.87  12.618  < 2e-16 ***
degreePhD   -3299.35    1302.52  -2.533 0.014704 *  
sexMale      1286.54    1313.09   0.980 0.332209    
year          351.97     142.48   2.470 0.017185 *  
ysdeg         339.40      80.62   4.210 0.000114 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared:  0.6312,    Adjusted R-squared:  0.5998 
F-statistic: 20.11 on 4 and 47 DF,  p-value: 1.048e-09

E if the model drops the variable rank, the variable years after the degree presents the strongly statistical significance.

Code
salary_F <- salary %>%
  mutate(fifteen_years= case_when(ysdeg > 15 ~0,ysdeg <= 15 ~ 1))
summary(lm(salary~degree + sex + rank+ year+fifteen_years,salary_F))

Call:
lm(formula = salary ~ degree + sex + rank + year + fifteen_years, 
    data = salary_F)

Residuals:
    Min      1Q  Median      3Q     Max 
-3403.3 -1387.0  -167.0   528.2  9233.8 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    25332.46    1034.63  24.485  < 2e-16 ***
degreePhD        818.93     797.48   1.027   0.3100    
sexMale         -907.14     840.54  -1.079   0.2862    
rankAsst      -11096.95    1191.00  -9.317 4.54e-12 ***
rankAssoc      -6124.28    1028.58  -5.954 3.65e-07 ***
year             434.85      78.89   5.512 1.65e-06 ***
fifteen_years   2163.46    1072.04   2.018   0.0496 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2362 on 45 degrees of freedom
Multiple R-squared:  0.8594,    Adjusted R-squared:  0.8407 
F-statistic: 45.86 on 6 and 45 DF,  p-value: < 2.2e-16

Question 3

Code
data("house.selling.price")
head(house.selling.price)
  case Taxes Beds Baths New  Price Size
1    1  3104    4     2   0 279900 2048
2    2  1173    2     1   0 146500  912
3    3  3076    4     2   0 237700 1654
4    4  1608    3     2   0 200000 2068
5    5  1454    3     3   0 159900 1477
6    6  2997    3     2   1 499900 3153
Code
house_model_1 <- lm(Price~Size+New, data=house.selling.price)
summary(house_model_1)

Call:
lm(formula = Price ~ Size + New, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-205102  -34374   -5778   18929  163866 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -40230.867  14696.140  -2.738  0.00737 ** 
Size           116.132      8.795  13.204  < 2e-16 ***
New          57736.283  18653.041   3.095  0.00257 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared:  0.7226,    Adjusted R-squared:  0.7169 
F-statistic: 126.3 on 2 and 97 DF,  p-value: < 2.2e-16

A. The variables Size and New show the statistical significance. the price will increase 116 when the house size increases 1 square feet. If the house is new, the price will increase 57736.283, compared with the size old house.

B. Price= 116.132Size+57736.283New-40230.86 For new homes Price=115.132Size + 57736.283-40230.86 For not new homes Price=115.132Size -40239.86.

Code
y_new <- 116.32*3000+57736.283-40230.86
y_old <- 116.32*3000-40230.86
y_new
[1] 366465.4
Code
y_old
[1] 308729.1

C. the new home price is 366465.4; the old home price is 308729.1

Code
house_model_2 <- lm(Price~Size+New*Size, data=house.selling.price)
summary(house_model_2)

Call:
lm(formula = Price ~ Size + New * Size, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-175748  -28979   -6260   14693  192519 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -22227.808  15521.110  -1.432  0.15536    
Size           104.438      9.424  11.082  < 2e-16 ***
New         -78527.502  51007.642  -1.540  0.12697    
Size:New        61.916     21.686   2.855  0.00527 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared:  0.7443,    Adjusted R-squared:  0.7363 
F-statistic: 93.15 on 3 and 96 DF,  p-value: < 2.2e-16

D. The regression result shows that the interaction term is statistical significance and r squared value is larger than the previous model.

E. for new homes, y=-22227.808+104.438Size-78527.502+61.916Size for old homes, y=104.438*Size-100755.3

Code
Size_3000 <-3000
Price_new <- 22227.808+104.438*Size_3000-78527.502+61.916*Size_3000
Price_old <-104.438*Size_3000-100755.3
Price_new
[1] 442762.3
Code
Price_old
[1] 212558.7

F. For 3000 square feet, the new hous is 442,762.3 which the old one is 212,558.7

Code
Size_1500 <-1500
Price_new_1500 <- 22227.808+104.438*Size_1500-78527.502+61.916*Size_1500
Price_old_1500 <-104.438*Size_1500-100755.3
Price_new_1500
[1] 193231.3
Code
Price_old_1500
[1] 55901.7

G. According to the predicted lines, when the size increases, the new house price will increase more than the old one’s.

H. I think the model with the interaction one represents the relationship between size and new because it has a larger value of R square.