Homework 4

hw4
Author

Emma Rasmussen

Published

November 13, 2022

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

library(readxl)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(stringr)
library(alr4)
library(smss)

1a.

Prediction equation: ŷ = −10,536 + 53.8x1 + 2.84x2.

#plugging in size of home and lot size into prediction equation
-10536 + 53.8*1240 + 2.84*18000
[1] 107296
#Calculating residual, actual-predicted
145000-107296
[1] 37704

Predicted selling price $107,296 Residual: $37,704

1b.

According to the prediction equations, for a fixed lot size, the price of the house is predicted to increase $53.80 per square foot of house size.

1c.

53.8/2.84
[1] 18.94366

For a fixed home size, lot size would need to increase by 18.94366 feet to have the same impact as a one square foot increase in home size.

#Check: 
-10536 + 53.8*1400 + 2.84*20000
[1] 121584
-10536 + 53.8*1400+ 2.84*20018.94366
[1] 121637.8
121637.8-121584
[1] 53.8

2a.

H0: Salary(male)=Salary(female) Ha:Salary(male) not equal to Salary(female)

data(salary)
head(salary)
   degree rank    sex year ysdeg salary
1 Masters Prof   Male   25    35  36350
2 Masters Prof   Male   13    22  35350
3 Masters Prof   Male   10    23  28200
4 Masters Prof Female    7    27  26775
5     PhD Prof   Male   19    30  33696
6 Masters Prof   Male   16    21  28516
t.test(salary ~ sex, data=salary)

    Welch Two Sample t-test

data:  salary by sex
t = 1.7744, df = 21.591, p-value = 0.09009
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
 -567.8539 7247.1471
sample estimates:
  mean in group Male mean in group Female 
            24696.79             21357.14 

According to this two sample t.test, there is not evidence for a difference in salary between male and female professors at the 5% significance level. At the 10% significance level, there is a difference.

2b.

#creating a model
summary(lm(salary ~ degree + rank + sex + year + ysdeg, data=salary))

Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15746.05     800.18  19.678  < 2e-16 ***
degreePhD    1388.61    1018.75   1.363    0.180    
rankAssoc    5292.36    1145.40   4.621 3.22e-05 ***
rankProf    11118.76    1351.77   8.225 1.62e-10 ***
sexFemale    1166.37     925.57   1.260    0.214    
year          476.31      94.91   5.018 8.65e-06 ***
ysdeg        -124.57      77.49  -1.608    0.115    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16
#assigning model to an object
prof_fit_1<- lm(salary ~ degree + rank + sex + year + ysdeg, data=salary)

#Creating a confidence interval for coefficients in the model
confint(prof_fit_1)
                 2.5 %      97.5 %
(Intercept) 14134.4059 17357.68946
degreePhD    -663.2482  3440.47485
rankAssoc    2985.4107  7599.31080
rankProf     8396.1546 13841.37340
sexFemale    -697.8183  3030.56452
year          285.1433   667.47476
ysdeg        -280.6397    31.49105

The 95% confidence interval for the difference in salary between male and females is -697.82 and 3030.56.

2c.

Degree: The p-value for degree is not statistically significant. However, acccoring to this regression equation, for a faculty member with a PhD, their predicted salary is $1,388.61 higher than a faculty member with a masters degree (all other variables held constant).

Rank: The baseline category is Asst professor. For a faculty member of rank Associate, all other variables held constant, their predicted salary is $5,292.36 more than an Asst Professor.

For a faculty member of rank Professor, the predicted salary is $11,118.76 more than a faculty of rank Asst Professor (all other variables held constant).

These salary differences are statistically significant at the 0.0001 alpha level for both Asst and Professor rank.

Sex: For a faculty member who is female, their predicted salary is $1166.37 more than a faculty member who is male. However, his coefficient is not statistically significant at any alpha level.

Year: For each year in their current rank, the salary is expected to increase by $478.31. The coeffiticent is significant at the 0.0001 alpha level.

ysdegree: For eah year after completion of their highest degree, salary is expected to decrease by $124.57. However this coefficient is not significant at any alpha level.

2d.

salary$rank<- relevel(salary$rank, ref = 'Assoc')
summary(lm(salary ~ degree + rank + sex + year + ysdeg, data=salary))

Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 21038.41    1109.12  18.969  < 2e-16 ***
degreePhD    1388.61    1018.75   1.363    0.180    
rankAsst    -5292.36    1145.40  -4.621 3.22e-05 ***
rankProf     5826.40    1012.93   5.752 7.28e-07 ***
sexFemale    1166.37     925.57   1.260    0.214    
year          476.31      94.91   5.018 8.65e-06 ***
ysdeg        -124.57      77.49  -1.608    0.115    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16

The baseline category is now Assoc. According to these coefficients, faculty of rank asst are expected to make $5292.36 less than Associate professors. Faculty of rank Professor are expected to make $5826.40 more than Associate professors.

2e.

summary(lm(salary ~ degree + sex + year + ysdeg, data=salary))

Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-8146.9 -2186.9  -491.5  2279.1 11186.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 17183.57    1147.94  14.969  < 2e-16 ***
degreePhD   -3299.35    1302.52  -2.533 0.014704 *  
sexFemale   -1286.54    1313.09  -0.980 0.332209    
year          351.97     142.48   2.470 0.017185 *  
ysdeg         339.40      80.62   4.210 0.000114 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared:  0.6312,    Adjusted R-squared:  0.5998 
F-statistic: 20.11 on 4 and 47 DF,  p-value: 1.048e-09

Excluding the rank variable reveals a difference between male and female salaries with females making $1286.54 less than men. However, this difference is not signficant at any standard alpha levels.

2f.

#creating a dummy variable new and old dean
salary<-mutate(salary, dean= case_when(ysdeg < 15 ~"new",
                               ysdeg >=15 ~"old"))
summary(lm(salary ~ dean + degree + sex + rank +year, data=salary))

Call:
lm(formula = salary ~ dean + degree + sex + rank + year, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-3588.0 -1532.2  -232.2   565.7  9132.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  20468.7      951.7  21.507  < 2e-16 ***
deanold      -2421.6     1187.9  -2.038   0.0474 *  
degreePhD     1073.5      843.3   1.273   0.2096    
sexFemale     1046.7      858.0   1.220   0.2289    
rankAsst     -5012.5     1002.3  -5.001 9.16e-06 ***
rankProf      6213.3     1045.0   5.946 3.76e-07 ***
year           450.7       81.5   5.530 1.55e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2360 on 45 degrees of freedom
Multiple R-squared:  0.8597,    Adjusted R-squared:  0.841 
F-statistic: 45.95 on 6 and 45 DF,  p-value: < 2.2e-16

I excluded ysdegree after creating the new variable to prevent multicollinearity. (Because one variable is made from the other.)

According to this equation, faculty hired by the old dean make $2421.60 less than new faculty when we control for other factors. This is significant at the 0.05 alpha level.

3a.

data(house.selling.price)
house.selling.price
    case Taxes Beds Baths New  Price Size
1      1  3104    4     2   0 279900 2048
2      2  1173    2     1   0 146500  912
3      3  3076    4     2   0 237700 1654
4      4  1608    3     2   0 200000 2068
5      5  1454    3     3   0 159900 1477
6      6  2997    3     2   1 499900 3153
7      7  4054    3     2   0 265500 1355
8      8  3002    3     2   1 289900 2075
9      9  6627    5     4   0 587000 3990
10    10   320    3     2   0  70000 1160
11    11   630    3     2   0  64500 1220
12    12  1780    3     2   0 167000 1690
13    13  1630    3     2   0 114600 1380
14    14  1530    3     2   0 103000 1590
15    15   930    3     1   0 101000 1050
16    16   590    2     1   0  70000  770
17    17  1050    3     2   0  85000 1410
18    18    20    3     1   0  22500 1060
19    19   870    2     2   0  90000 1300
20    20  1320    3     2   0 133000 1500
21    21  1350    2     1   0  90500  820
22    22  5616    4     3   1 577500 3949
23    23   680    2     1   0 142500 1170
24    24  1840    3     2   0 160000 1500
25    25  3680    4     2   0 240000 2790
26    26  1660    3     1   0  87000 1030
27    27  1620    3     2   0 118600 1250
28    28  3100    3     2   0 140000 1760
29    29  2070    2     3   0 148000 1550
30    30   830    3     2   0  69000 1120
31    31  2260    4     2   0 176000 2000
32    32  1760    3     1   0  86500 1350
33    33  2750    3     2   1 180000 1840
34    34  2020    4     2   0 179000 2510
35    35  4900    3     3   1 338000 3110
36    36  1180    4     2   0 130000 1760
37    37  2150    3     2   0 163000 1710
38    38  1600    2     1   0 125000 1110
39    39  1970    3     2   0 100000 1360
40    40  2060    3     1   0 100000 1250
41    41  1980    3     1   0 100000 1250
42    42  1510    3     2   0 146500 1480
43    43  1710    3     2   0 144900 1520
44    44  1590    3     2   0 183000 2020
45    45  1230    3     2   0  69900 1010
46    46  1510    2     2   0  60000 1640
47    47  1450    2     2   0 127000  940
48    48   970    3     2   0  86000 1580
49    49   150    2     2   0  50000  860
50    50  1470    3     2   0 137000 1420
51    51  1850    3     2   0 121300 1270
52    52   820    2     1   0  81000  980
53    53  2050    4     2   0 188000 2300
54    54   710    3     2   0  85000 1430
55    55  1280    3     2   0 137000 1380
56    56  1360    3     2   0 145000 1240
57    57   830    3     2   0  69000 1120
58    58   800    3     2   0 109300 1120
59    59  1220    3     2   0 131500 1900
60    60  3360    4     3   0 200000 2430
61    61   210    3     2   0  81900 1080
62    62   380    2     1   0  91200 1350
63    63  1920    4     3   0 124500 1720
64    64  4350    3     3   0 225000 4050
65    65  1510    3     2   0 136500 1500
66    66  4154    3     3   0 381000 2581
67    67  1976    3     2   1 250000 2120
68    68  3605    3     3   1 354900 2745
69    69  1400    3     2   0 140000 1520
70    70   790    2     2   0  89900 1280
71    71  1210    3     2   0 137000 1620
72    72  1550    3     2   0 103000 1520
73    73  2800    3     2   0 183000 2030
74    74  2560    3     2   0 140000 1390
75    75  1390    4     2   0 160000 1880
76    76  5443    3     2   0 434000 2891
77    77  2850    2     1   0 130000 1340
78    78  2230    2     2   0 123000  940
79    79    20    2     1   0  21000  580
80    80  1510    4     2   0  85000 1410
81    81   710    3     2   0  69900 1150
82    82  1540    3     2   0 125000 1380
83    83  1780    3     2   1 162600 1470
84    84  2920    2     2   1 156900 1590
85    85  1710    3     2   1 105900 1200
86    86  1880    3     2   0 167500 1920
87    87  1680    3     2   0 151800 2150
88    88  3690    5     3   0 118300 2200
89    89   900    2     2   0  94300  860
90    90   560    3     1   0  93900 1230
91    91  2040    4     2   0 165000 1140
92    92  4390    4     3   1 285000 2650
93    93   690    3     1   0  45000 1060
94    94  2100    3     2   0 124900 1770
95    95  2880    4     2   0 147000 1860
96    96   990    2     2   0 176000 1060
97    97  3030    3     2   0 196500 1730
98    98  1580    3     2   0 132200 1370
99    99  1770    3     2   0  88400 1560
100  100  1430    3     2   0 127200 1340
summary(lm(Price ~ Size + New, data= house.selling.price))

Call:
lm(formula = Price ~ Size + New, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-205102  -34374   -5778   18929  163866 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -40230.867  14696.140  -2.738  0.00737 ** 
Size           116.132      8.795  13.204  < 2e-16 ***
New          57736.283  18653.041   3.095  0.00257 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared:  0.7226,    Adjusted R-squared:  0.7169 
F-statistic: 126.3 on 2 and 97 DF,  p-value: < 2.2e-16

According to the coefficient for size, the price of a house is expected to increase by $116.132 for each square foot increase in size. The coefficient is significant at the 0.0001 alpha level, meaning there is a strong correlation between size and price when the age status (new/old) is held fixed.

According to the coefficient for new, a new house is expected to cost $57,736.283 more than an old house. This variable is significant at the 0.001 level, meaning that whether a house is old or new has a strong positive impact on price of the house.

3b.

Y = -40230.867 + 116.132(X1) + 57736.283 (X2) where X1 represents size and X2 represents new/old.

For a new house: Y = -40230.867 + 116.132(size) + 57736.283

For an old house: Y = -40230.867 + 116.132(size)

3c.

#new:
-40230.867 + 116.132*3000 +  57736.283
[1] 365901.4
#not new:
-40230.867 + 116.132*3000 +  57736.283*0
[1] 308165.1

New: $365,901.40 Not new: $308165.10

3d.

summary(lm(Price ~ Size + New + Size*New, data= house.selling.price))

Call:
lm(formula = Price ~ Size + New + Size * New, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-175748  -28979   -6260   14693  192519 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -22227.808  15521.110  -1.432  0.15536    
Size           104.438      9.424  11.082  < 2e-16 ***
New         -78527.502  51007.642  -1.540  0.12697    
Size:New        61.916     21.686   2.855  0.00527 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared:  0.7443,    Adjusted R-squared:  0.7363 
F-statistic: 93.15 on 3 and 96 DF,  p-value: < 2.2e-16

Y = -2227.808 + 104.438(size) + 61.916(size:new) -78527.502(new) Both size and the interaction term between size and new are statistically significant. The new/coefficient is no longer statistically significant.

3e.

For a new house: Y = -2227.808 + 166.354(size) - 78527.502 Old: Y = -2227.808 + 104.438(size)

3f.

#new: 
-2227.808 + 166.354*3000 - 78527.502
[1] 418306.7
#not new:
-2227.808 + 104.438*3000
[1] 311086.2

New: $418,306.70 Not new: $311,086.20

3g.

#new: 
-2227.808 + 166.354*1500 - 78527.502
[1] 168775.7
#not new:
-2227.808 + 104.438*1500
[1] 154429.2

New: $168,775.70 Not new: $154429.20

According to this equation, houses that are larger are much greater in price, especially when comparing new large houses to small new houses. For larger houses, the difference in cost between new and not new is much larger, compared to smaller houses, where new/not new makes less of a difference in price.

3h.

I prefer the second model with the interaction term which provides a clearer picture of how increased square footage makes a larger difference in bigger sized houses. The model with the interaction term also has a larger adjusted R squared.

However, I would be skeptical using this model with small homes: for a home that is 1000 square feet, the predicted price for a new house is greater than for an old house. In other words, I do not feel this model would be good at predicting tiny home prices.

#for a 1000 sq foot home:
#New:
-2227.808 + 166.354*1000 - 78527.502
[1] 85598.69
#Not new:
-2227.808 + 104.438*1000
[1] 102210.2