Code
= -10536 + (53.8 * 1240) + (2.84 * 18000)
price print(price)
[1] 107296
Code
= price - 145000
residual print(residual)
[1] -37704
April 22, 2023
ŷ = −10,536 + 53.8x1 + 2.84x2
y = selling price of home (in dollars) x1 = size of home (in square feet) x2 = lot size (in square feet)
A. particular home of 1240 square feet on a lot of 18,000 square feet sold for $145,000. Find the predicted selling price and the residual, and interpret.
[1] 107296
[1] -37704
The predicted price is $107,296
, which is much under the $145,000
price sold by -$37,704
.
B. For fixed lot size, how much is the house selling price predicted to increase for each square- foot increase in home size? Why?
It’s $53.80
for every 1x increase in square feet of the selling house, as shown in the equation as a coefficient of 1x.
C. According to this prediction equation, for fixed home size, how much would lot size need to increase to have the same impact as a one-square-foot increase in home size?
You need to increase the lot size by 18.94 sq feet to have the same impact as a one square foot increase.
Loading required package: car
Loading required package: carData
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
degree rank sex year ysdeg salary
1 Masters Prof Male 25 35 36350
2 Masters Prof Male 13 22 35350
3 Masters Prof Male 10 23 28200
4 Masters Prof Female 7 27 26775
5 PhD Prof Male 19 30 33696
6 Masters Prof Male 16 21 28516
A. Test the hypothesis that the mean salary for men and women is the same, without regard to any other variable but sex.
With a p-value of 0.09, we can conclude that the mean salary for both men and women is the same, thus accepting the null hypothesis.
Error in select(salary, c(sex)): could not find function "select"
Error in select(salary, c(salary)): could not find function "select"
Error in unlist(sex): object 'sex' not found
Error in unlist(salary1): object 'salary1' not found
Error in eval(predvars, data, env): object 'salary1' not found
function (x, ...)
UseMethod("mean")
<bytecode: 0x10ac0f270>
<environment: namespace:base>
B. Run a multiple linear regression with salary as the outcome variable and everything else as predictors, including sex.
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15746.05 800.18 19.678 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAssoc 5292.36 1145.40 4.621 3.22e-05 ***
rankProf 11118.76 1351.77 8.225 1.62e-10 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
2.5 % 97.5 %
(Intercept) 14134.4059 17357.68946
degreePhD -663.2482 3440.47485
rankAssoc 2985.4107 7599.31080
rankProf 8396.1546 13841.37340
sexFemale -697.8183 3030.56452
year 285.1433 667.47476
ysdeg -280.6397 31.49105
C. Interpret your finding for each predictor variable; discuss (a) statistical significance, (b) interpretation of the coefficient / slope in relation to the outcome variable and other variables
degreePD
- This means that, all else being equal, PhD holders can expect to earn an average of $1,388.60
more in salary.
rankAssoc
- Associate professors can expect to earn an average of $5,292.40
more in salary.
rankProf
- Professors would earn an average of $11,118.80
more in salary.
sexFemale
- Female faculty workers would earn an average of $1,166.4
more in salary.
year
- The longer a faculty member works at a college, the more they earn in salary, with an average of $476.30
in increase.
ysdeg
- However, if it’s been several years since you earned your last degree, expect a decrease of -$124.60
in salary on average.
D. Change the baseline category for the rank variable. Interpret the coefficients related to rank again.
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26864.81 1375.29 19.534 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAsst -11118.76 1351.77 -8.225 1.62e-10 ***
rankAssoc -5826.40 1012.93 -5.752 7.28e-07 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
In the new model, we can see that assistant professors and associate professors lose an average of -$11,118.80
and -$5,826.40
in salary, respectively.
E Removing rank
from the model
Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)
Residuals:
Min 1Q Median 3Q Max
-8146.9 -2186.9 -491.5 2279.1 11186.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17183.57 1147.94 14.969 < 2e-16 ***
degreePhD -3299.35 1302.52 -2.533 0.014704 *
sexFemale -1286.54 1313.09 -0.980 0.332209
year 351.97 142.48 2.470 0.017185 *
ysdeg 339.40 80.62 4.210 0.000114 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared: 0.6312, Adjusted R-squared: 0.5998
F-statistic: 20.11 on 4 and 47 DF, p-value: 1.048e-09
When removing rank
from the equation, we can see a decrease in the average salary for PhD holders and female faculty members by -$3,299.30
and -$1,286.50
, respectively. In addition, the salary average for year is smaller than before, but it’s still an increase. The biggest shift is ysdeg
with $339.40
increase in salary on average.
F New variable, new hypothesis
Some people have argued that the new Dean has been making offers that are a lot more generous to newly hired faculty than the previous one and that this might explain some of the variation in Salary.
Error in salary %>% mutate(ysdeg15 = ifelse(ysdeg <= 15, 1, 0)): could not find function "%>%"
degree rank sex year ysdeg salary
1 Masters Prof Male 25 35 36350
2 Masters Prof Male 13 22 35350
3 Masters Prof Male 10 23 28200
4 Masters Prof Female 7 27 26775
5 PhD Prof Male 19 30 33696
6 Masters Prof Male 16 21 28516
7 PhD Prof Female 0 32 24900
8 Masters Prof Male 16 18 31909
9 PhD Prof Male 13 30 31850
10 PhD Prof Male 13 31 32850
11 Masters Prof Male 12 22 27025
12 Masters Assoc Male 15 19 24750
13 Masters Prof Male 9 17 28200
14 PhD Assoc Male 9 27 23712
15 Masters Prof Male 9 24 25748
16 Masters Prof Male 7 15 29342
17 Masters Prof Male 13 20 31114
18 PhD Assoc Male 11 14 24742
19 PhD Assoc Male 10 15 22906
20 PhD Prof Male 6 21 24450
21 PhD Asst Male 16 23 19175
22 PhD Assoc Male 8 31 20525
23 Masters Prof Male 7 13 27959
24 Masters Prof Female 8 24 38045
25 Masters Assoc Male 9 12 24832
26 Masters Prof Male 5 18 25400
27 Masters Assoc Male 11 14 24800
28 Masters Prof Female 5 16 25500
29 PhD Assoc Male 3 7 26182
30 PhD Assoc Male 3 17 23725
31 PhD Asst Female 10 15 21600
32 PhD Assoc Male 11 31 23300
33 PhD Asst Male 9 14 23713
34 PhD Assoc Female 4 33 20690
35 PhD Assoc Female 6 29 22450
36 Masters Assoc Male 1 9 20850
37 Masters Asst Female 8 14 18304
38 Masters Asst Male 4 4 17095
39 Masters Asst Male 4 5 16700
40 Masters Asst Male 4 4 17600
41 Masters Asst Male 3 4 18075
42 PhD Asst Male 3 11 18000
43 Masters Assoc Male 0 7 20999
44 Masters Asst Female 3 3 17250
45 Masters Asst Male 2 3 16500
46 Masters Asst Male 2 1 16094
47 Masters Asst Female 2 6 16150
48 Masters Asst Female 2 2 15350
49 Masters Asst Male 1 1 16244
50 Masters Asst Female 1 1 16686
51 Masters Asst Female 1 1 15000
52 Masters Asst Female 0 2 20300
Error in eval(predvars, data, env): object 'ysdeg15' not found
Error in print(model4): object 'model4' not found
Error in cor.test.default(salary$ysdeg, salary$ysdeg15): 'y' must be a numeric vector
I took out the ysdeg
variable as they’re too similar to the new variable I created ysdeg15
to avoid multicollinearity.
The correlation is -0.8434239
, in which we can reject that alternative hypothesis. In other words, there are no changes in salary average.
case Taxes Beds Baths New Price Size
1 1 3104 4 2 0 279900 2048
2 2 1173 2 1 0 146500 912
3 3 3076 4 2 0 237700 1654
4 4 1608 3 2 0 200000 2068
5 5 1454 3 3 0 159900 1477
6 6 2997 3 2 1 499900 3153
A. Using the house.selling.price data, run and report regression results modeling y = selling price (in dollars) in terms of size of home (in square feet) and whether the home is new (1 = yes; 0 = no).
Call:
lm(formula = Price ~ Size + New, data = house.selling.price)
Residuals:
Min 1Q Median 3Q Max
-205102 -34374 -5778 18929 163866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -40230.867 14696.140 -2.738 0.00737 **
Size 116.132 8.795 13.204 < 2e-16 ***
New 57736.283 18653.041 3.095 0.00257 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared: 0.7226, Adjusted R-squared: 0.7169
F-statistic: 126.3 on 2 and 97 DF, p-value: < 2.2e-16
A new house sells at $57,736.30
more, while for every one square foot is sold at $116.10
.
B. Report and interpret the prediction equation, and form separate equations relating selling price to size for new and for not new homes.
y = -40,230
+ 116.10
x1 + 57,736.30
x2
C. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
A not new house would be sold at $308,070
D. Fit another model, this time with an interaction term allowing interaction between size and new, and report the regression results
Call:
lm(formula = Price ~ Size * New, data = house.selling.price)
Residuals:
Min 1Q Median 3Q Max
-175748 -28979 -6260 14693 192519
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22227.808 15521.110 -1.432 0.15536
Size 104.438 9.424 11.082 < 2e-16 ***
New -78527.502 51007.642 -1.540 0.12697
Size:New 61.916 21.686 2.855 0.00527 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared: 0.7443, Adjusted R-squared: 0.7363
F-statistic: 93.15 on 3 and 96 DF, p-value: < 2.2e-16
E. Report the lines relating the predicted selling price to the size for homes that are (i) new, (ii) not new.
[1] 212626.6
[1] 291154.1
F. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
A new house would sell at $212,626.60
, while a not new house would sell at $291,154.10
.
G. Find the predicted selling price for a home of 1500 square feet that is (i) new, (ii) not new. Comparing to (F), explain how the difference in predicted selling prices changes as the size of home increases.
[1] 191656.3
[1] 133920
A new house would sell at $191,656.30
, less than the $212,626.60
in Question F, and a not new house would sell at $133,920
, also less than the $291,154.10
in Question F as well.
H. Do you think the model with interaction or the one without it represents the relationship of size and new to the outcome price? What makes you prefer one model over another?
I think the model without the interaction represents the relationship of size and newness to the outcome price because it’s simpler, easier to interpret, and there is more statistical significance as opposed to the model with the interaction. Despite that, the RSS, R-square, and adjusted R-square are similar.
---
title: "Kristin Abijaoude_HW4"
editor: visual
date: "04/22/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- Hw4
- kristin abijaoude
---
# Question 1
ŷ = −10,536 + 53.8x1 + 2.84x2
y = selling price of home (in dollars) x1 = size of home (in square feet) x2 = lot size (in square feet)
A. particular home of 1240 square feet on a lot of 18,000 square feet sold for \$145,000. Find the predicted selling price and the residual, and interpret.
```{r}
price = -10536 + (53.8 * 1240) + (2.84 * 18000)
print(price)
residual = price - 145000
print(residual)
```
The predicted price is `$107,296`, which is much under the `$145,000` price sold by `-$37,704`.
B. For fixed lot size, how much is the house selling price predicted to increase for each square- foot increase in home size? Why?
It's `$53.80` for every 1x increase in square feet of the selling house, as shown in the equation as a coefficient of 1x.
C. According to this prediction equation, for fixed home size, how much would lot size need to increase to have the same impact as a one-square-foot increase in home size?
You need to increase the lot size by 18.94 sq feet to have the same impact as a one square foot increase.
```{r}
# x1 / x2
lot = 53.80 / 2.84
print(lot)
```
# Question 2
```{r}
library(alr4)
data(salary, package = "alr4")
head(salary)
```
A. Test the hypothesis that the mean salary for men and women is the same, without regard to any other variable but sex.
With a p-value of 0.09, we can conclude that the mean salary for both men and women is the same, thus accepting the null hypothesis.
```{r}
sex <- select(salary, c(sex))
salary1 <- select(salary, c(salary))
sex <- as.numeric(unlist(sex))
salary1 <- as.numeric(unlist(salary1))
mean <- t.test(salary1 ~ sex, var.equal = FALSE, alternative = "two.sided")
print(mean)
```
B. Run a multiple linear regression with salary as the outcome variable and everything else as predictors, including sex.
```{r}
model <- lm(formula = salary ~ ., data = salary)
summary(model)
```
```{r}
confint(model, 'sexFemale', level=0.95)
```
```{r}
confint(model, level=0.95)
```
C. Interpret your finding for each predictor variable; discuss (a) statistical significance, (b) interpretation of the coefficient / slope in relation to the outcome variable and other variables
`degreePD` - This means that, all else being equal, PhD holders can expect to earn an average of `$1,388.60` more in salary.
`rankAssoc` - Associate professors can expect to earn an average of `$5,292.40` more in salary.
`rankProf` - Professors would earn an average of `$11,118.80` more in salary.
`sexFemale` - Female faculty workers would earn an average of `$1,166.4` more in salary.
`year` - The longer a faculty member works at a college, the more they earn in salary, with an average of `$476.30` in increase.
`ysdeg` - However, if it's been several years since you earned your last degree, expect a decrease of `-$124.60` in salary on average.
D. Change the baseline category for the rank variable. Interpret the coefficients related to rank again.
```{r}
salary$rank <- relevel(salary$rank, ref = "Prof")
model2 <- lm(formula = salary ~ ., data = salary)
summary(model2)
```
In the new model, we can see that assistant professors and associate professors lose an average of `-$11,118.80` and `-$5,826.40` in salary, respectively.
E Removing `rank` from the model
```{r}
model3 <- lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)
summary(model3)
```
When removing `rank` from the equation, we can see a decrease in the average salary for PhD holders and female faculty members by `-$3,299.30` and `-$1,286.50`, respectively. In addition, the salary average for year is smaller than before, but it's still an increase. The biggest shift is `ysdeg` with `$339.40` increase in salary on average.
F New variable, new hypothesis
Some people have argued that the new Dean has been making offers that are a lot more generous to newly hired faculty than the previous one and that this might explain some of the variation in Salary.
```{r}
salary <- salary %>%
mutate(ysdeg15 = ifelse(ysdeg <= 15, 1,0))
salary
model4 <- lm(formula = salary ~ degree + sex + year + ysdeg15, data = salary)
print(model4)
cor.test(salary$ysdeg, salary$ysdeg15)
```
I took out the `ysdeg` variable as they're too similar to the new variable I created `ysdeg15` to avoid multicollinearity.
The correlation is `-0.8434239`, in which we can reject that alternative hypothesis. In other words, there are no changes in salary average.
# Question 3
```{r}
library(smss)
data("house.selling.price", package = "smss")
head(house.selling.price)
```
A. Using the house.selling.price data, run and report regression results modeling y = selling price (in dollars) in terms of size of home (in square feet) and whether the home is new (1 = yes; 0 = no).
```{r}
selling <- lm(formula = Price ~ Size + New, data = house.selling.price)
summary(selling)
```
A new house sells at `$57,736.30` more, while for every one square foot is sold at `$116.10`.
B. Report and interpret the prediction equation, and form separate equations relating selling price to size for new and for not new homes.
y = `-40,230` + `116.10`x1 + `57,736.30`x2
C. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
A not new house would be sold at `$308,070`
```{r}
price1 = -40230 + (116.10 * 3000) + (57736.30 * 0)
print(price1)
```
D. Fit another model, this time with an interaction term allowing interaction between size and new, and report the regression results
```{r}
selling2 <- lm(formula = Price ~ Size*New, data = house.selling.price)
summary(selling2)
```
E. Report the lines relating the predicted selling price to the size for homes that are (i) new, (ii) not new.
```{r}
# new
price2 = -22227.808 + (104.44 * 3000 ) - (78527.50 * 1) + (61.92 * 1)
price2
# not new
price3 = -22227.808 + (104.44 * 3000 ) + 61.92
price3
```
F. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
A new house would sell at `$212,626.60`, while a not new house would sell at `$291,154.10`.
G. Find the predicted selling price for a home of 1500 square feet that is (i) new, (ii) not new. Comparing to (F), explain how the difference in predicted selling prices changes as the size of home increases.
```{r}
# new
price4 = -40230 + (116.10 * 1500) + (57736.30 * 1)
price4
# not new
price5 = -40230 + (116.10 * 1500)
price5
```
A new house would sell at `$191,656.30`, less than the `$212,626.60` in Question F, and a not new house would sell at `$133,920`, also less than the `$291,154.10` in Question F as well.
H. Do you think the model with interaction or the one without it represents the relationship of size and new to the outcome price? What makes you prefer one model over another?
I think the model without the interaction represents the relationship of size and newness to the outcome price because it's simpler, easier to interpret, and there is more statistical significance as opposed to the model with the interaction. Despite that, the RSS, R-square, and adjusted R-square are similar.