The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Code
library(readxl)library(ggplot2)library(alr4)
Loading required package: car
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Question 1
For recent data in Jacksonville, Florida, on y = selling price of home (in dollars), x1 = size of home (in square feet), and x2 = lot size (in square feet), the prediction equation is ŷ = −10,536 + 53.8x1 + 2.84x2.
A. A particular home of 1240 square feet on a lot of 18,000 square feet sold for $145,000. Find the predicted selling price and the residual, and interpret.
Code
(-10536) +53.8*1240+2.84*18000
[1] 107296
B. For fixed lot size, how much is the house selling price predicted to increase for each square- foot increase in home size? Why?
53.8
C. According to this prediction equation, for fixed home size, how much would lot size need to increase to have the same impact as a one-square-foot increase in home size?
Code
53.8/2.84
[1] 18.94366
Question 2
(Data file: salary in alr4 R package). The data file concerns salary and other characteristics of all faculty in a small Midwestern college collected in the early 1980s for presentation in legal proceedings for which discrimination against women in salary was at issue. All persons in the data hold tenured or tenure track positions; temporary faculty are not included. The variables include degree, a factor with levels PhD and MS; rank, a factor with levels Asst, Assoc, and Prof; sex, a factor with levels Male and Female; Year, years in current rank; ysdeg, years since highest degree, and salary, academic year salary in dollars.
A. Test the hypothesis that the mean salary for men and women is the same, without regard to any other variable but sex. Explain your findings.
Code
head(salary)
degree rank sex year ysdeg salary
1 Masters Prof Male 25 35 36350
2 Masters Prof Male 13 22 35350
3 Masters Prof Male 10 23 28200
4 Masters Prof Female 7 27 26775
5 PhD Prof Male 19 30 33696
6 Masters Prof Male 16 21 28516
Code
unique(salary$rank)
[1] Prof Assoc Asst
Levels: Asst Assoc Prof
Code
summary(lm(salary~sex, data=salary))
Call:
lm(formula = salary ~ sex, data = salary)
Residuals:
Min 1Q Median 3Q Max
-8602.8 -4296.6 -100.8 3513.1 16687.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24697 938 26.330 <2e-16 ***
sexFemale -3340 1808 -1.847 0.0706 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5782 on 50 degrees of freedom
Multiple R-squared: 0.0639, Adjusted R-squared: 0.04518
F-statistic: 3.413 on 1 and 50 DF, p-value: 0.0706
B. Run a multiple linear regression with salary as the outcome variable and everything else as predictors, including sex. Assuming no interactions between sex and the other predictors, obtain a 95% confidence interval for the difference in salary between males and females.
Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15746.05 800.18 19.678 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAssoc 5292.36 1145.40 4.621 3.22e-05 ***
rankProf 11118.76 1351.77 8.225 1.62e-10 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
interpretation of the coefficient / slope in relation to the outcome variable and other variables
degree, sex, and ysdeg are not statistically significant at this situation. Associate professor make 5292 more than asisstant professor and professor makes 11118 more than assitant professor. As year is a continues varible, one year increase makes 476 more salary
D. Change the baseline category for the rank variable. Interpret the coefficients related to rank again.
Based on this analysis, the assitant professor makes 11118 less then full professor and associate professor make 5826 less than full professor.
Code
salary$rank<-relevel(salary$rank, ref ="Prof")summary(lm(salary~., data = salary))
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26864.81 1375.29 19.534 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAsst -11118.76 1351.77 -8.225 1.62e-10 ***
rankAssoc -5826.40 1012.93 -5.752 7.28e-07 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
E. Finkelstein (1980), in a discussion of the use of regression in discrimination cases, wrote, “[a] variable may reflect a position or status bestowed by the employer, in which case if there is discrimination in the award of the position or status, the variable may be ‘tainted.’” Thus, for example, if discrimination is at work in promotion of faculty to higher ranks, using rank to adjust salaries before comparing the sexes may not be acceptable to the courts. Exclude the variable rank, refit, and summarize how your findings changed, if they did.
Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)
Residuals:
Min 1Q Median 3Q Max
-8146.9 -2186.9 -491.5 2279.1 11186.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17183.57 1147.94 14.969 < 2e-16 ***
degreePhD -3299.35 1302.52 -2.533 0.014704 *
sexFemale -1286.54 1313.09 -0.980 0.332209
year 351.97 142.48 2.470 0.017185 *
ysdeg 339.40 80.62 4.210 0.000114 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared: 0.6312, Adjusted R-squared: 0.5998
F-statistic: 20.11 on 4 and 47 DF, p-value: 1.048e-09
F. Everyone in this dataset was hired the year they earned their highest degree. It is also known that a new Dean was appointed 15 years ago, and everyone in the dataset who earned their highest degree 15 years ago or less than that has been hired by the new Dean. Some people have argued that the new Dean has been making offers that are a lot more generous to newly hired faculty than the previous one and that this might explain some of the variation in Salary. Create a new variable that would allow you to test this hypothesis and run another multiple regression model to test this. Select variables carefully to make sure there is no multicollinearity. Explain why multicollinearity would be a concern in this case and how you avoided it. Do you find support for the hypothesis that the people hired by the new Dean are making higher than those that were not?
Call:
lm(formula = salary ~ degree + sex + newyear + ysdeg + year,
data = salary)
Residuals:
Min 1Q Median 3Q Max
-8314.2 -2146.3 -222.6 2240.8 11044.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15012.81 4821.50 3.114 0.003175 **
degreePhD -3411.96 1335.79 -2.554 0.014017 *
sexFemale -1233.85 1329.06 -0.928 0.358065
newyear 202.79 437.23 0.464 0.644984
ysdeg 341.08 81.38 4.191 0.000125 ***
year 375.97 152.72 2.462 0.017633 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3775 on 46 degrees of freedom
Multiple R-squared: 0.6329, Adjusted R-squared: 0.593
F-statistic: 15.86 on 5 and 46 DF, p-value: 4.589e-09
Question 3
(Data file: house.selling.price in smss R package)
A. Using the house.selling.price data, run and report regression results modeling y = selling price (in dollars) in terms of size of home (in square feet) and whether the home is new (1 = yes; 0 = no). In particular, for each variable; discuss statistical significance and interpret the meaning of the coefficient.
B. Report and interpret the prediction equation, and form separate equations relating selling price to size for new and for not new homes.
C. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
D. Fit another model, this time with an interaction term allowing interaction between size and new, and report the regression results
E. Report the lines relating the predicted selling price to the size for homes that are (i) new, (ii) not new.
F. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.
G. Find the predicted selling price for a home of 1500 square feet that is (i) new, (ii) not new. Comparing to (F), explain how the difference in predicted selling prices changes as the size of home increases.
H. Do you think the model with interaction or the one without it represents the relationship of size and new to the outcome price? What makes you prefer one model over another?
Source Code
---title: "Homework 4"author: "Xiaoyan"description: "Template of course blog qmd file"date: "04/17/2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - hw4 - - ---```{r}library(tidyr)library(dplyr)library(readxl)library(ggplot2)library(alr4)```# Question 1For recent data in Jacksonville, Florida, on y = selling price of home (in dollars), x1 = size of home(in square feet), and x2 = lot size (in square feet), the prediction equation isŷ = −10,536 + 53.8x1 + 2.84x2.A. A particular home of 1240 square feet on a lot of 18,000 square feet sold for $145,000.Find the predicted selling price and the residual, and interpret.```{r}(-10536) +53.8*1240+2.84*18000```B. For fixed lot size, how much is the house selling price predicted to increase for each square-foot increase in home size? Why?53.8C. According to this prediction equation, for fixed home size, how much would lot size need to increase to have the same impact as a one-square-foot increase in home size?```{r}53.8/2.84```# Question 2(Data file: salary in alr4 R package). The data file concerns salary and other characteristics of all faculty in a small Midwestern college collected in the early 1980s for presentation in legal proceedings for which discrimination against women in salary was at issue. All persons in the data hold tenured or tenure track positions; temporary faculty are not included. The variables include degree, a factor with levels PhD and MS; rank, a factor with levels Asst, Assoc, and Prof; sex, a factor with levels Male and Female; Year, years in current rank; ysdeg, years since highest degree, and salary, academic year salary in dollars.A. Test the hypothesis that the mean salary for men and women is the same, without regard toany other variable but sex. Explain your findings.```{r}head(salary)unique(salary$rank)summary(lm(salary~sex, data=salary))```B. Run a multiple linear regression with salary as the outcome variable and everything else aspredictors, including sex. Assuming no interactions between sex and the other predictors,obtain a 95% confidence interval for the difference in salary between males and females.```{r}lm(salary~degree+rank+sex+year+ysdeg, data=salary)|>confint()```C. Interpret your finding for each predictor variable; discuss (a) statistical significance, ```{r}summary(lm(salary~degree+rank+sex+year+ysdeg, data=salary))```(b)interpretation of the coefficient / slope in relation to the outcome variable and other variablesdegree, sex, and ysdeg are not statistically significant at this situation. Associate professor make 5292 more than asisstant professor and professor makes 11118 more than assitant professor. As year is a continues varible, one year increase makes 476 more salaryD. Change the baseline category for the rank variable. Interpret the coefficients related to rankagain.Based on this analysis, the assitant professor makes 11118 less then full professor and associate professor make 5826 less than full professor. ```{r}salary$rank<-relevel(salary$rank, ref ="Prof")summary(lm(salary~., data = salary))```E. Finkelstein (1980), in a discussion of the use of regression in discrimination cases, wrote,“[a] variable may reflect a position or status bestowed by the employer, in which case if thereis discrimination in the award of the position or status, the variable may be ‘tainted.’ ” Thus,for example, if discrimination is at work in promotion of faculty to higher ranks, using rank toadjust salaries before comparing the sexes may not be acceptable to the courts.Exclude the variable rank, refit, and summarize how your findings changed, if they did.```{r}summary(lm(salary~degree+sex+year+ysdeg, data=salary))```F. Everyone in this dataset was hired the year they earned their highest degree. It is alsoknown that a new Dean was appointed 15 years ago, and everyone in the dataset whoearned their highest degree 15 years ago or less than that has been hired by the new Dean.Some people have argued that the new Dean has been making offers that are a lot moregenerous to newly hired faculty than the previous one and that this might explain some ofthe variation in Salary.Create a new variable that would allow you to test this hypothesis and run another multipleregression model to test this. Select variables carefully to make sure there is nomulticollinearity. Explain why multicollinearity would be a concern in this case and howyou avoided it. Do you find support for the hypothesis that the people hired by the newDean are making higher than those that were not?```{r}salary$newyear<-ifelse(salary$year <=20,10,0)cor.test(salary$newyear, salary$year)summary(lm(salary~degree+sex+newyear+ysdeg, data=salary))summary(lm(salary~degree+sex+newyear+ysdeg+year, data=salary))```# Question 3(Data file: house.selling.price in smss R package)A. Using the house.selling.price data, run and report regression results modeling y = sellingprice (in dollars) in terms of size of home (in square feet) and whether the home is new (1 =yes; 0 = no). In particular, for each variable; discuss statistical significance and interpret themeaning of the coefficient.```{r}```B. Report and interpret the prediction equation, and form separate equations relating sellingprice to size for new and for not new homes.```{r}```C. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.```{r}```D. Fit another model, this time with an interaction term allowing interaction between size andnew, and report the regression results```{r}```E. Report the lines relating the predicted selling price to the size for homes that are (i) new,(ii) not new.```{r}```F. Find the predicted selling price for a home of 3000 square feet that is (i) new, (ii) not new.```{r}```G. Find the predicted selling price for a home of 1500 square feet that is (i) new, (ii) not new.Comparing to (F), explain how the difference in predicted selling prices changes as the sizeof home increases.```{r}```H. Do you think the model with interaction or the one without it represents the relationship ofsize and new to the outcome price? What makes you prefer one model over another?```{r}```