Homework 4
sai Pothula
Author

Sai Padma pothula

Published

May 2, 2023

Code
library(alr4)
Loading required package: car
Loading required package: carData
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(smss)
library(ggplot2)
library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 

A:

Code
y = -10536 + (53.8 * 1240) + (2.84 * 18000)
y
[1] 107296
Code
actual_value = 145000 - y
actual_value
[1] 37704

The actual selling price exceeded the predicted selling price by $37,704, indicating that the home sold for a higher price than what was estimated by the model.

1B: When the lot size is fixed, the coefficient of x1 (53.8) captures the specific impact of the home size on the predicted selling price, controlling for the influence of lot size. Therefore, for a fixed lot size, the house selling price is predicted to increase by $53.8 for each additional square foot increase in home size, as determined by the coefficient of x1 in the model.

1C: One-square-foot increase in home size is associated with an increase in price of $53.8. One- square-foot increase in lot size is associated with an increase in price of $2.84.(53.8 / 2.84) to have the same impact as a one-square-foot increase in home size, the lot size would need to increase by approximately 18.90 square feet according to the given prediction equation.

2A:

Code
data(salary)
Code
t.test(formula = salary ~ sex, data = salary)

    Welch Two Sample t-test

data:  salary by sex
t = 1.7744, df = 21.591, p-value = 0.09009
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
 -567.8539 7247.1471
sample estimates:
  mean in group Male mean in group Female 
            24696.79             21357.14 

The results of the t-test indicate that there is a difference between the mean salary for men and women. However, with a p-value of 0.09009, we do not have enough evidence to reject the null hypothesis at the 95% confidence level. In other words, we cannot conclude that there is a statistically significant difference in the mean salary between men and women when considering only the variable of sex.

2B

Code
lm(salary ~ ., data = salary) |>
  confint() 
                 2.5 %      97.5 %
(Intercept) 14134.4059 17357.68946
degreePhD    -663.2482  3440.47485
rankAssoc    2985.4107  7599.31080
rankProf     8396.1546 13841.37340
sexFemale    -697.8183  3030.56452
year          285.1433   667.47476
ysdeg        -280.6397    31.49105

The 95% confidence interval for the “sexFemale” variable is (-697, 3031). This confidence interval indicates that, when considering other variables, female faculty members may earn approximately $697 less to $3031 more in salary compared to male faculty members. This range provides an estimate of the potential difference in salaries between the genders, while taking into account the influence of other factors in the dataset.

2C:

Code
summary(lm(salary ~ ., data = salary))

Call:
lm(formula = salary ~ ., data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15746.05     800.18  19.678  < 2e-16 ***
degreePhD    1388.61    1018.75   1.363    0.180    
rankAssoc    5292.36    1145.40   4.621 3.22e-05 ***
rankProf    11118.76    1351.77   8.225 1.62e-10 ***
sexFemale    1166.37     925.57   1.260    0.214    
year          476.31      94.91   5.018 8.65e-06 ***
ysdeg        -124.57      77.49  -1.608    0.115    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16

Among the predictor variables examined, both rank and years in the current rank have statistically significant effects on salary. Controlling for all other factors, being at the Associate rank is associated with an average salary increase of $5,292.36 compared to the Assistant rank, while being a full professor corresponds to an average salary increase of $11,118.76 relative to the Assistant rank.

Additionally, each additional year in the current rank is linked to an average salary increase of $476.31. On the other hand, the number of years since obtaining the highest degree does not have a statistically significant impact on salary, suggesting that it does not contribute significantly to changes in salary when controlling for other factors.

Regarding sex, controlling for all other variables, being a female professor is associated with a salary increase of $1,166 compared to being a male professor. However, this result is not statistically significant, as indicated by the p-value of 0.214.

Finally, the variable of degree does not show a statistically significant effect on salary when controlling for other variables. Specifically, having a PhD is associated with a salary increase of $1,388.61 compared to having a master’s degree, but this result is not statistically significant with a p-value of 0.180.

In summary, among the examined predictor variables, rank and years in the current rank have statistically significant impacts on salary, while the effects of sex, degree, and years since obtaining the highest degree are not statistically significant when controlling for other factors.

2D:

Code
salary$rank <- relevel(salary$rank, ref = 'Prof')
summary(lm(salary ~ ., data = salary))

Call:
lm(formula = salary ~ ., data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-4045.2 -1094.7  -361.5   813.2  9193.1 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  26864.81    1375.29  19.534  < 2e-16 ***
degreePhD     1388.61    1018.75   1.363    0.180    
rankAsst    -11118.76    1351.77  -8.225 1.62e-10 ***
rankAssoc    -5826.40    1012.93  -5.752 7.28e-07 ***
sexFemale     1166.37     925.57   1.260    0.214    
year           476.31      94.91   5.018 8.65e-06 ***
ysdeg         -124.57      77.49  -1.608    0.115    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared:  0.855, Adjusted R-squared:  0.8357 
F-statistic: 44.24 on 6 and 45 DF,  p-value: < 2.2e-16

n the updated model, the coefficients for the dummy variables representing the two categories other than the reference category (Prof) are as follows: rankAsst being -11118 and rankAssoc being -5826. These coefficients indicate the difference in salary compared to the reference category (Full Professors), while controlling for other variables.

Specifically, the coefficient of -11118 for rankAsst suggests that Assistant Professors earn $11118 less on average than Full Professors, considering the influence of other factors. Similarly, the coefficient of -5826 for rankAssoc indicates that Associate Professors earn $5826 less on average than Full Professors, controlling for other variables.

In summary, the new model presents the same information as before, but in a different way, emphasizing the salary differences between each rank category (Assistant and Associate Professors) and the reference category (Full Professors) while taking into account the impact of other variables.

2E

Code
model <- lm(salary ~ degree + sex + year + ysdeg, data = salary)

# Print the summary of the initial model
summary(model)

Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-8146.9 -2186.9  -491.5  2279.1 11186.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 17183.57    1147.94  14.969  < 2e-16 ***
degreePhD   -3299.35    1302.52  -2.533 0.014704 *  
sexFemale   -1286.54    1313.09  -0.980 0.332209    
year          351.97     142.48   2.470 0.017185 *  
ysdeg         339.40      80.62   4.210 0.000114 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared:  0.6312,    Adjusted R-squared:  0.5998 
F-statistic: 20.11 on 4 and 47 DF,  p-value: 1.048e-09

With the exclusion of the “rank” variable from the model, the findings regarding the impact of different predictor variables on salary have changed.

Degree: Without the “rank” variable, the variable “degreePhD” now becomes a statistically significant predictor. When controlling for all other factors, having a PhD (compared to a master’s degree) is associated with a decrease in salary of $3,299.35. This is notably different from when the “rank” variable was included, where there was no statistical significance and a PhD increased salary by $1,388.61.

Years Since Highest Degree: In the absence of the “rank” variable, the variable “ysdeg” (years since obtaining the highest degree) becomes a statistically significant predictor. Controlling for other variables, each passing year now results in an increase in salary of $339.40. This differs from the previous model with the “rank” variable, where there was no statistical significance and a decrease in salary was suggested.

Year: The variable “year” remains statistically significant even after excluding the “rank” variable, although the p-value has increased. Controlling for other factors, each additional year of experience now contributes $351.97 to salary.

Sex: The result for the variable “sex” remains not statistically significant, indicating that its impact on salary did not change significantly. However, the direction of the effect has changed from positive to negative.

2F

Code
salary$newDean <- ifelse(salary$ysdeg <= 15, 0, 1)
# Run regression analysis with newDean included as a predictor variable
lm_salary_NewDean <- lm(salary ~ degree + rank + sex + year + newDean, data = salary)
summary(lm_salary_NewDean)

Call:
lm(formula = salary ~ degree + rank + sex + year + newDean, data = salary)

Residuals:
    Min      1Q  Median      3Q     Max 
-3403.3 -1387.0  -167.0   528.2  9233.8 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  26588.79    1168.06  22.763  < 2e-16 ***
degreePhD      818.93     797.48   1.027   0.3100    
rankAsst    -11096.95    1191.00  -9.317 4.54e-12 ***
rankAssoc    -6124.28    1028.58  -5.954 3.65e-07 ***
sexFemale      907.14     840.54   1.079   0.2862    
year           434.85      78.89   5.512 1.65e-06 ***
newDean      -2163.46    1072.04  -2.018   0.0496 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2362 on 45 degrees of freedom
Multiple R-squared:  0.8594,    Adjusted R-squared:  0.8407 
F-statistic: 45.86 on 6 and 45 DF,  p-value: < 2.2e-16

The analysis provides evidence supporting the hypothesis that individuals hired by the new Dean are earning higher salaries compared to those who were not. The statistical significance of the “newDean” variable, with a p-value of 0.0496, indicates that there is a notable difference in salaries between these two groups, even when considering other variables. Specifically, the findings suggest that faculty members who were not hired by the new Dean earn an average of $2,163.46 less, holding other factors constant. 3

Code
data(house.selling.price)

A

Code
model <- lm(Price ~ Size + New, data = house.selling.price)
summary(model)

Call:
lm(formula = Price ~ Size + New, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-205102  -34374   -5778   18929  163866 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -40230.867  14696.140  -2.738  0.00737 ** 
Size           116.132      8.795  13.204  < 2e-16 ***
New          57736.283  18653.041   3.095  0.00257 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared:  0.7226,    Adjusted R-squared:  0.7169 
F-statistic: 126.3 on 2 and 97 DF,  p-value: < 2.2e-16

In the regression model, both the “Size” and “New” variables exhibit positive associations with the “Price” of the house. Furthermore, these associations are statistically significant at the 5% level. Specifically, a one-square-foot increase in the size of the house is associated with an average increase in price of $116, while controlling for whether the house is new or not. Additionally, new houses, when compared to old houses of the same size, have an average price difference of $57,736, indicating that new houses tend to be more expensive.

B y = −40,230.867 + 116.132(Size) + 57736.283(New).

Where:

y represents the predicted selling price.

y = −40,230.867 + 116.132(Size) + 57736.283(1). #New house y = −40,230.867 + 116.132(Size) #Old house New houses have high prices

C

Code
#New house
y_new = -40230.867+(116.132*3000)+(57736.283*1)
y_new
[1] 365901.4
Code
#Old house 
y_old = -40230.867+(116.132*3000)+(57736.283*0)
y_old
[1] 308165.1

The predicted selling price for a new 3,000 square foot home is $365,901.40. The predicted selling price for a 3,000 square foot home that is not new is $308,165.10.

D

Code
model <- lm(Price ~ Size + New + Size * New, data = house.selling.price)

summary(model)

Call:
lm(formula = Price ~ Size + New + Size * New, data = house.selling.price)

Residuals:
    Min      1Q  Median      3Q     Max 
-175748  -28979   -6260   14693  192519 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -22227.808  15521.110  -1.432  0.15536    
Size           104.438      9.424  11.082  < 2e-16 ***
New         -78527.502  51007.642  -1.540  0.12697    
Size:New        61.916     21.686   2.855  0.00527 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared:  0.7443,    Adjusted R-squared:  0.7363 
F-statistic: 93.15 on 3 and 96 DF,  p-value: < 2.2e-16

E

New_home_P = −22227.808+104.438∗size+−78527.502∗1+61.916∗Size∗1 = −100755.3+166.354∗size

Old_home_P = −22227.808+104.438∗size+−78527.502∗0+61.916∗Size∗0 = −22227.808+104.438∗size

F

Code
New_home <- -22227.808 + (104.438*3000) -78527.502 + (61.916*3000)
Old_home <- -22227.808 + (104.438*3000)
New_home
[1] 398306.7
Code
Old_home
[1] 291086.2

G

Code
New_home <- -22227.808 + (104.438*1500) -78527.502 + (61.916*1500)
Old_home <- -22227.808 + (104.438*1500)
New_home
[1] 148775.7
Code
Old_home
[1] 134429.2

For homes with a size of 3000 square feet, the predicted selling price is $398,306.70 for new homes and $291,086.20 for not new homes. This indicates a significant difference of $107,220.50 in the sale price between new and not new homes of the same size.

In contrast, for homes with a size of 1500 square feet, the predicted selling price is $148,775.70 for new homes and $134,429.20 for not new homes. The difference in sale price between new and not new homes in this case is smaller, amounting to $14,346.50.

The disparity in the price difference between new and not new homes for different sizes can be explained by the structure of the prediction equation. According to the predictive model, a house needs to have a size of approximately 606 square feet before it can be sold at a profit. In other words, only the square footage beyond this threshold contributes to the increase in the sale price.

For instance, a 1500 square foot home has approximately 894 “money-making” square feet, while a 3000 square foot home has around 2394 such square feet. Each additional money-making square foot adds approximately $166.354 to the sale price.

Therefore, the observed differences in sale prices align with the predictive model’s expectations, reflecting the influence of the size of the home on its profitability and the corresponding impact on the selling price.

H: The inclusion of the interaction term in the model improves the representation of the relationship between size, newness, and the selling price. This is supported by the statistical significance of the interaction term, as indicated by its low p-value. Additionally, the adjusted R-squared value for the model with the interaction (0.7363) is higher compared to the model without the interaction (0.7169).

These findings suggest that the interaction between size and newness plays a significant role in explaining the variation in the selling price. The higher adjusted R-squared value indicates that the model with the interaction term captures a greater proportion of the variability in the data, providing a better fit to the observed selling prices.

Therefore, the model with the interaction term is more reliable and informative in understanding how both size and newness influence the predicted selling price, taking into account their combined effect on the outcome variable.