lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(smss)library(ggplot2)library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
A:
Code
y =-10536+ (53.8*1240) + (2.84*18000)y
[1] 107296
Code
actual_value =145000- yactual_value
[1] 37704
The actual selling price exceeded the predicted selling price by $37,704, indicating that the home sold for a higher price than what was estimated by the model.
1B: When the lot size is fixed, the coefficient of x1 (53.8) captures the specific impact of the home size on the predicted selling price, controlling for the influence of lot size. Therefore, for a fixed lot size, the house selling price is predicted to increase by $53.8 for each additional square foot increase in home size, as determined by the coefficient of x1 in the model.
1C: One-square-foot increase in home size is associated with an increase in price of $53.8. One- square-foot increase in lot size is associated with an increase in price of $2.84.(53.8 / 2.84) to have the same impact as a one-square-foot increase in home size, the lot size would need to increase by approximately 18.90 square feet according to the given prediction equation.
2A:
Code
data(salary)
Code
t.test(formula = salary ~ sex, data = salary)
Welch Two Sample t-test
data: salary by sex
t = 1.7744, df = 21.591, p-value = 0.09009
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
-567.8539 7247.1471
sample estimates:
mean in group Male mean in group Female
24696.79 21357.14
The results of the t-test indicate that there is a difference between the mean salary for men and women. However, with a p-value of 0.09009, we do not have enough evidence to reject the null hypothesis at the 95% confidence level. In other words, we cannot conclude that there is a statistically significant difference in the mean salary between men and women when considering only the variable of sex.
The 95% confidence interval for the “sexFemale” variable is (-697, 3031). This confidence interval indicates that, when considering other variables, female faculty members may earn approximately $697 less to $3031 more in salary compared to male faculty members. This range provides an estimate of the potential difference in salaries between the genders, while taking into account the influence of other factors in the dataset.
2C:
Code
summary(lm(salary ~ ., data = salary))
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15746.05 800.18 19.678 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAssoc 5292.36 1145.40 4.621 3.22e-05 ***
rankProf 11118.76 1351.77 8.225 1.62e-10 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
Among the predictor variables examined, both rank and years in the current rank have statistically significant effects on salary. Controlling for all other factors, being at the Associate rank is associated with an average salary increase of $5,292.36 compared to the Assistant rank, while being a full professor corresponds to an average salary increase of $11,118.76 relative to the Assistant rank.
Additionally, each additional year in the current rank is linked to an average salary increase of $476.31. On the other hand, the number of years since obtaining the highest degree does not have a statistically significant impact on salary, suggesting that it does not contribute significantly to changes in salary when controlling for other factors.
Regarding sex, controlling for all other variables, being a female professor is associated with a salary increase of $1,166 compared to being a male professor. However, this result is not statistically significant, as indicated by the p-value of 0.214.
Finally, the variable of degree does not show a statistically significant effect on salary when controlling for other variables. Specifically, having a PhD is associated with a salary increase of $1,388.61 compared to having a master’s degree, but this result is not statistically significant with a p-value of 0.180.
In summary, among the examined predictor variables, rank and years in the current rank have statistically significant impacts on salary, while the effects of sex, degree, and years since obtaining the highest degree are not statistically significant when controlling for other factors.
2D:
Code
salary$rank <-relevel(salary$rank, ref ='Prof')summary(lm(salary ~ ., data = salary))
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26864.81 1375.29 19.534 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAsst -11118.76 1351.77 -8.225 1.62e-10 ***
rankAssoc -5826.40 1012.93 -5.752 7.28e-07 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
n the updated model, the coefficients for the dummy variables representing the two categories other than the reference category (Prof) are as follows: rankAsst being -11118 and rankAssoc being -5826. These coefficients indicate the difference in salary compared to the reference category (Full Professors), while controlling for other variables.
Specifically, the coefficient of -11118 for rankAsst suggests that Assistant Professors earn $11118 less on average than Full Professors, considering the influence of other factors. Similarly, the coefficient of -5826 for rankAssoc indicates that Associate Professors earn $5826 less on average than Full Professors, controlling for other variables.
In summary, the new model presents the same information as before, but in a different way, emphasizing the salary differences between each rank category (Assistant and Associate Professors) and the reference category (Full Professors) while taking into account the impact of other variables.
2E
Code
model <-lm(salary ~ degree + sex + year + ysdeg, data = salary)# Print the summary of the initial modelsummary(model)
Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)
Residuals:
Min 1Q Median 3Q Max
-8146.9 -2186.9 -491.5 2279.1 11186.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17183.57 1147.94 14.969 < 2e-16 ***
degreePhD -3299.35 1302.52 -2.533 0.014704 *
sexFemale -1286.54 1313.09 -0.980 0.332209
year 351.97 142.48 2.470 0.017185 *
ysdeg 339.40 80.62 4.210 0.000114 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared: 0.6312, Adjusted R-squared: 0.5998
F-statistic: 20.11 on 4 and 47 DF, p-value: 1.048e-09
With the exclusion of the “rank” variable from the model, the findings regarding the impact of different predictor variables on salary have changed.
Degree: Without the “rank” variable, the variable “degreePhD” now becomes a statistically significant predictor. When controlling for all other factors, having a PhD (compared to a master’s degree) is associated with a decrease in salary of $3,299.35. This is notably different from when the “rank” variable was included, where there was no statistical significance and a PhD increased salary by $1,388.61.
Years Since Highest Degree: In the absence of the “rank” variable, the variable “ysdeg” (years since obtaining the highest degree) becomes a statistically significant predictor. Controlling for other variables, each passing year now results in an increase in salary of $339.40. This differs from the previous model with the “rank” variable, where there was no statistical significance and a decrease in salary was suggested.
Year: The variable “year” remains statistically significant even after excluding the “rank” variable, although the p-value has increased. Controlling for other factors, each additional year of experience now contributes $351.97 to salary.
Sex: The result for the variable “sex” remains not statistically significant, indicating that its impact on salary did not change significantly. However, the direction of the effect has changed from positive to negative.
2F
Code
salary$newDean <-ifelse(salary$ysdeg <=15, 0, 1)# Run regression analysis with newDean included as a predictor variablelm_salary_NewDean <-lm(salary ~ degree + rank + sex + year + newDean, data = salary)summary(lm_salary_NewDean)
Call:
lm(formula = salary ~ degree + rank + sex + year + newDean, data = salary)
Residuals:
Min 1Q Median 3Q Max
-3403.3 -1387.0 -167.0 528.2 9233.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26588.79 1168.06 22.763 < 2e-16 ***
degreePhD 818.93 797.48 1.027 0.3100
rankAsst -11096.95 1191.00 -9.317 4.54e-12 ***
rankAssoc -6124.28 1028.58 -5.954 3.65e-07 ***
sexFemale 907.14 840.54 1.079 0.2862
year 434.85 78.89 5.512 1.65e-06 ***
newDean -2163.46 1072.04 -2.018 0.0496 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2362 on 45 degrees of freedom
Multiple R-squared: 0.8594, Adjusted R-squared: 0.8407
F-statistic: 45.86 on 6 and 45 DF, p-value: < 2.2e-16
The analysis provides evidence supporting the hypothesis that individuals hired by the new Dean are earning higher salaries compared to those who were not. The statistical significance of the “newDean” variable, with a p-value of 0.0496, indicates that there is a notable difference in salaries between these two groups, even when considering other variables. Specifically, the findings suggest that faculty members who were not hired by the new Dean earn an average of $2,163.46 less, holding other factors constant. 3
Code
data(house.selling.price)
A
Code
model <-lm(Price ~ Size + New, data = house.selling.price)summary(model)
Call:
lm(formula = Price ~ Size + New, data = house.selling.price)
Residuals:
Min 1Q Median 3Q Max
-205102 -34374 -5778 18929 163866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -40230.867 14696.140 -2.738 0.00737 **
Size 116.132 8.795 13.204 < 2e-16 ***
New 57736.283 18653.041 3.095 0.00257 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared: 0.7226, Adjusted R-squared: 0.7169
F-statistic: 126.3 on 2 and 97 DF, p-value: < 2.2e-16
In the regression model, both the “Size” and “New” variables exhibit positive associations with the “Price” of the house. Furthermore, these associations are statistically significant at the 5% level. Specifically, a one-square-foot increase in the size of the house is associated with an average increase in price of $116, while controlling for whether the house is new or not. Additionally, new houses, when compared to old houses of the same size, have an average price difference of $57,736, indicating that new houses tend to be more expensive.
B y = −40,230.867 + 116.132(Size) + 57736.283(New).
Where:
y represents the predicted selling price.
y = −40,230.867 + 116.132(Size) + 57736.283(1). #New house y = −40,230.867 + 116.132(Size) #Old house New houses have high prices
#Old house y_old =-40230.867+(116.132*3000)+(57736.283*0)y_old
[1] 308165.1
The predicted selling price for a new 3,000 square foot home is $365,901.40. The predicted selling price for a 3,000 square foot home that is not new is $308,165.10.
D
Code
model <-lm(Price ~ Size + New + Size * New, data = house.selling.price)summary(model)
Call:
lm(formula = Price ~ Size + New + Size * New, data = house.selling.price)
Residuals:
Min 1Q Median 3Q Max
-175748 -28979 -6260 14693 192519
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22227.808 15521.110 -1.432 0.15536
Size 104.438 9.424 11.082 < 2e-16 ***
New -78527.502 51007.642 -1.540 0.12697
Size:New 61.916 21.686 2.855 0.00527 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared: 0.7443, Adjusted R-squared: 0.7363
F-statistic: 93.15 on 3 and 96 DF, p-value: < 2.2e-16
For homes with a size of 3000 square feet, the predicted selling price is $398,306.70 for new homes and $291,086.20 for not new homes. This indicates a significant difference of $107,220.50 in the sale price between new and not new homes of the same size.
In contrast, for homes with a size of 1500 square feet, the predicted selling price is $148,775.70 for new homes and $134,429.20 for not new homes. The difference in sale price between new and not new homes in this case is smaller, amounting to $14,346.50.
The disparity in the price difference between new and not new homes for different sizes can be explained by the structure of the prediction equation. According to the predictive model, a house needs to have a size of approximately 606 square feet before it can be sold at a profit. In other words, only the square footage beyond this threshold contributes to the increase in the sale price.
For instance, a 1500 square foot home has approximately 894 “money-making” square feet, while a 3000 square foot home has around 2394 such square feet. Each additional money-making square foot adds approximately $166.354 to the sale price.
Therefore, the observed differences in sale prices align with the predictive model’s expectations, reflecting the influence of the size of the home on its profitability and the corresponding impact on the selling price.
H: The inclusion of the interaction term in the model improves the representation of the relationship between size, newness, and the selling price. This is supported by the statistical significance of the interaction term, as indicated by its low p-value. Additionally, the adjusted R-squared value for the model with the interaction (0.7363) is higher compared to the model without the interaction (0.7169).
These findings suggest that the interaction between size and newness plays a significant role in explaining the variation in the selling price. The higher adjusted R-squared value indicates that the model with the interaction term captures a greater proportion of the variability in the data, providing a better fit to the observed selling prices.
Therefore, the model with the interaction term is more reliable and informative in understanding how both size and newness influence the predicted selling price, taking into account their combined effect on the outcome variable.
Source Code
---title: "Homework 4"author: "Sai Padma pothula"desription: ""date: "05/02/2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - Homework 4 - sai Pothula---```{r}library(alr4)library(smss)library(ggplot2)library(stargazer)```A:```{r}y =-10536+ (53.8*1240) + (2.84*18000)yactual_value =145000- yactual_value```The actual selling price exceeded the predicted selling price by $37,704, indicating that the home sold for a higher price than what was estimated by the model.1B: When the lot size is fixed, the coefficient of x1 (53.8) captures the specific impact of the home size on the predicted selling price, controlling for the influence of lot size. Therefore, for a fixed lot size, the house selling price is predicted to increase by $53.8 for each additional square foot increase in home size, as determined by the coefficient of x1 in the model.1C: One-square-foot increase in home size is associated with an increase in price of $53.8. One-square-foot increase in lot size is associated with an increase in price of $2.84.(53.8 / 2.84) to have the same impact as a one-square-foot increase in home size, the lot size would need to increase by approximately 18.90 square feet according to the given prediction equation.2A:```{r}data(salary)``````{r}t.test(formula = salary ~ sex, data = salary)```The results of the t-test indicate that there is a difference between the mean salary for men and women. However, with a p-value of 0.09009, we do not have enough evidence to reject the null hypothesis at the 95% confidence level. In other words, we cannot conclude that there is a statistically significant difference in the mean salary between men and women when considering only the variable of sex.2B```{r}lm(salary ~ ., data = salary) |>confint() ```The 95% confidence interval for the "sexFemale" variable is (-697, 3031). This confidence interval indicates that, when considering other variables, female faculty members may earn approximately $697 less to $3031 more in salary compared to male faculty members. This range provides an estimate of the potential difference in salaries between the genders, while taking into account the influence of other factors in the dataset.2C:```{r}summary(lm(salary ~ ., data = salary))```Among the predictor variables examined, both rank and years in the current rank have statistically significant effects on salary. Controlling for all other factors, being at the Associate rank is associated with an average salary increase of $5,292.36 compared to the Assistant rank, while being a full professor corresponds to an average salary increase of $11,118.76 relative to the Assistant rank.Additionally, each additional year in the current rank is linked to an average salary increase of $476.31. On the other hand, the number of years since obtaining the highest degree does not have a statistically significant impact on salary, suggesting that it does not contribute significantly to changes in salary when controlling for other factors.Regarding sex, controlling for all other variables, being a female professor is associated with a salary increase of $1,166 compared to being a male professor. However, this result is not statistically significant, as indicated by the p-value of 0.214.Finally, the variable of degree does not show a statistically significant effect on salary when controlling for other variables. Specifically, having a PhD is associated with a salary increase of $1,388.61 compared to having a master's degree, but this result is not statistically significant with a p-value of 0.180.In summary, among the examined predictor variables, rank and years in the current rank have statistically significant impacts on salary, while the effects of sex, degree, and years since obtaining the highest degree are not statistically significant when controlling for other factors.2D:```{r}salary$rank <-relevel(salary$rank, ref ='Prof')summary(lm(salary ~ ., data = salary))```n the updated model, the coefficients for the dummy variables representing the two categories other than the reference category (Prof) are as follows: rankAsst being -11118 and rankAssoc being -5826. These coefficients indicate the difference in salary compared to the reference category (Full Professors), while controlling for other variables.Specifically, the coefficient of -11118 for rankAsst suggests that Assistant Professors earn $11118 less on average than Full Professors, considering the influence of other factors. Similarly, the coefficient of -5826 for rankAssoc indicates that Associate Professors earn $5826 less on average than Full Professors, controlling for other variables.In summary, the new model presents the same information as before, but in a different way, emphasizing the salary differences between each rank category (Assistant and Associate Professors) and the reference category (Full Professors) while taking into account the impact of other variables.2E```{r}model <-lm(salary ~ degree + sex + year + ysdeg, data = salary)# Print the summary of the initial modelsummary(model)```With the exclusion of the "rank" variable from the model, the findings regarding the impact of different predictor variables on salary have changed.Degree:Without the "rank" variable, the variable "degreePhD" now becomes a statistically significant predictor. When controlling for all other factors, having a PhD (compared to a master's degree) is associated with a decrease in salary of $3,299.35. This is notably different from when the "rank" variable was included, where there was no statistical significance and a PhD increased salary by $1,388.61.Years Since Highest Degree:In the absence of the "rank" variable, the variable "ysdeg" (years since obtaining the highest degree) becomes a statistically significant predictor. Controlling for other variables, each passing year now results in an increase in salary of $339.40. This differs from the previous model with the "rank" variable, where there was no statistical significance and a decrease in salary was suggested.Year:The variable "year" remains statistically significant even after excluding the "rank" variable, although the p-value has increased. Controlling for other factors, each additional year of experience now contributes $351.97 to salary.Sex:The result for the variable "sex" remains not statistically significant, indicating that its impact on salary did not change significantly. However, the direction of the effect has changed from positive to negative.2F```{r}salary$newDean <-ifelse(salary$ysdeg <=15, 0, 1)# Run regression analysis with newDean included as a predictor variablelm_salary_NewDean <-lm(salary ~ degree + rank + sex + year + newDean, data = salary)summary(lm_salary_NewDean)```The analysis provides evidence supporting the hypothesis that individuals hired by the new Dean are earning higher salaries compared to those who were not. The statistical significance of the "newDean" variable, with a p-value of 0.0496, indicates that there is a notable difference in salaries between these two groups, even when considering other variables. Specifically, the findings suggest that faculty members who were not hired by the new Dean earn an average of $2,163.46 less, holding other factors constant.3```{r}data(house.selling.price)```A```{r}model <-lm(Price ~ Size + New, data = house.selling.price)summary(model)```In the regression model, both the "Size" and "New" variables exhibit positive associations with the "Price" of the house. Furthermore, these associations are statistically significant at the 5% level. Specifically, a one-square-foot increase in the size of the house is associated with an average increase in price of $116, while controlling for whether the house is new or not. Additionally, new houses, when compared to old houses of the same size, have an average price difference of $57,736, indicating that new houses tend to be more expensive.By = −40,230.867 + 116.132(Size) + 57736.283(New).Where:y represents the predicted selling price.y = −40,230.867 + 116.132(Size) + 57736.283(1). #New house y = −40,230.867 + 116.132(Size) #Old house New houses have high prices C```{r}#New housey_new =-40230.867+(116.132*3000)+(57736.283*1)y_new#Old house y_old =-40230.867+(116.132*3000)+(57736.283*0)y_old```The predicted selling price for a new 3,000 square foot home is $365,901.40. The predicted selling price for a 3,000 square foot home that is not new is $308,165.10.D```{r}model <-lm(Price ~ Size + New + Size * New, data = house.selling.price)summary(model)```ENew_home_P = −22227.808+104.438∗size+−78527.502∗1+61.916∗Size∗1 = −100755.3+166.354∗sizeOld_home_P = −22227.808+104.438∗size+−78527.502∗0+61.916∗Size∗0 = −22227.808+104.438∗sizeF```{r}New_home <--22227.808+ (104.438*3000) -78527.502+ (61.916*3000)Old_home <--22227.808+ (104.438*3000)New_homeOld_home```G```{r}New_home <--22227.808+ (104.438*1500) -78527.502+ (61.916*1500)Old_home <--22227.808+ (104.438*1500)New_homeOld_home```For homes with a size of 3000 square feet, the predicted selling price is $398,306.70 for new homes and $291,086.20 for not new homes. This indicates a significant difference of $107,220.50 in the sale price between new and not new homes of the same size.In contrast, for homes with a size of 1500 square feet, the predicted selling price is $148,775.70 for new homes and $134,429.20 for not new homes. The difference in sale price between new and not new homes in this case is smaller, amounting to $14,346.50.The disparity in the price difference between new and not new homes for different sizes can be explained by the structure of the prediction equation. According to the predictive model, a house needs to have a size of approximately 606 square feet before it can be sold at a profit. In other words, only the square footage beyond this threshold contributes to the increase in the sale price.For instance, a 1500 square foot home has approximately 894 "money-making" square feet, while a 3000 square foot home has around 2394 such square feet. Each additional money-making square foot adds approximately $166.354 to the sale price.Therefore, the observed differences in sale prices align with the predictive model's expectations, reflecting the influence of the size of the home on its profitability and the corresponding impact on the selling price.H: The inclusion of the interaction term in the model improves the representation of the relationship between size, newness, and the selling price. This is supported by the statistical significance of the interaction term, as indicated by its low p-value. Additionally, the adjusted R-squared value for the model with the interaction (0.7363) is higher compared to the model without the interaction (0.7169).These findings suggest that the interaction between size and newness plays a significant role in explaining the variation in the selling price. The higher adjusted R-squared value indicates that the model with the interaction term captures a greater proportion of the variability in the data, providing a better fit to the observed selling prices.Therefore, the model with the interaction term is more reliable and informative in understanding how both size and newness influence the predicted selling price, taking into account their combined effect on the outcome variable.