The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Code
library(readxl)library(ggplot2)library(alr4)
Loading required package: car
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(smss)library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
Question 1
United Nations (Data file: UN11in alr4) The data in the file UN11 contains several variables, including ppgdp, the gross national product per person in U.S. dollars, and fertility, the birth rate per 1000 females, both from the year 2009. The data are for 199 localities, mostly UN member countries, but also other areas such as Hong Kong that are not independent countries. The data were collected from the United Nations (2011). We will study the dependence of fertility on ppgdp.
Identify the predictor and the response.
Code
head(UN11)
region group fertility ppgdp lifeExpF pctUrban
Afghanistan Asia other 5.968 499.0 49.49 23
Albania Europe other 1.525 3677.2 80.40 53
Algeria Africa africa 2.142 4473.0 75.00 67
Angola Africa africa 5.135 4321.9 53.17 59
Anguilla Caribbean other 2.000 13750.1 81.10 100
Argentina Latin Amer other 2.172 9162.1 79.89 93
predictor is ppgdp and response is fertility and birth rate.
Draw the scatterplot of fertility on the vertical axis versus ppgdp on the horizontal axis and summarize the information in this graph. Does a straight-line mean function seem to be plausible for a summary of this graph?
Code
ggplot(UN11, aes(ppgdp, fertility)) +geom_point()
straight line doesn’t fit for summary this graph.
Draw the scatterplot of log(fertility) versus log(ppgdp) using natural logarithms. Does the simple linear regression model seem plausible for a summary of this graph? If you use a different base of logarithms, the shape of the graph won’t change, but the values on the axes will change.
Annual income, in dollars, is an explanatory variable in a regression analysis. For a British version of the report on the analysis, all responses are converted to British pounds sterling (1 pound equals about 1.33 dollars, as of 2016). (a) How, if at all, does the slope of the prediction equation change? the slope will also divided by 1.33 (b) How, if at all, does the correlation change? correlation will not change # Question 3 Water runoff in the Sierras (Data file: water in alr4) Can Southern California’s water supply in future years be predicted from past data? One factor affecting water availability is stream runoff. If runoff could be predicted, engineers, planners, and policy makers could do their jobs more efficiently. The data file contains 43 years’ worth of precipitation measurements taken at six sites in the Sierra Nevada mountains (labeled APMAM, APSAB, APSLAKE, OPBPC, OPRC, and OPSLAKE) and stream runoff volume at a site near Bishop, California, labeled BSAAM. Draw the scatterplot matrix for these data and summarize the information available from these plots. (Hint: Use the pairs() function.)
Professor ratings (Data file: Rateprof in alr4) In the website and online forum RateMyProfessors.com, students rate and comment on their instructors. Launched in 1999, the site includes millions of ratings on thousands of instructors. The data file includes the summaries of the ratings of 364 instructors at a large campus in the Midwest (Bleske-Rechek and Fritsch, 2011). Each instructor included in the data had at least 10 ratings over a several year period. Students provided ratings of 1–5 on quality, helpfulness, clarity, easiness of instructor’s courses, and raterInterest in the subject matter covered in the instructor’s courses. The data file provides the averages of these five ratings. Create a scatterplot matrix of these five variables. Provide a brief description of the relationships between the five ratings.
Code
head(Rateprof)
gender numYears numRaters numCourses pepper discipline dept
1 male 7 11 5 no Hum English
2 male 6 11 5 no Hum Religious Studies
3 male 10 43 2 no Hum Art
4 male 11 24 5 no Hum English
5 male 11 19 7 no Hum Spanish
6 male 10 15 9 no Hum Spanish
quality helpfulness clarity easiness raterInterest sdQuality sdHelpfulness
1 4.636364 4.636364 4.636364 4.818182 3.545455 0.5518564 0.6741999
2 4.318182 4.545455 4.090909 4.363636 4.000000 0.9020179 0.9341987
3 4.790698 4.720930 4.860465 4.604651 3.432432 0.4529343 0.6663898
4 4.250000 4.458333 4.041667 2.791667 3.181818 0.9325048 0.9315329
5 4.684211 4.684211 4.684211 4.473684 4.214286 0.6500112 0.8200699
6 4.233333 4.266667 4.200000 4.533333 3.916667 0.8632717 1.0327956
sdClarity sdEasiness sdRaterInterest
1 0.5045250 0.4045199 1.1281521
2 0.9438798 0.5045250 1.0744356
3 0.4129681 0.5407021 1.2369438
4 0.9990938 0.5882300 1.3322506
5 0.5823927 0.6117753 0.9749613
6 0.7745967 0.6399405 0.6685579
In this five parameters, quality, Clarity and helpfulness are positively correlated. while easiness and raterinterest were not significantly related. # Question 5 For the student.survey data file in the smss package, conduct regression analyses relating (by convention, y denotes the outcome variable, x denotes the explanatory variable) (i) y = political ideology and x = religiosity, (ii) y = high school GPA and x = hours of TV watching. (You can use ?student.survey in the R console, after loading the package, to see what each variable means.) (a) Graphically portray how the explanatory variable relates to the outcome variable in each of the two cases
Code
head(student.survey)
Error in head(student.survey): object 'student.survey' not found
Error in ggplot(data = student.survey, aes(x = re, fill = pi)): object 'student.survey' not found
Code
ggplot(data = student.survey, aes(x = hi, y = tv)) +geom_point()
Error in ggplot(data = student.survey, aes(x = hi, y = tv)): object 'student.survey' not found
Summarize and interpret results of inferential analyses.
Source Code
---title: "Homework 3"author: "Xiaoyan"description: "Template of course blog qmd file"date: "04/05/2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - hw3 - desriptive statistics - probability---```{r}library(tidyr)library(dplyr)library(readxl)library(ggplot2)library(alr4)library(smss)library(stargazer)```# Question 11. United Nations (Data file: UN11in alr4) The data in the file UN11 contains several variables,including ppgdp, the gross national product per person in U.S. dollars, and fertility, the birthrate per 1000 females, both from the year 2009. The data are for 199 localities, mostly UNmember countries, but also other areas such as Hong Kong that are not independent countries.The data were collected from the United Nations (2011). We will study the dependence offertility on ppgdp.(a) Identify the predictor and the response.```{r}head(UN11)```predictor is ppgdp and response is fertility and birth rate. (b) Draw the scatterplot of fertility on the vertical axis versus ppgdp on the horizontal axisand summarize the information in this graph. Does a straight-line mean function seem tobe plausible for a summary of this graph?```{r}ggplot(UN11, aes(ppgdp, fertility)) +geom_point()```straight line doesn't fit for summary this graph.(c) Draw the scatterplot of log(fertility) versus log(ppgdp) using natural logarithms. Doesthe simple linear regression model seem plausible for a summary of this graph? If you usea different base of logarithms, the shape of the graph won’t change, but the values on theaxes will change.```{r}ggplot(UN11, aes(ppgdp, fertility)) +geom_point()+scale_y_log10()+scale_x_log10()```# Question 2Annual income, in dollars, is an explanatory variable in a regression analysis. For a Britishversion of the report on the analysis, all responses are converted to British pounds sterling (1 poundequals about 1.33 dollars, as of 2016).(a) How, if at all, does the slope of the prediction equation change?the slope will also divided by 1.33(b) How, if at all, does the correlation change?correlation will not change# Question 3Water runoff in the Sierras (Data file: water in alr4) Can Southern California’s watersupply in future years be predicted from past data? One factor affecting water availability is streamrunoff. If runoff could be predicted, engineers, planners, and policy makers could do their jobsmore efficiently. The data file contains 43 years’ worth of precipitation measurements taken at sixsites in the Sierra Nevada mountains (labeled APMAM, APSAB, APSLAKE, OPBPC, OPRC, andOPSLAKE) and stream runoff volume at a site near Bishop, California, labeled BSAAM. Drawthe scatterplot matrix for these data and summarize the information available from theseplots. (Hint: Use the pairs() function.)```{r}head(water)pairs(water)```# Question 4Professor ratings (Data file: Rateprof in alr4) In the website and online forumRateMyProfessors.com, students rate and comment on their instructors. Launched in 1999, the siteincludes millions of ratings on thousands of instructors. The data file includes the summaries ofthe ratings of 364 instructors at a large campus in the Midwest (Bleske-Rechek and Fritsch, 2011).Each instructor included in the data had at least 10 ratings over a several year period. Studentsprovided ratings of 1–5 on quality, helpfulness, clarity, easiness of instructor’s courses, andraterInterest in the subject matter covered in the instructor’s courses. The data file provides theaverages of these five ratings. Create a scatterplot matrix of these five variables. Provide abrief description of the relationships between the five ratings.```{r}head(Rateprof)pairs(Rateprof[,c('quality', 'clarity', 'helpfulness','easiness', 'raterInterest')])```In this five parameters, quality, Clarity and helpfulness are positively correlated. while easiness and raterinterest were not significantly related. # Question 5For the student.survey data file in the smss package, conduct regression analyses relating(by convention, y denotes the outcome variable, x denotes the explanatory variable)(i) y = political ideology and x = religiosity,(ii) y = high school GPA and x = hours of TV watching.(You can use ?student.survey in the R console, after loading the package, to see what each variablemeans.)(a) Graphically portray how the explanatory variable relates to the outcome variable ineach of the two cases```{r}head(student.survey)ggplot(data = student.survey, aes(x = re,fill=pi)) +geom_bar(position ="dodge" )``````{r}ggplot(data = student.survey, aes(x = hi, y = tv)) +geom_point()```(b) Summarize and interpret results of inferential analyses.```{r}```