hw3
linear regression
distribution transformation
Challenge
Author

Miguel Curiel

Published

April 20, 2023

Code
# load necessary packages
library(tidyverse)
library(alr4)
library(smss)

Question 1

United Nations (Data file: UN11in alr4) The data in the file UN11 contains several variables, including ppgdp, the gross national product per person in U.S. dollars, and fertility, the birth rate per 1000 females, both from the year 2009. The data are for 199 localities, mostly UN member countries, but also other areas such as Hong Kong that are not independent countries. The data were collected from the United Nations (2011). We will study the dependence of fertility on ppgdp.

  1. Identify the predictor and the response.

    1. Predictor = ppgdp

    2. Response = fertility

  2. Draw the scatterplot of fertility on the vertical axis versus ppgdp on the horizontal axis and summarize the information in this graph. Does a straight-line mean function seem to be plausible for a summary of this graph?

    1. No, a straight-line mean function does not seem plausible because the distribution seems to be somewhat curvilinear.
  3. Draw the scatterplot of log(fertility) versus log(ppgdp) using natural logarithms. Does the simple linear regression model seem plausible for a summary of this graph? If you use a different base of logarithms, the shape of the graph won’t change, but the values on the axes will change.

    1. Yes, a simple linear regression model with a logarithmic transformation seems to be plausible as it is better at capturing the curvilinear relationship between variables.
Code
ggplot(data = UN11, aes(x = ppgdp, y = fertility)) +
  geom_point() +
  geom_smooth(method = 'lm', se=F) +
  labs(title='1.b. Scatterplot of fertility versus ppgdp')