HW2

hw2

Introduction to Visualization

Author

Lai Wei

Published

November 15, 2022

Goals of this HW: gain experience with working with external data, dplyr, and the pipe operator.

Background for mmr_2015.csv: The maternal mortality ratio (MMR) is defined as the number of maternal deaths per 100,000 live births. The UN maternal mortality estimation group produces estimates of the MMR for all countries in the world.

In this HW, I will use mmr_2015.csv, which is a data set that contains a subset of the (real) data that were used to generate the United Nations Maternal mortality estimates, as published in the year 2015. Variables in the data set mmr_2015.csv are as follows:

Iso = ISO code
Name = country name
Year = observation year
MMR = observed maternal mortality ratio, which is defined as the number of maternal deaths/total number of births*100,000

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2

Warning: package 'ggplot2' was built under R version 4.2.2

Warning: package 'stringr' was built under R version 4.2.2

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(babynames)

Error in library(babynames): there is no package called 'babynames'

library(dplyr)

using mmr_2015.csv: Read in mmr_2015.csv. Then construct a graph that shows the observed values of the MMR plotted against year (starting in 2000) for India and Thailand, as in the example Figure 1 below. Use the pipe operator so that the graph follows from a multi-line command that starts with “mmr %>%”. Hint 1: Use data transformation functions to filter rows with i. year >= 2000 and ii. countries India and Thailand only. Hint 2: Use ggplot() to display the data.

mmr <- read.csv("D:/Umass Amherst/BIOSTATS 597D/HW/mmr_2015.csv")

Warning in file(file, "rt"): cannot open file 'D:/Umass Amherst/BIOSTATS
597D/HW/mmr_2015.csv': No such file or directory

Error in file(file, "rt"): cannot open the connection

data_IT = filter(mmr,country == "India"|country == "Thailand",year >= 2000)

Error in filter(mmr, country == "India" | country == "Thailand", year >= : object 'mmr' not found

ggplot(data = data_IT,aes(x = year,y= mmr))+
  geom_point(aes(group = country,color = country))

Error in ggplot(data = data_IT, aes(x = year, y = mmr)): object 'data_IT' not found

using babynames as used in the lecture slides:

Reproduce the example Figure 2 below where babynames was filtered to include only those rows with year > 1975, sex equal to male, and either prop > 0.025 or n > 50000. Note that the y-axis starts at zero.

babynames %>% 
  filter(year > 1975, sex == "M",prop > 0.025|n > 50000) %>% 
  ggplot(aes(x = year, y = prop))+
  geom_point(aes(group = name,color = name), size = 2)+
  geom_line(aes(group = name, color = name))+
  expand_limits(y = 0)

Error in filter(., year > 1975, sex == "M", prop > 0.025 | n > 50000): object 'babynames' not found

Construct and print a tibble that shows the countries sorted by their average observed MMR (rounded to zero digits), with the country with the highest average MMR listed first, as example Figure 3 below:

data1<- group_by(mmr,country) %>% 
  summarise_at(vars(mmr),list(name = mean))

Error in group_by(mmr, country): object 'mmr' not found

  names(data1)[2] = "ave"

Error in names(data1)[2] = "ave": object 'data1' not found

  data1$ave <- round(data1$ave,0)

Error in eval(expr, envir, enclos): object 'data1' not found

  arrange(data1,desc(ave))

Error in arrange(data1, desc(ave)): object 'data1' not found

Continuing with the mmr data set

Part a: For each year - first calculate the mean observed value for each country (to allow for settings where countries may have more than 1 value per year; note that this is not true in this data set). - then rank countries by increasing MMR for each year.

Calculate the mean ranking across all years, extract the mean ranking for 10 countries with the lowest ranking across all years, and print the resulting table.

data2<-
  mmr %>% 
  group_by(year) %>% 
  mutate(Mean = mean(mmr,na.rm = TRUE)) %>% 
  arrange(desc(mmr))

Error in group_by(., year): object 'mmr' not found

data2

Error in eval(expr, envir, enclos): object 'data2' not found

  arrange(data2,desc(Mean))

Error in arrange(data2, desc(Mean)): object 'data2' not found

lowest10 <- print(tail(data2,10))

Error in tail(data2, 10): object 'data2' not found

Part b: do the same thing but now with rankings calculated separately for two periods, with period 1 referring to years < 2000 and period 2 referring to years >= 2000. For each period

first calculate the mean observed value for each country (to allow for settings where countries may have more than 1 value per period)
then rank countries by increasing MMR for each period.

Calculate the mean ranking across all periods, extract the 10 countries with the lowest ranking across all periods, and print the table.

before_2000<-mmr %>% 
  filter(year < 2000) %>% 
  group_by(country) %>% 
  mutate(Mean = mean(mmr,na.rm = TRUE)) %>% 
  arrange(desc(mmr))

Error in filter(., year < 2000): object 'mmr' not found

before_2000

Error in eval(expr, envir, enclos): object 'before_2000' not found

  print(tail(before_2000,10))

Error in tail(before_2000, 10): object 'before_2000' not found

after_2000 <- mmr %>% 
  filter(year >= 2000) %>% 
  group_by(country) %>% 
  mutate(Mean = mean(mmr,na.rm = TRUE)) %>% 
  arrange(desc(mmr))

Error in filter(., year >= 2000): object 'mmr' not found

after_2000

Error in eval(expr, envir, enclos): object 'after_2000' not found

  print(tail(after_2000,10))

Error in tail(after_2000, 10): object 'after_2000' not found