Challenge 10

purrr
Author

Noah Dixon

Published

July 6, 2023

Read Data

Using the read.csv function we can read the cereal.csv data into a data frame.

cereal <- read.csv("_data/cereal.csv")
cereal
ABCDEFGHIJ0123456789
Cereal
<chr>
Sodium
<int>
Sugar
<int>
Type
<chr>
Frosted Mini Wheats011A
Raisin Bran34018A
All Bran705A
Apple Jacks14014C
Captain Crunch20012C
Cheerios1801C
Cinnamon Toast Crunch21010C
Crackling Oat Bran15016A
Fiber One1000A
Frosted Flakes13012C

Next, we will split the cereal dataframe based on cereal type.

cereal_types <- split(cereal, cereal$Type)
cereal_types
$A
                  Cereal Sodium Sugar Type
1    Frosted Mini Wheats      0    11    A
2            Raisin Bran    340    18    A
3               All Bran     70     5    A
8     Crackling Oat Bran    150    16    A
9              Fiber One    100     0    A
12 Honey Bunches of Oats    180     7    A
16          Honey Smacks     50    15    A
17             Special K    220     4    A
18              Wheaties    180     4    A
19           Corn Flakes    200     3    A

$C
                  Cereal Sodium Sugar Type
4            Apple Jacks    140    14    C
5         Captain Crunch    200    12    C
6               Cheerios    180     1    C
7  Cinnamon Toast Crunch    210    10    C
10        Frosted Flakes    130    12    C
11           Froot Loops    140    14    C
13    Honey Nut Cheerios    190     9    C
14                  Life    160     6    C
15         Rice Krispies    290     3    C
20             Honeycomb    210    11    C

Now, we will recreate my function from challenge 9 to calculate summary statistics for a variable. We will alter the function slightly to accept the column name as an argument along with the data frame.

statsFunction <- function(df, col_name) {
  column <- df[[col_name]]
  print(paste0("Summary Statistics:"))
  print(paste0("Maximum: ", max(column)))
  print(paste0("Minimum: ", min(column)))
  print(paste0("Mean: ", mean(column, na.rm = TRUE)))
  print(paste0("Median: ", median(column, na.rm = TRUE)))
  print(paste0("Standard Deviation: ", sd(column, na.rm = TRUE)))
}

Finally, we will use the map function from the purrr package to apply this function to the Sugar column of both data frames in the cereal_types list.

result <- map(cereal_types, ~statsFunction(.x, "Sugar"))
[1] "Summary Statistics:"
[1] "Maximum: 18"
[1] "Minimum: 0"
[1] "Mean: 8.3"
[1] "Median: 6"
[1] "Standard Deviation: 6.25477595299961"
[1] "Summary Statistics:"
[1] "Maximum: 14"
[1] "Minimum: 1"
[1] "Mean: 9.2"
[1] "Median: 10.5"
[1] "Standard Deviation: 4.49196814077947"