<- read.csv("_data/cereal.csv")
cereal cereal
Challenge 10
challenge_10
purrr
Read Data
Using the read.csv function we can read the cereal.csv data into a data frame.
Next, we will split the cereal dataframe based on cereal type.
<- split(cereal, cereal$Type)
cereal_types cereal_types
$A
Cereal Sodium Sugar Type
1 Frosted Mini Wheats 0 11 A
2 Raisin Bran 340 18 A
3 All Bran 70 5 A
8 Crackling Oat Bran 150 16 A
9 Fiber One 100 0 A
12 Honey Bunches of Oats 180 7 A
16 Honey Smacks 50 15 A
17 Special K 220 4 A
18 Wheaties 180 4 A
19 Corn Flakes 200 3 A
$C
Cereal Sodium Sugar Type
4 Apple Jacks 140 14 C
5 Captain Crunch 200 12 C
6 Cheerios 180 1 C
7 Cinnamon Toast Crunch 210 10 C
10 Frosted Flakes 130 12 C
11 Froot Loops 140 14 C
13 Honey Nut Cheerios 190 9 C
14 Life 160 6 C
15 Rice Krispies 290 3 C
20 Honeycomb 210 11 C
Now, we will recreate my function from challenge 9 to calculate summary statistics for a variable. We will alter the function slightly to accept the column name as an argument along with the data frame.
<- function(df, col_name) {
statsFunction <- df[[col_name]]
column print(paste0("Summary Statistics:"))
print(paste0("Maximum: ", max(column)))
print(paste0("Minimum: ", min(column)))
print(paste0("Mean: ", mean(column, na.rm = TRUE)))
print(paste0("Median: ", median(column, na.rm = TRUE)))
print(paste0("Standard Deviation: ", sd(column, na.rm = TRUE)))
}
Finally, we will use the map function from the purrr package to apply this function to the Sugar column of both data frames in the cereal_types list.
<- map(cereal_types, ~statsFunction(.x, "Sugar")) result
[1] "Summary Statistics:"
[1] "Maximum: 18"
[1] "Minimum: 0"
[1] "Mean: 8.3"
[1] "Median: 6"
[1] "Standard Deviation: 6.25477595299961"
[1] "Summary Statistics:"
[1] "Maximum: 14"
[1] "Minimum: 1"
[1] "Mean: 9.2"
[1] "Median: 10.5"
[1] "Standard Deviation: 4.49196814077947"