Challenge 10 Solutions

challenge_10
purrr
Author

Sean Conway

Published

July 6, 2023

Challenge Overview

The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.

For example, we can draw n random samples from 10 different distributions using a vector of 10 means.

n <- 100 # sample size
m <- seq(1,10) # means 
samps <- map(m,rnorm,n=n) 
samps
[[1]]
  [1]  0.268458130  1.805058809  2.850437381  1.981456163  1.989510845
  [6]  2.053036692  0.080427688  1.080486631  1.519875926  0.271701408
 [11]  2.244070685  1.791907073  0.107315332  1.111323682  2.007059258
 [16]  0.154593711 -0.168036683  1.027192048 -0.798746046  2.460586858
 [21]  1.636178177  0.814818486 -0.832419503  0.745375433  0.912161500
 [26]  0.765410547 -1.389209133  1.541892802  1.142690595  1.561627758
 [31]  2.488199340  1.442615066  2.158000705  1.375759346  0.652806052
 [36] -1.854021140 -0.001099008  0.689068067  1.361606769  1.132893538
 [41]  2.902815003  1.135665010  0.679537916  0.963032432  0.821293093
 [46]  0.164142062  1.261276948  1.135965320  0.587392267  1.983310788
 [51]  1.369952377 -0.730781642  0.127642852  0.196331242  1.543374136
 [56]  0.813015174 -0.796774117 -0.917717948  1.959668407  2.043064430
 [61]  0.644985822  1.411156108  1.733970616  2.344453865  2.691232391
 [66] -0.360347889  0.952766201 -0.141840750 -0.666188451  1.052123339
 [71]  0.827121690  0.079729868  1.129250612  2.367229687  0.126578103
 [76]  0.371188340  2.802423758  2.263973343 -0.583531900  0.063985412
 [81]  0.828370506  0.271737428  1.358450307  1.774134930  0.914732100
 [86]  2.183848359  1.576500433  1.690008980  0.052923252  0.773521995
 [91]  0.095434884 -0.312756963  1.930563183  0.866573616  0.850336693
 [96]  1.443802756  1.746105327  0.353189756  0.589599577  1.314099238

[[2]]
  [1]  3.4213316  2.1773042  2.9989417  1.6848160  1.1031708  2.4811625
  [7]  1.3206888  2.5352591  2.6951389  2.3811273  0.6746779  1.1652051
 [13]  3.3800205  2.6878627  3.0290476  1.3015955  1.5943286  3.1304413
 [19]  2.3657316  2.1056012  2.0248525  1.4044024  1.9280501  0.4711288
 [25]  0.7499365  2.2919731  0.9019173  3.7525119  0.5769933  2.3125075
 [31]  2.4626770  2.6690676  1.8994103  3.7024116  1.2595332  2.0132914
 [37]  1.9911367  1.6516412  2.0758082  3.1932766  2.0561536  1.7210857
 [43]  0.7974941 -0.2243914  1.1811639  3.7258634  0.7222775  1.5341697
 [49]  2.4150230  2.6393095  3.2723973  2.6466146  2.3050969  0.3113854
 [55]  1.3092199  2.5009514  2.1313592  2.1526343  1.2312833  1.8985097
 [61]  1.4057728  2.3441052  1.1031590  1.5953727  1.9896394  2.1648868
 [67]  2.1917130  2.0515841  2.2641962  2.3756484  2.7708286  1.7720878
 [73]  2.0692621  3.2281927  1.9716207  2.6352069  0.8097524  0.4104619
 [79]  2.5115633  4.0369971  1.9877930  3.6695963  2.1279395  2.5882936
 [85]  1.6169111  2.8877187  1.0819231  1.9306155 -0.1976967  2.1912348
 [91]  2.4978101  1.0800988  2.2075221  1.4764784  3.2698633  2.2190558
 [97]  1.3756575  2.4599892  2.2713205  2.4506695

[[3]]
  [1] 2.849388 3.275419 3.887502 2.661112 2.241833 2.327815 4.079780 1.646765
  [9] 4.247939 3.057717 4.755694 3.321319 2.316313 2.232794 2.359763 3.124312
 [17] 3.731267 4.118570 1.395507 3.374049 3.797304 3.456335 3.878998 1.694063
 [25] 3.865238 4.312225 2.621145 2.067114 3.085590 4.058947 1.948555 3.678394
 [33] 2.975859 3.077720 1.875007 3.133204 3.361260 3.641329 3.351381 2.526533
 [41] 2.379244 3.263238 2.556551 2.769018 5.137459 3.564894 2.238057 3.045533
 [49] 1.383442 5.536588 3.452727 4.126669 4.124214 3.650804 3.175500 3.006370
 [57] 3.423461 1.452312 3.270103 2.306489 3.423046 1.820633 3.118646 3.275302
 [65] 2.953491 3.051793 3.313827 1.645863 3.914151 3.220667 3.402753 2.768472
 [73] 2.995866 3.147580 3.589511 2.604507 2.284266 2.595755 3.448061 2.685283
 [81] 4.499424 3.074671 1.296883 3.204911 3.279439 3.134744 3.895504 4.647770
 [89] 3.823168 2.353586 3.137422 3.347107 1.766439 4.298840 3.698805 3.294489
 [97] 3.577747 1.109990 3.094454 3.303195

[[4]]
  [1] 2.909101 5.565002 4.329330 4.487694 3.673659 2.386289 3.236789 4.837758
  [9] 4.120674 4.139167 4.197418 4.844362 5.220403 5.314739 3.740268 4.575939
 [17] 5.509264 3.225708 4.603033 4.194045 3.879794 4.876978 4.779621 3.210556
 [25] 2.643782 2.760828 4.788262 4.685030 4.671720 4.977641 4.435364 4.302177
 [33] 4.777089 2.805352 4.851540 3.505579 2.815000 4.703671 2.866514 4.309735
 [41] 3.736249 3.703144 2.916645 3.111574 5.032504 5.474812 5.348718 5.000874
 [49] 4.250878 3.036741 4.290564 3.648204 4.780317 4.470074 2.979266 3.614318
 [57] 4.406711 3.580283 4.092946 4.788032 3.670807 5.044847 1.542944 3.581498
 [65] 1.797159 3.290718 4.812917 5.381617 3.360792 2.661766 3.971591 4.409649
 [73] 5.629199 3.823959 4.350915 4.986306 3.823890 5.409713 5.308825 5.759710
 [81] 3.507546 3.242976 5.192751 4.510785 3.779038 2.453765 4.829490 3.407364
 [89] 4.080830 4.212391 3.342785 4.700756 4.902640 3.387637 5.871850 3.326346
 [97] 4.030990 4.128260 3.585292 2.851284

[[5]]
  [1] 6.101048 5.510790 4.090887 6.052491 4.020157 6.492446 7.316474 4.455562
  [9] 7.029689 4.232981 5.378409 6.939321 4.473825 6.450459 4.407387 4.740987
 [17] 4.662560 3.736878 6.798850 2.945911 6.262191 4.225913 3.659185 4.733412
 [25] 6.116324 7.248214 4.255195 5.685438 3.986471 4.318510 6.180469 5.080252
 [33] 4.666482 3.755502 4.254060 6.525292 4.449268 4.186136 6.227843 5.195076
 [41] 5.791629 6.014242 3.740156 5.028988 4.316829 4.144595 5.804596 5.271576
 [49] 4.337883 4.971070 5.598393 5.068108 6.084045 5.085871 6.561756 3.936155
 [57] 5.695948 6.224938 4.343231 4.756046 5.732667 6.333938 3.455375 5.559245
 [65] 5.103436 5.551539 5.040560 4.762688 5.314857 5.962616 3.930919 6.604384
 [73] 6.258845 5.340836 3.823944 2.860539 6.036557 4.661485 4.455820 3.913337
 [81] 5.953889 5.571327 4.077226 5.159839 5.488512 6.034312 5.285816 4.268168
 [89] 4.691877 5.771440 5.789221 5.126011 4.226983 5.478381 7.209924 4.174580
 [97] 4.670375 4.852552 3.279181 4.181330

[[6]]
  [1] 5.157754 7.378809 4.781528 6.477789 5.032303 5.578639 5.278332 7.482551
  [9] 5.597662 6.269840 7.235227 5.368733 7.356507 6.883053 6.229005 6.019629
 [17] 6.591643 7.746407 8.615281 7.018589 5.999659 5.381808 4.623965 7.332709
 [25] 6.760416 5.532726 7.366359 8.224478 5.124630 6.096700 3.651660 5.703005
 [33] 6.307832 5.604344 7.293577 5.954842 5.594733 6.073168 5.408861 6.087452
 [41] 6.356125 6.325823 6.874012 6.911382 6.162324 6.670889 6.502121 7.541481
 [49] 6.722524 6.782897 6.900867 7.473466 5.889876 6.306327 5.453883 5.775001
 [57] 7.223017 6.139948 5.729072 5.773861 4.220888 6.033537 4.622212 5.373821
 [65] 6.391049 5.437170 6.946522 4.309184 4.991137 6.898836 6.255827 6.256083
 [73] 6.610911 8.155066 4.744625 6.255503 3.454818 6.565708 6.234452 5.478558
 [81] 5.079481 4.834396 7.012601 6.683881 4.958339 4.676383 5.319896 4.235249
 [89] 6.634659 4.579088 7.120645 6.318637 5.238755 6.412613 6.446754 5.166346
 [97] 5.390285 7.361652 7.006868 8.006277

[[7]]
  [1] 6.402471 6.194810 7.900129 8.139702 6.727481 6.723286 7.615240 8.528159
  [9] 5.909887 7.322088 7.407308 5.777374 6.532791 8.088526 7.105431 7.610605
 [17] 7.548097 6.919883 6.888251 5.558936 6.727806 7.583347 6.632602 7.908410
 [25] 5.326804 6.788731 6.322464 7.461177 7.190528 5.946723 8.376530 5.541241
 [33] 7.490220 7.203998 5.590601 6.657010 6.325185 7.520668 7.688981 8.003149
 [41] 5.282302 5.768235 7.709313 6.185665 6.377766 6.394810 7.164838 7.714391
 [49] 7.360359 6.837345 7.870399 6.302780 6.294957 6.435253 5.435022 6.192847
 [57] 7.775367 6.935891 6.767892 7.657569 6.800202 6.267998 7.542220 6.189420
 [65] 7.918263 8.063880 6.321646 7.619768 6.604228 8.713627 6.577755 7.215723
 [73] 5.906993 5.796383 7.279002 5.319842 6.695906 5.470839 7.416835 6.470555
 [81] 5.906952 8.294792 6.439227 6.493457 8.710688 7.070846 6.915834 7.218159
 [89] 6.127085 6.234658 6.014314 8.720263 7.636827 8.048245 6.606637 6.271184
 [97] 6.438818 7.267039 7.395047 8.011923

[[8]]
  [1]  7.260039  9.093702  8.957614  7.337442  9.250030  7.811487  8.135249
  [8]  8.478739  9.149628  9.462939  9.030277  8.003083  7.096476  6.846704
 [15]  7.360071  7.206038  8.737749  7.487870  6.972552  5.662759  7.486435
 [22]  7.792269  5.828574  6.258681  6.813699  6.694434  9.139767  7.601866
 [29] 10.654703  6.874182  6.643528  7.431364  7.019931  9.002242  8.125005
 [36]  9.072721  8.179681  6.486804  6.537731  7.545667  8.750910  7.884447
 [43]  7.735273  8.150766  7.956882  7.036281  8.492775  8.773177  9.278230
 [50]  7.803994  8.684940  5.862633  6.964846  6.392633  7.914810  9.332209
 [57]  8.053307  8.614812  5.232762  8.538430  9.119135  6.973781  6.501858
 [64]  8.384892  6.751632  8.539838  7.668791  7.415485  7.790122  9.202261
 [71]  7.489985  6.960745  8.514709  9.693491  5.115094  8.266659  7.823602
 [78]  6.509247  7.358881  9.110748  6.325313  8.739057  8.749901  8.704304
 [85]  7.070760  9.125652  8.084477  8.919265  9.372269  7.442062  7.221604
 [92]  8.670422  7.878583  5.679776  7.575565  7.806229  8.135077  7.619470
 [99]  8.486658  8.243160

[[9]]
  [1]  9.126423  6.492825  9.759623  9.015525  7.781886  9.724508  7.962595
  [8]  7.792892  7.055123  9.412364 10.196067  9.355844  8.511063  8.667114
 [15]  9.984545  9.557694  8.439534  9.609796  8.473127  8.833193  9.169522
 [22]  8.959740  7.775719  9.062358  8.351518  9.338643 10.041384 10.291176
 [29]  7.986984 10.678237  8.722119  9.242342  8.654386  8.800456 10.057541
 [36]  7.273664  8.280249  8.643718  9.624214  9.571291 10.210710  9.561939
 [43] 10.870747 10.845488  8.732542 11.155110  9.242930 10.856360  7.933317
 [50]  7.865069  8.871460 10.251372 13.071356 10.441054  7.465638  6.960587
 [57]  8.783978  8.957982  9.758598  9.659952  9.420352 11.058641  7.870306
 [64]  8.001907  9.169500 10.838688  8.349900 10.005245  9.685935  8.738429
 [71] 10.547047  8.137919  7.826117  8.310148  8.087978  8.692287  8.719213
 [78]  9.301791  8.875563  9.898609  9.578387  8.704105  8.821449  8.204982
 [85]  6.769590  8.714529  8.327589 10.434764 10.321879  9.045867 10.214030
 [92]  9.321948 10.149178  7.834123  7.255850  9.126612 10.268186  9.668153
 [99]  8.111253  7.065867

[[10]]
  [1] 10.940431  9.655367  9.702246 10.348212 10.040725 10.761855  9.526864
  [8]  9.858697 10.445138  9.859076 10.077429 10.463395  9.814670 10.242008
 [15] 11.293190  8.788702  9.034744  9.428154 11.118861 11.169829  8.943704
 [22] 11.264952  9.250037  9.405424  9.836878  9.895092 10.020501  9.002539
 [29]  8.325159  8.177161 10.652136 11.069552 10.282327  9.636455  8.560088
 [36] 10.019359  9.244668  9.743809  9.646116  8.444149 11.024892  9.891176
 [43] 11.926914 10.944104  8.858591 11.439752  8.800259 10.136550 10.242429
 [50] 11.964047 10.416332  9.201005 10.328812  9.885357 10.831335 10.498519
 [57] 10.091813  8.515794 10.630576 10.317034 11.837400 10.311029 11.219555
 [64] 10.846636 10.639684  9.216272 12.185906 10.183914 10.473126  9.848617
 [71] 11.476663  9.387814  9.997179 10.559720  9.856581 10.734400 10.233080
 [78]  9.391727  9.586382 10.511176 11.585117 10.755687 10.630383 10.738890
 [85] 11.299522 10.027602  8.705846  8.695347 10.510384 11.968574 10.919258
 [92] 12.785376 10.849988 11.323014 10.686161 11.690689 10.160984  7.975584
 [99] 11.227056 10.707378

We can then use map_dbl to verify that this worked correctly by computing the mean for each sample.

samps %>%
  map_dbl(mean)
 [1]  0.9683571  2.0278843  3.1037180  4.0998333  5.1161893  6.1145581
 [7]  6.9166271  7.8302441  9.0724711 10.2167469

purrr is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr and map readings before attempting this challenge.

The challenge

Use purrr with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr with a function you wrote for challenge 9.

Solutions

There are innumerable ways to use purrr in your coding.

Using purrr to perform simple computations

Let’s use the map_dbl() function to compute the mean for each of several variables.

Below, we use the the map_dbl() function to compute the mean for multiple variables from the mtcars dataset (specifically weight, horsepower, and miles-per-gallon). We use map_dbl() because we know the result of computing the mean will be of data type double. This allows purrr to simplify the output. We also combine the variables in a list when passing them to map_dbl().

# the dataset
mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
map_dbl(list(mtcars$wt, mtcars$hp, mtcars$mpg),mean)
[1]   3.21725 146.68750  20.09062

The above operation gives us a vector of means. If we use regular old map, the operation still works fine, but we get a list object (which can be a little more annoying to work with).

map(list(mtcars$wt, mtcars$hp, mtcars$mpg),mean)
[[1]]
[1] 3.21725

[[2]]
[1] 146.6875

[[3]]
[1] 20.09062

A function that computes multiple summary statistics

I modified this function to include a required “id” value. This will allow us to use map2_dfr() to apply the function across multiple columns and bind the results into a single data frame, while allowing the variable itself to be identifiable. I also modified the function to compute the standard error.

sum_stat <- function(x,id){
  stat <- tibble(
    id=id,
    mean=mean(x,na.rm=T),
    median=median(x,na.rm=T),
    sd=sd(x,na.rm=T),
    se=sd/sqrt(length(x))
  )
  return(stat)
}

We’ll use the mtcars dataset.

mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

In the example below, we compute the mean, median, sd, and se for mpg (miles per gallon), hp (horsepower), and wt (weight) in the mtcars dataset. We use map2_dfr() to do so. We need to use one of the map2 variations because we need to concurrently pass the vector of numerical values and the character identifier we’re using for this variable (e.g., both mtcars$mpg and "mpg").

mtcars_stats <- map2_dfr(list(mtcars$mpg, mtcars$hp, mtcars$wt), 
         list("mpg","hp","wt"),
         sum_stat)
mtcars_stats
# A tibble: 3 × 5
  id      mean median     sd     se
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 mpg    20.1   19.2   6.03   1.07 
2 hp    147.   123    68.6   12.1  
3 wt      3.22   3.32  0.978  0.173

We can even use pivot_longer() to get a “long” format of these statistics.

mtcars_stats_long <- mtcars_stats %>%
  pivot_longer(c(-id))
mtcars_stats_long
# A tibble: 12 × 3
   id    name     value
   <chr> <chr>    <dbl>
 1 mpg   mean    20.1  
 2 mpg   median  19.2  
 3 mpg   sd       6.03 
 4 mpg   se       1.07 
 5 hp    mean   147.   
 6 hp    median 123    
 7 hp    sd      68.6  
 8 hp    se      12.1  
 9 wt    mean     3.22 
10 wt    median   3.32 
11 wt    sd       0.978
12 wt    se       0.173

A purrr pipeline

Below, I fully demonstrate how we can use our function in a pipeline that begins with purrr and ends with us using ggplot to visualize the mean and se for each of these numerical variables.

# compute summary stats
map2_dfr(list(mtcars$mpg, mtcars$hp, mtcars$wt), 
         list("miles per gallon","horse power","weight"),
         sum_stat) %>%
  mutate(se_lower=mean-se, # get lower and upper bounds for error bars
         se_upper=mean+se) %>%
  ggplot(aes(id,mean))+
  geom_col(fill="lightblue")+ # visualize w/ geom_col
  geom_errorbar(aes(ymin=se_lower,ymax=se_upper),width=.25)+ # add error bars
  labs(x="variable",y="mean value",caption="Error bars are +- 1 SE of the mean.")+
  ggthemes::theme_few()+
  theme(plot.caption=element_text(hjust=0))

This plot might not be the best use of ggplot. The units of these variables are so different - for example, wt is measured in tons, so it’s very difficult to see where exactly the mean lies. So take this as an example of what you can do with purrr, when it makes sense for your research question.

A function that plots a histogram

In Challenge 9, we made a function that creates a histogram using ggplot. Here, we use it to make multiple histograms using map().

We have to modify the function using defusion1.

make_my_hist <- function(dat, colname, fill="purple", xlab="x", ylab="n"){
  colname <- rlang::ensym(colname)
  dat %>%
    ggplot(aes({{colname}}))+
    geom_histogram(fill=fill)+
    labs(x=colname,
         y=ylab)
}

Making the histograms

We pass the names of the variables we want to graph to the make_my_hist() function. We use !!, or bang-bang to make sure the function works appropriately.

map(c("mpg", "hp", "wt"), ~make_my_hist(dat=mtcars, colname=!!.x))
[[1]]


[[2]]


[[3]]

A function that computes counts of a categorical variable

When a variable is categorical, we typically summarise it by computing the counts (or frequencies) of each value. Base R uses the table() function to do so, but the result is of class "table", which is not always amenable to a tidyverse programmer.

The below function uses the sum() function add up the counts of each unique value in a categorical variable. A second, optional argument allows the user to compute proportions as well (also known as relative frequencies).2.

The function

# function for counting
table_data <- function(x, props=F){
  # get all unique values of x
  v <- unique(x)
  
  # using purrr, count the num of values at each unique level of x
  counts <- map_dbl(v, ~sum(x==.x))
  
  # combine results in a tibble
  res <- tibble(
    name=v,
    n=counts
  )
  
  # compute props if desired
  if(props){
    res <- res %>%
      mutate(prop=n/sum(n))
  }
  return(res)
}

Using the functions

# randomly sampled vector
vec <- sample(c("a","b","c"),size=1000,replace=T)
head(vec)
[1] "b" "c" "b" "b" "a" "a"
# count w/o props
table_data(vec)
# A tibble: 3 × 2
  name      n
  <chr> <dbl>
1 b       328
2 c       349
3 a       323
# count w/ props
table_data(vec,T)
# A tibble: 3 × 3
  name      n  prop
  <chr> <dbl> <dbl>
1 b       328 0.328
2 c       349 0.349
3 a       323 0.323

Wrapping up

The above examples are just a few ways you can use purrr to implement functional programming in R.

Footnotes

  1. This is beyond the scope of the challenge/this course. If you’re interested, you are welcome to read more about it. The main point here is we’re using some fancy R functions to grab the column names we want from the mtcars data frame.↩︎

  2. A user might also use the dplyr functions n() or count(), though these have drawbacks of their own↩︎