data(airquality)
head(airquality)
Challenge 10
challenge_10
functions
purrr
audrey_bertin
purrr
For this challenge, I’ll be using the function I wrote in challenge 9 that calculates z scores and apply it multiple times.
A common use of z-scores is in anomaly detection. In this practice, we compare the most recent value in a sequence to all the values that came before to see if that value is an anomaly or not.
We can use a built in dataset for this, called airquality
, which stores time series air quality information:
Our original function looks as follows:
<- function(baseline, value){
z_score <- mean(baseline)
mean <- sd(baseline)
sd <- abs((value - mean)/sd)
z_score
= tibble(mean = mean, sd = sd, input_value = value, z_score = z_score)
results return(results)
}
We can rewrite this so that it determines the baseline and value itself, and instead takes a vector as input:
<- function(vec){
z_score = vec %>% head(-1)
baseline = vec %>% tail(1)
value
<- mean(baseline, na.rm=TRUE)
mean <- sd(baseline, na.rm=TRUE)
sd <- abs((value - mean)/sd)
z_score
= tibble(baseline_mean = mean, baseline_sd = sd, most_recent_value = value, z_score = z_score)
results return(results)
}
Running this on a single column we get:
z_score(airquality$Temp)
We can use purrr::map
to compute this for multiple columns and join them into a single dataframe:
= list(airquality$Ozone, airquality$Wind, airquality$Temp)
cols
map_dfr(cols, z_score)