data(airquality)
head(airquality)Challenge 10
challenge_10
functions
purrr
audrey_bertin
purrr
For this challenge, I’ll be using the function I wrote in challenge 9 that calculates z scores and apply it multiple times.
A common use of z-scores is in anomaly detection. In this practice, we compare the most recent value in a sequence to all the values that came before to see if that value is an anomaly or not.
We can use a built in dataset for this, called airquality, which stores time series air quality information:
Our original function looks as follows:
z_score <- function(baseline, value){
mean <- mean(baseline)
sd <- sd(baseline)
z_score <- abs((value - mean)/sd)
results = tibble(mean = mean, sd = sd, input_value = value, z_score = z_score)
return(results)
}We can rewrite this so that it determines the baseline and value itself, and instead takes a vector as input:
z_score <- function(vec){
baseline = vec %>% head(-1)
value = vec %>% tail(1)
mean <- mean(baseline, na.rm=TRUE)
sd <- sd(baseline, na.rm=TRUE)
z_score <- abs((value - mean)/sd)
results = tibble(baseline_mean = mean, baseline_sd = sd, most_recent_value = value, z_score = z_score)
return(results)
}Running this on a single column we get:
z_score(airquality$Temp)We can use purrr::map to compute this for multiple columns and join them into a single dataframe:
cols = list(airquality$Ozone, airquality$Wind, airquality$Temp)
map_dfr(cols, z_score)