<- simulate_correlated_normal(offdiag = 0.9)
simdata_corr visualise_ts(simdata_corr)
Tl;dr
This is part one of a two part series. Stay tuned till October 27th.
Static Visualisation
The idea behind this plot is that we can draw a line for each data generating process, which we will refer to as a collection of series. In my application this takes the form where each series is a separate well within a groundwater monitoring site.
If the lines coalesce we can qualitatively infer that each series are somewhat correlated.
Conversely, if the lines deviate from each other at random the opposite is true.
For example compare these two plots with varying correlations.
<- simulate_correlated_normal(offdiag = 0.2)
simdata_indep visualise_ts(simdata_indep)
Functions
The two functions used are defined as follows.
Show data generating function.
simulate_correlated_normal.R
#' Simulate Correlated Data
#'
#' Simulates `n` by `p` multivariate normal data with
#' suggested correlation.
#'
#' @param n integer. Number of observations.
#' @param p integer. Number of variables.
#' @param offdiag numeric, between 0 and 1. Informs correlation matrix.
<- function(n = 50, p = 30, offdiag = 0.95) {
simulate_correlated_normal <- rep(0, p)
mu <- diag(1 - offdiag, p) + matrix(offdiag, p, p)
Sigma
as.data.frame(MASS::mvrnorm(n, mu, Sigma))
}
Show visualisation function.
visualise_ts.R
#' Visualise Time-Series Data
#'
#' Plot used in blog post to show many time-series like data.
#'
#' @param x data frame. Only data columns should be given here.
#' @param highlight character vector. Names of variables to highlight.
<- function(x, highlight = NULL) {
visualise_ts <-
plot_data ::pivot_longer(tibble::rowid_to_column(x), -.data$rowid)
tidyr
<-
out %>%
plot_data ::ggplot(ggplot2::aes(
ggplot2x = .data$rowid,
y = .data$value,
colour = .data$name
+
)) ::geom_line(alpha = 0.8) +
ggplot2::geom_point(alpha = 0.8) +
ggplot2::labs(x = "Time", y = "Value", colour = "Series") +
ggplot2::guides(colour = "none")
ggplot2
if (is.null(highlight)) return(out)
+
out ::gghighlight(
gghighlight$name %in% highlight,
.dataunhighlighted_params = ggplot2::aes(alpha = 0.2),
use_group_by = FALSE
+
) ::guides(colour = "legend")
ggplot2 }
Issue
When there are many series, adding a legend would be hard to read and cause confusion between similarly coloured lines.
Adobe suggests limiting categorical colours to 6, at most and states 12 colours are
extremely difficult to understand
If we were to re-add the legend, this fact becomes clear.
visualise_ts(simulate_correlated_normal(offdiag = 0.9)) +
::guides(colour = "legend") ggplot2
So, we can see a global picture of the data but drilling down in specific series or understanding which series have high values or deviate from population trends is near impossible.
gghighlight
The gghighlight
package allows us to focus on a single value within an aesthetic.
Therefore, if we were super interested in the 30th variable and how it appears in these plots, we would add the function as follows using a dplyr::filter
syntax.
::ggplot(...) +
ggplot2::geom_point(...) +
ggplot2::geom_line(...) +
ggplot2::gghighlight(series == "V30") gghighlight
See the gghighlight documentation for more details.
In our implementation, we allow for an optional variable (or variables!) to be identified by name and then highlighted while reducing the other aesthetics opacity from 0.8 to 0.2
visualise_ts(simdata_corr, highlight = "V30")
visualise_ts(simdata_indep, highlight = "V30")
visualise_ts(simdata_indep, highlight = c("V1", "V30"))
Next Time
Looking as these visualisations, there is a desire to simply click on a line and highlight the series it belongs to.
Obviously, this is not possible with a static image but what if we were to use Shiny and the nearPoints
function?
Image Credit
Josh Cowley. October 7th, 2022. “The Catalyst, Newcastle Upon Tyne”.