Josh Cowley - Using Shiny in a Simulation Study

Tl;dr

By creating a package to contain all required code (R/) and data (data/) we can create a simple script (app.R) alongside a DESCRIPTION file, listing dependencies, that creates a minimal shiny app.

Using this idea in a simulation study setting allows for live visualisation updates for any parameter change.

Why?

Last week, I attended the RSC (Research Student Conference) 2022 in Nottingham (good conference, would recommend) and as part of that talk I needed to

illustrate my data structure, without revealing it due to a NDA;
demonstrate the efficacy of the Mixture of Experts model I was presenting.

What is a simulation study?

One key idea of statistics is parameter estimation; assume your data follows some clearly defined model and estimate one (or many) specified parameters.

In any ‘real-life’ application, completing this task perfectly is impossible as only a sample of the population is available, the parameter is unknowable and even then, it would only reflect our model of reality, not reality itself.

Simulation studies are a key tool where we simulate some data similar to what we expect. There are many reasons to do this (Morris, White, and Crowther 2019), such as:

validating the model fitting process (code and mathematical logic);
comparing two or more models;
understanding the power of a model.

Proposed Workflow

Originally, I would simulate the data from a script or R package and save the dataset to be visualised afterwards. There would be a lot of back and forth between simulating, visualising, reviewing idiosyncrasies in the data and repeat.

Shiny allows us to do all three steps in a local or remote web application where any changes to parameters or the RNG seed will automatically update any visualisations.

It is possible to create a single file or multiple file Shiny application, however I would recommend building a minimal Shiny package as it allows for easier transferring of R code.

Worked Example

We aim to simulate from a mixture of regressions. That is, we have a single design matrix $X$ that $Y$ will regress on, conditional on a nominal, latent group membership variable $Z$.

\[ Y_i \sim \mathcal{N}(x_i^T \beta_{z_i}, \sigma^2) \]

\[ Z_i \sim \mathrm{Categorical}(\pi_i) \] where $\pi_i = (\Pr(Z_i = 1), \Pr(Z_i = 2), \dots, \Pr(Z_i = K))$.

Structure

We want a minimal package and so the file structure we are aiming for is as suggested in Mastering Shiny:

MixRegrApp
|   DESCRIPTION
|   app.R
|   MixRegrApp.RProj
|   .Rbuildignore  
|
└── R/
|   |   simulate.R
|   |   visualise.R
|   |   ui.R
|   |   server.R
|   |   ...
|
└── man/
|   |   ...
|
└── data/
|   |   ...

Here,

DESCRIPTION describes the package and its dependencies. Can be created via usethis::use_description;
app.R will be a simple script with a few lines of code to load the package and run the app;
MixRegrApp.Rproj and .Rbuildignore are used for building packages.
All files in the R/ directory are sourced making functions available for use, the organising of functions into files is subjective. See https://r-pkgs.org/Code.html;
Roxygen comments above the R code generates documentation in man/ to inform collaborators or your later self on their functionality.
If external data is needed, save them as a .rda file inside data/ or by using usethis::use_data(). See https://r-pkgs.org/data.html.

Description File

The DESCRIPTION file is fairly straightforward and lists all the packages used for this project under Imports.

DESCRIPTION

DESCRIPTION

Package: MixRegrApp
Title: Mixture of Regressions App
Description: Brief Shiny application to create Mixture of Regressions data.
Version: 0.0.0.9000
AuthorUrl: https://josh.quarto.pub/
Authors@R: 
    person(given = "Josh",
           family = "Cowley",
           role = c("aut", "cre"),
           email = "j.cowley1@ncl.ac.uk")
Maintainer: Josh Cowley <j.cowley1@ncl.ac.uk>
License: GPL-3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
Imports:
    ggplot2,
    pkgload,
    rlang,
    shiny,
    shinyMatrix,
    stats,
    tidyr,
    tidyselect,

R Code

The files simulate.R and visualise.R define functions that can simulate data given some parameters and produce plots given some data respectively.

All functions are documented as in packages using Roxygen comments, see https://r-pkgs.org/man.html.

simulate.R

simulate.R

#' Mixture of Regressions
#'
#' Simulate from `K` (`length(groups)`) distinct regressions based on group
#'   membership variable `z`.
#'
#' That is,
#'   \eqn{Y_i \sim N(x_i * regr_{i, z_i}, sd^2)}
#'
#' @param x matrix. Design matrix.
#' @param z integer. Group membership variable.
#' @param regr matrix. Regression matrix with `p` rows and `k` columns.
#' @param sd numeric. Standard deviation of the observational error.
#'
#' @export
sim_mixregr <- function(x, z, regr, sd) {
    n <- length(z)
    p <- ncol(x)
    n_groups <- if (is.factor(z)) nlevels(z) else length(z)

    stopifnot(
      "Invalid dimensions for `regr`" = identical(dim(regr), c(p, n_groups)),
      "Invalid dimensions for `x`" = (nrow(x) == n)
    )

    lp <- x %*% regr
    y_means <- vapply(seq_len(n), function(i) lp[i, z[i]], numeric(1))

    y <- stats::rnorm(n, mean = y_means, sd = sd)

    return(y)
}


#' Design Matrix
#'
#' Simulated from i.i.d N(0,1) distributions, optional intercept.
#'
#' @param n integer. Number of observations.
#' @param nx integer. Number of explanatory variables, excluding intercept.
#' @param add_intercept logical. An intercept column is prepened when `TRUE`.
#'
#' @export
sim_design_matrix <- function(n, nx = 2, add_intercept = TRUE) {
    x <- matrix(stats::rnorm(n * nx), nrow = n, ncol = nx)
    colnames(x) <- paste0("X", seq_len(ncol(x)))
    if (add_intercept) return(cbind(Intercept = 1, x)) else return(x)
}

#' Group Membership
#'
#' Returns (factor of) groupings based on a vector of probabilities.
#'
#' @param n integer. Number of observations.
#' @param prob passed to \code{\link{sample}}.
#'
#' @export
sim_group_membership <- function(n, prob) {
    z_int <- sample(seq_along(prob), size = n, replace = TRUE, prob = prob)
    return(factor(z_int, levels = seq_along(prob)))
}

visualise.R

visualise.R

#' Visualisation Methods
#'
#' @param x matrix. Design matrix
#' @param y vector. Response variable data.
#' @param z vector/factor. Group membership data.
#'
#' @importFrom rlang .data
#'
#' @name vis
NULL


#' @describeIn vis
#'   Scatterplot with additional group membership data shown as colour.
#'
#' @export
vis_scatterplot <- function(x, y, z) {
  plot_data <-
    tidyr::pivot_longer(
      data = tidyr::tibble(as.data.frame(x), y, z),
      cols = tidyselect::matches("(X)|(Intercept)")
    )

  ggplot2::ggplot(plot_data, ggplot2::aes(
    x = .data$value,
    y = .data$y,
    colour = as.factor(.data$z)
  )) +
    ggplot2::geom_point() +
    ggplot2::facet_wrap(ggplot2::vars(.data$name)) +
    ggplot2::labs(y = "Y", x = "X", colour = "Group") +
    ggplot2::scale_color_discrete(drop = FALSE)
}


#' @describeIn vis
#'   Boxplot of `y` for each group (independent of `x`).
#'
#' @export
vis_boxgroup <- function(y, z) {
  plot_data <- data.frame(y = y, z = z)

  ggplot2::ggplot(plot_data, ggplot2::aes(y = y)) +
    ggplot2::geom_boxplot() +
    ggplot2::facet_wrap(
      ggplot2::vars(.data$z),
      labeller = function(tb) list(z = paste("Z =", tb$z)),
      drop = FALSE,
      nrow = 1
    ) +
    ggplot2::labs(y = "Y")
}

Shiny UI

There are two necessary functions app_ui and app_server to handle the frontend (inputs and outputs) and the backend (computation and simulation) respectively.

First we define the inputs to all appear on a sidebar, see https://rstudio.github.io/shinydashboard/ and other resources for different layouts not covered here.

Each argument to sidebarPanel is a different input that allows for different data entry types. Most input methods take the following form where further arguments typically provide further control such as setting minimum and maximum allowable values.

someInput(inputId, label, value, ...)

We also use the shinyMatrix package to leverage matrix input that is not provided as standard by Shiny.

Inputs

inputs <-
  shiny::sidebarPanel(
    shiny::numericInput("seed", "Seed", 3),
    
    shiny::numericInput("n", "Observations (n)", 50),
    
    shiny::checkboxInput("add_intercept", "Include Intercept?", TRUE),
    
    shiny::tags$label("Probabilities"),
    shiny::splitLayout(
      shiny::numericInput("prob1", "k = 1", 0.1),
      shiny::numericInput("prob2", "k = 2", 0.2),
      shiny::numericInput("prob3", "k = 3", 0.3),
      shiny::numericInput("prob4", "k = 4", 0.4)
    ),
    
    shinyMatrix::matrixInput(
      "regr",
      "Regression Matrix",
      value = matrix(c(1, 3, 0.1, 4, -4, 5, 2, 6, -2, 3, 10, 3.5), 3, 4),
      rows = list(names = FALSE), cols = list(names = FALSE), class = "numeric"
    )
  )

The output of this app are simply plot objects, so we combine each plot into a tabset that can be navigated without sacrificing screen real-estate. Each someOutput function defines the location of the plot, its dimensions and an ID to be used by the server. It does not create the plot or do any requisite computation.

Outputs

outputs <-
  shiny::mainPanel(
    shiny::tabsetPanel(
      # Plot 1
      shiny::tabPanel(
        "Scatterplot",
        shiny::plotOutput("scatterplot", height = "800px")
      ),
      
      # Plot 2
      shiny::tabPanel(
        "Group Histogram",
        shiny::plotOutput("group_hist", height = "800px")
      )
    )
  )

We then combine both of these constituents of IO into a titled fluid page.

# Return a single fluid page for I/O
shiny::fluidPage(
  shiny::titlePanel("Mixture of Regressions - Simulation Study App"),
  shiny::sidebarLayout(inputs, outputs)
)

The entire process is combined into a single function as shown in ui.R.

ui.R

ui.R

#' Application I/O
#'
#' @keywords internal
app_ui <- function() {
  inputs <-
    shiny::sidebarPanel(
      shiny::textOutput("tmp"),

      shiny::numericInput("seed", "Seed", 3),

      shiny::numericInput("n", "Observations (n)", 50),

      shiny::checkboxInput("add_intercept", "Include Intercept?", TRUE),

      shiny::tags$label("Probabilities"),
      shiny::splitLayout(
        shiny::numericInput("prob1", "k = 1", 0.1),
        shiny::numericInput("prob2", "k = 2", 0.2),
        shiny::numericInput("prob3", "k = 3", 0.3),
        shiny::numericInput("prob4", "k = 4", 0.4)
      ),

      shinyMatrix::matrixInput(
        "regr",
        "Regression Matrix",
        value = matrix(c(1, 3, 0.1, 4, -4, 5, 2, 6, -2, 3, 10, 3.5), 3, 4),
        rows = list(names = FALSE), cols = list(names = FALSE), class = "numeric"
      ),

      shiny::numericInput("sd", "Standard Deviation", 5),
    )

  outputs <-
    shiny::mainPanel(
      shiny::tabsetPanel(
        shiny::tabPanel(
          "Scatterplot",
          shiny::plotOutput("scatterplot", height = "800px")
        ),

        shiny::tabPanel(
          "Boxplot by Group",
          shiny::plotOutput("boxgroup", height = "800px")
        )
      )
    )

  # Return a single fluid page for I/O
  shiny::fluidPage(
    shiny::titlePanel("Mixture of Regressions - Simulation Study App"),
    shiny::sidebarLayout(inputs, outputs)
  )
}

Shiny Server

The role of app_server is to be a function factory, a function that returns a function. In this context, the return value is a method required by Shiny with the arguments input and output.

There are three key sections to app_server.

preamble can be run outside the returned function, such as ggplot2 options;
computation or simulation inside the returned function, using reactive;
assignment of objects to the output IDs mentioned in the UI stage.

We use reactive wherever an expression may change over time, usually by an input change.

This is used twice in the example, once as a derived quantity where we combine our probabilities (from 4 separate inputs) into a single vector.

probs <- shiny::reactive(c(input$prob1, input$prob2, input$prob3, input$prob4))

This quantity is then accessed via probs().

Note

reactive takes any expression so we could have done more computation if we desired, for example validating they all sum to 1.

We combine all of our main data simulating into single method with a set.seed call at the top. This ensure the seed is set once (and only once) so our simulation works as it would in a separate script, should we need to retrace our steps.

simdata <- 
  shiny::reactive({
    set.seed(input$seed)
    
    x <- sim_design_matrix(input$n)
    z <- sim_group_membership(input$n, probs())
    y <- sim_mixregr(x, z, regr, sd)
    
    list(x = x, y = y, z = z)
  })

Our data is then accessed by simdata()$x, simdata()$y and simdata()$z.

The side effect of this design is any change in the inputs require all computation to be re-done. If computation time is significant, better designs that update parts at a time should be utilised and are beyond the scope of this blog.

Finally, we must assign objects to our output argument. This is done by the methods defined in visualise.R making the code fairly readable.

output$scatterplot <-
  shiny::renderPlot(vis_scatterplot(simdata()$x, simdata()$y, simdata()$z))

output$group_hist <-
  shiny::renderPlot(vis_grouphist(simdata()$z))

server.R

server.R

#' Application Backend
#'
#' @keywords internal
app_server <- function(sample_info) {
  ggplot2::theme_set(ggplot2::theme_minimal(base_size = 20))
  ggplot2::theme_update(panel.spacing = ggplot2::unit(4, "lines"))

  function(input, output) {
    probs <- shiny::reactive(c(input$prob1, input$prob2, input$prob3, input$prob4))

    simdata <-
      shiny::reactive({
        set.seed(input$seed)

        add_intercept <- input$add_intercept
        nx <- if (add_intercept) 2 else 3

        x <-
          sim_design_matrix(input$n, nx = nx, add_intercept = add_intercept)

        z <- sim_group_membership(input$n, probs())

        y <- sim_mixregr(x, z, input$regr, input$sd)

        return(list(x = x, y = y, z = z))
      })

    output$scatterplot <-
    shiny::renderPlot(vis_scatterplot(simdata()$x, simdata()$y, simdata()$z))

    output$boxgroup <-
      shiny::renderPlot(vis_boxgroup(simdata()$y, simdata()$z))
  }
}

Execution

Now we can run our app using app.R, which is one line for loading the package (that is, anything in R/ and data/) and another to run the actual app.

app.R

pkgload::load_all(".")
shiny::shinyApp(app_ui(), app_server())

This structure can be published to service providers such as https://shinyapps.io by sharing app.R, DESCRIPTION, R/ and data/ if required.

All code for this particular example is available from this blogs GitHub page, that is, https://github.com/nclJoshCowley/jcblog/tree/master/posts/2022-09-15-shiny-simulation-study/MixRegrApp.

Upshot

The advantage of creating a package is multifaceted:

you can take advantage of package development tools such as
- devtools::document (CTRL + SHIFT + D);
- R CMD CHECK (CTRL + SHIFT + E);
- unit testing (CTRL + SHIFT + T), see https://r-pkgs.org/testing-basics.html.
publishing the app is straightforward, for example, MixRegrApp;
storing the app on github allows for easy installation via the suite of devtools::install_* tools.

Extensions to this app are plentiful due to the nature of Shiny. To name one, we could have added a button to save the data, generate a report using R markdown or send an email.

Any further work is project specific and can lead to more involved shiny development using tools such as golem.

Image Credit

Josh Cowley. September 7th, 2022. “Trent Building, Nottingham”.

References

Morris, Tim P, Ian R White, and Michael J Crowther. 2019. “Using Simulation Studies to Evaluate Statistical Methods.” Statistics in Medicine 38 (11): 2074–2102.