<-
inputs ::sidebarPanel(
shiny::numericInput("seed", "Seed", 3),
shiny
::numericInput("n", "Observations (n)", 50),
shiny
::checkboxInput("add_intercept", "Include Intercept?", TRUE),
shiny
::tags$label("Probabilities"),
shiny::splitLayout(
shiny::numericInput("prob1", "k = 1", 0.1),
shiny::numericInput("prob2", "k = 2", 0.2),
shiny::numericInput("prob3", "k = 3", 0.3),
shiny::numericInput("prob4", "k = 4", 0.4)
shiny
),
::matrixInput(
shinyMatrix"regr",
"Regression Matrix",
value = matrix(c(1, 3, 0.1, 4, -4, 5, 2, 6, -2, 3, 10, 3.5), 3, 4),
rows = list(names = FALSE), cols = list(names = FALSE), class = "numeric"
) )
Tl;dr
Why?
Last week, I attended the RSC (Research Student Conference) 2022 in Nottingham (good conference, would recommend) and as part of that talk I needed to
illustrate my data structure, without revealing it due to a NDA;
demonstrate the efficacy of the Mixture of Experts model I was presenting.
What is a simulation study?
One key idea of statistics is parameter estimation; assume your data follows some clearly defined model and estimate one (or many) specified parameters.
In any ‘real-life’ application, completing this task perfectly is impossible as only a sample of the population is available, the parameter is unknowable and even then, it would only reflect our model of reality, not reality itself.
Simulation studies are a key tool where we simulate some data similar to what we expect. There are many reasons to do this (Morris, White, and Crowther 2019), such as:
validating the model fitting process (code and mathematical logic);
comparing two or more models;
understanding the power of a model.
Proposed Workflow
Originally, I would simulate the data from a script or R package and save the dataset to be visualised afterwards. There would be a lot of back and forth between simulating, visualising, reviewing idiosyncrasies in the data and repeat.
Shiny allows us to do all three steps in a local or remote web application where any changes to parameters or the RNG seed will automatically update any visualisations.
It is possible to create a single file or multiple file Shiny application, however I would recommend building a minimal Shiny package as it allows for easier transferring of R code.
Worked Example
We aim to simulate from a mixture of regressions. That is, we have a single design matrix \(X\) that \(Y\) will regress on, conditional on a nominal, latent group membership variable \(Z\).
\[ Y_i \sim \mathcal{N}(x_i^T \beta_{z_i}, \sigma^2) \]
\[ Z_i \sim \mathrm{Categorical}(\pi_i) \] where \(\pi_i = (\Pr(Z_i = 1), \Pr(Z_i = 2), \dots, \Pr(Z_i = K))\).
Structure
We want a minimal package and so the file structure we are aiming for is as suggested in Mastering Shiny:
MixRegrApp
| DESCRIPTION
| app.R
| MixRegrApp.RProj
| .Rbuildignore
|
└── R/
| | simulate.R
| | visualise.R
| | ui.R
| | server.R
| | ...
|
└── man/
| | ...
|
└── data/
| | ...
Here,
DESCRIPTION
describes the package and its dependencies. Can be created viausethis::use_description
;app.R
will be a simple script with a few lines of code to load the package and run the app;MixRegrApp.Rproj
and.Rbuildignore
are used for building packages.All files in the
R/
directory are sourced making functions available for use, the organising of functions into files is subjective. See https://r-pkgs.org/Code.html;Roxygen comments above the R code generates documentation in
man/
to inform collaborators or your later self on their functionality.If external data is needed, save them as a
.rda
file insidedata/
or by usingusethis::use_data()
. See https://r-pkgs.org/data.html.
Description File
The DESCRIPTION
file is fairly straightforward and lists all the packages used for this project under Imports
.
R Code
The files simulate.R
and visualise.R
define functions that can simulate data given some parameters and produce plots given some data respectively.
All functions are documented as in packages using Roxygen comments, see https://r-pkgs.org/man.html.
Shiny UI
There are two necessary functions app_ui
and app_server
to handle the frontend (inputs and outputs) and the backend (computation and simulation) respectively.
First we define the inputs to all appear on a sidebar, see https://rstudio.github.io/shinydashboard/ and other resources for different layouts not covered here.
Each argument to sidebarPanel
is a different input that allows for different data entry types. Most input methods take the following form where further arguments typically provide further control such as setting minimum and maximum allowable values.
someInput(inputId, label, value, ...)
We also use the shinyMatrix
package to leverage matrix input that is not provided as standard by Shiny.
The output of this app are simply plot objects, so we combine each plot into a tabset that can be navigated without sacrificing screen real-estate. Each someOutput
function defines the location of the plot, its dimensions and an ID to be used by the server. It does not create the plot or do any requisite computation.
We then combine both of these constituents of IO into a titled fluid page.
# Return a single fluid page for I/O
::fluidPage(
shiny::titlePanel("Mixture of Regressions - Simulation Study App"),
shiny::sidebarLayout(inputs, outputs)
shiny )
The entire process is combined into a single function as shown in ui.R
.
Shiny Server
The role of app_server
is to be a function factory, a function that returns a function. In this context, the return value is a method required by Shiny with the arguments input
and output
.
There are three key sections to app_server
.
preamble can be run outside the returned function, such as
ggplot2
options;computation or simulation inside the returned function, using
reactive
;assignment of objects to the output IDs mentioned in the UI stage.
We use reactive wherever an expression may change over time, usually by an input change.
This is used twice in the example, once as a derived quantity where we combine our probabilities (from 4 separate inputs) into a single vector.
<- shiny::reactive(c(input$prob1, input$prob2, input$prob3, input$prob4)) probs
This quantity is then accessed via probs()
.
We combine all of our main data simulating into single method with a set.seed
call at the top. This ensure the seed is set once (and only once) so our simulation works as it would in a separate script, should we need to retrace our steps.
<-
simdata ::reactive({
shinyset.seed(input$seed)
<- sim_design_matrix(input$n)
x <- sim_group_membership(input$n, probs())
z <- sim_mixregr(x, z, regr, sd)
y
list(x = x, y = y, z = z)
})
Our data is then accessed by simdata()$x
, simdata()$y
and simdata()$z
.
The side effect of this design is any change in the inputs require all computation to be re-done. If computation time is significant, better designs that update parts at a time should be utilised and are beyond the scope of this blog.
Finally, we must assign objects to our output argument. This is done by the methods defined in visualise.R
making the code fairly readable.
$scatterplot <-
output::renderPlot(vis_scatterplot(simdata()$x, simdata()$y, simdata()$z))
shiny
$group_hist <-
output::renderPlot(vis_grouphist(simdata()$z)) shiny
Execution
Now we can run our app using app.R
, which is one line for loading the package (that is, anything in R/
and data/
) and another to run the actual app.
app.R
::load_all(".")
pkgload::shinyApp(app_ui(), app_server()) shiny
This structure can be published to service providers such as https://shinyapps.io by sharing app.R
, DESCRIPTION
, R/
and data/
if required.
All code for this particular example is available from this blogs GitHub page, that is, https://github.com/nclJoshCowley/jcblog/tree/master/posts/2022-09-15-shiny-simulation-study/MixRegrApp.
Upshot
The advantage of creating a package is multifaceted:
you can take advantage of package development tools such as
devtools::document
(CTRL + SHIFT + D);R CMD CHECK (CTRL + SHIFT + E);
unit testing (CTRL + SHIFT + T), see https://r-pkgs.org/testing-basics.html.
publishing the app is straightforward, for example, MixRegrApp;
storing the app on github allows for easy installation via the suite of
devtools::install_*
tools.
Extensions to this app are plentiful due to the nature of Shiny. To name one, we could have added a button to save the data, generate a report using R markdown or send an email.
Any further work is project specific and can lead to more involved shiny development using tools such as golem
.
Image Credit
Josh Cowley. September 7th, 2022. “Trent Building, Nottingham”.