Skip to contents

1 Introduction

1.1 What are we going to do in this Vignette

In this vignette, we’ll explore how to use the primarycensoreddist package in conjunction with Stan. We’ll cover the following key points:

  1. Introduction to Stan and its relevance for our analysis
  2. Overview of the packages we’ll be using
  3. How to access and use Stan functions provided by primarycensoreddist
  4. Methods for integrating these Stan functions into your workflow

1.2 What is Stan and why are we using it

Stan is a probabilistic programming language for statistical inference. It provides a flexible and efficient platform for Bayesian modeling and is widely used in various fields of data science and statistics. In this vignette, we’ll use Stan in conjunction with primarycensoreddist to perform Bayesian inference on censored data.

For more information on Stan:

1.3 Packages used in this vignette

Alongside the primarycensoreddist package we will use the cmdstanr package for interfacing with cmdstan.

2 Using Stan code in primarycensoreddist

primarycensoreddist includes a set of Stan functions that mirror the R functions in primarycensoreddist. Documentation for these functions can be found here. We support a range of approaches to integrate this Stan code into your workflow.

2.1 Checking available Stan functions using pcd_stan_functions()

Aside from reading the documentation it is also possible to list the available Stan functions using a helper function directly in R.

##  [1] "expgrowth_pdf"                          
##  [2] "expgrowth_lpdf"                         
##  [3] "expgrowth_cdf"                          
##  [4] "expgrowth_lcdf"                         
##  [5] "expgrowth_rng"                          
##  [6] "check_for_analytical"                   
##  [7] "primary_censored_gamma_uniform_lcdf"    
##  [8] "primary_censored_lognormal_uniform_lcdf"
##  [9] "primary_censored_dist_analytical_lcdf"  
## [10] "primary_censored_dist_analytical_cdf"   
## [11] "dist_lcdf"                              
## [12] "primary_dist_lpdf"                      
## [13] "primary_censored_ode"                   
## [14] "primary_censored_dist_cdf"              
## [15] "primary_censored_dist_lcdf"             
## [16] "primary_censored_dist_lpmf"             
## [17] "primary_censored_dist_pmf"              
## [18] "primary_censored_sone_lpmf_vectorized"  
## [19] "primary_censored_sone_pmf_vectorized"

2.2 Accessing Stan functions

Stan functions are accessed using the pcd_load_stan_functions() function. This function takes the name of the function as an argument and returns the function as a string. It can additionally write the functions to a file and wrap them in a functions{} block.

pcd_load_stan_functions("primary_censored_dist_lpmf")
## [1] "// Stan functions from primarycensoreddist version 0.5.0.1000\nreal primary_censored_dist_lpmf(data int d, int dist_id, array[] real params,\n                                data real pwindow, data real d_upper,\n                                data real D, int primary_dist_id,\n                                array[] real primary_params) {\n  if (d_upper > D) {\n    reject(\"Upper truncation point is greater than D. It is \", d_upper,\n           \" and D is \", D, \". Resolve this by increasing D to be greater or equal to d + swindow or decreasing swindow.\");\n  }\n  if (d_upper <= d) {\n    reject(\"Upper truncation point is less than or equal to d. It is \", d_upper,\n           \" and d is \", d, \". Resolve this by increasing d to be less than d_upper.\");\n  }\n  real log_cdf_upper = primary_censored_dist_lcdf(\n    d_upper | dist_id, params, pwindow, positive_infinity(), primary_dist_id, primary_params\n  );\n  real log_cdf_lower = primary_censored_dist_lcdf(\n    d | dist_id, params, pwindow, positive_infinity(), primary_dist_id, primary_params\n  );\n  if (!is_inf(D)) {\n    real log_cdf_D;\n\n    if (d_upper == D) {\n      log_cdf_D = log_cdf_upper;\n    } else {\n      log_cdf_D = primary_censored_dist_lcdf(\n        D | dist_id, params, pwindow, positive_infinity(), primary_dist_id, primary_params\n      );\n    }\n    return log_diff_exp(log_cdf_upper, log_cdf_lower) - log_cdf_D;\n  } else {\n    return log_diff_exp(log_cdf_upper, log_cdf_lower);\n  }\n}"

2.3 Linking the Stan functions to your workflow

2.3.1 Writing functions to a file

One option for using Stan functions is to write them to a file and then compile them using cmdstanr. This is a good approach as it means that once the functions are written they can be used in the same way as any other stan functions you might use. The downside is that it may mean more work keeping up to date with changes to the functions. We can do this using the pcd_load_stan_functions() function.

expgrowth_rng_file <- file.path(tempdir(), "expgrowth_rng.stan")
exp_model <- pcd_load_stan_functions(
  "expgrowth_rng",
  write_to_file = TRUE,
  output_file = expgrowth_rng_file,
  wrap_in_block = TRUE
)
## Stan functions written to: /tmp/RtmpybEN2F/expgrowth_rng.stan

This can now be compiled and used in the same way as any other cmdstanr model.

model <- cmdstan_model(expgrowth_rng_file)
model
## functions {
## // Stan functions from primarycensoreddist version 0.5.0.1000
## real expgrowth_rng(real min, real max, real r) {
##   real u = uniform_rng(0, 1);
##   if (abs(r) < 1e-10) {
##     return min + u * (max - min);
##   }
##   return min + log(u * (exp(r * max) - exp(r * min)) + exp(r * min)) / r;
## }
## }

Alternatively, you could use #include expgrowth_rng.stan in a stan file functions block to include the function along with the path to that file as with any other stan file (see here).

2.3.2 Including the functions directly via include_paths

Rather than writing the functions to a file it is also possible to include them directly in the stan file using the include_paths argument to cmdstan_model(). This is useful if you don’t to clutter your model with the stan code from primarycensoreddist and want automatic updating of the functions. To demonstrate we will first write a small model has has expgrowth.stan in its include paths (rather than writing it to a file and then including it). The first step is find the file and path for the expgrowth_rng function.

pcd_stan_files("expgrowth_rng")
## [1] "expgrowth.stan"

With that done we now write stan wrapper model.

expgrowth_stan_file <- file.path(tempdir(), "expgrowth.stan")
writeLines(
  text = c(
    "functions {",
    "#include expgrowth.stan",
    "}",
    "generated quantities {",
    "  real y = expgrowth_rng(0, 1, 0.4);",
    "}"
  ),
  con = expgrowth_stan_file
)

We can now use this file to compile a model. Note that we need to include the path to the primarycensoreddist Stan functions using the include_paths argument to cmdstan_model().

model <- cmdstan_model(expgrowth_stan_file, include_paths = pcd_stan_path())
model
## functions {
## #include expgrowth.stan
## }
## generated quantities {
##   real y = expgrowth_rng(0, 1, 0.4);
## }

We can then sample from the model (we set fixed_param = TRUE here as our toy example doesn’t require MCMC sampling).

samples <- model$sample(chains = 1, fixed_param = TRUE)
## Running MCMC with 1 chain...
## 
## Chain 1 Iteration:   1 / 1000 [  0%]  (Sampling) 
## Chain 1 Iteration: 100 / 1000 [ 10%]  (Sampling) 
## Chain 1 Iteration: 200 / 1000 [ 20%]  (Sampling) 
## Chain 1 Iteration: 300 / 1000 [ 30%]  (Sampling) 
## Chain 1 Iteration: 400 / 1000 [ 40%]  (Sampling) 
## Chain 1 Iteration: 500 / 1000 [ 50%]  (Sampling) 
## Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 1 finished in 0.0 seconds.
samples
##  variable mean median   sd  mad   q5  q95 rhat ess_bulk ess_tail
##         y 0.51   0.52 0.29 0.38 0.06 0.95 1.00     1065      983

2.4 Using Stan functions directly in R

Whilst it is possible to use Stan functions directly in R it is not recommended for most use cases (use the R functions in primarycensoreddist instead). However, it can be useful to understand what is going on under the hood or for exploration (indeed we use this internally in primarycensoreddist to check our functions against the R implementations). To do this we use the expose_functions() method on our already compiled model.

model$expose_functions(global = TRUE)

We can now use the function in R. Note that this may get slightly more complicated if our stan function depends on other stan functions (i.e. you need to have those included in your compiled model as well).

expgrowth_rng(0, 1, 0.4)
## [1] 0.6524848