# AutoPPL

## Table of Contents

## Overview

AutoPPL is a C++ template library providing high-level support for probabilistic programming. Using operator overloading and expression templates, AutoPPL provides a generic framework for specifying probabilistic models and applying inference algorithms.

The library is still at an experimental stage.

### Who should use AutoPPL?

The goal of this project is to provide a framework for practitioners, students, and researchers.
It is often desired to have a framework for specifying a probabilistic model separate from any inference algorithms.
While AutoPPL does provide a few inference algorithms such as NUTS and Metropolis-Hastings,
it allows users to write their *own* sampling algorithms and even add new distributions.

### How is AutoPPL different from existing PPLs like STAN and PyMC3?

AutoPPL can be thought of as a hybrid of STAN and PyMC3. It is similar to STAN in that it is extremely optimized for high performance (see Benchmarks) and it uses much of the same logic discussed in the STAN reference guide. It is similar to PyMC3 in that it is a library rather than a separate domain-specific language.

However, it is unique in that it is purely a C++ library. While STAN requires the user to write STAN code, which gets translated into C++ code by the STAN compiler and then compiled into a binary, AutoPPL just requires the user to directly write C++ code. Some benefits include the following:

- eliminates the extra layer of abstraction to a separate domain-specific language
- users can use native C++ tools to clean and prepare data, and also examine the posterior samples
- easily extend the library such as adding new distributions or a sampling algorithm

In the future, we plan to provide Python and R bindings such that the user can write a C++ function using AutoPPL that defines and samples from the model and export the Python/R binding to examine the posterior samples using a more comfortable, scripting language.

## Design Choices

### Intuitive Model Specification

In Bayesian statistics, the first step in performing probabilistic inference is to specify a generative model for the data of interest. For example, in mathematical notation, a Gaussian model could look like the following:

```
X ~ Normal(theta, I)
theta ~ Uniform(-1, 1)
```

Given its simplicity and expressiveness, we wanted to mimic this compact notation as much as possible for users of our library. For the model specified above, the code in AutoPPL could look like the following:

```
Data<double, vec> X({...}); // load data
Param<double> theta; // define scalar parameter
auto model = ( // specify model
theta |= uniform(-1., 1.),
X |= normal(theta, 1.)
);
```

### Efficient Memory Usage

We make the assumption that users are able to specify the probabilistic model at compile-time.
As a result, AutoPPL can construct all model expressions at compile-time simply from the type information.
A model object makes no heap-allocations (with the exception of `ppl::for_each`

) and is minimal in size.
It is simply a small, contiguous slab of memory representing the binary tree.
The `model`

object in the previous section
is about `88 bytes`

on `x86_64-apple-darwin17.7.0`

using `clang-11.0.3`

.

For a more complicated model such as the following:

```
Data<double> X;
std::array<Param<double>, 6> theta;
auto model = (
theta[0] |= uniform(-1., 1.),
theta[1] |= uniform(theta[0], theta[0] + 2.),
theta[2] |= normal(theta[1], theta[0] * theta[0]),
theta[3] |= normal(-2., 1.),
theta[4] |= uniform(-0.5, 0.5),
theta[5] |= normal(theta[2] + theta[3], theta[4]),
X |= normal(theta[5], 1.)
);
```

The size of the model is `440 bytes`

on the same machine and compiler.

A model expression simply references the variables used in the expression
such as `theta[0], theta[1], ..., theta[5], X`

, i.e. it does not copy any data or values.

### High-performance Inference Methods

Users interface with the inference methods via model expressions and other configuration parameters for that particular method. Hence, the inference algorithms are completely general and work with any model so long as the model expression is properly constructed. Due to the statically-known model specification, algorithms have opportunities to make compile-time optimizations. See Benchmarks for performance comparisons with STAN.

We were largely inspired by STAN and followed their reference and also their implementation to compute ESS, perform adaptations, and stabilize sampling algorithms. However, our library works very differently underneath, especially with automatic differentiation and handling model expressions.

## Installation

First, clone the repository:

`git clone https://github.com/JamesYang007/autoppl ~/autoppl`

The following are the dependencies:

We provide a shell script for Mac and Linux that automatically installs these libraries locally in `lib`

:

```
cd ~/autoppl
./setup.sh
```

Since AutoPPL is a template library, there is nothing to build!
Just pass the compiler flag `-I<path>`

when building your program
to include the path to `include`

inside cloned directory,
`include`

to Eigen3.3, and `include`

to FastAD.
If you ran `./setup.sh`

, the paths for the latter two are
`lib/FastAD/libs/eigen-3.3.7/build/include`

and
`lib/FastAD/build/include`

(relative to cloned directory).

For CMake users, they can follow these steps:

```
./clean-build.sh release -DCMAKE_INSTALL_PREFIX=.. -DAUTOPPL_ENABLE_TEST=OFF
cd build/release
make install
```

This will simply set the CMake variable `CMAKE_INSTALL_PREFIX`

to the `build`

directory and won't build anything.
If you want to install AutoPPL into the system, remove `-DCMAKE_INSTALL_PREFIX=..`

.
The `make install`

will install the `include`

directory and CMake shared files in `build`

directory.

In your own CMake project, assuming you have a single source file
`main.cpp`

in the same directory as your CMakeLists.txt,
write the following as a minimal configuration:

```
cmake_minimum_required(VERSION 3.7)
project("MyProject")
find_package(AutoPPL CONFIG REQUIRED)
find_package(FastAD CONFIG REQUIRED)
find_package(Eigen3 CONFIG REQUIRED)
add_executable(main main.cpp)
target_link_libraries(main AutoPPL::AutoPPL FastAD::FastAD Eigen3::Eigen)
```

If you installed `AutoPPL`

locally as instructed before,
you should add a hint to the `find_package`

like:

`find_package(AutoPPL CONFIG REQUIRED HINTS path_to_autoppl/build/share)`

The above rule applies for `FastAD`

and `Eigen3`

as well.
If you installed these libraries using `setup.sh`

locally,
the required hint paths are:

```
find_package(FastAD CONFIG REQUIRED HINTS path_to_autoppl/lib/FastAD/build/share)
find_package(Eigen3 CONFIG REQUIRED HINTS path_to_autoppl/lib/FastAD/libs/Eigen3/build/share)
```

## Quick Guide

We assume that we are in `namespace ppl`

throughout this section.

### Variable

There are only a few variable types that a user will need to define a model.

```
DataView<ValueType, ShapeType> Xv(ptr, rows, cols);
Data<ValueType, ShapeType> X(rows, cols);
Param<ValueType, ShapeType, ConstraintType> p(rows, cols);
TParam<ValueType, ShapeType> tp(rows, cols);
```

For all types listed above, `ValueType`

should be either `double`

or `int`

, and
`ShapeType`

must be one of `scl, vec, mat`

(scalar, column vector, matrix) to indicate the general shape.
By default, `ShapeType`

is `scl`

.
Depending on the shape, the user may omit `rows`

or `cols`

.
For example, if `ShapeType`

is `scl`

, one can omit both `rows`

and `cols`

(both set to 1),
and if `ShapeType`

is `vec`

, one can omit `cols`

(set to 1).
For `Param`

specifically, see Constraint for more information about `ConstraintType`

.
For this section, this third template parameter is not important.

The difference between `DataView`

and `Data`

is that
`DataView`

only views existing data (does not copy any data) and `Data`

*owns* data.
`DataView`

will view data in column major format and `Data`

will own data in column major format.
To get the underlying data that `DataView`

views or `Data`

owns,
we expose the member function `get`

, which will return a reference to `Eigen::Map`

or `Eigen::Matrix`

, respectively.
We ask the users to refer to `Eigen`

documentation for modifying the data.

`Param`

and `TParam`

objects do not own or view any values.
There is nothing to do from the user other than constructing them.
More details on their distinction will become clearer in the later sections.
At a high level, a `Param`

is a parameter that can be sampled
and a `TParam`

is a transformation of parameters and is not sampled.

It is worth mentioning that currently `TParam<T, vec>`

objects are the only ones
that provide subsetting, i.e. have `operator[]`

defined.
There should be no need with `Data`

or `DataView`

since one can subset the underlying Eigen object itself
and create a new `Data`

or `DataView`

object to own a copy of or view that subset.
Note that `tp[i]`

returns another expression and does not actually retrieve any value;
it is simply a lightweight object that refers to `tp`

and the `i`

th value it represents.
Users should not concern themselves how `tp`

actually binds to values during MCMC.

### Variable Expression

Variable expressions are any expressions that are "mathematical" functions of variables.
We provide overloads for `operator+,-,*,/,+=,-=,*=,/=,=`

,
functions such as `sin, cos, tan, log, exp, sqrt, dot, for_each`

.
All functions are vectorized whenever possible.
Here is an example:

```
Data<double, mat> X(m,n);
Param<double, vec> w(n), w2(m);
TParam<double, scl> s;
auto var_expr = sin(dot(X, w) + s) - w2;
```

The expression above first creates an expression for the dot-product,
then a vectorized sum with a scalar (`s`

),
then a vectorized `sin`

,
then finally vectorized `operator-`

with `w2`

.

The only non-obvious function is `for_each`

.
It has the same syntax as `std::for_each`

,
however, the lambda function must return some variable expression:

```
TParam<double, vec> h(10);
auto expr = for_each(util::counting_iterator<size_t>(0),
util::counting_iterator<size_t>(h.size()),
[&](size_t i) { return h[i] += h[i-1] * 2.; });
```

Note that we provide our own `counting_iterator`

since they become quite useful in this context.
One can think of this expression as a "lazy-version" of the following:

```
for (size_t i = 0; i < h.size(); ++i) {
h[i] += h[i-1] * 2.;
}
```

assuming `h`

in this context is, for example, `std::vector<double>`

.
Again, nothing is computed when the expression is constructed.
All computation is done lazily during MCMC sampling.

Finally, users can create constants by writing literals directly, as shown above, when constructing variable expressions. If the user wishes to create a constant vector or matrix, they can simply use those objects when constructing the variable expression and our library will wrap them as constant objects. The difference between constants and data is that constants cannot be assigned a distribution.

### Constraint

This section only applies to `Param`

objects.
Sometimes, parameters must be constrained.
Some notable examples are
covariance matrix (symmetric and positive definite),
probability values (bounded by 0 and 1),
and standard deviation (bounded below by 0).
It is a well-known problem that sampling constrained parameters directly is highly-inefficient.

Every parameter has an associated unconstrained and constrained value.
Most MCMC algorithms like NUTS and Metropolis-Hastings will sample unconstrained values,
and transform the unconstrained values to constrained values when computing the log-pdf (corrected with the Jacobian).
For more information on how these transformations are performed, we direct the readers to look at
STAN reference guide.
Note that users will always receive *constrained* values as their samples after invoking a MCMC sampler.

Currently we only support lower bounds, lower-and-upper bounds, (symmetric) positive-definite, and no constraint.
We recommend using C++17 class template argument deduction (CTAD)
instead of `auto`

to indicate that these objects are indeed parameters,
but of course, one can certainly just use `auto`

:

```
// Declaration pseudo-code:
// make_param<ValueType, ShapeType=scl, ConstraintType>(rows, cols, constraint);
// make_param<ValueType, ShapeType=scl, ConstraintType>(rows, constraint);
// make_param<ValueType, ShapeType=scl, ConstraintType>(constraint);
Param sigma = make_param<double>(lower(0.)); // scalar lower bounded by 0.
Param p = make_param<double, vec>(10, bounded(0., 1.)); // 10-element vector bounded by 0., 1.
Param Sigma = make_param<double, mat>(3, pos_def()); // 3x3 covariance matrix
```

From section Variable, we saw that `Param`

had a third template parameter.
Using the `make_param`

helper function, we can deduce that third parameter type,
which is precisely the constraint argument to `make_param`

.
In general, the constraint can be a complicated expression depending on other parameters as such:

```
Param<double> w1;
Param w2 = make_param<double>(lower(w1 - 1. + w1 * w1));
```

so it is crucial that we are able to deduce the type.

### Distribution Expression

A distribution expression internally defines how to compute the log-pdf (dropping any constants). Currently we support the following distributions:

Distribution | Syntax |
---|---|

Bernoulli | `ppl::bernoulli(p)` |

Cauchy | `ppl::cauchy(x0, gamma)` |

Normal | `ppl::normal(mu, sigma)` |

Multivariate Normal | `ppl::normal(mu, Sigma)` |

Uniform | `ppl::uniform(min, max)` |

Wishart | `ppl::wishart(V, n)` |

Here are some examples:

```
Param<double> m1, m2, M1, M2;
auto uniform_expr = uniform(m1 + m2, M1 + M2);
Param<double> l1, l2, s;
auto normal_expr = normal(l1 + l2, s);
Param V = make_param<double, mat>(3, pos_def);
auto wishart_expr = wishart(V, 5);
```

### Model Expression

There are two operators that govern model expressions: `operator|=`

and `operator,`

.
`operator|=`

assigns a distribution to a parameter or data and
`operator,`

"glues" such assignments together, defining a joint distribution.
The choice for `operator|=`

is deliberate because it captures both the intuition that we are
assigning a distribution to a variable and that the distribution is *conditional* (`|`

).

Example:

```
Data<double, vec> y(10);
Param<double> m1, m2, M1, M2;
Param<double> l1, l2, s;
auto model = (
m1 |= normal(1., 1.),
m2 |= normal(-1., 1.),
M1 |= uniform(m2, m1),
M2 |= uniform(M, 3.24 + m1),
y |= normal(l1 + l2, s)
);
```

### Transformed Parameters

There are cases where defining a model expression is not enough. Sometimes parameters have to be further transformed to cache some common transformation. For example, here is a stochastic volatility model taken from STAN, but using AutoPPL:

```
DataView<double, vec> y(ptr, n_data);
Param phi = make_param<double>(bounded(-1., 1.));
Param sigma = make_param<double>(lower(0.));
Param<double> mu;
Param<double, vec> h_std(n_data);
TParam<double, vec> h(n_data);
// define transformed parameter expression
auto tp_expr = (
h = h_std * sigma,
h[0] /= sqrt(1. - phi * phi),
h += mu,
for_each(util::counting_iterator<>(1),
util::counting_iterator<>(h.size()),
[&](size_t i) { return h[i] += phi * (h[i-1] - mu); })
);
auto model = (
phi |= uniform(-1., 1.),
sigma |= cauchy(0., 5.),
mu |= cauchy(0., 10.),
h_std |= normal(0., 1.),
y |= normal(0., exp(h / 2.))
);
```

A couple of notes:

- transformed parameter expression
*must*be defined as a separate expression from model expression - it is more efficient to make a transformed parameter expression if there is a common transformation used in multiple places when defining a model expression
- all variable expressions are perfectly valid when defining TP expressions,
but note that
`operator=`

is only available for`TParam`

objects. In fact, all`TParam`

objects that are used in a model expression*must have exactly one*`operator=`

expression assigning some expression to that`TParam`

object. The user cannot assign a vector`TParam`

with some expression and then also assign a subview again like:

because this introduces ambiguity when back-evaluating during automatic differentiation. However, it is fine to assign each subview exactly once like:`h = h_std * sigma, h[0] = 1.`

`for_each(util::counting_iterator<>(0), util::counting_iterator<>(h.size), [&](size_t i) { return h[i] = h_std[i] * sigma; });`

- it is possible to assign a
`TParam`

with an expression composed of only data and constants, but this is much less efficient than precomputing that expression and creating a new constant out of that. This precomputation only requires using Eigen library and should not be a part of any autoppl expressions.

### Program Expression

A program expression simply combines a transformed parameter expression with a model expression.
If there is no transformed parameter expression, then a program expression simply wraps a model expression.
A program expression is what gets passed to MCMC samplers.
The user does not need to convert a model expression into a program expression,
but if there is a transformed parameter expression,
the user should use `operator|`

to "pipe" transformed parameter expression with a model expression
to create a program expression (order matters! The wrong order will raise a compiler error):

`auto program = tp_expr | model;`

### Sampling Algorithms

Currently, we support the following algorithms:

MCMC Algorithm | Syntax |
---|---|

Metropolis-Hastings | ppl::mh(program, config) |

No-U-Turn Sampler (NUTS) | ppl::nuts(program, config) |

Every sampling algorithm has a corresponding configuration object associated with it. The user does not need to pass a configuration, in which case, a default-constructed object gets passed with the default settings.

We briefly list the configuration declaration:

```
struct ConfigBase
{
size_t warmup = 1000; // number of warmup iterations
size_t samples = 1000; // number of samples
size_t seed = mcmc::random_seed(); // seed (default: random)
bool prune = true; // extra step during initialization to make sure log-pdf is not degenerate
};
// MH-specific
struct MHConfig : ConfigBase
{
double sigma = 1.0; // proposal distribution SD (N(0, sigma))
double alpha = 0.25; // discrete proposal distribution (triangular on {-1,0,1})
// with alpha probability on -1 and 1 (1-2*alpha on 0).
};
// NUTS-specific
struct StepConfig
{
double delta = 0.8;
double gamma = 0.05;
double t0 = 10.;
double kappa = 0.75;
};
struct VarConfig
{
size_t init_buffer = 75;
size_t term_buffer = 50;
size_t window_base = 25;
};
// template parameter one of: unit_var, diag_var
template <class VarAdapterPolicy=diag_var>
struct NUTSConfig: ConfigBase
{
using var_adapter_policy_t = VarAdapterPolicy;
size_t max_depth = 10;
StepConfig step_config;
VarConfig var_config;
};
```

For the NUTS-specific configuration, we direct the reader to STAN.

Every sampler will return a `ppl::MCMCResult<>`

object.
The template parameter indicates the row or column-major for the underlying sample matrix.
The sampler may choose to sample the unconstrained values and write in a row-major result object
for speed purposes and then convert to a column-major result object with constrained values.
We briefly show the class declaration:

```
template <int Major = Eigen::ColMajor>
struct MCMCResult
{
// ...
cont_samples_t cont_samples; // (continuous) sample matrix
disc_samples_t disc_samples; // (discrete) sample matrix
std::string name; // MCMC algorithm name
double warmup_time = 0; // time elapsed for warmup
double sampling_time = 0; // time elapsed for sampling
};
```

The sample matrices will always be `samples x n_constrained_values`

,
where `samples`

is the number of samples requested from the config object
and `n_constrained_values`

is the number of constrained parameter values (flattened into a row).
The algorithms guarantee that each row consists of sampled parameter values
in the same order as the priors in the model.
As an example, for the following model

```
auto model = (
t1 |= ...,
t2 |= ...,
t3 |= ...,
...
);
```

a sample row would consist of values for `t1`

, `t2`

, `t3`

in that order.
If the parameters are multi-dimensional, their values are flattened assuming column-major format.
So if `t1`

is a vector with 4 elements, the first four elements of a row will be values for `t1`

.
If `t2`

is a 2x2 matrix, the next four elements of a row will be values for `t2(0,0), t2(1,0), t2(0,1), t2(1,1)`

.

Currently, we do not support a `summary`

function yet to output a summary of the samples.
The user can, however, directly call `res.cont_samples.colwise().mean()`

to compute the mean for each column.
We also provide `ppl::math::ess(matrix)`

to compute column-wise effective-sample-size (ESS).
The user can then divide this result with `res.sampling_time`

to get `ESS/s`

,
which is the preferred metric to compare performance among MCMC algorithms.
ESS was computed as outlined
here.
We also made some adjustments to use Geyer's biased estimator for ESS
as in the current implementation of STAN
(source).

## Examples

### Sampling from Joint Distribution

Although AutoPPL was designed to perform inference on posterior distributions,
one can certainly use it to sample from any joint distribution defined by the priors and conditional distributions.
For example, we can sample `1000`

points with `1000`

warmup iterations from a
standard normal distribution using Metropolis-Hastings in the following way:

```
Param<double> theta;
auto model = (theta |= normal(0., 1.));
auto res = mh(model, config);
```

In general, so long as the joint PDF is known, or equivalently and more commonly if the conditional and prior PDFs are known, one can sample from the distribution. As another example, we may sample from a more complicated joint distribution:

```
Param<double> theta1;
Param<double> theta2;
auto model = (
theta1 |= uniform(-1., 1.),
theta2 |= normal(theta1, 1.)
);
auto res = mh(model);
```

### Sampling Posterior Mean and Standard Deviation

The following is an example of fitting a Gaussian model to some data.
We put a `Normal(0,3)`

prior on the mean and `Uniform(0,2)`

prior on the
standard deviation.
While in the previous section, we used Metropolis-Hastings
to demonstrate how to use it,
it is recommended to use the state-of-the-art NUTS sampler to sample
from the posterior distribution.

```
Data<double, vec> x(5);
Param<double> mu;
Param<double> sigma;
x.get() << 1.0, 1.5, 1.7, 1.2, 1.5; // load data
auto model = (
mu |= normal(0., 3.),
sigma |= uniform(0., 2.),
x |= normal(mu, sigma)
);
auto res = nuts(model);
```

### Bayesian Linear Regression

This example covers ridge regression in a Bayesian setting.
We created a fictitious dataset consisting of `(x,y)`

coordinates.
The true relationship is the following: `y = x + 1`

.
By specifying two parameters for the weight and bias, we propose
the following probabilistic model:

```
y ~ Normal(x*w + b, 0.5)
w ~ Uniform(0, 2)
b ~ Uniform(0, 2)
```

In AutoPPL, we can write the following code and sample from the posterior:

```
Data<double> x(6);
Data<double> y(6);
Param<double> w;
Param<double> b;
x.get() << 2.4, 3.1, 3.6, 4, 4.5, 5.;
y.get() << 3.5, 4, 4.4, 5.01, 5.46, 6.1;
auto model = (
w |= uniform(0., 2.),
b |= uniform(0., 2.),
y |= normal(x * w + b, 0.5)
);
auto res = nuts(model);
```

### Stochastic Volatility

This example was discussed in a previous section, but we mention it again as reference:

```
DataView<double, vec> y(ptr, n_data);
Param phi = make_param<double>(bounded(-1., 1.));
Param sigma = make_param<double>(lower(0.));
Param<double> mu;
Param<double, vec> h_std(n_data);
TParam<double, vec> h(n_data);
auto tp_expr = (
h = h_std * sigma,
h[0] /= sqrt(1. - phi * phi),
h += mu,
for_each(util::counting_iterator<>(1),
util::counting_iterator<>(h.size()),
[&](size_t i) { return h[i] += phi * (h[i-1] - mu); })
);
auto model = (
phi |= uniform(-1., 1.),
sigma |= cauchy(0., 5.),
mu |= cauchy(0., 10.),
h_std |= normal(0., 1.),
y |= normal(0., exp(h / 2.))
);
auto res = nuts(tp_expr | model);
```

## Benchmarks

In the following examples, we show benchmarks with STAN.

We list the benchmark settings for completion:

- Machine: x86_64-apple-darwin19.5.0
- CPU: 3.4 GHz Quad-Core Intel Core i5
- Compiler: Clang 11.0.3

### Bayesian Linear Regression

We collected a dataset regarding life expectancy released by WHO (source). After cleaning and extracting three predictors: "Alcohol", "HIV/AIDS", and "GDP", the dataset consisted of 157 points. We performed a Bayesian linear regression with this data and the following model:

```
y ~ Normal(X*w + b, s*s + 2.)
w ~ Normal(0., 5.)
b ~ Normal(0., 5.)
s ~ Uniform(0.5, 8.)
```

where `w`

is a 3-dimensional parameter vector,
and `b`

and `s`

are scalar parameters.

Using the same dataset and model specification, we performed NUTS to sample various number of samples. We also set the number of chains and cores to 1 and adaptation method to diagonal precision matrix.

The following plots show benchmarks between run-times and effective sample size (ESS) per second:

The reported mean, standard deviation, and ESS values were almost identical in all cases, which is not surprising since we used the same algorithm to estimate ESS and perform NUTS.

The runtimes have similar log-like behavior, but it is clear that STAN (dotted lines) takes far longer in both sampling and warmup times by a factor of about 6.5-7. As for ESS/s, upon comparing by colors (corresponding to a parameter) between dotted (STAN) and solid (AutoPPL) lines, we see that AutoPPL has uniformly larger ESS/s by a factor of 6.5-7 as well. This difference quickly becomes more noticeable as sample size grows. From these plots and that sampling results were identical show that the drastic difference in ESS/s is simply from faster automatic differentation and a good use of memory to take advantage of cache.

The following is the AutoPPL code for the model specification without data loading. The full code can be found here:

```
DataView<double, mat> X(X_data.data(), X_data.rows(), X_data.cols());
DataView<double, vec> y(y_data.data(), y_data.rows());
Param<double, vec> w(3);
Param<double> b;
Param<double> s;
auto model = (s |= uniform(0.5, 8.),
b |= normal(0., 5.),
w |= normal(0., 5.),
y |= normal(dot(X, w) + b, s * s + 2.));
NUTSConfig<> config;
config.warmup = num_samples;
config.samples = num_samples;
auto res = nuts(model, config);
```

### Gaussian Model

We generated 1000 values from standard normal distribution to form our data. Our model is defined as follows:

```
y ~ Normal(l1 + l2, s)
l1 ~ Normal(0., 10.)
l2 ~ Normal(0., 10.)
s ~ Uniform(0., 20.)
```

where all parameters are scalar. The benchmark configurations are exactly the same as in the previous section. The following plots show benchmarks between run-times and effective sample size (ESS) per second:

We note that both STAN and AutoPPL outputted almost identical means, standard deviation, and ESS.

The runtimes have a similar linear trend,
and it is clear that STAN (dotted lines) takes far longer
in both sampling and warmup times.
Comparing the sampling times, for example, we see about 20 times improvement.
The ESS/s for `l1`

and `l2`

overlap completely (red and blue) in both STAN and AutoPPL
and this is expected since they are symmetric in the model specification.
With the exception of the two smallest sample sizes (100, 500),
ESS/s is fairly constant as sample size varies.
It is quite evident that AutoPPL (solid) has a larger ESS/s by a factor of 20.

The reason for this difference is in how we handle expressions where data vector elements are iid (independent and identically distributed). For most distributions, especially those that are in some exponential family, they can be highly optimized in iid settings to perform quicker differentiation. However, it is worth noting that this optimization does not apply when the data are simply independent but not identically distributed (as in the linear regression case), or when the variable is a parameter, not data. Nonetheless, our AD is extremely fast due to vectorization and is in general faster than STAN.

The following is the AutoPPL code without data generation. The full code can be found here.

```
Data<double, vec> y(n_data);
Param<double> lambda1, lambda2, sigma;
auto model = (
sigma |= uniform(0.0, 20.0),
lambda1 |= normal(0.0, 10.0),
lambda2 |= normal(0.0, 10.0),
y |= normal(lambda1 + lambda2, sigma)
);
NUTSConfig<> config;
config.n_samples = n_samples;
config.warmup = n_samples;
auto res = nuts(model, config);
```

## Contributors

_{Jacob Austin}💻 🎨 📖 |
_{Jenny Chen}💻 🎨 |
_{lucieleblanc}💻 🎨 |
_{Matt Ludkin}💻 |

## Third Party Tools

Many thanks to the following third party tools used in this project:

- Clang: one of the main compilers used.
- CMake: build system.
- Coveralls: check test coverage.
- Cpp Coveralls: check test coverage specifically for C++ code.
- Eigen: matrix library.
- FastAD: automatic differentiation library.
- GCC: one of the main compilers used.
- Google Benchmark: benchmark library algorithms.
- GoogleTest: unit/integration-tests.
- Travis CI: continuous integration for Linux using GCC.
- Valgrind: check memory leak and errors.