A library for modular probabilistic modelling.
A library for probabilistic programming using algebraic effects. The emphasis is on modular definitions of probabilistic models, and also compositional implementation of model execution (inference) in terms of effect handlers.
ProbFX
Prelude
ProbFX is a library for probabilistic programming using algebraic effects that implements the paper Modular Probabilistic Models via Algebraic Effects -- this paper provides a comprehensive motivation and walkthrough of this library. To have a more interactive and visual play-around with ProbFX, please see the artifact branch: this corresponds parts of the paper to the implementation, and also provides an executable version of ProbFX as a script.
Description
ProbFx is a PPL that places emphasis on being able to define modular and reusable probabilistic models, where the decision to sample
or observe
against a random variable or distribution of a model is delayed until the point of execution; this allows a model to be defined just once and then reused for a variety of applications. We also implement a compositional approach towards model execution (inference) by using effect handlers.
Building and executing models
A large number of example ProbFX programs are documented in the examples directory, showing how to define and then execute a probabilistic model.
In general, the process is:
Define an appropriate model of type
Model env es a
, and (optionally) a corresponding model environment typeenv
.For example, a logistic regression model that takes a list of
Double
s as inputs and generates a list ofBool
s, modelling the probability of an event occurring or not:-- | The model environment type, for readability purposes type LogRegrEnv = '[ "y" ':= Bool, -- ^ output "m" ':= Double, -- ^ mean "b" ':= Double -- ^ intercept ] -- | Logistic regression model logRegr :: (Observable env "y" Bool , Observables env '["m", "b"] Double) => [Double] -> Model env rs [Bool] logRegr xs = do -- | Specify the distributions of the model parameters -- mean m <- normal 0 5 #m -- intercept b <- normal 0 1 #b -- noise sigma <- gamma' 1 1 -- | Specify distribution of model outputs let sigmoid x = 1.0 / (1.0 + exp((-1.0) * x)) ys <- foldM (\ys x -> do -- probability of event occurring p <- normal' (m * x + b) sigma -- generate as output whether the event occurs y <- bernoulli (sigmoid p) #y return (ys ++ [y])) [] xs return ys
The
Observables
constraint says that, for example,"m"
and"b"
are observable variables in the model environmentenv
that may later be provided a trace of observed values of typeDouble
.Calling a primitive distribution such as
normal 0 5 #m
lets us later provide observed values for "m" when executing the model.Calling a primed variant of primitive distribution such as
gamma' 1 1
will disable observed values from being provided to that distribution.Execute a model under a model environment, using one of the
Inference
library functions.Below simulates from a logistic regression model using model parameters
m = 2
andb = -0.15
but provides no values fory
: this will result inm
andb
being observed buty
being sampled.simulateLogRegr :: Sampler [(Double, Bool)] simulateLogRegr = do -- | Specify the model inputs let xs = map (/50) [(-50) .. 50] -- | Specify the model environment env = (#y := []) <:> (#m := [2]) <:> (#b := [-0.15]) <:> nil -- | Simulate from logistic regression (ys, envs) <- SIM.simulate logRegr env xs return (zip xs ys)
Below performs Metropolis-Hastings inference on the same model, by providing values for the model output
y
and hence observing (conditioning against) them, but providing none for the model parametersm
andb
and hence sampling them.-- | Metropolis-Hastings inference inferMHLogRegr :: Sampler [(Double, Double)] inferMHLogRegr = do -- | Simulate data from log regression (xs, ys) <- unzip <$> simulateLogRegr -- | Specify the model environment let env = (#y := ys) <:> (#m := []) <:> (#b := []) <:> nil -- | Run MH inference for 20000 iterations mhTrace :: [Env LogRegrEnv] <- MH.mh 20000 logRegr (xs, env) ["m", "b"] -- | Retrieve values sampled for #m and #b during MH let m_samples = concatMap (get #m) mhTrace b_samples = concatMap (get #b) mhTrace return (zip m_samples b_samples)
One may have noticed by now that lists of values are always provided to observable variables in a model environment; each run-time occurrence of that variable will then result in the head value being observed and consumed, and running out of values will default to sampling.
Running the function
mh
returns a trace of output model environments, from which we can retrieve the trace of sampled model parameters viaget #m
andget #b
. These represent the posterior distribution overm
andb
. (The argument["m", "b"]
tomh
is optional for indicating interest in learning#m
and#b
in particular).Sampler
computations can be evaluated withsampleIO :: Sampler a -> IO a
to produce anIO
computation.sampleIO simulateLogRegr :: [(Double, Bool)]