In this post I’ll discuss how to perform incremental updates to a simple statistical model using PyMC. The short answer is that you have to create a new model each time. In this example, I’ll use a Bernoulli random variable from `scipy.stats`

to generate coin flips, and I will use PyMC to model a prior and likelihood distribution, and produce a posterior distribution as output.

First we import some modules and initialize our variables. We’ll set the parameters to the beta prior distribution to one, which is equivalent to a uniform distribution between zero and one. The actual probability that the coin will come up heads is 2/3, or 66.66%. I chose this because it’s close enough to 50% to look like a fair coin after a few flips.

import pymc, scipy.stats, numpy as np # initialize variables flips = 0 heads = 0 alpha = 1.0 beta = 1.0 # the actual value of the proportion theta, i.e., # the probability that the coin will come up heads prob = 2.0/3.0

Next, we’ll perform 18 flips, updating the Beta distribution prior with the new information using the parameters `alpha`

and `beta`

after each flip. (An explanation of Beta priors may be found here.) Each model will use 50,000 iterations, and will burn-in (throw out) the first 10,000 iterations.

for i in range(18): # flip a coin flip = scipy.stats.bernoulli( prob ).rvs() # update variables flips += 1 heads += flip # create a prior distribution for the parameter theta, the proportion of heads theta = pymc.Beta( name="theta", alpha=alpha, beta=beta ) # create a likelihood distribution # n - the number of trials/flips, # p - prior distribution of theta # value - the number of observed heads like = pymc.Binomial( name="N", n=flips, p=theta, value=heads, observed=True ) # create a model object model = pymc.Model( [ theta, like ] ) # pass the model to the Markov Chain Monte Carlo simulation mcmc = pymc.MCMC( model ) # run the simulation for 50K iterations, # "burning-in" (tossing out) the first 10K iterations mcmc.sample( iter=50000, burn=10000 ) # update the parameters for the next prior alpha = heads if alpha == 0: alpha = 0.5 beta = flips - heads if beta == 0: beta = 0.5

We can visualize the posterior, the distribution of the values of `theta`

, using the `trace()`

method.

hist( theta.trace(), bins=np.linspace( 0, 1, 21 ), facecolor=(0,0,1,0.5), edgecolor=(0.25,0.25,1), linewidth=3 ) ; title('Iterative Bayesian Coin Flipping') ylabel('Flips') ; xlabel('$\\theta$') savefig('iterative_bayesian_coin_flipping_example.png', dpi=200 )

We can call the `summary()`

method of `theta`

to obtain percentiles of the posterior, the 95% highest density interval, the variance, and some quantiles.

theta.summary()

theta: MC Mean SD Error 95% HPD interval ------------------------------------- 0.704 0.068 0.0 [ 0.568 0.834] Posterior quantiles: 2.5 25 50 75 97.5 |------|======|======|------| 0.562 0.66 0.708 0.753 0.828

We can also access specific parts of the summary using the dictionary returned by the `stats()`

method of the `theta`

distribution.

theta.stats()

{'95% HPD interval': array([ 0.56802781, 0.83376255]), 'mc error': 0.00033181076170894026, 'mean': 0.70441527658710223, 'n': 40000, 'quantiles': {2.5: 0.56198261307785313, 25: 0.65963730749639737, 50: 0.70767338838336302, 75: 0.75303639985853832, 97.5: 0.82848204538592407}, 'standard deviation': 0.068484753569964044}

# References

- Abraham Flaxman has some great PyMC resources in his blog, Healthy Algorithms
- Cameron Davidson-Pilon has a very thorough and interesting (free) book on Bayesian analysis using PyMC: Probabilistic Programming & Bayesian Methods for Hackers
- There is also some example code by the PyMC developers available here