In this post I’ll discuss how to perform incremental updates to a simple statistical model using PyMC. The short answer is that you have to create a new model each time. In this example, I’ll use a Bernoulli random variable from scipy.stats
to generate coin flips, and I will use PyMC to model a prior and likelihood distribution, and produce a posterior distribution as output.
First we import some modules and initialize our variables. We’ll set the parameters to the beta prior distribution to one, which is equivalent to a uniform distribution between zero and one. The actual probability that the coin will come up heads is 2/3, or 66.66%. I chose this because it’s close enough to 50% to look like a fair coin after a few flips.
import pymc, scipy.stats, numpy as np # initialize variables flips = 0 heads = 0 alpha = 1.0 beta = 1.0 # the actual value of the proportion theta, i.e., # the probability that the coin will come up heads prob = 2.0/3.0
Next, we’ll perform 18 flips, updating the Beta distribution prior with the new information using the parameters alpha
and beta
after each flip. (An explanation of Beta priors may be found here.) Each model will use 50,000 iterations, and will burn-in (throw out) the first 10,000 iterations.
for i in range(18): # flip a coin flip = scipy.stats.bernoulli( prob ).rvs() # update variables flips += 1 heads += flip # create a prior distribution for the parameter theta, the proportion of heads theta = pymc.Beta( name="theta", alpha=alpha, beta=beta ) # create a likelihood distribution # n - the number of trials/flips, # p - prior distribution of theta # value - the number of observed heads like = pymc.Binomial( name="N", n=flips, p=theta, value=heads, observed=True ) # create a model object model = pymc.Model( [ theta, like ] ) # pass the model to the Markov Chain Monte Carlo simulation mcmc = pymc.MCMC( model ) # run the simulation for 50K iterations, # "burning-in" (tossing out) the first 10K iterations mcmc.sample( iter=50000, burn=10000 ) # update the parameters for the next prior alpha = heads if alpha == 0: alpha = 0.5 beta = flips - heads if beta == 0: beta = 0.5
We can visualize the posterior, the distribution of the values of theta
, using the trace()
method.
hist( theta.trace(), bins=np.linspace( 0, 1, 21 ), facecolor=(0,0,1,0.5), edgecolor=(0.25,0.25,1), linewidth=3 ) ; title('Iterative Bayesian Coin Flipping') ylabel('Flips') ; xlabel('$\\theta$') savefig('iterative_bayesian_coin_flipping_example.png', dpi=200 )
We can call the summary()
method of theta
to obtain percentiles of the posterior, the 95% highest density interval, the variance, and some quantiles.
theta.summary()
theta: MC Mean SD Error 95% HPD interval ------------------------------------- 0.704 0.068 0.0 [ 0.568 0.834] Posterior quantiles: 2.5 25 50 75 97.5 |------|======|======|------| 0.562 0.66 0.708 0.753 0.828
We can also access specific parts of the summary using the dictionary returned by the stats()
method of the theta
distribution.
theta.stats()
{'95% HPD interval': array([ 0.56802781, 0.83376255]), 'mc error': 0.00033181076170894026, 'mean': 0.70441527658710223, 'n': 40000, 'quantiles': {2.5: 0.56198261307785313, 25: 0.65963730749639737, 50: 0.70767338838336302, 75: 0.75303639985853832, 97.5: 0.82848204538592407}, 'standard deviation': 0.068484753569964044}
References
- Abraham Flaxman has some great PyMC resources in his blog, Healthy Algorithms
- Cameron Davidson-Pilon has a very thorough and interesting (free) book on Bayesian analysis using PyMC: Probabilistic Programming & Bayesian Methods for Hackers
- There is also some example code by the PyMC developers available here