In this post I’ll discuss how to perform incremental updates to a simple statistical model using PyMC. The short answer is that you have to create a new model each time. In this example, I’ll use a Bernoulli random variable from scipy.stats to generate coin flips, and I will use PyMC to model a prior and likelihood distribution, and produce a posterior distribution as output.

First we import some modules and initialize our variables. We’ll set the parameters to the beta prior distribution to one, which is equivalent to a uniform distribution between zero and one. The actual probability that the coin will come up heads is 2/3, or 66.66%. I chose this because it’s close enough to 50% to look like a fair coin after a few flips.

import pymc, scipy.stats, numpy as np

# initialize variables
flips = 0
heads = 0
alpha = 1.0
beta = 1.0
# the actual value of the proportion theta, i.e.,
# the probability that the coin will come up heads
prob = 2.0/3.0

Next, we’ll perform 18 flips, updating the Beta distribution prior with the new information using the parameters alpha and beta after each flip. (An explanation of Beta priors may be found here.) Each model will use 50,000 iterations, and will burn-in (throw out) the first 10,000 iterations.

for i in range(18):

    # flip a coin
    flip = scipy.stats.bernoulli( prob ).rvs()
    # update variables
    flips += 1
    heads += flip

    # create a prior distribution for the parameter theta, the proportion of heads
    theta = pymc.Beta( name="theta", alpha=alpha, beta=beta )

    # create a likelihood distribution
    #   n - the number of trials/flips,
    #   p - prior distribution of theta
    #   value - the number of observed heads 
    like = pymc.Binomial( name="N", n=flips, p=theta, value=heads, observed=True )

    # create a model object
    model = pymc.Model( [ theta, like ] )

    # pass the model to the Markov Chain Monte Carlo simulation
    mcmc = pymc.MCMC( model )

    # run the simulation for 50K iterations,
    # "burning-in" (tossing out) the first 10K iterations
    mcmc.sample( iter=50000, burn=10000 )

    # update the parameters for the next prior 
    alpha = heads
    if alpha == 0: alpha = 0.5
    beta = flips - heads
    if beta == 0: beta = 0.5

We can visualize the posterior, the distribution of the values of theta, using the trace() method.

hist( theta.trace(), bins=np.linspace( 0, 1, 21 ), facecolor=(0,0,1,0.5), edgecolor=(0.25,0.25,1), linewidth=3 ) ;
title('Iterative Bayesian Coin Flipping')
ylabel('Flips') ; xlabel('$\\theta$')
savefig('iterative_bayesian_coin_flipping_example.png', dpi=200 )

We can call the summary() method of theta to obtain percentiles of the posterior, the 95% highest density interval, the variance, and some quantiles.

theta.summary()

theta:
                  MC  
    Mean   SD     Error  95% HPD interval
    -------------------------------------
    0.704  0.068  0.0    [ 0.568  0.834]
    
    
    Posterior quantiles:
    
    2.5    25     50     75    97.5
     |------|======|======|------|
    0.562  0.66  0.708  0.753  0.828

We can also access specific parts of the summary using the dictionary returned by the stats() method of the theta distribution.

theta.stats()

{'95% HPD interval': array([ 0.56802781,  0.83376255]),
 'mc error': 0.00033181076170894026,
 'mean': 0.70441527658710223,
 'n': 40000,
 'quantiles': {2.5: 0.56198261307785313,
  25: 0.65963730749639737,
  50: 0.70767338838336302,
  75: 0.75303639985853832,
  97.5: 0.82848204538592407},
 'standard deviation': 0.068484753569964044}

References

Abraham Flaxman has some great PyMC resources in his blog, Healthy Algorithms
Cameron Davidson-Pilon has a very thorough and interesting (free) book on Bayesian analysis using PyMC: Probabilistic Programming & Bayesian Methods for Hackers
There is also some example code by the PyMC developers available here

Connor Johnson

Performing Incremental Updates using PyMC

References

Blog about math, programming, and data.