Performing Incremental Updates using PyMC

In this post I’ll discuss how to perform incremental updates to a simple statistical model using PyMC. The short answer is that you have to create a new model each time. In this example, I’ll use a Bernoulli random variable from scipy.stats to generate coin flips, and I will use PyMC to model a prior and likelihood distribution, and produce a posterior distribution as output.

First we import some modules and initialize our variables. We’ll set the parameters to the beta prior distribution to one, which is equivalent to a uniform distribution between zero and one. The actual probability that the coin will come up heads is 2/3, or 66.66%. I chose this because it’s close enough to 50% to look like a fair coin after a few flips.

import pymc, scipy.stats, numpy as np

# initialize variables
flips = 0
alpha = 1.0
beta = 1.0
# the actual value of the proportion theta, i.e.,
# the probability that the coin will come up heads
prob = 2.0/3.0

Next, we’ll perform 18 flips, updating the Beta distribution prior with the new information using the parameters alpha and beta after each flip. (An explanation of Beta priors may be found here.) Each model will use 50,000 iterations, and will burn-in (throw out) the first 10,000 iterations.

for i in range(18):

# flip a coin
flip = scipy.stats.bernoulli( prob ).rvs()
# update variables
flips += 1

# create a prior distribution for the parameter theta, the proportion of heads
theta = pymc.Beta( name="theta", alpha=alpha, beta=beta )

# create a likelihood distribution
#   n - the number of trials/flips,
#   p - prior distribution of theta
#   value - the number of observed heads
like = pymc.Binomial( name="N", n=flips, p=theta, value=heads, observed=True )

# create a model object
model = pymc.Model( [ theta, like ] )

# pass the model to the Markov Chain Monte Carlo simulation
mcmc = pymc.MCMC( model )

# run the simulation for 50K iterations,
# "burning-in" (tossing out) the first 10K iterations
mcmc.sample( iter=50000, burn=10000 )

# update the parameters for the next prior
if alpha == 0: alpha = 0.5
beta = flips - heads
if beta == 0: beta = 0.5

We can visualize the posterior, the distribution of the values of theta, using the trace() method.

hist( theta.trace(), bins=np.linspace( 0, 1, 21 ), facecolor=(0,0,1,0.5), edgecolor=(0.25,0.25,1), linewidth=3 ) ;
title('Iterative Bayesian Coin Flipping')
ylabel('Flips') ; xlabel('$\\theta$')
savefig('iterative_bayesian_coin_flipping_example.png', dpi=200 ) We can call the summary() method of theta to obtain percentiles of the posterior, the 95% highest density interval, the variance, and some quantiles.

theta.summary()
theta:
MC
Mean   SD     Error  95% HPD interval
-------------------------------------
0.704  0.068  0.0    [ 0.568  0.834]

Posterior quantiles:

2.5    25     50     75    97.5
|------|======|======|------|
0.562  0.66  0.708  0.753  0.828

We can also access specific parts of the summary using the dictionary returned by the stats() method of the theta distribution.

theta.stats()
{'95% HPD interval': array([ 0.56802781,  0.83376255]),
'mc error': 0.00033181076170894026,
'mean': 0.70441527658710223,
'n': 40000,
'quantiles': {2.5: 0.56198261307785313,
25: 0.65963730749639737,
50: 0.70767338838336302,
75: 0.75303639985853832,
97.5: 0.82848204538592407},
'standard deviation': 0.068484753569964044}

References

This site uses Akismet to reduce spam. Learn how your comment data is processed.