Modeling with Beta Distributions

The beta distribution requires two parameters, usually referred to as a and b, or alpha and beta. If you are considering a Bernoulli process, a sequence of binary outcomes (success or failure) with a constant probability of success, then you could use a beta distribution, setting the parameter a equal to the number of successes, and setting the parameter b equal to the number of failures. The neat thing about the Beta distribution is that the greater the total number of trials (the sum of the successes and failures) the more peaked, or narrow, the distribution becomes.

For example, below we have a distribution where we’ve observed three successes and two failures. Would you wager $100 that a coin that came up heads three times in five flips was biased? Probably not. This uncertainty is expressed by the fact that the distribution leans toward giving a greater probability to the occurrence of heads, but that it’s still pretty spread out.

import scipy.stats, numpy as np

# the domain
x = np.linspace(0,1,100)

# parameters
#==========================
# number of successes/heads
a = 3
# number of failures/tails
b = 2

# initialize the distribution
rv = scipy.stats.beta( a, b )

# the range, the frozen pdf
y = rv.pdf( t )

# normalize 
y /= np.sum( y )

# plotting code
plot( x, y )
# fix the y-axis for comparison
ylim(0,0.1)
title( "Beta( a=3, b=2 )" )
savefig( "beta_3_2.png", dpi=200 )

Changing the parameters in the code listing above to a=30 and b=20 we get the following figure. In contrast with the first figure, this is more peaked, giving greater confidence to the assertion that the coin is biased. The interpretation is that after we’ve seen thirty heads in fifty flips, we’re more willing to bet $100 that the coin is biased.

We see that although the proportion a/b never changed, 3/2 still equals 30/20, the shape of the distribution did change, and it did so intuitively. If we had changed a and b to 300 and 200 instead, you would see that it would become even more peaked. It is important to note at this point that the beta distribution may take decimals also; there is nothing stopping you from setting a to 3.75. This does not break the metaphor of successes/failures or wins/losses. Setting a to 3.75 is akin to the Census Bureau reporting that the average family has 2.6 children.

Connor Johnson

Modeling with Beta Distributions

One thought on “Modeling with Beta Distributions”

Blog about math, programming, and data.