Compound Poisson Processes

In this post I’ll discuss compound Poisson processes, which I read about in the final chapter of Hassett and Stewart’s Probability for Risk Management last night. These model a stochastic process where at regular intervals (months, quarters, etc.) some number of events occur according to a Poisson process with rate \lambda, and the intensity of each event is determined independently by another other distribution.

Here is more concrete example: imagine you are an insurance company and claims come in every week according to a Poisson distribution with a rate \lambda=3. This means that every week, you see about 3 claims, but some weeks you might not have any claims, and some weeks you might have as many as five or seven claims. In addition to that, each claim is drawn from a uniform distribution ranging from $500 to $5000.

This is the gist of the compound Poisson process, events occur according to a Poisson distribution, and the severity of events are determined independently by an arbitrary distribution.

Today I learned from the Wikipedia page on compound Poisson processes that there is an analytical estimate of the expected value this sort of process known as Wald’s equation. For one unit of time, it’s the rate of the Poisson process times the expected value of the other distribution.

As far as formulas go, that’s pretty darn simple/intuitive/elegant. I’m more interested in distributions, so I’ll provide some Python code for simulating a compound Poisson process. We’ll use the following modules,

import numpy as np
import scipy, scipy.stats
import seaborn

Next we’ll generate a sample of 1000 events from a Poisson process with rate \lambda=3, this will represent our number of claims per week.

num_claims = scipy.stats.poisson(3).rvs(1000)

Next, we’ll associate a cost for each claim, and sum those costs for the week.

wkly_amts = [ scipy.stats.uniform(500,5000-500).rvs(claim) for claim in num_claims ]
wkly_amts = map( np.sum, wkly_amts )

Finally, we’ll plot a distribution using the seaborn module.

seaborn.distplot( wkly_amts )
title("Distribution of Weekly Total Claims")
xlabel("USD")
savefig("distribution_weekly_claims.png", dpi=200 )

Then, out of curiosity, we can look at the 95% percentile,

seaborn.ci( np.array( wkly_amts ), which=95 )
array([     0.        ,  20038.468])

We can also look at the difference between the empirical mean of the distribution and compare it to the approximation provided by Wald’s equation. It turns out that we see that there’s about a 1.5% difference between the two.

empirical = np.mean( amts )
wald_approx = 3 * np.mean( scipy.stats.uniform(500,5000-500).rvs(1000) )
( empirical - wald_approx )/wald_approx
-0.0156637