Here, I’ll introduce some ideas regarding spatial point processes using Python. First I’ll present the Poisson point process, and then I’ll cover two other processes: the Thomas point process and the Matérn point process. I’ll use these tools in two future posts regarding measuring fractal dimension, and kriging. An excellent resource for spatial statistics is the R package `spstat`

. The manual is a really great read. The `spstat`

package implements the Thomas and Matérn point processes as `rThomas`

and `rMatern`

.

# Poisson Point Process

A Poisson random variable describes the number of events one is expected to observe over a given time period, assuming that those events occur independently of one another, and occur according to a constant rate, . Most actuarial science books start out with the Poisson processes, and a favorite example involves hurricanes. (I refer, of course, to the tropical cyclonic storm, not the French Quarter beverage beloved by tourists.) The Hurricane Problem is this: given a rate of 2.75 hurricanes per year, what is the probability of having at most one hurricane; what is the probability of having four or more hurricanes, etc. The PDF of the Poisson Distribution is given by,

Note that this is a discrete PDF, and that although the rate, , is a real number, and the probabilities are real numbers, we’re still talking about discrete events–the probabilities of one or two, or three (discrete) hurricanes.

More abstractly, we’re interested in the number of events with respect to a rate, , over some dimension. In the Hurricane Problem, the rate is hurricanes per annum, and the dimension is time. We can also consider a Poisson process occurring over two spatial dimensions. Consider snowfall in your yard. At this scale (as opposed to the scale of parishes or states) the snowflakes can be said to fall at a constant rate over the whole yard. That implies that we can estimate a rate by counting the number of new snowflakes to fall in a, say, five foot by five foot region of the yard over 15 minutes, and then develop an estimate of the snowflakes per square foot per hour for the entire yard, assuming snowflake independence and a constant yard-wide rate of snowfall.

With this idea of events per time period per unit of area, we can write a Python function to simulate a Poisson point process. It will take as an input the rate–events per time period–and two spatial dimensions, and it will just assume that you’re interested in a rectangular region.

def PoissonPP( rt, Dx, Dy=None ): ''' Determines the number of events `N` for a rectangular region, given the rate `rt` and the dimensions, `Dx`, `Dy`. Returns a <2xN> NumPy array. ''' if Dy == None: Dy = Dx N = scipy.stats.poisson( rt*Dx*Dy ).rvs() x = scipy.stats.uniform.rvs(0,Dx,((N,1))) y = scipy.stats.uniform.rvs(0,Dy,((N,1))) P = np.hstack((x,y)) return P

We can plot an example quickly using the code below. Note that at **line 9** I set the limits of the axes as functions of `Dx`

. This is so that if you make several plots of different sized regions, you can compare them without having to think about different scales.

rate, Dx = 0.2, 20 P = PoissonPP( rate, Dx ).T fig, ax = subplots() ax = fig.add_subplot(111) ax.scatter( P[0], P[1], edgecolor='b', facecolor='none', alpha=0.5 ) # lengths of the axes are functions of `Dx` xlim(0,Dx) ; ylim(0,Dx) # label the axes and force a 1:1 aspect ratio xlabel('X') ; ylabel('Y') ; ax.set_aspect(1) title('Poisson Process, ={}'.format(rate)) savefig( 'poisson_lambda_0p2.png', fmt='png', dpi=100 )

As we can see, most of the heavy lifting is done by a uniform distribution. The only thing *Poisson* about this is the number of points. If you wanted to get really fancy, you could partition your region and pick a Poisson random variate `N`

for each partition, and populate the region according to the local `N`

.

# Thomas Point Process

The Thomas point process produces clusters using Poisson point processes in two stages. First, a set of `M`

parent points are determined according to a Poisson process with a rate `kappa`

, and distributed uniformly over a region. Next, for each parent point, a set of `N`

child points are determined according to a second Poisson process with a rate `mu`

. These child points are distributed about each parent according to an isotropic Gaussian distribution with a variance `sigma`

. (Isotropic means *uniformly in all directions*.) Note that the `M`

parent points never show up in the final distribution.

def ThomasPP( kappa, sigma, mu, Dx ): ''' A Poisson( kappa ) number of parents are created, each forming a Poisson( mu ) numbered cluster of points, having an isotropic Gaussian distribution with variance `sigma` ''' # create a set of parent points from a Poisson( kappa ) # distribution on the square region [0,Dx] X [0,Dx] parents = PoissonPP( kappa, Dx ) # M is the number of parents M = parents.shape[0] # an empty list for the Thomas process points TP = list() # for each parent point.. for i in range( M ): # determine a number of children according # to a Poisson( mu ) distribution N = scipy.stats.poisson( mu ).rvs() # for each child point.. for j in range( N ): # place a point centered on the location of the parent according # to an isotropic Gaussian distribution with sigma variance pdf = scipy.stats.norm( loc=parents[i,:2], scale=(sigma,sigma) ) # add the child point to the list TP TP.append( list( pdf.rvs(2) ) ) # return a numpy array TP = np.array( TP ) return TP

# Matérn Point Process

The Matérn point process works very much the same way as the Thomas process, except it replaces the isotropic Gaussian distribution with a uniform distribution about a disk of a given radius.

In order to implement this process we’ll need to implement a function very similar to the Thomas process above, and a function for returning a set of points from a uniform distribution over a disk which will take the `x`

and `y`

coordinates of its parent point, and a value `r`

for the radius of the disk. First, it redefines `r`

within its scope to some uniformly distributed value from , and then it draws an angle `theta`

uniformly from . Finally, it uses `r`

and `theta`

in a polar coordinate transform to produce a child point within some neighborhood `r`

of its parent.

def uniformDisk( x, y, r ): ''' Returns a uniformly distributed point in a disk of radius `r` centered at the point (x,y). ''' r = scipy.stats.uniform( 0, r**2.0 ).rvs() theta = scipy.stats.uniform( 0, 2*np.pi ).rvs() xt = np.sqrt( r ) * np.cos( theta ) yt = np.sqrt( r ) * np.sin( theta ) return x+xt, y+yt

Finally, we implement the Matérn point process using the uniform diskribution. (Boom, words.)

def MaternPP( kappa, r, mu, Dx ): ''' A Poisson( kappa ) number of parents are created, each forming a Poisson( mu ) numbered cluster of points, distributed uniformly in a circle of radius `r` ''' # create a set of parent points from a Poisson( kappa ) # distribution on the square region [0,Dx] X [0,Dx] parents = PoissonPP( kappa, Dx ) # M is the number of parents M = parents.shape[0] # an empty list for the Matern process points MP = list() # for each parent point.. for i in range( M ): # determine a number of children according # to a Poisson( mu ) distribution N = scipy.stats.poisson( mu ).rvs() # for each child point.. for j in range( N ): # produce a uniformly distributed child point from a # circle of radius `r`, centered about a parent point x, y = uniformDisk( parents[i,0], parents[i,1], r ) # add the child point to the list MP MP.append( [ x, y ] ) # return a numpy array MP = np.array( MP ) return MP

# The Hurricane Problem

In order to answer the Hurricane problem mentioned above, we would let , and then we’d think about . For the first part, we’d calculate the PDF for , sum those values and that would be it.

For the second part, we’d calculate the PDF for and then subtract that from 1, since we are considering the complement.

In Python we’d write,

import scipy.stats L = 2.75 Pr_X_lt_2 = scipy.stats.poisson(2.75).cdf(1) Pr_X_gt_3 = 1-scipy.stats.poisson(2.75).cdf(3) print 'Pr(X<2) = {:.5f}'.format( Pr_X_lt_2 ) print 'Pr(X>3) = {:.5f}'.format( Pr_X_gt_3 )

Pr(X<2) = 0.23973 Pr(X>3) = 0.29696

Hi, how thoroughly did you check that your uniformDisk( x, y, r ) really produces uniformly distributed points?

My blog post on this from a while back:

http://www.anderswallin.net/2009/05/uniform-random-points-in-a-circle-using-polar-coordinates/

also, John Baez posted about Voronoi Diagrams of poisson point processes over here:

https://plus.google.com/117663015413546257905/posts/FPryzySMdm5

and that inspired me to do a direct calculation of the average number of closest neighbours a point has:

http://www.anderswallin.net/2011/12/poisson-voronoi-diagram-statistics/

Have fun!

Anders

I did not check that, but I fixed it. Thanks for pointing that out. I’ll check out the Voroni diagrams and nearest neighbors calculations this evening. Thanks again!

Can I have your acceptance to post this on my twitter?

Yeah, sure. Knock yourself out!