Simple Kriging in Python

March 20, 2014 cjohnson318 27 Comments

In this post I will work through an example of Simple Kriging. Kriging is a set of techniques for interpolation. It differs from other interpolation techniques in that it sacrifices smoothness for the integrity of sampled points. Most interpolation techniques will over or undershoot the value of the function at sampled locations, but kriging honors those measurements and keeps them fixed. In future posts I would like to cover other types of kriging, other semivariaogram models, and colocated co-kriging. Until then, I’m keeping relatively up to date code at my GitHub project, geostatsmodels.

Data

The data used in this exercise is in a zip file at this site. (Click on Geostatistics Resources.) I have adapted my material from the Kriging document on the same site. My approach will focus more on programming. First we will import some modules and then load the data and parse it,

from pylab import *
import numpy as np
from pandas import DataFrame, Series
from scipy.spatial.distance import pdist, squareform

z = open( 'WGTutorial/ZoneA.dat','r' ).readlines()
z = [ i.strip().split() for i in z[10:] ]
z = np.array( z, dtype=np.float )
z = DataFrame( z, columns=['x','y','thk','por','perm','lperm','lpermp','lpermr'] )

Next, we will plot the data,

fig, ax = subplots()
ax.scatter( z.x, z.y, c=z.por, cmap='gray' )
ax.set_aspect(1)
xlim(-1500,22000)
ylim(-1500,17500)
xlabel('Easting [m]')
ylabel('Northing [m]')
title('Porosity %') ;

In surveys, we generally specify one point in latitude and longitude, and then measure things as North and East of that point, hence the Northing and Easting.

The Semivariogram

The semivariogram encodes data about spatial variance over the region at a given distance or lag. We generally expect data points that are close together spatially to share other characteristics, and we expect points that are separated by greater distances to have lesser correlation. The semivariogram allows us to model the similarity points in a field as a function of distance. The semivariogram is given by,

(1) $\begin{equation*} \hat{\gamma}(h) = \dfrac{1}{2N(h)} \displaystyle \sum_{N(h)} ( z_{i} - z_{j} )^{2} \end{equation*}$

Here, $h$ is distance specified by the user, and $z_{i}$ and $z_{j}$ are two points that are separated spatially by $h$ . The $N(h)$ term is the number of points we have that are separated by the distance $h$ . The semivariogram then is the sum of squared differences between values separated by a distance $h$ . As an aside, contrast this with the formulation for variance,

(2) $\begin{equation*} s = \dfrac{1}{N-1} \displaystyle \sum_{k=1}^{N} (z_{k} - \hat{\mu} )^{2} \end{equation*}$

Here, $N$ is the number of data points, $\hat{\mu}$ is the sample mean, and $z_{k}$ is a data point. For sample variance, we are taking the squared difference between data points and the mean, and in the semivariogram we are taking the squared difference between data points separated by distance $h$ . We can write some functions to calculate the semivariogram at one lag, and then at multiple lags as follows.

def SVh( P, h, bw ):
    '''
    Experimental semivariogram for a single lag
    '''
    pd = squareform( pdist( P[:,:2] ) )
    N = pd.shape[0]
    Z = list()
    for i in range(N):
        for j in range(i+1,N):
            if( pd[i,j] >= h-bw )and( pd[i,j] <= h+bw ):
                Z.append( ( P[i,2] - P[j,2] )**2.0 )
    return np.sum( Z ) / ( 2.0 * len( Z ) )

def SV( P, hs, bw ):
    '''
    Experimental variogram for a collection of lags
    '''
    sv = list()
    for h in hs:
        sv.append( SVh( P, h, bw ) )
    sv = [ [ hs[i], sv[i] ] for i in range( len( hs ) ) if sv[i] > 0 ]
    return np.array( sv ).T

def C( P, h, bw ):
    '''
    Calculate the sill
    '''
    c0 = np.var( P[:,2] )
    if h == 0:
        return c0
    return c0 - SVh( P, h, bw )

The C() function is the covariance function, and will be used later. Let us now calculate and plot the semivariogram,

# part of our data set recording porosity
P = np.array( z[['x','y','por']] )
# bandwidth, plus or minus 250 meters
bw = 500
# lags in 500 meter increments from zero to 10,000
hs = np.arange(0,10500,bw)
sv = SV( P, hs, bw )
plot( sv[0], sv[1], '.-' )
xlabel('Lag [m]')
ylabel('Semivariance')
title('Sample Semivariogram') ;
savefig('sample_semivariogram.png',fmt='png',dpi=200)

Modeling

Now that we’ve calculated the semivariogram, we will need to fit a model to the data. There are three popular models, the spherical, exponential, and the Gaussian. Here, we’ll implement the spherical model. First, we will present a function named opt() for determining the optimal value a for the spherical model.

def opt( fct, x, y, C0, parameterRange=None, meshSize=1000 ):
    if parameterRange == None:
        parameterRange = [ x[1], x[-1] ]
    mse = np.zeros( meshSize )
    a = np.linspace( parameterRange[0], parameterRange[1], meshSize )
    for i in range( meshSize ):
        mse[i] = np.mean( ( y - fct( x, a[i], C0 ) )**2.0 )
    return a[ mse.argmin() ]

The opt() function finds the optimal parameter for fitting a spherical model to the semivariogram data. The spherical model is given by the function spherical(). On the last line we see that spherical() returns itself in a map() function, which seems odd. The idea is that the input h can be a single float value, or list or NumPy array of floats. If h is a single value, then line 9 is called. If h is a list or an array (an iterable) then line 17 is called, which applies line 9 to each value of h.

def spherical( h, a, C0 ):
    '''
    Spherical model of the semivariogram
    '''
    # if h is a single digit
    if type(h) == np.float64:
        # calculate the spherical function
        if h <= a:
            return C0*( 1.5*h/a - 0.5*(h/a)**3.0 )
        else:
            return C0
    # if h is an iterable
    else:
        # calcualte the spherical function for all elements
        a = np.ones( h.size ) * a
        C0 = np.ones( h.size ) * C0
        return map( spherical, h, a, C0 )

Next, cvmodel() fits a model to the semivariogram data and returns a covariance method named covfct().

def cvmodel( P, model, hs, bw ):
    '''
    Input:  (P)      ndarray, data
            (model)  modeling function
                      - spherical
                      - exponential
                      - gaussian
            (hs)     distances
            (bw)     bandwidth
    Output: (covfct) function modeling the covariance
    '''
    # calculate the semivariogram
    sv = SV( P, hs, bw )
    # calculate the sill
    C0 = C( P, hs[0], bw )
    # calculate the optimal parameters
    param = opt( model, sv[0], sv[1], C0 )
    # return a covariance function
    covfct = lambda h, a=param: C0 - model( h, a, C0 )
    return covfct

At this point we’ll plot our model and see if it represents our data well.

sp = cvmodel( P, model=spherical, hs=np.arange(0,10500,500), bw=500 )
plot( sv[0], sv[1], '.-' )
plot( sv[0], sp( sv[0] ) ) ;
title('Spherical Model')
ylabel('Semivariance')
xlabel('Lag [m]')
savefig('semivariogram_model.png',fmt='png',dpi=200)

Simple Kriging

Now that we have a model for the semivariogram, we can write a function to perform the kriging. The fundamental relationship is a matrix equation,

(3) $\begin{equation*} K \lambda = k \Rightarrow \lambda = K^{-1} k \end{equation*}$

Here, $K$ is a matrix of covariances calculated using the spherical model, $\lambda$ is a vector of simple kriging weights, and $k$ is the vector of covariances between the data points and an unsampled point. Our kriging function takes the data set P, the model, the distances hs, the bandwidth bw, the coordinates of the unsampled point u, and the number of surrounding points N to use in the calculation.

def krige( P, model, hs, bw, u, N ):
    '''
    Input  (P)     ndarray, data
           (model) modeling function
                    - spherical
                    - exponential
                    - gaussian
           (hs)    kriging distances
           (bw)    kriging bandwidth
           (u)     unsampled point
           (N)     number of neighboring
                   points to consider
    '''

    # covariance function
    covfct = cvmodel( P, model, hs, bw )
    # mean of the variable
    mu = np.mean( P[:,2] )

    # distance between u and each data point in P
    d = np.sqrt( ( P[:,0]-u[0] )**2.0 + ( P[:,1]-u[1] )**2.0 )
    # add these distances to P
    P = np.vstack(( P.T, d )).T
    # sort P by these distances
    # take the first N of them
    P = P[d.argsort()[:N]]

    # apply the covariance model to the distances
    k = covfct( P[:,3] )
    # cast as a matrix
    k = np.matrix( k ).T

    # form a matrix of distances between existing data points
    K = squareform( pdist( P[:,:2] ) )
    # apply the covariance model to these distances
    K = covfct( K.ravel() )
    # re-cast as a NumPy array -- thanks M.L.
    K = np.array( K )
    # reshape into an array
    K = K.reshape(N,N)
    # cast as a matrix
    K = np.matrix( K )

    # calculate the kriging weights
    weights = np.linalg.inv( K ) * k
    weights = np.array( weights )

    # calculate the residuals
    residuals = P[:,2] - mu

    # calculate the estimation
    estimation = np.dot( weights.T, residuals ) + mu

    return float( estimation )

Calculation

Here, we’ll calculate the kriging estimate at a number of unsampled points.

X0, X1 = P[:,0].min(), P[:,0].max()
Y0, Y1 = P[:,1].min(), P[:,1].max()
Z = np.zeros((80,100))
dx, dy = (X1-X0)/100.0, (Y1-Y0)/80.0
for i in range( 80 ):
    print i,
    for j in range( 100 ):
        Z[i,j] = krige( P, model, hs, bw, (dy*j,dx*i), 16 )

Plotting

Finally, for fun, we’ll try to reproduce one of the graphs found on the site with the data. I kind of guessed on the colormap.

cdict = {'red':   ((0.0, 1.0, 1.0),
                   (0.5, 225/255., 225/255. ),
                   (0.75, 0.141, 0.141 ),
                   (1.0, 0.0, 0.0)),
         'green': ((0.0, 1.0, 1.0),
                   (0.5, 57/255., 57/255. ),
                   (0.75, 0.0, 0.0 ),
                   (1.0, 0.0, 0.0)),
         'blue':  ((0.0, 0.376, 0.376),
                   (0.5, 198/255., 198/255. ),
                   (0.75, 1.0, 1.0 ),
                   (1.0, 0.0, 0.0)) }

my_cmap = matplotlib.colors.LinearSegmentedColormap('my_colormap',cdict,256)

fig, ax = subplots()
H = np.zeros_like( Z )
for i in range( Z.shape[0] ):
    for j in range( Z.shape[1] ):
        H[i,j] = np.round( Z[i,j]*3 )

ax.matshow( H, cmap=my_cmap, interpolation='nearest' )
ax.scatter( z.x/200.0, z.y/200.0, facecolor='none', linewidths=0.75, s=50 )
xlim(0,99) ; ylim(0,80)
xticks( [25,50,75], [5000,10000,15000] )
yticks( [25,50,75], [5000,10000,15000] )

savefig( 'krigingpurple.png', fmt='png', dpi=200 )

27 thoughts on “Simple Kriging in Python”

awallin55 says:

March 20, 2014 at 6:15 PM

how about using a syntax-highlighting plugin!? makes the code more readable:
https://wordpress.org/plugins/wp-syntax/

also whitespace matters in python, and at least my browser shows some additional unwanted line-breaks in the code-windows on this page 🙁
1. cjohnson318 says:
  
  March 20, 2014 at 7:18 PM
  
  I have syntax highlighting, and all the code looks alright to me. What browser are you using?
manejandodatos (@manejandodatos) says:

March 25, 2014 at 10:19 AM

Brilliant!
Patrick Li says:

April 12, 2014 at 12:22 PM

great. Thank you.
Ngoc An says:

June 17, 2014 at 2:00 AM

Could you please show me how to find the file ZoneA.dat?
1. cjohnson318 says:
  
  June 21, 2014 at 6:37 PM
  
  Right, that was obscure. Go to http://people.ku.edu/~gbohling/geostats/index.html, and then click on My tutorial on reservoir modeling with WinGslib, with example data. This will download a ZIP file with the data sets in it.
luisblacuttluis says:

June 26, 2014 at 5:14 PM

Hi, I’m having problems running the
fig, ax = subplots()
line
1. cjohnson318 says:
  
  June 27, 2014 at 10:33 AM
  
  You might need to execute something like “from pylab import *”. I use the IPython notebook interface with the “–pylab=inline” flag at startup, and I forget that that imports a lot of plotting functionality automatically. Let me know if the “from pylab import *” line works for you.
  1. luis says:
    
    June 28, 2014 at 1:00 PM
    
    Hi, the instruction went through, however I get the following error message
    
    Traceback (most recent call last):
    File “<pyshell#10>”, line 1, in
    ax.scatter( z.x, z.y, c=z.por, cmap=’gray’ )
    AttributeError: ‘list’ object has no attribute ‘x’
    
    I’m working on a 64-bit machine with OS Ubuntu 14.04
    1. cjohnson318 says:
      
      June 28, 2014 at 3:15 PM
      
      Okay, that says that (lowercase) z is a list, but at the top we declare (lowercase) z to be a pandas DataFrame. Using a DataFrame isn’t essential, I just like being able to name columns so I can have more readable code. If you want, you can email me your code and data and I can look at it. I’m connor (dot) labaume (dot) johnson (at gmail dot com).
Pingback: Two Dimensional Sequential Gaussian Simulation in Python | Connor Johnson
martin says:

July 1, 2014 at 3:14 AM

I have a problem with your script:

sp.sv[0] (from sp = cvmodel( P, model=spherical, hs=np.arange(0,10500,500), bw=500 ))

so.sv[0]
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘function’ object has no attribute ‘sv’

For me, the correct formulation is :

sp(sv[0])

After:with model=spherical

sp.model( sp.sv[0] ) ) is sp(model(sp.sv[0])) = sp(spherical(sp.sv[0]))

TypeError: spherical() takes exactly 3 arguments (1 given)

effectively from def spherical( h, a, C0 ):

Thanks
1. cjohnson318 says:
  
  July 2, 2014 at 11:13 AM
  
  Hi Martin, I simplified the code a few weeks ago from a more OOP approach to the present state in order to save time. (It didn’t save any time.) I accidentally forgot to update some of the examples. I’m apologize for the inconvenience. I also have code up at https://github.com/cjohnson318/geostatistics
  1. Martin says:
    
    July 2, 2014 at 12:20 PM
    
    Thanks,I know, but how to plot plot( sv[0], sp( sv[0] ) ) ?
    1. cjohnson318 says:
      
      July 2, 2014 at 12:43 PM
      
      Oh, okay. Try adding “from pylab import *” before you plot.
      1. Martin says:
        
        July 2, 2014 at 2:53 PM
        
        Sorry, but It is not the problem of matplotlib, it is the problem of your new solution:
        
        plot( sv[0], sp( sv[0] ) )
        
        The resulting curve is horizontally reflected
        
        http://i.imgur.com/lnlfFfp.png
      2. cjohnson318 says:
        
        July 2, 2014 at 3:20 PM
        
        Oh, right. So, the model fits the empirical semivariogram, but the output is flipped because it is modeling the covariance described by the semivariogram. For small distances the semivariogram is low, but the covariance is high. For greater distances, the semivariogram is high, but the covariance is low.
Martin says:

July 3, 2014 at 6:41 AM

Many thanks, it works now with spherical (an also with exponential and gaussian from your https://github.com/cjohnson318/geostatistics/blob/master/krige.py script)

A little correction in the function krige:
you need to put K = np.array(K) between the lines 29 and 31 ,

K = covfct( K.ravel() ) K =np.array(K) #reshape into an array K = K.reshape(N,N)

otherwise:

K = K.reshape(N,N)
AttributeError: ‘list’ object has no attribute ‘reshape’

because originally: type(K) = list()
1. cjohnson318 says:
  
  July 3, 2014 at 8:48 AM
  
  Excellent! Thanks! I’ll definitely fix that!
2. Javier Jofré says:
  
  August 6, 2014 at 7:56 AM
  
  What happened with the Python code?
3. guido says:
  
  November 20, 2014 at 6:25 AM
  
  i have the same problem of the flipped semivariogram, the link https://github.com/cjohnson318/geostatistics/blob/master/krige.py it’s broken, where can I find the files kriege.py? can you help me?
  thanks in advance
  1. cjohnson318 says:
    
    November 22, 2014 at 3:23 PM
    
    Hi, I’m sorry, try this link: https://github.com/cjohnson318/geostatsmodels. There’s some more tutorial material there also.
    1. Guido says:
      
      November 28, 2014 at 9:22 AM
      
      Hi, tank you, I have studied the intro to geostatistcs and variogram analysis, I may have found the problem, line 19 of cvmodel function
      covfct = lambda h, a=param: C0 – model( h, a, C0 )
      should be changed in
      covfct = lambda h, a=param: model( h, a, C0 )
      what do you think about? may be the solution?
      thanks in advance
      1. cjohnson318 says:
        
        November 28, 2014 at 12:34 PM
        
        I think the semivariogram and the covariance function are always flipped versions of each other. The semivariogram shows how the variance increases with distance, and the covariance function describes how covariance decreases with distance. You examine the semivariogram to decide on a model, and the sill and nugget parameters, but internally kriging uses a covariance function, which is a flipped version of the semivariogram.
Jesús Gómez says:

July 25, 2014 at 2:15 PM

Great. Just to bring you back something: there are little changes to make all that code work on Python3:

the map call used in the spherical function definition should be wrapped with a list call: return list(map( spherical, h, a, C0 )). This is due Python3 map function returns a iterator and not a list directly.
The print statement when calculating the kriging estimate should be substituted with a print function call: print(i). In Python3, the print statement is gone and substituted with a print function.

Additionally, you should point that before “calculating the kriging estimate” the variable model must be set to any of the model functions: in this case, the spherical model. model = spherical.

Thank you.
Patrick Ng says:

January 29, 2015 at 11:41 AM

A nice and easy to follow geostats example. Attaboy Mr. Johnson.
John says:

May 2, 2015 at 11:03 PM

It is easy to understand, excellent explanation,
great tool. Thank you very much !!!

Comments are closed.