Z-score Transform for Geostatistics

In this post I’ll present the z-score forward and backward transforms used in Sequential Gaussian Simulation, to be discussed at a later date. Some geostatistical algorithms assume that data is distributed normally, but interesting data is generally never normally distributed? Solution: force normality, or quasi-normality. All of this is loosely based on Clayton V. Deutsche’s work on the GSLIB library, and his books.

The first function creates a cumulative distribution function, and its inverse, for a given data set d. The CDF and its inverse are reported as <Nx2> arrays, where N is the number of data points in d, the input data set.

def cdf( d, bins=12 ):
    N = len( d )
    counts, intervals = np.histogram( d, bins=bins )
    h = np.diff( intervals ) / 2.0
    f, finv = np.zeros((N,2)), np.zeros((N,2))
    idx, k, T = 0, 0, float( np.sum( counts ) )
    for count in counts:
        for i in range( count ):
            x = intervals[idx]+h[0]
            y = np.cumsum( counts[:idx+1] )[-1] / T
            f[k,:] = x, y 
            finv[k,:] = y, x
            k += 1
        idx += 1
    return f, finv

This next awkward function returns an interpolation function for the CDFs created by the previous function. It’s a function that returns a function, thanks to functions being first class citizens in Python land. The basis of this function is the scipy.interpolate.interp1d() function, but since this function throws an error when you give it a value outside the original interpolation range, I’ve taken care of those boundary conditions. I could have alternatively used a try/except structure, and I may yet, but this works for now.

def fit( d ):
    x, y = d[:,0], d[:,1]
    def f(t):
        if t <= x.min():
            return y[ np.argmin(x) ]
        elif t >= x.max():
            return y[ np.argmax(x) ]
        else:
            intr = scipy.interpolate.interp1d( x, y )
            return intr(t)
    return f

Finally, here are my transformation and back-transformation functions. Depending on the original distribution of the data, the transformation to a standard normal distribution may or may not work out so nicely.

# transform data to normal dist
def to_norm( data, bins=12 ):
    mu = np.mean( data )
    sd = np.std( data )
    z = ( data - mu ) / sd
    f, inv = cdf( z, bins=bins )
    z = scipy.stats.norm(0,1).ppf( f[:,1] )
    z = np.where( z==np.inf, np.nan, z )
    z = np.where( np.isnan( z ), np.nanmax( z ), z )
    param = ( mu, sd )
    return z, inv, param, mu, sd

# transform data from normal dist back
def from_norm( data, inv, param, mu, sd ):
    h = fit( inv )
    f = scipy.stats.norm(0,1).cdf( data )
    z = [ h(i)*sd + mu for i in f ]
    return z

So that’s it. I’ll revisit this code and make it nicer and put comments in it, but this is a working example. Tomorrow I’ll work on the two dimensional Sequential Gaussian Simulation.

Connor Johnson

Z-score Transform for Geostatistics

One thought on “Z-score Transform for Geostatistics”

Blog about math, programming, and data.