A co-worker was interested in segmenting a list of data points, and I went down a rabbit hole on one dimensional segmentation. I found an article on the Jenk’s natural breaks optimization on Wikipedia. I found another article that had some examples. This is used to bin data points so that clusters are always binned together. There is an iterative method that takes unordered data, but this implementation just sorts the data before binning.
In this post I will discuss an implementation of sequential Gaussian simulation (SGS) from the field of geostatistics. Geostatistics is simply a statistical consideration of spatially distributed data. Sequential Gaussian simulation is a technique used to “fill in” a grid representing the area of interest using a smattering of observations, and a model of the observed trend. The basic workflow incorporates three steps:
- Modeling the measured variation using a semivariogram
- Using the semivariogram to perform interpolation by kriging
- Running simulations to estimate the spatial distribution of the variable(s) of interest
In this post I’ll present the z-score forward and backward transforms used in Sequential Gaussian Simulation, to be discussed at a later date. Some geostatistical algorithms assume that data is distributed normally, but interesting data is generally never normally distributed? Solution: force normality, or quasi-normality. All of this is loosely based on Clayton V. Deutsche’s work on the GSLIB library, and his books.
In this post I will work through an example of Simple Kriging. Kriging is a set of techniques for interpolation. It differs from other interpolation techniques in that it sacrifices smoothness for the integrity of sampled points. Most interpolation techniques will over or undershoot the value of the function at sampled locations, but kriging honors those measurements and keeps them fixed. In future posts I would like to cover other types of kriging, other semivariaogram models, and colocated co-kriging. Until then, I’m keeping relatively up to date code at my GitHub project, geostatsmodels.