In this post I’ll present the z-score forward and backward transforms used in Sequential Gaussian Simulation, to be discussed at a later date. Some geostatistical algorithms assume that data is distributed normally, but interesting data is generally never normally distributed? Solution: force normality, or quasi-normality. All of this is loosely based on Clayton V. Deutsche’s work on the GSLIB library, and his books.

# Tag Archives: Python

# Traversing a Directory Tree in Python and Go(lang)

In this post I’ll discuss the basics of walking through a directory tree in Python and Go. If you are dealing with a smaller directory, it may be more convenient to use Python. If you are dealing with a larger directory containing hundreds of subdirectories and thousands of files, you may want to look into using Go, or another compiled language. I enjoy using Go because it compiles quickly, and it doesn’t use pointer arithmetic.

# An Iterative Closest Point Algorithm

In this post I’ll demonstrate an iterative closest point (ICP) algorithm that works reasonably well. An ICP algorithm seeks to find a transformation between two sets of points that minimizes the error between them, i.e., you are trying to find a transformation that will lay one set of points exactly on top of another.

# Puzzle: The Wolf, the Goat, and the Cabbage

In this post I’ll present a solution to a puzzle using Python. I think the primary value of this post is that it provides an example of how to translate an objective and a set of constraints into data structures and functions that can be interpreted by a computer. This problem breaks down into two interrelated parts:

- Translate the problem into data structures and functions
- Choose a strategy for finding the solution

# Spatial Statistical Hypothesis Testing

In this post I’ll consider performing a local hypothesis test for a difference in means with spatial data. I do not know if this is the optimal way to go about this sort of thing, but I have not yet found another solution. I think the best way to describe the problem is to consider the artificial data, and then wade through the code.

# Classical Hypothesis Testing, Statistical Power, and Type-II Errors

This is one of the fundamental tasks in science. You do a study, and then you have to determine if there is a statistically meaningful difference between the test and control data. It is important to be able to understand the hypothesis testing, because a lot of interesting functions in R are hypothesis tests. I’ll consider the simple z-test for testing whether the mean of the simple is the same as the hypothesized mean of the population. We’ll see how statistical power, which is the probability of detecting a difference in means, changes with sample size and effect size, which is the size of the difference between the observed sample mean, and the hypothesized population mean. We’ll also see that the significance level is comparable to the Type-II (false negative) error rate.

# Develop Windows Executables from Python Scripts for 32-bit and 64-bit Architecures

In this post I’ll discuss building a Windows executable from a Python script for 32-bit and 64-bit Windows. Producing a 64-bit executable on a 64-bit machine in Windows is easy using PyInstaller, but producing a 32-bit executable on a 64-bit machine takes some tinkering. I ended up setting up a chroot environment on Ubuntu for this task.

# Using PyBrain for Optical Character Recognition (First Whack)

This is my first whack at using PyBrain for optical character recognition. I am limiting myself to numerical data, since that’s what I have laying around needing to be optically recognized the most. I’m also focusing on extra small, and heavily corrupted data.

# Tkinter Optical Character Recognition Training Data Labeler

In this post I’ll demonstrate how to build a object oriented Tkinter GUI application for associating labels to filenames in order to quickly and easily build a set of training data. The *Submit* button will associate the label with the file, and the *Save and Quit* button will dump the file and its associated label into a Python dict, and then a cPickle file for later use. This is still a little rough around the edges; it assumes that you’re looking for PNG data in the current directory, and the output overwrites previous output, but it’s a start.

# Open an Excel File in Pandas

In this post I’ll demonstrate how to open an Excel file in Python using Pandas, a (the) module for data manipulation. I love using Pandas, and I cannot recommend it enough.