In this post I’ll look at replicating Hadley Wickham‘s
gather() tool from his
tidyr package using the pandas
melt() function. Why would anyone want to do this? Well, Dr. Wickham’s work is beautiful, and the
pandas.melt() function is not as elegant as the
tidyr::gather() function. You may read Dr. Wickham’s pre-print paper here.
Continue reading tidyr and pandas: Gather and Melt →
One of my favorite things about pandas is that you can easily combine temporal data sets using different time scales. Behind the scenes, pandas will fill in the empty gaps with null values, and then quietly ignore those null values when you want to make a scatter plot or do some other computation, like a rolling mean. It takes so much tedious book-keeping out of the data analysis process.
Continue reading Concatenating and Visualizing Data in Pandas →
In this post I’ll demonstrate an iterative closest point (ICP) algorithm that works reasonably well. An ICP algorithm seeks to find a transformation between two sets of points that minimizes the error between them, i.e., you are trying to find a transformation that will lay one set of points exactly on top of another.
Continue reading An Iterative Closest Point Algorithm →
Continue reading Using THREE.js to Visualize 3D Point Sets →
In this post I’ll consider performing a local hypothesis test for a difference in means with spatial data. I do not know if this is the optimal way to go about this sort of thing, but I have not yet found another solution. I think the best way to describe the problem is to consider the artificial data, and then wade through the code.
Continue reading Spatial Statistical Hypothesis Testing →
This is my first whack at using PyBrain for optical character recognition. I am limiting myself to numerical data, since that’s what I have laying around needing to be optically recognized the most. I’m also focusing on extra small, and heavily corrupted data.
Continue reading Using PyBrain for Optical Character Recognition (First Whack) →
In this post I’ll demonstrate how to build a object oriented Tkinter GUI application for associating labels to filenames in order to quickly and easily build a set of training data. The Submit button will associate the label with the file, and the Save and Quit button will dump the file and its associated label into a Python dict, and then a cPickle file for later use. This is still a little rough around the edges; it assumes that you’re looking for PNG data in the current directory, and the output overwrites previous output, but it’s a start.
Continue reading Tkinter Optical Character Recognition Training Data Labeler →
In this post I’ll demonstrate how to open an Excel file in Python using Pandas, a (the) module for data manipulation. I love using Pandas, and I cannot recommend it enough.
Continue reading Open an Excel File in Pandas →
Google Maps API lets you make query information elevation data using WGS84 coordinates. All you have to do is construct a URL with the coordinates, and then Google will return a JSON file. A JSON file is basically a text file, with some extra structure, in the form of some keywords, brackets, braces, and semi-colons.
Continue reading Query Google Maps API Using Windows PowerShell →
This is the first in a series of posts using the small data sets from The Handbook of Small Data Sets to illustrate introductory techniques in text processing, plotting, statistics, etc. The data sets are collected in a ZIP file at publisher’s website in the link above. Someone decided to format the data files to resemble the published format to the greatest degree possible, which makes parsing the files interesting. First, we will import our modules,
Continue reading Small Data: Germinating Seeds →