In this post I’ll look at replicating Hadley Wickham‘s `gather()`

tool from his `tidyr`

package using the pandas `melt()`

function. Why would anyone want to do this? Well, Dr. Wickham’s work is beautiful, and the `pandas.melt()`

function is not as elegant as the `tidyr::gather()`

function. You may read Dr. Wickham’s pre-print paper here.

# Tag Archives: Data

# Concatenating and Visualizing Data in Pandas

One of my favorite things about pandas is that you can easily combine temporal data sets using different time scales. Behind the scenes, pandas will fill in the empty gaps with null values, and then quietly ignore those null values when you want to make a scatter plot or do some other computation, like a rolling mean. It takes *so much* tedious book-keeping out of the data analysis process.

# An Iterative Closest Point Algorithm

In this post I’ll demonstrate an iterative closest point (ICP) algorithm that works reasonably well. An ICP algorithm seeks to find a transformation between two sets of points that minimizes the error between them, i.e., you are trying to find a transformation that will lay one set of points exactly on top of another.

# Using THREE.js to Visualize 3D Point Sets

In this post I’ll describe a bare bones application for inputting and displaying three dimensional data in browser. The data is input using HTML, where it is parsed with JavaScript, and then plotted using the THREE.js library. The input data is parsed using whitespace, so colleagues can copy-paste data from Excel into the fields for plotting.

# Spatial Statistical Hypothesis Testing

In this post I’ll consider performing a local hypothesis test for a difference in means with spatial data. I do not know if this is the optimal way to go about this sort of thing, but I have not yet found another solution. I think the best way to describe the problem is to consider the artificial data, and then wade through the code.

# Using PyBrain for Optical Character Recognition (First Whack)

This is my first whack at using PyBrain for optical character recognition. I am limiting myself to numerical data, since that’s what I have laying around needing to be optically recognized the most. I’m also focusing on extra small, and heavily corrupted data.

# Tkinter Optical Character Recognition Training Data Labeler

In this post I’ll demonstrate how to build a object oriented Tkinter GUI application for associating labels to filenames in order to quickly and easily build a set of training data. The *Submit* button will associate the label with the file, and the *Save and Quit* button will dump the file and its associated label into a Python dict, and then a cPickle file for later use. This is still a little rough around the edges; it assumes that you’re looking for PNG data in the current directory, and the output overwrites previous output, but it’s a start.

# Open an Excel File in Pandas

In this post I’ll demonstrate how to open an Excel file in Python using Pandas, a (the) module for data manipulation. I love using Pandas, and I cannot recommend it enough.

# Query Google Maps API Using Windows PowerShell

Google Maps API lets you make query information elevation data using WGS84 coordinates. All you have to do is construct a URL with the coordinates, and then Google will return a JSON file. A JSON file is basically a text file, with some extra structure, in the form of some keywords, brackets, braces, and semi-colons.

# Small Data: Germinating Seeds

This is the first in a series of posts using the small data sets from The Handbook of Small Data Sets to illustrate introductory techniques in text processing, plotting, statistics, etc. The data sets are collected in a ZIP file at publisher’s website in the link above. Someone decided to format the data files to resemble the published format to the greatest degree possible, which makes parsing the files interesting. First, we will import our modules,