Tag Archives: Data

tidyr and pandas: Gather and Melt

August 28, 2014 cjohnson318 4 Comments

In this post I’ll look at replicating Hadley Wickham‘s gather() tool from his tidyr package using the pandas melt() function. Why would anyone want to do this? Well, Dr. Wickham’s work is beautiful, and the pandas.melt() function is not as elegant as the tidyr::gather() function. You may read Dr. Wickham’s pre-print paper here.

Continue reading tidyr and pandas: Gather and Melt →

Uncategorized

Concatenating and Visualizing Data in Pandas

July 10, 2014 cjohnson318

One of my favorite things about pandas is that you can easily combine temporal data sets using different time scales. Behind the scenes, pandas will fill in the empty gaps with null values, and then quietly ignore those null values when you want to make a scatter plot or do some other computation, like a rolling mean. It takes so much tedious book-keeping out of the data analysis process.

Continue reading Concatenating and Visualizing Data in Pandas →

Uncategorized

An Iterative Closest Point Algorithm

June 6, 2014 cjohnson318

In this post I’ll demonstrate an iterative closest point (ICP) algorithm that works reasonably well. An ICP algorithm seeks to find a transformation between two sets of points that minimizes the error between them, i.e., you are trying to find a transformation that will lay one set of points exactly on top of another.

Continue reading An Iterative Closest Point Algorithm →

Uncategorized

Using THREE.js to Visualize 3D Point Sets

May 11, 2014 cjohnson318

In this post I’ll describe a bare bones application for inputting and displaying three dimensional data in browser. The data is input using HTML, where it is parsed with JavaScript, and then plotted using the THREE.js library. The input data is parsed using whitespace, so colleagues can copy-paste data from Excel into the fields for plotting.

Continue reading Using THREE.js to Visualize 3D Point Sets →

Uncategorized

Spatial Statistical Hypothesis Testing

May 10, 2014 cjohnson318

In this post I’ll consider performing a local hypothesis test for a difference in means with spatial data. I do not know if this is the optimal way to go about this sort of thing, but I have not yet found another solution. I think the best way to describe the problem is to consider the artificial data, and then wade through the code.

Continue reading Spatial Statistical Hypothesis Testing →

Uncategorized

Using PyBrain for Optical Character Recognition (First Whack)

April 20, 2014 cjohnson318 1 Comment

This is my first whack at using PyBrain for optical character recognition. I am limiting myself to numerical data, since that’s what I have laying around needing to be optically recognized the most. I’m also focusing on extra small, and heavily corrupted data.

Continue reading Using PyBrain for Optical Character Recognition (First Whack) →

Uncategorized

Tkinter Optical Character Recognition Training Data Labeler

April 19, 2014 cjohnson318

In this post I’ll demonstrate how to build a object oriented Tkinter GUI application for associating labels to filenames in order to quickly and easily build a set of training data. The Submit button will associate the label with the file, and the Save and Quit button will dump the file and its associated label into a Python dict, and then a cPickle file for later use. This is still a little rough around the edges; it assumes that you’re looking for PNG data in the current directory, and the output overwrites previous output, but it’s a start.

Continue reading Tkinter Optical Character Recognition Training Data Labeler →

Uncategorized

Open an Excel File in Pandas

April 17, 2014 cjohnson318 1 Comment

In this post I’ll demonstrate how to open an Excel file in Python using Pandas, a (the) module for data manipulation. I love using Pandas, and I cannot recommend it enough.

Continue reading Open an Excel File in Pandas →

Uncategorized

Query Google Maps API Using Windows PowerShell

April 3, 2014 cjohnson318 1 Comment

Google Maps API lets you make query information elevation data using WGS84 coordinates. All you have to do is construct a URL with the coordinates, and then Google will return a JSON file. A JSON file is basically a text file, with some extra structure, in the form of some keywords, brackets, braces, and semi-colons.

Continue reading Query Google Maps API Using Windows PowerShell →

Uncategorized

Small Data: Germinating Seeds

February 11, 2014 cjohnson318

This is the first in a series of posts using the small data sets from The Handbook of Small Data Sets to illustrate introductory techniques in text processing, plotting, statistics, etc. The data sets are collected in a ZIP file at publisher’s website in the link above. Someone decided to format the data files to resemble the published format to the greatest degree possible, which makes parsing the files interesting. First, we will import our modules,

Continue reading Small Data: Germinating Seeds →

Connor Johnson

Tag Archives: Data

tidyr and pandas: Gather and Melt

Concatenating and Visualizing Data in Pandas

An Iterative Closest Point Algorithm

Using THREE.js to Visualize 3D Point Sets

Spatial Statistical Hypothesis Testing

Using PyBrain for Optical Character Recognition (First Whack)

Tkinter Optical Character Recognition Training Data Labeler

Open an Excel File in Pandas

Query Google Maps API Using Windows PowerShell

Small Data: Germinating Seeds

Blog about math, programming, and data.