Classical Hypothesis Testing, Statistical Power, and Type-II Errors

April 30, 2014 cjohnson318 1 Comment

This is one of the fundamental tasks in science. You do a study, and then you have to determine if there is a statistically meaningful difference between the test and control data. It is important to be able to understand the hypothesis testing, because a lot of interesting functions in R are hypothesis tests. I’ll consider the simple z-test for testing whether the mean of the simple is the same as the hypothesized mean of the population. We’ll see how statistical power, which is the probability of detecting a difference in means, changes with sample size and effect size, which is the size of the difference between the observed sample mean, and the hypothesized population mean. We’ll also see that the significance level $\alpha$ is comparable to the Type-II (false negative) error rate.

Continue reading Classical Hypothesis Testing, Statistical Power, and Type-II Errors →

Uncategorized

Develop Windows Executables from Python Scripts for 32-bit and 64-bit Architecures

April 22, 2014 cjohnson318

In this post I’ll discuss building a Windows executable from a Python script for 32-bit and 64-bit Windows. Producing a 64-bit executable on a 64-bit machine in Windows is easy using PyInstaller, but producing a 32-bit executable on a 64-bit machine takes some tinkering. I ended up setting up a chroot environment on Ubuntu for this task.

Continue reading Develop Windows Executables from Python Scripts for 32-bit and 64-bit Architecures →

Uncategorized

Using PyBrain for Optical Character Recognition (First Whack)

April 20, 2014 cjohnson318 1 Comment

This is my first whack at using PyBrain for optical character recognition. I am limiting myself to numerical data, since that’s what I have laying around needing to be optically recognized the most. I’m also focusing on extra small, and heavily corrupted data.

Continue reading Using PyBrain for Optical Character Recognition (First Whack) →

Uncategorized

Tkinter Optical Character Recognition Training Data Labeler

April 19, 2014 cjohnson318

In this post I’ll demonstrate how to build a object oriented Tkinter GUI application for associating labels to filenames in order to quickly and easily build a set of training data. The Submit button will associate the label with the file, and the Save and Quit button will dump the file and its associated label into a Python dict, and then a cPickle file for later use. This is still a little rough around the edges; it assumes that you’re looking for PNG data in the current directory, and the output overwrites previous output, but it’s a start.

Continue reading Tkinter Optical Character Recognition Training Data Labeler →

Uncategorized

Open an Excel File in Pandas

April 17, 2014 cjohnson318 1 Comment

In this post I’ll demonstrate how to open an Excel file in Python using Pandas, a (the) module for data manipulation. I love using Pandas, and I cannot recommend it enough.

Continue reading Open an Excel File in Pandas →

Uncategorized

Wiring a Tilt Switch

April 13, 2014 cjohnson318

I bought one of the Arduino Sidekick component kits from RadioShack this weekend and I’d like to build a few circuits with those parts over the next few posts. I’ll be using Mike Margolis’ Arduino Cookbook which is the best text on tinkering with Arduinos that I have found, and I highly recommend it.

Continue reading Wiring a Tilt Switch →

Uncategorized

Using QGIS and OSGeo4W for Geo-Data Tasks

April 9, 2014 cjohnson318

In this post I’ll discuss creating and altering shapefiles, and converting point sets from one coordinate reference system to another. I’ll also touch on scripting these tasks for large data sets. I’ll begin with the installation of Quantum GIS and Python for manipulating geographical data. I mainly use QGIS for visualizing and building shapefiles, and I use OSGeo4W from the command line for adding/converting shapefile projections, and converting point sets from one CRS to another.

Continue reading Using QGIS and OSGeo4W for Geo-Data Tasks →

Uncategorized

Query Google Maps API Using Windows PowerShell

April 3, 2014 cjohnson318 1 Comment

Google Maps API lets you make query information elevation data using WGS84 coordinates. All you have to do is construct a URL with the coordinates, and then Google will return a JSON file. A JSON file is basically a text file, with some extra structure, in the form of some keywords, brackets, braces, and semi-colons.

Continue reading Query Google Maps API Using Windows PowerShell →

Uncategorized

Computing Principal Components in Python

April 2, 2014 cjohnson318 2 Comments

In this post I will walk through the computation of principal components from a data set using Python. A number of languages and modules implement principal components analysis (PCA) but some implementations can vary slightly which may lead to confusion if you are trying to follow someone else’s code, or you are using multiple languages. Perhaps more importantly, as a data analyst you should at all costs avoid using a tool if you do not understand how it works. I will use data from The Handbook of Small Data Sets to illustrate this example. The data sets will be found in a zipped directory on site linked above.

Continue reading Computing Principal Components in Python →

Connor Johnson

Monthly Archives: April 2014

Classical Hypothesis Testing, Statistical Power, and Type-II Errors

Develop Windows Executables from Python Scripts for 32-bit and 64-bit Architecures

Using PyBrain for Optical Character Recognition (First Whack)

Tkinter Optical Character Recognition Training Data Labeler

Open an Excel File in Pandas

Wiring a Tilt Switch

Using QGIS and OSGeo4W for Geo-Data Tasks

Query Google Maps API Using Windows PowerShell

Computing Principal Components in Python

Blog about math, programming, and data.