Using PyBrain for Optical Character Recognition (First Whack)

This is my first whack at using PyBrain for optical character recognition. I am limiting myself to numerical data, since that’s what I have laying around needing to be optically recognized the most. I’m also focusing on extra small, and heavily corrupted data.

Data Preprocessing

Since my PNG images are all different sizes, I decided to vignette them all onto the largest format, which was 10×9 pixels. Here, ds is a dict that has filenames of PNG images as keys, and then the class label as a value. This dict was created using the image labeling GUI detailed in my previous post. The list raw is a list of two element lists. The first element is a (three dimensional) NumPy array, and the second element is the class label.

raw = list()
for k, v in ds.iteritems():
    im = cv2.imread( k )
    im = im[:,:,0]
    raw.append( [ im, v ] )

Here, sz is a list of image dimensions. The values mh and mw are the maximum height and width image dimensions found in the training data.

sz = np.array( [ i[0].shape for i in raw ] )

mh, mw = np.max( sz[:,0] ), np.max( sz[:,1] )

This loop vignettes each image onto a standard size, mh by mw, and then normalizes the pixel data from 0 to 255, to 0 to 1. The data is put into the list normed. Each row of normed is a list of values between 0 and 1 from the image data, and then a class label.

normed = list()
for i in range( len( raw ) ):
    img, lbl = raw[i]
    z = np.ones( ( mh, mw ) )*255.0
    h, w = img.shape
    dh = ( mh - h ) / 2
    dw = ( mw - w ) / 2
    z[dh:dh+h,dw:dw+w] = img
    z /= 255.0
    z = z.ravel()
    normed.append( list( z ) + [ int( lbl ) ] ) 

Using PyBrain

Here is the code for using PyBrain. It looks like all of the useful functions are peppered through a bunch of modules. I’m not very familiar with the tool yet, and this is cobbled together from the documentation. First we initialize and populate the data set with the training data. The training data in this example is all of the data except for the last 20 items.

from pybrain.tools.shortcuts import buildNetwork
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
import time

## initialize a data set
ds = SupervisedDataSet( len( normed[0] ) - 1, 1 )

## poplate the data set
N = len( normed )
for i in range( N-20 ):
    ds.addSample( normed[i][:-1], normed[i][-1] )

Next we set up a neural network and train it. The 90, 60, 30,... bit are the numbers of neurons per layer of the neural network. The first layer has 90 neurons, the next layer has 60 neurons, etc. I settled on the output layer having one neuron since I’m looking for a single classification, but maybe that’s not the best idea? I’m still working on it.

## set up the neural network
net = buildNetwork( 90, 60, 30, 20, 10, 1, bias=True )
trainer = BackpropTrainer( net, ds )

## initial time
t0 = time.time()

## training
for i in range( 500 ):
    trainer.train()

## fial time    
t1 = time.time()

This part collects the output. The err term is used for determining convergence. I ran this bad boy all night and it never converged, so I settled instead on training it over 500 epochs. The accuracy is the percentage of correct classifications of the last 20 items, which the network has not been trained on, based on the class labels.

## error returned by the trainer
err = trainer.train()

## determine the accuracy
score = 0
for i in range( N-20, N ):
    p = net.activate( normed[i][:-1] )
    res = np.round( p )
    if res == normed[i][-1]:
        score += 1
acc = score / float( 20 )

## print the output (in minutes), the error, and the accuracy
print '{:.2f}  {:.5e}  {:.2f}'.format( (t1-t0)/60.0, err, acc )

Conclusion

I ended up with 65-70% accuracy, which I think is pretty good based on the dismal quality of my data, and the fact that I did not do any feature extraction. This was essentially a proof of principle exercise to myself, as I did not know if it would work at all on my data. In the future I’d like to look at feature extraction techniques, and further neural network options and topologies.

One thought on “Using PyBrain for Optical Character Recognition (First Whack)”

  1. I’m pretty sure your error was in having just 1 output node/neuron. It should be (in general) a number of output neurons equal to the number of classes, i.e 2.

Comments are closed.