Friday, September 17, 2010

Digit recognition with a neural network. First attempt!

I bought myself "On Intelligence", and I thought as the first step on the long road to building an intelligent machine I would go back to neural networks. I have my own little library that seems to work (kind of), so I decided I was going to try to write a little number recognition program. I was inspired of course by ai-junkie, but it was a bit short on details...
So here is the problem: I have a 5x5 grid of LEDs that can be on or off, and I want to recognize the digit that they show. 
The input will be a list of 25 numbers, either 0 (off) or 1 (on)
The output will be a list of 10 numbers, either 0 or 1. The first position of a 1 indicates the result (so if the first element is a 1, it's number 0, etc).
I arbitrarily decide to set the number of hidden neurons to 50, twice the input neurons. So my topology is 25-50-10. 
I train the network with only one version of each digit. For example 3 would be:
*****
    *
 ****
    *
*****
Wrote a few lines of Haskell code to read the training sets (moving from the representation with *, spaces and new lines to the list of 1 and 0, so that 3 becomes: [1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0]).
Then I train the network. It learns after 200 iterations, and successfully recognizes the training sets (ok, so things are working as they should be).
Then the real test: can my network recognize small variants of the training set?
Lets's try with:
****
    *
 ***
    *
****
(a slightly rounded 3): the network answers 3!!
Works also for a slightly rounded 6. But then:
 ***
*   *
 ***
*   *
 ***
(rounded 8) gives me 6. Oh dear... I could probably add more cases in the training sets and hopefully resolve the ambiguities, but there are bigger issues (I know I know, I'm very candid, but it's one thing being told something in a book, and another to build something to actually show it to you). For example:
***
* *
***
Is a small zero that doesn't take the full 5 LED width. My network recognizes 0 when it fills the full square, not when it's smaller (tells me it's a 5).
And of course, 1 is a disaster. The training "1" is a vertical row of LEDs on the left. A vertical row of LEDs on the right gives me 9, usually (and I suppose, 9 has all the LEDs on the right on).
So not only is basic training not sufficient to detect simple variations, but also the fact that we deal with absolute positioning of leds mean than scaling and simple side translations are not supported. So I suppose the next step is to not work with absolute positions, but relative positions of "on" LEDs. Another day's work...

6 comments:

Unknown said...
This comment has been removed by the author.
Unknown said...

You should try http://en.wikipedia.org/wiki/Scale-invariant_feature_transform which gives the needed invariants (scale, translation, rotation).

JP Moresmau said...

Thanks for the link! A bit of reading to do...

Muddle-headed Wombat said...

I'm doing research for my MSc on neural networks in Haskell. A post of yours on perceptrons helped me get started, about a year ago, so it would be great if I can return the favour by offering a suggestion or two.

If I understood correctly, you're training your network only with perfect examples, and hoping that the network will be able to recognise variations. I think you need to train it with some variations too, so that the network can "get a feel" for what kinds of variations are reasonable. And so...

I think that you would benefit from a larger grid. I don't think there are enough possibilities for variation with a 5x5 grid. For example, those two 3's that you have pretty much span the space of possible 3s.

Depending on what learning rule you're using, you may need to ensure that all your inputs are > 0 and < 1. Check to see if your weights are growing without bound. If so, you may want to change your representation of 3 from [1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0] to [0.9,0.9,0.9,0.9,0.9,0.1,0.1,0.1,0.1,0.9,0.1,0.9,0.9,0.9,0.9,0.1,0.1,0.1,0.1,0.9,0.9,0.9,0.9,0.9,0.9].

Hope that helps.

Muddle-headed Wombat said...

BTW, I found some of the advice on this page to be very helpful:

http://www.cs.ucdavis.edu/~vemuri/classes/ecs170/Program2-Digit%20Recognition.htm

JP Moresmau said...

Thanks, Wombat. Yes, I'm starting small so I can control what's happening, and a bigger grid would make things more interesting. The remark about the weight is interesting. I had read somewhere that -0.5 and +0.5 were better, but my networks don't converge in that case. I'll try with 0.1 and 0.9 to see. I also agree I need to train the network with variations, and maybe it also helps to train it with negative results (to tell it not recognizing anything when it just sees a blob is "ok").