Friday, September 24, 2010

Haskell Neural Network: plugging a space leak

First, the good news: last week when I posted about my little digit recognition program I had made a mistake in my test code: when I give a rounded eight to my network, it does recognize it, even though it's been trained only with a square eight! Yoohoo!

Also, a few people pointed to online database of handwritten digits so that I could get more data to train my network. I dully downloaded and started working with some files. Soon I knew I had a problem. Not only was the learning slow, but the memory usage of the program showed exponential growth, like using 400MB after a few iterations. Ha ha, I thought, so it's me against lazy evaluation again!

Well, I was right. A couple of strictness annotation in the neuron output calculation function helped performance, nearly making the code twice as fast, but didn't help the memory consumption. Going mad with ! (BangPatterns is your friend) didn't help any more. WTF?

Then I started to read a bit more about seq and Weak Head Normal Form. So when I put a BangPattern in front of a tuple, it evaluates the tuple but not what's inside it. And, lo and behold, if at each learning iteration I traced the network to the standard output, memory usage stayed constant! So I needed to tell Haskell to fully evaluate the network each time, so as not to keep thunks full of arithmetic operations hanging round. A distant memory of a package called deepseq provided me the final help I needed. I didn't use the package itself because I wanted to understand really what's going on, so I just roll out a few lines of my own:

And by calling deepSeq on my network in each iteration, my program runs in constant space! And I looked at the flat line of memory usage in my task manager, and I was pleased. Now, I can try to improve the speed of the dang thing, but that's another set of problems...


Anonymous said...

You might find that at least some of the performance issue you're having are directly because of the DeepSeq analog you've created. Running a recursive seq over a list every time it's passed from one function to another is actually quite slow.

You would probably benefit *heavily* from using strict data types, instead of type aliases for lazy types.

Float's already strict, so you can stick with the same declaration for it. But the other declarations should be something like:

type Value = Float
data Neuron = Neuron !(StrictList Value) !Value
data Network = Network !(StrictList Neuron) !(StrictList Neuron)
data StrictList a = Nil | Cons !a !(StrictList a)

The use of ! in data types is part of haskell 98, so doesn't require an extension. The semantics of it are that when the constructor is evaluated to WHNF, it also evaluates each element with a ! to WHNF.

The advantage to using strict data structures over something like deepseq is that the strictness is applied just once - when the expression resulting in the constructor is first evaluated. After that, no use of seq is required, meaning you can avoid repeated traversals of your data structures looking for values not in WHNF.

JP Moresmau said...

Thank you Carl. Insightful comments, but I don't think in my case I'm losing too much: I only run deepSeq once per training iteration, not through each function call. But yes, ensuring the lists are always strict, and not just at regular points in the program, may improve performance. Can I get a strict list with the normal list API (fold, zip, etc) somewhere, or do I have to roll out my own?

Anonymous said...

Looks like the answer is... Maybe. If you can use the vector library, it's certain to be faster than any sort of list. But I'm not sure if its api is appropriate for what you're doing.

See the documentation on hackage. It doesn't look like there's a strict singly-linked list type on hackage, though. So it will depend a lot on your use pattern.

Derek Elkins said...

I agree with Carl albeit more forcefully. Using deepSeq (or equivalents) is like smashing your door in with a sledgehammer because your key sticks in the lock. It gets the door open, but it lacks finesse and has some unpleasant consequences. Usually these types of problems can be solved with a few well placed seqs, often only one. However, it is much easier to design your code not to have these problems in the first place than to try to patch them up after the fact. While there are common patterns of good and bad behavior, usually you can figure out what you should do when defining each function by just giving a little thought to the behavior and usage patterns. Using strict data types, as Carl suggests, is an important way of getting good performance by construction. However, unlike Carl's data types suggests, the appropriate solution is usually not "make everything in sight strict." So for example, what is usually desired is not Carl's StrictList type, but a head strict list, data L a = Nil | Cons !a (L a). In general, you usually want the "spines" of your structures lazy and often the elements strict.