JP Moresmau's Programming Blog: performance

Showing posts with label performance. Show all posts

Wednesday, September 30, 2015

Symbolic differentiation to the rescue! ... Or not.

I'm still playing with my LSTM networks, inspired by a few blog posts about their "unreasonable efficiency". I spent a lot of time messing with genetic algorithms, but then I came back to more predictable methods, namely gradient descent. I was trying to optimize the performance of the cost function I use via AD (even asking help on stack overflow) when I stumbled across a couple of blog posts on symbolic differentiation (here and here). The last one, combining automatic and symbolic differentiation, struck a chord. If my differentiation calculation was taking so much time to calculate, could I just not calculate it once with symbolic expressions, then close the resulting expression over my variables (my LSTM network) repeatedly while applying the gradients. I should only pay the price for the derivation once!

So I extended the data type suggested in the blog post to include all operations I was using in my function, manage to sort out all the types and verify via a few tests that it would work. I had great hopes! But as soon as I started testing on a real LSTM, the code just crawled to a halt. I even thought I had some infinite loop, maybe some recursion on the expression, but testing more thoroughly showed that it was the sheer size of the generated expression that was the issue. A LSTM of 2 cells is represented in the cost function used by AD as an array of 44 doubles, so basically for a LSTM of 2 cells, I'll have 44 variables in my gradient expression. My simple test that tries to use a LSTM to generate the string "hello world!" uses 9 cells (9 different characters in the string) , which is 702 variables. Even printing the expression takes forever. Running it through a simplifying step takes also forever. So my idea was not as good as it first looked, but it was fun testing it!

The code for my expressions can be found here, and the code doing the gradient descent via the symbolic differentiation is here. All of that looks probably very naive for you calculus and machine learning experts but hey, I'm a human learning... If everybody has any idea to speed up my code, I'l happily listen!

Tuesday, February 05, 2013

EclipseFP and performance

OK, ok, performance in EclipseFP could be better. I'll try to present a bit the challenges and why things are the way they are.

You see, EclipseFP used scion for a while, and scion was a long running process, which ended up eating loads of memory, and had a few issues due to the GHC API maintaining state and not releasing memory (I've written about this elsewhere). So I rewrote the whole Haskell backend with BuildWrapper, and I took opposite approach: BuildWrapper is an short-lived executable. You start BuildWrapper with some parameters, it does its stuff and outputs some JSON answer. And that's fine for a lot of operations. But people have started to complain that when they edit a source file, the feedback loop is quite slow, and they keep seeing that the synchronization job keeps on running all the time. That's because every time Eclipse says that the editor has changed (not every keystroke, luckily), EclipseFP goes back to BuildWrapper to analyze the source. So it fires the BuildWrapper executable (and only this can be slow if you have an over zealous anti virus running), which in turn builds up a GHC session, analyses the AST, writes it to JSON, and tears everything down. For small projects and fast machines, this is acceptable. But I eat my own dogfood, of course, and when working on BuildWrapper itself, I noticed things could be too slow for comfort. Sometimes it would take 4 seconds to analyze completely a file.

So I went the middle route. I create another command in BuildWrapper that makes the executable stay around and listen for more commands. More specifically, there is now a long running build command, that can repeatedly analyze a file while staying in the GHC session. It won't have the same issues as the long running scion because it's tied to the file you're editing. When you close the file in Eclipse, the executable dies. When you change something in the Cabal file, the executable restarts to it can take into account new modules, flags or dependencies. The synchronization job now runs in something like a quarter of the time, and even faster.

I believe we have a happy medium: most operations are short-lived and we don't need to worry about memory usage or stray processes, but where performance matters (when you're in the flow and writing Haskell and don't want to wait for the editor to catch up) we have a long running process that's efficient, but not too persistent.

This will be part of the upcoming 2.5.0 release of EclipseFP. Of course people are welcome to get the source and test it for themselves before the official release!

Friday, September 24, 2010

Haskell Neural Network: plugging a space leak

First, the good news: last week when I posted about my little digit recognition program I had made a mistake in my test code: when I give a rounded eight to my network, it does recognize it, even though it's been trained only with a square eight! Yoohoo!

Also, a few people pointed to online database of handwritten digits so that I could get more data to train my network. I dully downloaded and started working with some files. Soon I knew I had a problem. Not only was the learning slow, but the memory usage of the program showed exponential growth, like using 400MB after a few iterations. Ha ha, I thought, so it's me against lazy evaluation again!

Well, I was right. A couple of strictness annotation in the neuron output calculation function helped performance, nearly making the code twice as fast, but didn't help the memory consumption. Going mad with ! (BangPatterns is your friend) didn't help any more. WTF?

Then I started to read a bit more about seq and Weak Head Normal Form. So when I put a BangPattern in front of a tuple, it evaluates the tuple but not what's inside it. And, lo and behold, if at each learning iteration I traced the network to the standard output, memory usage stayed constant! So I needed to tell Haskell to fully evaluate the network each time, so as not to keep thunks full of arithmetic operations hanging round. A distant memory of a package called deepseq provided me the final help I needed. I didn't use the package itself because I wanted to understand really what's going on, so I just roll out a few lines of my own:

And by calling deepSeq on my network in each iteration, my program runs in constant space! And I looked at the flat line of memory usage in my task manager, and I was pleased. Now, I can try to improve the speed of the dang thing, but that's another set of problems...

Monday, September 06, 2010

EclipseFP development: using the GHC Lexer

The new version of EclipseFP is slowly coming along. Slowly, because of some holidays, because of a lot of bug fixes, and mainly because of some big changes.
There were a few bugs revolving around syntax highlighting. The original code was using some lexing rules from the JFace packages in Eclipse, but it didn't cover some common enough cases of Haskell syntax, and of course was fairly hopeless at more complex stuff (things like Template Haskell). So I could either trod on and fix the lexing rules for everything, or I could use a different approach. So I decided to use Scion again, and to add support for lexing. Now Scion exposes a method that calls the GHC Lexer on an arbitrary Haskell source. We use the result to color highlight properly the Haskell source code. Of course I have to handle things like CPP preprocessing directives and literal source files myself, but that was ok. The two main issues I had to tackle were performance, and advanced lexing options.
Performance was poor at start, and there are two reasons: the underlying Scion code deals with JSON Strings while the socket code uses ByteStrings, and, I think, the Eclipse console. For the first issue, I'm using now AttoJson, so that Json marshalling is done with ByteStrings and try to use ByteString as much as possible in the code. The Eclipse console is a different matter. I think when too much data is written to it it struggles to update the console control, blocks the UI thread and the whole UI becomes unresponsive. For the moment, I have disabled server output in the console for commands (since now getting the tokens from the source code is quite a big response). Since lexing is also done in the UI thread, we've probably lost a bit in that area, but I feel the advantages (less Haskell knowledge in the Java code, more reliance on the GHC code, which I suspect knows about Haskell pretty well) outweigh the costs.
One funny thing with the Lexer I found: for example when you have TemplateHaskell code in your source file, and you have the TemplateHaskell pragma, the Lexer doesn't automatically set the TemplateHaskell flag. So I decided that I was going to enable all the lexer flags, since performance didn't seem to suffer. Be liberal in what you parse...

Anyway, I'm not going to release the next version right now, a bit of testing is still needed, and there are a few little things that I want to fix. The testing is easy, in a way: I use the development version of EclipseFP to hack the Scion code, so by eating my own dog food I hope that I can release something of value to Haskell programmers.
So don't worry, EclipseFP is coming soon!

Tuesday, September 02, 2008

Arrays to the rescue!

Some time ago I blogged about Haskell Generics and how they simplified my code of a little RPG game I'm writing. My data types were designed with lists. For example a character has traits, and traits are a list of Characteristic objects, that have a name (Strength, Dexterity, etc.) and values. The Generics code helped me to deal with finding the right item in the list depending on the name, updating its value, etc. Records were no good because you can pass the generated function as an accessor but not as a mutator.

I have some unit tests that generate random characters, and I thought they were pretty slow, which I thought was due to the random number generator. Then I ran the tests with profiling on. Guess what? The Generics code amounted for a huge amount of the time spent. And I'd say the underlying list operation, with loads of list updates, were not great either.

So I got rid of the Generics code, and I decided to replace my lists by something else. In the Java world (my day time job) I rely heavily on Maps. I thought it was a bit overkill in this case, so I turned to the Haskell array package. And there was light!

You see, Arrays in Haskell are a different beast that arrays in other languages. You can use different types for the indices, not only integers. So since the data type representing the names of characteristics is a simple enumeration, I could use that as the array index. So the characters traits are an array with the names as indices:


data Characteristic = Strength | Dexterity | ...
  deriving (Show,Read,Eq,Enum,Bounded,Ord,Ix)

Ix is the type class that an array index type must implement


type CharacteristicRatings=Array Characteristic Rating

data Character = Character {
  ...
  ,traits::CharacteristicRatings
  ...
  }

And Rating follow the same pattern, since a rating has the current value for the characteristic, the normal value, etc.

Then the array operations make it easy enough to read and modify values: ! to get the value at the specified index, // to update. I get type safety and expressiveness (no need to even use toEnum and fromEnum), and performance...

Performance? With the Generics code and lists, my tests took 8 seconds. With arrays and no Generics, 0.26s. Nuff said.

JP Moresmau's Programming Blog