Monday, September 06, 2010

EclipseFP development: using the GHC Lexer

The new version of EclipseFP is slowly coming along. Slowly, because of some holidays, because of a lot of bug fixes, and mainly because of some big changes.
There were a few bugs revolving around syntax highlighting. The original code was using some lexing rules from the JFace packages in Eclipse, but it didn't cover some common enough cases of Haskell syntax, and of course was fairly hopeless at more complex stuff (things like Template Haskell). So I could either trod on and fix the lexing rules for everything, or I could use a different approach. So I decided to use Scion again, and to add support for lexing. Now Scion exposes a method that calls the GHC Lexer on an arbitrary Haskell source. We use the result to color highlight properly the Haskell source code. Of course I have to handle things like CPP preprocessing directives and literal source files myself, but that was ok. The two main issues I had to tackle were performance, and advanced lexing options.
Performance was poor at start, and there are two reasons: the underlying Scion code deals with JSON Strings while the socket code uses ByteStrings, and, I think, the Eclipse console. For the first issue, I'm using now AttoJson, so that Json marshalling is done with ByteStrings and try to use ByteString as much as possible in the code. The Eclipse console is a different matter. I think when too much data is written to it it struggles to update the console control, blocks the UI thread and the whole UI becomes unresponsive. For the moment, I have disabled server output in the console for commands (since now getting the tokens from the source code is quite a big response). Since lexing is also done in the UI thread, we've probably lost a bit in that area, but I feel the advantages (less Haskell knowledge in the Java code, more reliance on the GHC code, which I suspect knows about Haskell pretty well) outweigh the costs.
One funny thing with the Lexer I found: for example when you have TemplateHaskell code in your source file, and you have the TemplateHaskell pragma, the Lexer doesn't automatically set the TemplateHaskell flag. So I decided that I was going to enable all the lexer flags, since performance didn't seem to suffer. Be liberal in what you parse...


Anyway, I'm not going to release the next version right now, a bit of testing is still needed, and there are a few little things that I want to fix. The testing is easy, in a way: I use the development version of EclipseFP to hack the Scion code, so by eating my own dog food I hope that I can release something of value to Haskell programmers.
So don't worry, EclipseFP is coming soon!

3 comments:

Thomas ten Cate said...

That's great news! Support for CPP too... you sure have been thorough :D

Exception said...

Brilliant! I think it is a wise decision to not duplicate the parsing intelligence.

LĂ©onard said...

Great ! keep your good work !