Saturday, July 16, 2016

Another Web-based Haskell IDE

After giving up on EclipseFP, I've worked a bit on haskell-ide-engine and leksah, contributing little things here and there to try to make the Haskell IDE ecosystem a little bit better. But at some point, I tried to update the GTK libraries on my Ubuntu machine to get leksah to run, and broke my whole desktop. Hours of fun followed to get back to a working system. So I thought again at my efforts last year to have a web based IDE for Haskell, because using the browser as the UI saves users a lot of pain, no UI libraries to install or update!

I started another little effort that I call "reload", both because it's another take on something I had started before and of course because it issues ":reload" commands to ghci when you change files. I have changes the setup, though. Now I use Scotty for the back end, with a REST API, and I use a pure Javascript front-end, with the Polymer library providing the web component framework and material design. I also use a web socket to send back GHCi results from the back end to the browser. I still use ghcid for the backend, maybe one day when haskell-ide-engine is released I can use that instead.

The functionality is fairly simple yet: there is a file browser on the left, and the editor (I'm using the ACE web editor) on the right. There is no save button, any change is automatically saved to disk (you use source version control, right?). On the server, there is a GHCi session for each cabal component in your project, and any change causes a reload, and you can see the errors/warnings in a menu and in the editor's annotations. You can build, run tests and benchmarks, and I've just added ":info" support. The fact that we're using GHCi makes it fast, but I'm sure there's loads of wrinkles to iron out still.

Anyway, if you're interested in a test ride, just clone from Github and get going!

Tuesday, February 23, 2016

Haskell Through of Disillusionment

I'm going through a hard pass with Haskell. I still love the language, of course, but some things in the ecosystem bother me as they impact seriously both the fun of writing Haskell and my productivity.

Sometimes it's the lack of good development environment that gets me. I have failed with EclipseFP to build a community and gather enough support, but it doesn't seem that other efforts go that much further. I contribute to Leksah and haskell-ide-engine, and there a plugins now for Atom or other modern editors, but when I do a spot of Android development I see what a good IDE is and how much I miss in Haskell.

But today it's more the open source libraries issues that irks me. It's great that we have loads of libraries, and they're open source and usually good quality. But of course the maintainers are all volunteers, and sometimes have better things to do. But there are a few libraries that I use in my code that now actually stop me from progressing. I have provided enhancements or bug fixes that I need for my projects as pull requests, and they languish in the maintainers' inboxes for months. So what am I to do? Hound the maintainers? Fork the library to apply my patches? Rewrite my code so it doesn't use that library but another, better maintained? Not use libraries but write everything myself? And of course if I offer to take over maintainership I'll end up being overloaded and will perpetuate the problem. I suppose the best approach will be to offer to be one of MANY maintainers for the library, so that I can merge my changes and release on Hackage if the others maintainers are otherwise busy/uninterested. I'm not sure how that can work in the general case, though, if loads of people are maintainers for loads of libraries, I believe that having one person with the vision and the drive for a project is best, but for little libraries it may not matter much.


Thursday, February 11, 2016

Starting out on Android

Oohhh, it's been a while since I've posted. I spent some time working on haskell-ide-engine, but then I got Haskell fatigue and decided to look at Android instead. A change is as good as a rest!

Coming from the Java world, it's not difficult to get into Android. I followed the guide on the Google site to get the basics. At first I got a bit afraid of the unholy mix of Gradle build scripts, visual layout editor, XML files and Java code, but you get over it. The IDE has some nice touches and it's good to get useful autocompletion in resources like ids and strings. I liked the warning about SDK level APIs and missing resources once I started translated my app into French.

But I have to say, when I was working on EclipseFP I often ran into IDEA fanatics that swore that IDEA was miles ahead of Eclipse. Android Studio famously moved from Eclipse to IDEA, and frankly, I don't see what the hype is about. Yes, Eclipse has some annoying bugs and idiosyncrasies. But Android Studio, at least with the default settings, as I haven't spent much time customizing it, is not such a wonderful IDE. It's slow to start up, the view layout is sometimes confusing, the font is too fine or small in places (I must be getting old, but sometimes I couldn't see that a semi colon was in fact a colon) and the warning/error markers in the gutter are way too small and nearly invisible, to the point that sometimes the build fails and I can't see where in the source is the error! Maybe there's a special "grumpy old man with failing eye sight" setting to make things bigger.

Since I believe that you learn by doing, I've developed my first app, and I'm in the process of publishing it to the App Store (so I can claim I'm a published Android developer in my CV, wink). Apart from the fact that you need to supply an icon in 2 million different sizes, which is challenging for somebody with my artistic abilities, it's a straightforward process. My app is an uninteresting workout log app (to record how much weight you supposedly lift at the gym) that's purely local (no social features, so you can't boast online about the weights you may have lifted), but hey, it's a start. The code is of course on Github.

I've also wrote a basic game using the framework presented here, and the nice asset from Kenney (see note about artistic abilities above). Nothing fancy, just a rabbit hopping over holes and getting carrots for points, but just seeing my little game running on my family's phones and tablets is nice! It's on Github too.

Wednesday, September 30, 2015

Symbolic differentiation to the rescue! ... Or not.

I'm still playing with my LSTM networks, inspired by a few blog posts about their "unreasonable efficiency". I spent a lot of time messing with genetic algorithms, but then I came back to more predictable methods, namely gradient descent. I was trying to optimize the performance of the cost function I use via AD (even asking help on stack overflow) when I stumbled across a couple of blog posts on symbolic differentiation (here and here). The last one, combining automatic and symbolic differentiation, struck a chord. If my differentiation calculation was taking so much time to calculate, could I just not calculate it once with symbolic expressions, then close the resulting expression over my variables (my LSTM network) repeatedly while applying the gradients. I should only pay the price for the derivation once!

So I extended the data type suggested in the blog post to include all operations I was using in my function, manage to sort out all the types and verify via a few tests that it would work. I had great hopes! But as soon as I started testing on a real LSTM, the code just crawled to a halt. I even thought I had some infinite loop, maybe some recursion on the expression, but testing more thoroughly showed that it was the sheer size of the generated expression that was the issue. A LSTM of 2 cells is represented in the cost function used by AD as an array of 44 doubles, so basically for a LSTM of 2 cells, I'll have 44 variables in my gradient expression. My simple test that tries to use a LSTM to generate the string "hello world!" uses 9 cells (9 different characters in the string) , which is 702 variables. Even printing the expression takes forever. Running it through a simplifying step takes also forever. So my idea was not as good as it first looked, but it was fun testing it!

The code for my expressions can be found here, and the code doing the gradient descent via the symbolic differentiation is here. All of that looks probably very naive for you calculus and machine learning experts but hey, I'm a human learning...  If everybody has any idea to speed up my code, I'l happily listen!

Friday, August 28, 2015

A Reddit clone (very basic) using Local Storage and no server side storage

Some weeks ago there was a bit of a tussle over at Reddit, with subreddits going private in protest, talk of censorship, etc. This was interesting to see, from a distance. It got me thinking about trying to replicate Reddit, a site where people can share stuff and have discussions, but without having the server control all the data. So I've developed a very basic Reddit clone, where you can post links and stories and comment on them. You can also upvote stories and comments, and downvote what you upvoted (cancel your upvote). But there's a catch: the site has no database. Things are kept in memory, and on the users machine, via the HTML 5 LocalStorage. That's all!

Everytime you upload or upvote something, it gets saved to your LocalStorage for the site. Once something gets downvoted to zero, it disappear. When you go to the site, whatever is in your LocalStorage gets uploaded and upvoted again. So stories can come and go as users connect and disconnect, and only the most popular stories will always be visible on the site (since at least one connected user needs to have uploaded or upvoted a story for it to be visible).

Of course, there is still a server, that could decide to censor stories, modify text, but at least you can always check that what you have on YOUR machine is the data you wanted. you can always copy that data elsewhere easily for safe keeping (browser developer tools let you inspect your LocalStorage content).

All in all, this was probably only an excuse for me to play with Javascript and Java (I did the server side in Java, since it was both easy to build and deploy) and Heroku. I've deployed the app at https://fivemegs.herokuapp.com  and the source code can be found at https://github.com/JPMoresmau/5megs. Any feedback welcome!

Sunday, August 02, 2015

Playing with Recurrent Neural Networks in Haskell

Some time ago an interesting article surfaced on Reddit, about using recurrent neural networks to generate plausible looking text. Andrej did a very good job in explaining how they work and some of the techniques and algorithms he used. I thought this was worth some explorations on my own, so I did a bit of research and tried to implement my own networks in Haskell.

I implemented the basics using the hmatrix package to have vectors and matrices. I went to Khan Academy to learn more about these topics because I had stopped math in school before learning matrices. Once I managed to implement hopefully correctly the algorithm I used the simple-genetic-algorithm package (well, my fork, uploaded on hackage) to evolve the network via genetic selection and mutation.

This gave some results, especially when I fixed my mutation code, that in fact was doing nothing, thus emphasizing again that mutation is a critical part of genetic algorithms, and not just crossovers.

Then I went to implement proper learning algorithms, a bit less random than genetic evolution. I was clearly out of my depth there, but learned a lot. To implement gradient descent I actually used the ad library and had to rewrite the algorithms without hmatrix so they would work on simple lists. Given that a couple of weeks again I didn't even understand what AD was, I'm happy I got some stuff to work. I was lucky to find some good code in python, even trying to understand the under-specified RMSProp algorithm.

The end result is not great, though. My network can learn the really complex sentence "Hello world!" and regenerate it after a couple of thousand iterations of the learning algorithm, in around a minute on my machine. I haven't managed to parallelize the treatment yet, and running on all my cores instead of just one make performance a lot worse. On more complex sentences performance becomes really bad, as both the network becomes bigger (more characters to deal with, etc) and the data to evaluate is bigger.

Trying to use a sparse representation for characters using 2 values instead of 1 as in the standard 1-of-k encoding didn't work well. I also realize that tweaks in the cost function had a huge impact on how well and how fast the network can learn. I have started to look at cutting the data in batches but I think I need to make the learning on simple data as fast as possible before moving on.

So, nothing earth shattering, no great progress or insight, but I feel I learned a lot by implementing my own stuff and running into all the issues that are probably well known of people familiar with these subjects. My artificial intelligence is still pretty dumb, but hopefully my own natural intelligence has progressed a bit!

The code is on github of course, so if anybody wants to have a look, I'll gladly take any feedback!!

Thursday, May 14, 2015

EclipseFP end of life (from me at least)

Hello, after a few years and several releases, I am now stopping the maintenance of EclipseFP and its companion Haskell packages (BuildWrapper, ghc-pkg-lib and scion-browser). If anybody wants to take over I' ll gladly give them all what's required to get started. Feel free to fork and continue!

Why am I stopping? Not for a very specific reason, though seeing that I had to adapt BuildWrapper to GHC 7.10 didn't exactly fill me with joy, but more generally I got tired of being the single maintainer for this project. I got a few pull requests over the years and some people have at some stage participated (thanks to you, you know who you are!), but not enough, and the biggest part of the work has been always on my shoulders. Let's say I got tired of getting an endless stream of issues reports and enhancement requests with nobody stepping up to actually address them.

Also, I don't think on the Haskell side it makes sense for me to keep on working on a GHC API wrapper like BuildWrapper. There are other alternatives, and with the release of ide-backend, backed up by FPComplete, a real company staffed by competent people who seem to have more that 24 hours per day to hack on Haskell tools, it makes more sense to have consolidation there.

The goal of EclipseFP was to make it easy for Java developers or other Eclipse users to move to Haskell, and I think this has been a failure, mainly due to the inherent complexity of the setup (the Haskell stack and the Java stack) and the technical challenges of integrating GHC and Cabal in a complex IDE like Eclipse. Of course we could have done better with the constraints we were operating under, but if more eyes had looked at the code and more hands had worked on deck we could have succeeded.

Personally I would now be interested in maybe getting the Atom editor to use ide-backend-client, or maybe work on a web based (but local) Haskell IDE. Some of my dabblings can be found at https://github.com/JPMoresmau/dbIDE. But I would much prefer to not work on my own, so if you have an open source project you think could do with my help, I'll be happy to hear about it!

I still think Haskell is a great language that would deserve a top-notch IDE, for newbies and experts alike, and I hope one day we'll get there.

For you EclipseFP users, you can of course keep using it as long as it works, but if no other maintainers step up, down the line you'll have to look for other options, as compatibility with the Haskell ecosystem will not be assured. Good luck!

Happy Haskell Hacking!