Saturday, October 01, 2016

Everything is broken

This week was I suppose fairly typical. Started using a new library, the excellent sqlg that provides the TinkerPop graph API on top of relational databases. Found a bug pretty quickly. Off we go to contribute to another open source project, good for my street cred I suppose. Let’s fork it, and open the source code in IDEA (Community edition). After years of hearing abuse about Eclipse, I’m now trying to use “the best IDE ever” (say all the fan boys) instead. Well, that didn’t go so well, apparently importing a Maven project and resolving the dependencies proves too much for IDEA. I fought with it for a while, then gave up.

Fired up Eclipse, it opened and built the sqlg projects without a hitch. Wrote a test, fixed the bug, raised a PR, got it accepted with a thank you, life is good.

Then I find another bug. Except that upon investigation, it’s not in sqlg, it’s in the actual TinkerPop code. The generics on a map are wrong, there are values that are not instances of the key class (thanks generics type erasure!). So I can fix by changing the method signature, or change the keys. Both will break existing code. Sigh…

Oh, and the TinkerPop project doesn’t build in Eclipse. The Eclipse compiler chokes on some Java 8 code. Off to the Eclipse bug tracker. Maybe I need to have three different Java IDEs to be able to handle all the projects I may find bugs in.


Everything isbroken. Off I go to my own code to add my own bugs.

Wednesday, August 03, 2016

Scratching an itch: generating Cabal data-files field automatically

Maybe I didn't look hard enough, but I'm not aware of a tool to generate the contents of the Cabal data-files field automatically when you have loads of folders to include. Cabal has very simple wildcard matching for files references in this field, by design (to avoid including too much data in a source distribution). So it only supports wildcards to replace the file name inside a directory for a given extension, and doesn't support sub directories.

For the reload project - first release on Hackage! - I had to include loads of files, all the Polymer web components the UI depends on, which are all on different sub directories, with a bunch of different extensions. So I wrote a little tool to generate the field automatically, and put it on Hackage too.

You pass it a directory name, possibly some subdirectories and extensions to ignore, and it generates all the required entries. Saved me loads of time, and scratched my own itch!

Saturday, July 16, 2016

Another Web-based Haskell IDE

After giving up on EclipseFP, I've worked a bit on haskell-ide-engine and leksah, contributing little things here and there to try to make the Haskell IDE ecosystem a little bit better. But at some point, I tried to update the GTK libraries on my Ubuntu machine to get leksah to run, and broke my whole desktop. Hours of fun followed to get back to a working system. So I thought again at my efforts last year to have a web based IDE for Haskell, because using the browser as the UI saves users a lot of pain, no UI libraries to install or update!

I started another little effort that I call "reload", both because it's another take on something I had started before and of course because it issues ":reload" commands to ghci when you change files. I have changes the setup, though. Now I use Scotty for the back end, with a REST API, and I use a pure Javascript front-end, with the Polymer library providing the web component framework and material design. I also use a web socket to send back GHCi results from the back end to the browser. I still use ghcid for the backend, maybe one day when haskell-ide-engine is released I can use that instead.

The functionality is fairly simple yet: there is a file browser on the left, and the editor (I'm using the ACE web editor) on the right. There is no save button, any change is automatically saved to disk (you use source version control, right?). On the server, there is a GHCi session for each cabal component in your project, and any change causes a reload, and you can see the errors/warnings in a menu and in the editor's annotations. You can build, run tests and benchmarks, and I've just added ":info" support. The fact that we're using GHCi makes it fast, but I'm sure there's loads of wrinkles to iron out still.

Anyway, if you're interested in a test ride, just clone from Github and get going!

Tuesday, February 23, 2016

Haskell Through of Disillusionment

I'm going through a hard pass with Haskell. I still love the language, of course, but some things in the ecosystem bother me as they impact seriously both the fun of writing Haskell and my productivity.

Sometimes it's the lack of good development environment that gets me. I have failed with EclipseFP to build a community and gather enough support, but it doesn't seem that other efforts go that much further. I contribute to Leksah and haskell-ide-engine, and there a plugins now for Atom or other modern editors, but when I do a spot of Android development I see what a good IDE is and how much I miss in Haskell.

But today it's more the open source libraries issues that irks me. It's great that we have loads of libraries, and they're open source and usually good quality. But of course the maintainers are all volunteers, and sometimes have better things to do. But there are a few libraries that I use in my code that now actually stop me from progressing. I have provided enhancements or bug fixes that I need for my projects as pull requests, and they languish in the maintainers' inboxes for months. So what am I to do? Hound the maintainers? Fork the library to apply my patches? Rewrite my code so it doesn't use that library but another, better maintained? Not use libraries but write everything myself? And of course if I offer to take over maintainership I'll end up being overloaded and will perpetuate the problem. I suppose the best approach will be to offer to be one of MANY maintainers for the library, so that I can merge my changes and release on Hackage if the others maintainers are otherwise busy/uninterested. I'm not sure how that can work in the general case, though, if loads of people are maintainers for loads of libraries, I believe that having one person with the vision and the drive for a project is best, but for little libraries it may not matter much.


Thursday, February 11, 2016

Starting out on Android

Oohhh, it's been a while since I've posted. I spent some time working on haskell-ide-engine, but then I got Haskell fatigue and decided to look at Android instead. A change is as good as a rest!

Coming from the Java world, it's not difficult to get into Android. I followed the guide on the Google site to get the basics. At first I got a bit afraid of the unholy mix of Gradle build scripts, visual layout editor, XML files and Java code, but you get over it. The IDE has some nice touches and it's good to get useful autocompletion in resources like ids and strings. I liked the warning about SDK level APIs and missing resources once I started translated my app into French.

But I have to say, when I was working on EclipseFP I often ran into IDEA fanatics that swore that IDEA was miles ahead of Eclipse. Android Studio famously moved from Eclipse to IDEA, and frankly, I don't see what the hype is about. Yes, Eclipse has some annoying bugs and idiosyncrasies. But Android Studio, at least with the default settings, as I haven't spent much time customizing it, is not such a wonderful IDE. It's slow to start up, the view layout is sometimes confusing, the font is too fine or small in places (I must be getting old, but sometimes I couldn't see that a semi colon was in fact a colon) and the warning/error markers in the gutter are way too small and nearly invisible, to the point that sometimes the build fails and I can't see where in the source is the error! Maybe there's a special "grumpy old man with failing eye sight" setting to make things bigger.

Since I believe that you learn by doing, I've developed my first app, and I'm in the process of publishing it to the App Store (so I can claim I'm a published Android developer in my CV, wink). Apart from the fact that you need to supply an icon in 2 million different sizes, which is challenging for somebody with my artistic abilities, it's a straightforward process. My app is an uninteresting workout log app (to record how much weight you supposedly lift at the gym) that's purely local (no social features, so you can't boast online about the weights you may have lifted), but hey, it's a start. The code is of course on Github.

I've also wrote a basic game using the framework presented here, and the nice asset from Kenney (see note about artistic abilities above). Nothing fancy, just a rabbit hopping over holes and getting carrots for points, but just seeing my little game running on my family's phones and tablets is nice! It's on Github too.

Wednesday, September 30, 2015

Symbolic differentiation to the rescue! ... Or not.

I'm still playing with my LSTM networks, inspired by a few blog posts about their "unreasonable efficiency". I spent a lot of time messing with genetic algorithms, but then I came back to more predictable methods, namely gradient descent. I was trying to optimize the performance of the cost function I use via AD (even asking help on stack overflow) when I stumbled across a couple of blog posts on symbolic differentiation (here and here). The last one, combining automatic and symbolic differentiation, struck a chord. If my differentiation calculation was taking so much time to calculate, could I just not calculate it once with symbolic expressions, then close the resulting expression over my variables (my LSTM network) repeatedly while applying the gradients. I should only pay the price for the derivation once!

So I extended the data type suggested in the blog post to include all operations I was using in my function, manage to sort out all the types and verify via a few tests that it would work. I had great hopes! But as soon as I started testing on a real LSTM, the code just crawled to a halt. I even thought I had some infinite loop, maybe some recursion on the expression, but testing more thoroughly showed that it was the sheer size of the generated expression that was the issue. A LSTM of 2 cells is represented in the cost function used by AD as an array of 44 doubles, so basically for a LSTM of 2 cells, I'll have 44 variables in my gradient expression. My simple test that tries to use a LSTM to generate the string "hello world!" uses 9 cells (9 different characters in the string) , which is 702 variables. Even printing the expression takes forever. Running it through a simplifying step takes also forever. So my idea was not as good as it first looked, but it was fun testing it!

The code for my expressions can be found here, and the code doing the gradient descent via the symbolic differentiation is here. All of that looks probably very naive for you calculus and machine learning experts but hey, I'm a human learning...  If everybody has any idea to speed up my code, I'l happily listen!

Friday, August 28, 2015

A Reddit clone (very basic) using Local Storage and no server side storage

Some weeks ago there was a bit of a tussle over at Reddit, with subreddits going private in protest, talk of censorship, etc. This was interesting to see, from a distance. It got me thinking about trying to replicate Reddit, a site where people can share stuff and have discussions, but without having the server control all the data. So I've developed a very basic Reddit clone, where you can post links and stories and comment on them. You can also upvote stories and comments, and downvote what you upvoted (cancel your upvote). But there's a catch: the site has no database. Things are kept in memory, and on the users machine, via the HTML 5 LocalStorage. That's all!

Everytime you upload or upvote something, it gets saved to your LocalStorage for the site. Once something gets downvoted to zero, it disappear. When you go to the site, whatever is in your LocalStorage gets uploaded and upvoted again. So stories can come and go as users connect and disconnect, and only the most popular stories will always be visible on the site (since at least one connected user needs to have uploaded or upvoted a story for it to be visible).

Of course, there is still a server, that could decide to censor stories, modify text, but at least you can always check that what you have on YOUR machine is the data you wanted. you can always copy that data elsewhere easily for safe keeping (browser developer tools let you inspect your LocalStorage content).

All in all, this was probably only an excuse for me to play with Javascript and Java (I did the server side in Java, since it was both easy to build and deploy) and Heroku. I've deployed the app at https://fivemegs.herokuapp.com  and the source code can be found at https://github.com/JPMoresmau/5megs. Any feedback welcome!