Wednesday, September 30, 2015

Symbolic differentiation to the rescue! ... Or not.

I'm still playing with my LSTM networks, inspired by a few blog posts about their "unreasonable efficiency". I spent a lot of time messing with genetic algorithms, but then I came back to more predictable methods, namely gradient descent. I was trying to optimize the performance of the cost function I use via AD (even asking help on stack overflow) when I stumbled across a couple of blog posts on symbolic differentiation (here and here). The last one, combining automatic and symbolic differentiation, struck a chord. If my differentiation calculation was taking so much time to calculate, could I just not calculate it once with symbolic expressions, then close the resulting expression over my variables (my LSTM network) repeatedly while applying the gradients. I should only pay the price for the derivation once!

So I extended the data type suggested in the blog post to include all operations I was using in my function, manage to sort out all the types and verify via a few tests that it would work. I had great hopes! But as soon as I started testing on a real LSTM, the code just crawled to a halt. I even thought I had some infinite loop, maybe some recursion on the expression, but testing more thoroughly showed that it was the sheer size of the generated expression that was the issue. A LSTM of 2 cells is represented in the cost function used by AD as an array of 44 doubles, so basically for a LSTM of 2 cells, I'll have 44 variables in my gradient expression. My simple test that tries to use a LSTM to generate the string "hello world!" uses 9 cells (9 different characters in the string) , which is 702 variables. Even printing the expression takes forever. Running it through a simplifying step takes also forever. So my idea was not as good as it first looked, but it was fun testing it!

The code for my expressions can be found here, and the code doing the gradient descent via the symbolic differentiation is here. All of that looks probably very naive for you calculus and machine learning experts but hey, I'm a human learning...  If everybody has any idea to speed up my code, I'l happily listen!

Friday, August 28, 2015

A Reddit clone (very basic) using Local Storage and no server side storage

Some weeks ago there was a bit of a tussle over at Reddit, with subreddits going private in protest, talk of censorship, etc. This was interesting to see, from a distance. It got me thinking about trying to replicate Reddit, a site where people can share stuff and have discussions, but without having the server control all the data. So I've developed a very basic Reddit clone, where you can post links and stories and comment on them. You can also upvote stories and comments, and downvote what you upvoted (cancel your upvote). But there's a catch: the site has no database. Things are kept in memory, and on the users machine, via the HTML 5 LocalStorage. That's all!

Everytime you upload or upvote something, it gets saved to your LocalStorage for the site. Once something gets downvoted to zero, it disappear. When you go to the site, whatever is in your LocalStorage gets uploaded and upvoted again. So stories can come and go as users connect and disconnect, and only the most popular stories will always be visible on the site (since at least one connected user needs to have uploaded or upvoted a story for it to be visible).

Of course, there is still a server, that could decide to censor stories, modify text, but at least you can always check that what you have on YOUR machine is the data you wanted. you can always copy that data elsewhere easily for safe keeping (browser developer tools let you inspect your LocalStorage content).

All in all, this was probably only an excuse for me to play with Javascript and Java (I did the server side in Java, since it was both easy to build and deploy) and Heroku. I've deployed the app at https://fivemegs.herokuapp.com  and the source code can be found at https://github.com/JPMoresmau/5megs. Any feedback welcome!

Sunday, August 02, 2015

Playing with Recurrent Neural Networks in Haskell

Some time ago an interesting article surfaced on Reddit, about using recurrent neural networks to generate plausible looking text. Andrej did a very good job in explaining how they work and some of the techniques and algorithms he used. I thought this was worth some explorations on my own, so I did a bit of research and tried to implement my own networks in Haskell.

I implemented the basics using the hmatrix package to have vectors and matrices. I went to Khan Academy to learn more about these topics because I had stopped math in school before learning matrices. Once I managed to implement hopefully correctly the algorithm I used the simple-genetic-algorithm package (well, my fork, uploaded on hackage) to evolve the network via genetic selection and mutation.

This gave some results, especially when I fixed my mutation code, that in fact was doing nothing, thus emphasizing again that mutation is a critical part of genetic algorithms, and not just crossovers.

Then I went to implement proper learning algorithms, a bit less random than genetic evolution. I was clearly out of my depth there, but learned a lot. To implement gradient descent I actually used the ad library and had to rewrite the algorithms without hmatrix so they would work on simple lists. Given that a couple of weeks again I didn't even understand what AD was, I'm happy I got some stuff to work. I was lucky to find some good code in python, even trying to understand the under-specified RMSProp algorithm.

The end result is not great, though. My network can learn the really complex sentence "Hello world!" and regenerate it after a couple of thousand iterations of the learning algorithm, in around a minute on my machine. I haven't managed to parallelize the treatment yet, and running on all my cores instead of just one make performance a lot worse. On more complex sentences performance becomes really bad, as both the network becomes bigger (more characters to deal with, etc) and the data to evaluate is bigger.

Trying to use a sparse representation for characters using 2 values instead of 1 as in the standard 1-of-k encoding didn't work well. I also realize that tweaks in the cost function had a huge impact on how well and how fast the network can learn. I have started to look at cutting the data in batches but I think I need to make the learning on simple data as fast as possible before moving on.

So, nothing earth shattering, no great progress or insight, but I feel I learned a lot by implementing my own stuff and running into all the issues that are probably well known of people familiar with these subjects. My artificial intelligence is still pretty dumb, but hopefully my own natural intelligence has progressed a bit!

The code is on github of course, so if anybody wants to have a look, I'll gladly take any feedback!!

Thursday, May 14, 2015

EclipseFP end of life (from me at least)

Hello, after a few years and several releases, I am now stopping the maintenance of EclipseFP and its companion Haskell packages (BuildWrapper, ghc-pkg-lib and scion-browser). If anybody wants to take over I' ll gladly give them all what's required to get started. Feel free to fork and continue!

Why am I stopping? Not for a very specific reason, though seeing that I had to adapt BuildWrapper to GHC 7.10 didn't exactly fill me with joy, but more generally I got tired of being the single maintainer for this project. I got a few pull requests over the years and some people have at some stage participated (thanks to you, you know who you are!), but not enough, and the biggest part of the work has been always on my shoulders. Let's say I got tired of getting an endless stream of issues reports and enhancement requests with nobody stepping up to actually address them.

Also, I don't think on the Haskell side it makes sense for me to keep on working on a GHC API wrapper like BuildWrapper. There are other alternatives, and with the release of ide-backend, backed up by FPComplete, a real company staffed by competent people who seem to have more that 24 hours per day to hack on Haskell tools, it makes more sense to have consolidation there.

The goal of EclipseFP was to make it easy for Java developers or other Eclipse users to move to Haskell, and I think this has been a failure, mainly due to the inherent complexity of the setup (the Haskell stack and the Java stack) and the technical challenges of integrating GHC and Cabal in a complex IDE like Eclipse. Of course we could have done better with the constraints we were operating under, but if more eyes had looked at the code and more hands had worked on deck we could have succeeded.

Personally I would now be interested in maybe getting the Atom editor to use ide-backend-client, or maybe work on a web based (but local) Haskell IDE. Some of my dabblings can be found at https://github.com/JPMoresmau/dbIDE. But I would much prefer to not work on my own, so if you have an open source project you think could do with my help, I'll be happy to hear about it!

I still think Haskell is a great language that would deserve a top-notch IDE, for newbies and experts alike, and I hope one day we'll get there.

For you EclipseFP users, you can of course keep using it as long as it works, but if no other maintainers step up, down the line you'll have to look for other options, as compatibility with the Haskell ecosystem will not be assured. Good luck!

Happy Haskell Hacking!

Thursday, February 05, 2015

Genetic evolution of a neural network

In  a previous post I was trying the LambdaNet library for neural networks, training one with a function my own little human brain designed. But can we make the network learn without knowing the actual algorithm?

As a implementation of a genetic algorithm, I use simple-genetic-algorithm, that is, ahem, simple to use compared to some other packages. I've modified it, though, to use MonadRandom instead of having to thread through the RandomGen, so if you want to run the code make sure to take the version from my fork in the MonadRandom branch, until these changes are released by the maintainer on Hackage.

To use a genetic algorithm, we need two things: a representation of the structure we want to evolve that is suitable for the genetic transformations, and the implementation of these transformations (crossover and mutation) themselves.

For the first task, we convert the network to simple vectors:

-- | Convert a network for only vectors structures
toVectors :: (Floating (Vector a), Container Vector a, Floating a) => 
  Network a -> [([Vector a],Vector a)]
toVectors = map (tovs . layerToShowable) . layers
  where
    tovs (ShowableLayer ws bs) = (toRows ws,bs)

-- | Convert back to a network given a network definition and vector structures
fromVectors :: (Floating (Vector a), Container Vector a, Floating a) => 
  LayerDefinition a -> [([Vector a],Vector a)] -> Network a 
fromVectors ld = Network . map ((\sl-> showableToLayer (sl,ld)) . frvs)
  where
    frvs (vs,v)=ShowableLayer (fromRows vs) v

Note that we need a LayerDefinition to rebuild the network from the Vectors. Currently each layer of the network has the same definition and the algorithm does NOT evolve this structure, only the weights and biases.

We're going to keep that information along with some Worlds that we use for fitness testing:

-- | Store the network information as vectors
data NetworkData = NetworkData [([Vector Float],Vector Float)] [World]
  deriving (Show,Read,Eq,Ord)


Then we can implement the Chromosome class. Crossover takes the average of all weights and biases of the two parents, mutation changes one value randomly. Of course other implementations could be found.

-- | Chromosome instance
instance Chromosome NetworkData where
    -- Take average
    crossover (NetworkData xs ws) (NetworkData ys _) =
        return [ NetworkData (Prelude.zipWith zipW xs ys) ws]
        where
          zipW (vs1,v1) (vs2,v2) = (Prelude.zipWith zipW1 vs1 vs2,zipW1 v1 v2)
          zipW1 = V.zipWith (\x y -> (x + y) / 2) 

    -- Mutate one weight randomly
    mutation (NetworkData xs ws) = do
        xs' <- font="" r1="" randomchange="" xs="">
        return $ NetworkData xs' ws
      where
        randomChange _ [] =  return []
        randomChange f xs2 = do
          idx <- -="" 1="" font="" getrandomr="" length="" xs2="">
          mapM (\(i,x)->if i==idx then f x else return x) $ zip [0..] xs2
        r1 (vs,v) = do
          (v3:vs2) <- font="" r2="" randomchange="" v:vs="">
          return (vs2,v3)
        r2 v2 = do
          idx2 <- -="" 1="" font="" getrandomr="" v.length="" v2="">
          dx   <- 20="" font="" getrandomr="" nbsp="">
          return $ v2  V.// [(idx2,dx)]         

    -- calculate fitness
    fitness (NetworkData xs ws) = sum (map (fitWorld xs) ws) / fromIntegral (length ws)

For the fitness function we calculate the fitness for each given world and average it. I'm not trying to be too clever with that code.
For each world we run the food searching algorithm from each corners, and evaluate how far we are from the target, and if we reached it how long it took us. So networks that find the food will always rank higher than the ones who don't, and the quicker among them will rank higher again.

-- | Calculate fitness on a given world   
fitWorld :: [([Vector Float],Vector Float)] -> World -> Double
fitWorld dat w = sum (map fitCorner $ corners $ wSize w) / 4
  where
    fitCorner pos = 
      let network = fromVectors layerDef dat
          poss = algSteps w (neuralAlg w network) 50 pos
          endSmell = currentSmell w $ last poss
          possLength = length poss
      in fitFormula w endSmell possLength (distance pos $ wFood w)
      
-- | Fitness formula
fitFormula :: World -> Int -> Int -> Int -> Double
fitFormula w endSmell possLenth dist = case fromIntegral endSmell / fromIntegral (wSmell w) of
    1 -> 2 + (fromIntegral dist / fromIntegral possLenth)
    n -> n

Then we just need a stop condition: either a maximum number of generations or reaching the maximum possible fitness (shortest path found from all corners)

-- | Maximum for the fitness
maxFit :: Double
maxFit = fitFormula (World undefined undefined 10 undefined) 10 10 10

-- | Stop function
stopf ::  NetworkData -> Int -> IO Bool
stopf nd gen= return $ gen > 300 || fitness nd == maxFit

And code to generate random networks

-- | Build a random network data
buildNetworkData :: (Monad m,RandomGen g) => [World] -> RandT g m NetworkData 
buildNetworkData ws= do
  g <- font="" getsplit="">
  let n = buildNetwork g
  return $ NetworkData (toVectors n) ws

We can then evolve our population of networks, say on two worlds, w and w2:

runGAIO 64 0.1 (buildNetworkData [w,w2]) stopf

 And we get after a couple of minutes a network that can find the food from another point that the tested corners:

Iteration:1
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#........#
#.......@#
#........#
#........#
##########
Iteration:2
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#......@.#
#........#
#........#
#........#
##########
Iteration:3
##########
#........#
#........#
#........#
#...X....#
#........#
#.....@..#
#........#
#........#
#........#
#........#
##########
Iteration:4
##########
#........#
#........#
#........#
#...X....#
#....@...#
#........#
#........#
#........#
#........#
#........#
##########
Iteration:5
##########
#........#
#........#
#........#
#...@....#
#........#
#........#
#........#
#........#
#........#
#........#
##########
Yummy!

Interested in all this? I recommend the ai-junkie tutorial, then! My code is of course on github. Feedback and suggestions welcome!

Sunday, February 01, 2015

HGraphStorage on Hackage

I've published my prototype of a graph database, HGraphStorage, on Hackage. Not to say that this is a database you should use in production! This is just a test bed to play around graph database concepts and low level algorithms for storage and retrieval.

Just releasing it was, ahem, fun, as somebody noticed that it didn't build in a clean sandbox, and then that the haddock comments were broken (it's a bit annoying that a one space error in Haddock markup causes the whole install to fail). So hopefully now it's ready for people to try and improve!

Pull requests welcome!!


Saturday, January 31, 2015

Searching for food using a LambdaNet neural network

From time to time I have a fit and start doing a bit of AI. I saw that a new version of LambdaNet was released so I though I would take it for a spin and try something (a little) bit more complicated that their XOR example.

The problem is simple. In a rectangular world, there is food is one place. The food "smells" and so each position in the world has a smell associated with it, the higher the smell meaning the closer to the food. Can we have a neural network that can navigate to the food?

A few definitions:

-- | Direction to go to
data Direction = TopLeft | Left | BottomLeft | Top | Center | Bottom | TopRight | Right | BottomRight
  deriving (Show,Read,Eq,Ord,Bounded,Enum)

-- | Strength of the smell
type Smell = Int
-- | Input information
type Input = [(Direction,Smell)]
-- | Position in world
type Position = (Int,Int)
-- | Size of the world
type Size = (Int,Int)
-- | Maximum number of steps to take
type Max = Int

-- | Number of directions
dirLength :: Int

dirLength = 1 + fromEnum (maxBound :: Direction)

-- | The world
data World = World
    { wSize   :: Size -- ^ size
    , wSmell  :: Smell -- ^ Smell of the food position
    , wSmells :: DM.Map Position Smell -- ^ All smell strengths by position
    }
    deriving (Show,Read,Eq,Ord)

-- | Function deciding in which direction to move
type StepFunction = Position -> Input -> Direction

Fundamental is the concept of Direction, since we want to move. Basically, when we are in a given position in a world, we can get nine directions and their associated smell (staying in the same place is one position). The function to decide what to do in a given position given all the smells of the neighbouring positions is called StepFunction.

The algorithm is easy to write for a human brain:

-- | The base algorithm: just go toward the highest smell
baseAlg :: StepFunction
baseAlg _ = fst . maximumBy (comparing snd)

Note that we ignore the current position, we only work with the input structure.

On top of that, we need function to build the world with the proper smell indicators, run the algorithm till we find the food, etc. All this code can be found in the GitHub project but is not really critical for our understanding of neural networks. One function of interest is running one step of the algorithm, showing the intermediate structures generated:

-- | Perform one step and return the information generated: direction/smell input, direction output
algStepExplain :: World -> StepFunction -> Position -> (Position,([(Direction,Smell)],Direction))

We get the position back, and the second element of the tuple is the input and the output of the StepFunction.

What we want to do is train a neural network, which should be easy since we have an algorithm we know will work well to find the best position to move to, and then use that network as an implementation of StepFunction.

The hardest in neural network programming is to design the input and output structures, so that they represent adequately the information about your problem in a format that the network can deal with. Here, we have a fixed input size: the smells of the 9 neighbouring positions. The StepFunction returns a Direction, and a Direction is an enum of nine values, so the output of the network could also be 9 values, the highest of these indicating the direction chosen by the network.

The networks in LambdaNet requires Vectors as their input and output data, so lets format the inputs:

-- | Format the inputs suitable for the network
formatInputs ::  World -> [(Direction,Smell)] ->  Vector Float
formatInputs w =   fromList . map (\i-> fromIntegral (snd i) / fromIntegral (wSmell w))    

So an input of 1 means we're on the food itself, and the input value will decrease as we're further from the food, while staying between 0 and 1.

If we have a network, the implementation of StepFunction is straightforward:

-- | Use the network to give the answer 
neuralAlg ::  World -> Network Float -> StepFunction
neuralAlg w n _ is = toEnum $ maxIndex $ predict (formatInputs w is) n 

We format the input, run predict, retrieve the index for the maximum value in the output vector, and use that as the index in the Direction enum. We just need a trained network!

To get that, we generate the training data from a given world. We list all possible positions in the world, calculate the corresponding inputs, run the basic algorithm on the input to get the optimal answer. For the result direction will set the output value to 1, and zero for all the others

-- | Training data: for each position in the world, use the base algorithm to get the training answer
trainData ::  World -> [(Vector Float, Vector Float)]
trainData w = map onePos $ allPositions w
  where
    onePos p = 
      let (_,(is,dir)) = algStepExplain w baseAlg p
          os = map (\(d,_)->if dir==d then 1 else 0) is 
      in (formatInputs w is,fromList os) 

From here, we unimaginatively reuse the LambdaNet tutorial code to build a network...

-- | Create the network
buildNetwork :: RandomGen g => g -> Network Float
buildNetwork g = createNetwork normals g $ replicate 3 $ LayerDefinition sigmoidNeuron dirLength connectFully

And train it:

-- | Train a network on several given worlds
train :: Network Float -> [World] -> Network Float
train n ws = 
  let t = BackpropTrainer (3 :: Float) quadraticCost quadraticCost'
      dat = concatMap trainData ws
  in trainUntilErrorLessThan n t online dat 0.01

What is critical here is that we train the network on several different worlds. I tried training only one world, the resulting network would perform well on worlds of the same size or smaller, but not bigger worlds, because it was too fit for the actual smell values. Training even on only two quite different worlds brought big enhancements in the intelligence of the network, at the code of longer learning time.

Once the network is trained, you can run it on several different worlds and see how it can find the food. There is a simple visualization module that allows you to see clearly the moves, for example:

Iteration 1
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#........#
#.......@#
#........#
#........#
########## 

(X being the food, @ the current position)

Iteration 3
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#......@.#
#........#
#........#
#........#
##########

Iteration 6
##########
#........#
#........#
#........#
#...@....#
#........#
#........#
#........#
#........#
#........#
#........#
##########

Yummy!

If you're interested, the full source code with tasty unit tests in on Github.

This is of course very basic, and only begs to be enhanced with more complicated worlds (maybe with walls, several sources of food, places with no smell at all, etc). What do you do when you don't know the best algorithm yourself? Maybe I'll come back for more later to find out!


Thursday, January 29, 2015

EclipseFP 2.6.4 released!

Hello all. I've just released EclipseFP 2.6.4, to provide a fix for people that could not create projects due to a NullPointerException. Unfortunately I am at the moment the only contributor to EclipseFP which means nobody else than myself tests it before a release, so regressions happen. I would love to see more people contribute and build from source!

A couple of fixes and small enhancements have been made, the release note are here.

Just update by pointing your Eclipse update feature to http://eclipsefp.sf.net/updates.

Happy Haskell Hacking!

Friday, January 23, 2015

Writing a low level graph database

I've been interested in graph databases for a long time, and I've developed several applications that offer an API close enough to a graph API, but with relational storage. I've also played with off the shelf graph databases, but I thought it would be fun to try an implement my own, in Haskell of course.
I found that in general literature is quite succinct on how database products manage their physical storage, so I've used some of the ideas behind the Neo4J database, as explained in the Graph Databases books and in a few slideshows online.
So I've written the start of a very low level graph database, writing directly on disk via Handles and some Binary instances. I try to use fixed length record so that their IDs translate easily into offsets in the file. Mostly everything ends up looking like linked lists on disk: vertices have a pointer to their first property and their first edge, and in turn these have pointers to the next property or edge. Vertex have pointers to edges linking to and from them.
I've also had some fun trying to implement an index trie on disk.
All in all, it's quite fun, even though I realize my implementations are quite naive, and I just hope that the OS disk caching is enough to make performance acceptable. I've written a small benchmark using the Hackage graph of packages as sample data, but I would need to write the same with a relational backend.

If anybody is interested in looking at the code or even participate, everything is of course on Github!

Monday, January 12, 2015

EclipseFP 2.6.3 released

A new version of EclipseFP, the Eclipse plugins for Haskell development, has been released. The release notes can be found here.

As usual install or update from Eclipse by using the update site http://eclipsefp.sf.net/updates.

Happy Haskell Hacking!