JP Moresmau's Programming Blog: Neural Network

Showing posts with label Neural Network. Show all posts

Sunday, March 10, 2019

Rust for linear algebra and neural networks

From time to time I do a little bit of fooling around with algorithms and AI, I suppose it's a break from what I do in work, which is more about building business applications. I don't have a real goal in mind, I don't think I'm going to stumble about an algorithm for general intelligence or something, but I like to understand how things work at a low level.

So I've writing a bit of Rust code to implement some linear algebra (some matrix and vector operations) and use that to build some small neural networks. You can find the current state of that code here. Of course, if you really want to use production ready code for linear algebra you should check https://www.rustsim.org/ and if neural networks and AI in Rust are what you're after please check https://github.com/AtheMathmo/rusty-machine. Note that rusty-machine is still in a state of flux since it's using its own linear algebra library but there's talk about using nalgebra from rustsim hopefully that will bring together all these initiatives in one unified Rust platform for numeric operations and AI.

I have to say I'm still impressed with Rust. I like that you can do a lot of things with a functional approach, while still being able to update variable values or do other imperative operations when it makes sense. It does feel a lot like "pragmatic Haskell" at times. So far I haven't struggled with the borrow checker, but the code I've written so far is fairly simple and self-contained.

As always, I'm still wondering if there are some projects that it would make sense for me to contribute too. I'm certainly not a AI or math expert, but if you have a Rust project that could do with some contributions, let me know!

Happy Rust Hacking!

Wednesday, September 30, 2015

Symbolic differentiation to the rescue! ... Or not.

I'm still playing with my LSTM networks, inspired by a few blog posts about their "unreasonable efficiency". I spent a lot of time messing with genetic algorithms, but then I came back to more predictable methods, namely gradient descent. I was trying to optimize the performance of the cost function I use via AD (even asking help on stack overflow) when I stumbled across a couple of blog posts on symbolic differentiation (here and here). The last one, combining automatic and symbolic differentiation, struck a chord. If my differentiation calculation was taking so much time to calculate, could I just not calculate it once with symbolic expressions, then close the resulting expression over my variables (my LSTM network) repeatedly while applying the gradients. I should only pay the price for the derivation once!

So I extended the data type suggested in the blog post to include all operations I was using in my function, manage to sort out all the types and verify via a few tests that it would work. I had great hopes! But as soon as I started testing on a real LSTM, the code just crawled to a halt. I even thought I had some infinite loop, maybe some recursion on the expression, but testing more thoroughly showed that it was the sheer size of the generated expression that was the issue. A LSTM of 2 cells is represented in the cost function used by AD as an array of 44 doubles, so basically for a LSTM of 2 cells, I'll have 44 variables in my gradient expression. My simple test that tries to use a LSTM to generate the string "hello world!" uses 9 cells (9 different characters in the string) , which is 702 variables. Even printing the expression takes forever. Running it through a simplifying step takes also forever. So my idea was not as good as it first looked, but it was fun testing it!

The code for my expressions can be found here, and the code doing the gradient descent via the symbolic differentiation is here. All of that looks probably very naive for you calculus and machine learning experts but hey, I'm a human learning... If everybody has any idea to speed up my code, I'l happily listen!

Sunday, August 02, 2015

Playing with Recurrent Neural Networks in Haskell

Some time ago an interesting article surfaced on Reddit, about using recurrent neural networks to generate plausible looking text. Andrej did a very good job in explaining how they work and some of the techniques and algorithms he used. I thought this was worth some explorations on my own, so I did a bit of research and tried to implement my own networks in Haskell.

I implemented the basics using the hmatrix package to have vectors and matrices. I went to Khan Academy to learn more about these topics because I had stopped math in school before learning matrices. Once I managed to implement hopefully correctly the algorithm I used the simple-genetic-algorithm package (well, my fork, uploaded on hackage) to evolve the network via genetic selection and mutation.

This gave some results, especially when I fixed my mutation code, that in fact was doing nothing, thus emphasizing again that mutation is a critical part of genetic algorithms, and not just crossovers.

Then I went to implement proper learning algorithms, a bit less random than genetic evolution. I was clearly out of my depth there, but learned a lot. To implement gradient descent I actually used the ad library and had to rewrite the algorithms without hmatrix so they would work on simple lists. Given that a couple of weeks again I didn't even understand what AD was, I'm happy I got some stuff to work. I was lucky to find some good code in python, even trying to understand the under-specified RMSProp algorithm.

The end result is not great, though. My network can learn the really complex sentence "Hello world!" and regenerate it after a couple of thousand iterations of the learning algorithm, in around a minute on my machine. I haven't managed to parallelize the treatment yet, and running on all my cores instead of just one make performance a lot worse. On more complex sentences performance becomes really bad, as both the network becomes bigger (more characters to deal with, etc) and the data to evaluate is bigger.

Trying to use a sparse representation for characters using 2 values instead of 1 as in the standard 1-of-k encoding didn't work well. I also realize that tweaks in the cost function had a huge impact on how well and how fast the network can learn. I have started to look at cutting the data in batches but I think I need to make the learning on simple data as fast as possible before moving on.

So, nothing earth shattering, no great progress or insight, but I feel I learned a lot by implementing my own stuff and running into all the issues that are probably well known of people familiar with these subjects. My artificial intelligence is still pretty dumb, but hopefully my own natural intelligence has progressed a bit!

The code is on github of course, so if anybody wants to have a look, I'll gladly take any feedback!!

Thursday, February 05, 2015

Genetic evolution of a neural network

In a previous post I was trying the LambdaNet library for neural networks, training one with a function my own little human brain designed. But can we make the network learn without knowing the actual algorithm?

As a implementation of a genetic algorithm, I use simple-genetic-algorithm, that is, ahem, simple to use compared to some other packages. I've modified it, though, to use MonadRandom instead of having to thread through the RandomGen, so if you want to run the code make sure to take the version from my fork in the MonadRandom branch, until these changes are released by the maintainer on Hackage.

To use a genetic algorithm, we need two things: a representation of the structure we want to evolve that is suitable for the genetic transformations, and the implementation of these transformations (crossover and mutation) themselves.

For the first task, we convert the network to simple vectors:

-- | Convert a network for only vectors structures
toVectors :: (Floating (Vector a), Container Vector a, Floating a) =>
Network a -> [([Vector a],Vector a)]
toVectors = map (tovs . layerToShowable) . layers
where
tovs (ShowableLayer ws bs) = (toRows ws,bs)

-- | Convert back to a network given a network definition and vector structures
fromVectors :: (Floating (Vector a), Container Vector a, Floating a) =>
LayerDefinition a -> [([Vector a],Vector a)] -> Network a
fromVectors ld = Network . map ((\sl-> showableToLayer (sl,ld)) . frvs)
where
frvs (vs,v)=ShowableLayer (fromRows vs) v

Note that we need a LayerDefinition to rebuild the network from the Vectors. Currently each layer of the network has the same definition and the algorithm does NOT evolve this structure, only the weights and biases.

We're going to keep that information along with some Worlds that we use for fitness testing:

-- | Store the network information as vectors
data NetworkData = NetworkData [([Vector Float],Vector Float)] [World]
deriving (Show,Read,Eq,Ord)

Then we can implement the Chromosome class. Crossover takes the average of all weights and biases of the two parents, mutation changes one value randomly. Of course other implementations could be found.

-- | Chromosome instance
instance Chromosome NetworkData where
-- Take average
crossover (NetworkData xs ws) (NetworkData ys _) =
return [ NetworkData (Prelude.zipWith zipW xs ys) ws]
where
zipW (vs1,v1) (vs2,v2) = (Prelude.zipWith zipW1 vs1 vs2,zipW1 v1 v2)
zipW1 = V.zipWith (\x y -> (x + y) / 2)

-- Mutate one weight randomly
mutation (NetworkData xs ws) = do
xs' <- font="" r1="" randomchange="" xs="">
return $ NetworkData xs' ws
where
randomChange _ [] = return []
randomChange f xs2 = do
idx <- -="" 1="" font="" getrandomr="" length="" xs2="">
mapM (\(i,x)->if i==idx then f x else return x) $ zip [0..] xs2
r1 (vs,v) = do
(v3:vs2) <- font="" r2="" randomchange="" v:vs="">
return (vs2,v3)
r2 v2 = do
idx2 <- -="" 1="" font="" getrandomr="" v.length="" v2="">
dx <- 20="" font="" getrandomr="" nbsp="">
return $ v2 V.// [(idx2,dx)]

-- calculate fitness
fitness (NetworkData xs ws) = sum (map (fitWorld xs) ws) / fromIntegral (length ws)

For the fitness function we calculate the fitness for each given world and average it. I'm not trying to be too clever with that code.
For each world we run the food searching algorithm from each corners, and evaluate how far we are from the target, and if we reached it how long it took us. So networks that find the food will always rank higher than the ones who don't, and the quicker among them will rank higher again.

-- | Calculate fitness on a given world
fitWorld :: [([Vector Float],Vector Float)] -> World -> Double
fitWorld dat w = sum (map fitCorner $ corners $ wSize w) / 4
where
fitCorner pos =
let network = fromVectors layerDef dat
poss = algSteps w (neuralAlg w network) 50 pos
endSmell = currentSmell w $ last poss
possLength = length poss
in fitFormula w endSmell possLength (distance pos $ wFood w)

-- | Fitness formula
fitFormula :: World -> Int -> Int -> Int -> Double
fitFormula w endSmell possLenth dist = case fromIntegral endSmell / fromIntegral (wSmell w) of
1 -> 2 + (fromIntegral dist / fromIntegral possLenth)
n -> n

Then we just need a stop condition: either a maximum number of generations or reaching the maximum possible fitness (shortest path found from all corners)

-- | Maximum for the fitness
maxFit :: Double
maxFit = fitFormula (World undefined undefined 10 undefined) 10 10 10

-- | Stop function
stopf :: NetworkData -> Int -> IO Bool
stopf nd gen= return $ gen > 300 || fitness nd == maxFit

And code to generate random networks

-- | Build a random network data
buildNetworkData :: (Monad m,RandomGen g) => [World] -> RandT g m NetworkData
buildNetworkData ws= do
g <- font="" getsplit="">
let n = buildNetwork g
return $ NetworkData (toVectors n) ws

We can then evolve our population of networks, say on two worlds, w and w2:

runGAIO 64 0.1 (buildNetworkData [w,w2]) stopf

And we get after a couple of minutes a network that can find the food from another point that the tested corners:

Iteration:1

##########

#........#

#...X....#

#........#

#.......@#

#........#

##########

Iteration:2

##########

#........#

#...X....#

#........#

#......@.#

#........#

##########

Iteration:3

##########

#........#

#...X....#

#........#

#.....@..#

#........#

##########

Iteration:4

##########

#........#

#...X....#

#....@...#

#........#

##########

Iteration:5

##########

#........#

#...@....#

#........#

##########

Yummy!

Interested in all this? I recommend the ai-junkie tutorial, then! My code is of course on github. Feedback and suggestions welcome!

Saturday, January 31, 2015

Searching for food using a LambdaNet neural network

From time to time I have a fit and start doing a bit of AI. I saw that a new version of LambdaNet was released so I though I would take it for a spin and try something (a little) bit more complicated that their XOR example.

The problem is simple. In a rectangular world, there is food is one place. The food "smells" and so each position in the world has a smell associated with it, the higher the smell meaning the closer to the food. Can we have a neural network that can navigate to the food?

A few definitions:

-- | Direction to go to
data Direction = TopLeft | Left | BottomLeft | Top | Center | Bottom | TopRight | Right | BottomRight
deriving (Show,Read,Eq,Ord,Bounded,Enum)

-- | Strength of the smell
type Smell = Int
-- | Input information
type Input = [(Direction,Smell)]
-- | Position in world
type Position = (Int,Int)
-- | Size of the world
type Size = (Int,Int)
-- | Maximum number of steps to take
type Max = Int

-- | Number of directions
dirLength :: Int

dirLength = 1 + fromEnum (maxBound :: Direction)

-- | The world
data World = World
{ wSize :: Size -- ^ size
, wSmell :: Smell -- ^ Smell of the food position
, wSmells :: DM.Map Position Smell -- ^ All smell strengths by position
}
deriving (Show,Read,Eq,Ord)

-- | Function deciding in which direction to move
type StepFunction = Position -> Input -> Direction

Fundamental is the concept of Direction, since we want to move. Basically, when we are in a given position in a world, we can get nine directions and their associated smell (staying in the same place is one position). The function to decide what to do in a given position given all the smells of the neighbouring positions is called StepFunction.

The algorithm is easy to write for a human brain:

-- | The base algorithm: just go toward the highest smell
baseAlg :: StepFunction
baseAlg _ = fst . maximumBy (comparing snd)

Note that we ignore the current position, we only work with the input structure.

On top of that, we need function to build the world with the proper smell indicators, run the algorithm till we find the food, etc. All this code can be found in the GitHub project but is not really critical for our understanding of neural networks. One function of interest is running one step of the algorithm, showing the intermediate structures generated:

-- | Perform one step and return the information generated: direction/smell input, direction output
algStepExplain :: World -> StepFunction -> Position -> (Position,([(Direction,Smell)],Direction))

We get the position back, and the second element of the tuple is the input and the output of the StepFunction.

What we want to do is train a neural network, which should be easy since we have an algorithm we know will work well to find the best position to move to, and then use that network as an implementation of StepFunction.

The hardest in neural network programming is to design the input and output structures, so that they represent adequately the information about your problem in a format that the network can deal with. Here, we have a fixed input size: the smells of the 9 neighbouring positions. The StepFunction returns a Direction, and a Direction is an enum of nine values, so the output of the network could also be 9 values, the highest of these indicating the direction chosen by the network.

The networks in LambdaNet requires Vectors as their input and output data, so lets format the inputs:

-- | Format the inputs suitable for the network
formatInputs :: World -> [(Direction,Smell)] -> Vector Float
formatInputs w = fromList . map (\i-> fromIntegral (snd i) / fromIntegral (wSmell w))

So an input of 1 means we're on the food itself, and the input value will decrease as we're further from the food, while staying between 0 and 1.

If we have a network, the implementation of StepFunction is straightforward:

-- | Use the network to give the answer
neuralAlg :: World -> Network Float -> StepFunction
neuralAlg w n _ is = toEnum $ maxIndex $ predict (formatInputs w is) n

We format the input, run predict, retrieve the index for the maximum value in the output vector, and use that as the index in the Direction enum. We just need a trained network!

To get that, we generate the training data from a given world. We list all possible positions in the world, calculate the corresponding inputs, run the basic algorithm on the input to get the optimal answer. For the result direction will set the output value to 1, and zero for all the others

-- | Training data: for each position in the world, use the base algorithm to get the training answer
trainData :: World -> [(Vector Float, Vector Float)]
trainData w = map onePos $ allPositions w
where
onePos p =
let (_,(is,dir)) = algStepExplain w baseAlg p
os = map (\(d,_)->if dir==d then 1 else 0) is
in (formatInputs w is,fromList os)

From here, we unimaginatively reuse the LambdaNet tutorial code to build a network...

-- | Create the network
buildNetwork :: RandomGen g => g -> Network Float
buildNetwork g = createNetwork normals g $ replicate 3 $ LayerDefinition sigmoidNeuron dirLength connectFully

And train it:

-- | Train a network on several given worlds
train :: Network Float -> [World] -> Network Float
train n ws =
let t = BackpropTrainer (3 :: Float) quadraticCost quadraticCost'
dat = concatMap trainData ws
in trainUntilErrorLessThan n t online dat 0.01

What is critical here is that we train the network on several different worlds. I tried training only one world, the resulting network would perform well on worlds of the same size or smaller, but not bigger worlds, because it was too fit for the actual smell values. Training even on only two quite different worlds brought big enhancements in the intelligence of the network, at the code of longer learning time.

Once the network is trained, you can run it on several different worlds and see how it can find the food. There is a simple visualization module that allows you to see clearly the moves, for example:

Iteration 1
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#........#
#.......@#
#........#
#........#
##########

(X being the food, @ the current position)

Iteration 3
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#......@.#
#........#
#........#
#........#
##########

Iteration 6
##########
#........#
#........#
#........#
#...@....#
#........#
#........#
#........#
#........#
#........#
#........#
##########

Yummy!

If you're interested, the full source code with tasty unit tests in on Github.

This is of course very basic, and only begs to be enhanced with more complicated worlds (maybe with walls, several sources of food, places with no smell at all, etc). What do you do when you don't know the best algorithm yourself? Maybe I'll come back for more later to find out!

Monday, August 18, 2014

Fame at last!

I was reading the book "Haskell Data Analysis Cookbook" when suddenly, my name pops up! Funny to see a link to a 7 year old blog entry, who knew I would go down in history for a few line of codes for a perceptron? It's deep in Chapter 7, for those interested. Maybe this is a sign that I should abandon everything else and spend my time on AI, since it's obviously where fame and riches abound! Right...

Friday, September 24, 2010

Haskell Neural Network: plugging a space leak

First, the good news: last week when I posted about my little digit recognition program I had made a mistake in my test code: when I give a rounded eight to my network, it does recognize it, even though it's been trained only with a square eight! Yoohoo!

Also, a few people pointed to online database of handwritten digits so that I could get more data to train my network. I dully downloaded and started working with some files. Soon I knew I had a problem. Not only was the learning slow, but the memory usage of the program showed exponential growth, like using 400MB after a few iterations. Ha ha, I thought, so it's me against lazy evaluation again!

Well, I was right. A couple of strictness annotation in the neuron output calculation function helped performance, nearly making the code twice as fast, but didn't help the memory consumption. Going mad with ! (BangPatterns is your friend) didn't help any more. WTF?

Then I started to read a bit more about seq and Weak Head Normal Form. So when I put a BangPattern in front of a tuple, it evaluates the tuple but not what's inside it. And, lo and behold, if at each learning iteration I traced the network to the standard output, memory usage stayed constant! So I needed to tell Haskell to fully evaluate the network each time, so as not to keep thunks full of arithmetic operations hanging round. A distant memory of a package called deepseq provided me the final help I needed. I didn't use the package itself because I wanted to understand really what's going on, so I just roll out a few lines of my own:

And by calling deepSeq on my network in each iteration, my program runs in constant space! And I looked at the flat line of memory usage in my task manager, and I was pleased. Now, I can try to improve the speed of the dang thing, but that's another set of problems...

Friday, September 17, 2010

Digit recognition with a neural network. First attempt!

I bought myself "On Intelligence", and I thought as the first step on the long road to building an intelligent machine I would go back to neural networks. I have my own little library that seems to work (kind of), so I decided I was going to try to write a little number recognition program. I was inspired of course by ai-junkie, but it was a bit short on details...
So here is the problem: I have a 5x5 grid of LEDs that can be on or off, and I want to recognize the digit that they show.
The input will be a list of 25 numbers, either 0 (off) or 1 (on)
The output will be a list of 10 numbers, either 0 or 1. The first position of a 1 indicates the result (so if the first element is a 1, it's number 0, etc).
I arbitrarily decide to set the number of hidden neurons to 50, twice the input neurons. So my topology is 25-50-10.
I train the network with only one version of each digit. For example 3 would be:

*****
    *
 ****
    *
*****

Wrote a few lines of Haskell code to read the training sets (moving from the representation with *, spaces and new lines to the list of 1 and 0, so that 3 becomes: [1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0]).
Then I train the network. It learns after 200 iterations, and successfully recognizes the training sets (ok, so things are working as they should be).
Then the real test: can my network recognize small variants of the training set?
Lets's try with:

****
    *
 ***
    *
****

(a slightly rounded 3): the network answers 3!!
Works also for a slightly rounded 6. But then:

 ***
*   *
 ***
*   *
 ***

(rounded 8) gives me 6. Oh dear... I could probably add more cases in the training sets and hopefully resolve the ambiguities, but there are bigger issues (I know I know, I'm very candid, but it's one thing being told something in a book, and another to build something to actually show it to you). For example:

***
* *
***

Is a small zero that doesn't take the full 5 LED width. My network recognizes 0 when it fills the full square, not when it's smaller (tells me it's a 5).
And of course, 1 is a disaster. The training "1" is a vertical row of LEDs on the left. A vertical row of LEDs on the right gives me 9, usually (and I suppose, 9 has all the LEDs on the right on).
So not only is basic training not sufficient to detect simple variations, but also the fact that we deal with absolute positioning of leds mean than scaling and simple side translations are not supported. So I suppose the next step is to not work with absolute positions, but relative positions of "on" LEDs. Another day's work...

Friday, April 09, 2010

Found my neural network bug (three years later)

After the university of Waterloo AI Challenge I went back to a bit of AI programming in Haskell. Three years ago I had written some code to implement a neural network, and it didn't seem to learn as fast as the book said it should. I had not worked on that any more, but I decided to look at it again. And I found a bug!! I mixed up the deltas to pass on from one run to the next. Changing two lines of code makes my network learning much faster.

So now onto something a bit more meaty than exclusive or: very simple character recognition. I want to be able to recognize numbers written as a digital alarm clock would display them (at least mine): you have 7 lights, four vertical and three horizontal, and various patterns make the numbers. If all lights are on, it's 8, if only the middle horizontal light is off we have zero, etc...

I just test things with a network that has 7 input neurons (one for each light), 10 output neurons (one for each number) and 8 hidden neurons (the average between input and output)

g<-getStdGen

n<-network 7 8 10

let trainingSet=[

([1,1,1,-1,1,1,1],[1,0,0,0,0,0,0,0,0,0]), -- 0

([-1,-1,1,-1,-1,1,-1],[0,1,0,0,0,0,0,0,0,0]), -- 1

([1,-1,1,1,1,-1,1],[0,0,1,0,0,0,0,0,0,0]), -- 2

([1,-1,1,1,-1,1,1],[0,0,0,1,0,0,0,0,0,0]), -- 3

([-1,1,1,1,-1,1,-1],[0,0,0,0,1,0,0,0,0,0]), -- 4

([1,1,-1,1,-1,1,1],[0,0,0,0,0,1,0,0,0,0]), -- 5

([1,1,-1,1,1,1,1],[0,0,0,0,0,0,1,0,0,0]), -- 6

([1,1,1,-1,-1,1,-1],[0,0,0,0,0,0,0,1,0,0]), -- 7

([1,1,1,1,1,1,1],[0,0,0,0,0,0,0,0,1,0]), -- 8

([1,1,1,1,-1,1,1],[0,0,0,0,0,0,0,0,0,1]) -- 9

]

let (LNS(n',err,ds),e) = run (initialState n) defaultLearningRate trainingSet (1,1000)

And after something like 150 iterations I get a network that has a low error rate. A little helper method to massage the results:

runNumber n inputs=do

let

(_, output)=exec n inputs

normalized=map (\v->if (1-v)>0.5 then 0 else 1) output

return $ elemIndex 1 normalized

Et voila!

Main> runNumber nn [1,-1,1,1,-1,1,1]

Just 3

Now, things are starting to get interesting! Hopefully I won't wait another three years before trying maybe more complex character recognition.

JP Moresmau's Programming Blog

Sunday, March 10, 2019

Rust for linear algebra and neural networks

Wednesday, September 30, 2015

Symbolic differentiation to the rescue! ... Or not.

Sunday, August 02, 2015

Playing with Recurrent Neural Networks in Haskell

Thursday, February 05, 2015

Genetic evolution of a neural network

Saturday, January 31, 2015

Searching for food using a LambdaNet neural network

Monday, August 18, 2014

Fame at last!

Friday, September 24, 2010

Haskell Neural Network: plugging a space leak

Friday, September 17, 2010

Digit recognition with a neural network. First attempt!

Friday, April 09, 2010

Found my neural network bug (three years later)

Friend of Eclipse

About Me