Some time ago an interesting article surfaced on Reddit, about using recurrent neural networks to generate plausible looking text. Andrej did a very good job in explaining how they work and some of the techniques and algorithms he used. I thought this was worth some explorations on my own, so I did a bit of research and tried to implement my own networks in Haskell.
I implemented the basics using the hmatrix package to have vectors and matrices. I went to Khan Academy to learn more about these topics because I had stopped math in school before learning matrices. Once I managed to implement hopefully correctly the algorithm I used the simple-genetic-algorithm package (well, my fork, uploaded on hackage) to evolve the network via genetic selection and mutation.
This gave some results, especially when I fixed my mutation code, that in fact was doing nothing, thus emphasizing again that mutation is a critical part of genetic algorithms, and not just crossovers.
Then I went to implement proper learning algorithms, a bit less random than genetic evolution. I was clearly out of my depth there, but learned a lot. To implement gradient descent I actually used the ad library and had to rewrite the algorithms without hmatrix so they would work on simple lists. Given that a couple of weeks again I didn't even understand what AD was, I'm happy I got some stuff to work. I was lucky to find some good code in python, even trying to understand the under-specified RMSProp algorithm.
The end result is not great, though. My network can learn the really complex sentence "Hello world!" and regenerate it after a couple of thousand iterations of the learning algorithm, in around a minute on my machine. I haven't managed to parallelize the treatment yet, and running on all my cores instead of just one make performance a lot worse. On more complex sentences performance becomes really bad, as both the network becomes bigger (more characters to deal with, etc) and the data to evaluate is bigger.
Trying to use a sparse representation for characters using 2 values instead of 1 as in the standard 1-of-k encoding didn't work well. I also realize that tweaks in the cost function had a huge impact on how well and how fast the network can learn. I have started to look at cutting the data in batches but I think I need to make the learning on simple data as fast as possible before moving on.
So, nothing earth shattering, no great progress or insight, but I feel I learned a lot by implementing my own stuff and running into all the issues that are probably well known of people familiar with these subjects. My artificial intelligence is still pretty dumb, but hopefully my own natural intelligence has progressed a bit!
The code is on github of course, so if anybody wants to have a look, I'll gladly take any feedback!!
In this blog I talk about some of the personal programming I do as a hobby. From Java to Rust via Haskell, I've played around with a lot of technologies and still try to have fun with new languages and APIs!
Showing posts with label Genetic algorithms. Show all posts
Showing posts with label Genetic algorithms. Show all posts
Sunday, August 02, 2015
Thursday, February 05, 2015
Genetic evolution of a neural network
In a previous post I was trying the LambdaNet library for neural networks, training one with a function my own little human brain designed. But can we make the network learn without knowing the actual algorithm?
As a implementation of a genetic algorithm, I use simple-genetic-algorithm, that is, ahem, simple to use compared to some other packages. I've modified it, though, to use MonadRandom instead of having to thread through the RandomGen, so if you want to run the code make sure to take the version from my fork in the MonadRandom branch, until these changes are released by the maintainer on Hackage.
To use a genetic algorithm, we need two things: a representation of the structure we want to evolve that is suitable for the genetic transformations, and the implementation of these transformations (crossover and mutation) themselves.
For the first task, we convert the network to simple vectors:
-- | Convert a network for only vectors structures
toVectors :: (Floating (Vector a), Container Vector a, Floating a) =>
Network a -> [([Vector a],Vector a)]
toVectors = map (tovs . layerToShowable) . layers
where
tovs (ShowableLayer ws bs) = (toRows ws,bs)
-- | Convert back to a network given a network definition and vector structures
fromVectors :: (Floating (Vector a), Container Vector a, Floating a) =>
LayerDefinition a -> [([Vector a],Vector a)] -> Network a
fromVectors ld = Network . map ((\sl-> showableToLayer (sl,ld)) . frvs)
where
frvs (vs,v)=ShowableLayer (fromRows vs) v
Note that we need a LayerDefinition to rebuild the network from the Vectors. Currently each layer of the network has the same definition and the algorithm does NOT evolve this structure, only the weights and biases.
We're going to keep that information along with some Worlds that we use for fitness testing:
-- | Store the network information as vectors
data NetworkData = NetworkData [([Vector Float],Vector Float)] [World]
deriving (Show,Read,Eq,Ord)
Then we can implement the Chromosome class. Crossover takes the average of all weights and biases of the two parents, mutation changes one value randomly. Of course other implementations could be found.
-- | Chromosome instance
instance Chromosome NetworkData where
-- Take average
crossover (NetworkData xs ws) (NetworkData ys _) =
return [ NetworkData (Prelude.zipWith zipW xs ys) ws]
where
zipW (vs1,v1) (vs2,v2) = (Prelude.zipWith zipW1 vs1 vs2,zipW1 v1 v2)
zipW1 = V.zipWith (\x y -> (x + y) / 2)
-- Mutate one weight randomly
mutation (NetworkData xs ws) = do
xs' <- font="" r1="" randomchange="" xs="">->
return $ NetworkData xs' ws
where
randomChange _ [] = return []
randomChange f xs2 = do
idx <- -="" 1="" font="" getrandomr="" length="" xs2="">->
mapM (\(i,x)->if i==idx then f x else return x) $ zip [0..] xs2
r1 (vs,v) = do
(v3:vs2) <- font="" r2="" randomchange="" v:vs="">->
return (vs2,v3)
r2 v2 = do
idx2 <- -="" 1="" font="" getrandomr="" v.length="" v2="">->
dx <- 20="" font="" getrandomr="" nbsp="">->
return $ v2 V.// [(idx2,dx)]
-- calculate fitness
fitness (NetworkData xs ws) = sum (map (fitWorld xs) ws) / fromIntegral (length ws)
For the fitness function we calculate the fitness for each given world and average it. I'm not trying to be too clever with that code.
For each world we run the food searching algorithm from each corners, and evaluate how far we are from the target, and if we reached it how long it took us. So networks that find the food will always rank higher than the ones who don't, and the quicker among them will rank higher again.
-- | Calculate fitness on a given world
fitWorld :: [([Vector Float],Vector Float)] -> World -> Double
fitWorld dat w = sum (map fitCorner $ corners $ wSize w) / 4
where
fitCorner pos =
let network = fromVectors layerDef dat
poss = algSteps w (neuralAlg w network) 50 pos
endSmell = currentSmell w $ last poss
possLength = length poss
in fitFormula w endSmell possLength (distance pos $ wFood w)
-- | Fitness formula
fitFormula :: World -> Int -> Int -> Int -> Double
fitFormula w endSmell possLenth dist = case fromIntegral endSmell / fromIntegral (wSmell w) of
1 -> 2 + (fromIntegral dist / fromIntegral possLenth)
n -> n
Then we just need a stop condition: either a maximum number of generations or reaching the maximum possible fitness (shortest path found from all corners)
-- | Maximum for the fitness
maxFit :: Double
maxFit = fitFormula (World undefined undefined 10 undefined) 10 10 10
-- | Stop function
stopf :: NetworkData -> Int -> IO Bool
stopf nd gen= return $ gen > 300 || fitness nd == maxFit
And code to generate random networks
-- | Build a random network data
buildNetworkData :: (Monad m,RandomGen g) => [World] -> RandT g m NetworkData
buildNetworkData ws= do
g <- font="" getsplit="">->
let n = buildNetwork g
return $ NetworkData (toVectors n) ws
As a implementation of a genetic algorithm, I use simple-genetic-algorithm, that is, ahem, simple to use compared to some other packages. I've modified it, though, to use MonadRandom instead of having to thread through the RandomGen, so if you want to run the code make sure to take the version from my fork in the MonadRandom branch, until these changes are released by the maintainer on Hackage.
To use a genetic algorithm, we need two things: a representation of the structure we want to evolve that is suitable for the genetic transformations, and the implementation of these transformations (crossover and mutation) themselves.
For the first task, we convert the network to simple vectors:
-- | Convert a network for only vectors structures
toVectors :: (Floating (Vector a), Container Vector a, Floating a) =>
Network a -> [([Vector a],Vector a)]
toVectors = map (tovs . layerToShowable) . layers
where
tovs (ShowableLayer ws bs) = (toRows ws,bs)
-- | Convert back to a network given a network definition and vector structures
fromVectors :: (Floating (Vector a), Container Vector a, Floating a) =>
LayerDefinition a -> [([Vector a],Vector a)] -> Network a
fromVectors ld = Network . map ((\sl-> showableToLayer (sl,ld)) . frvs)
where
frvs (vs,v)=ShowableLayer (fromRows vs) v
Note that we need a LayerDefinition to rebuild the network from the Vectors. Currently each layer of the network has the same definition and the algorithm does NOT evolve this structure, only the weights and biases.
We're going to keep that information along with some Worlds that we use for fitness testing:
-- | Store the network information as vectors
data NetworkData = NetworkData [([Vector Float],Vector Float)] [World]
deriving (Show,Read,Eq,Ord)
Then we can implement the Chromosome class. Crossover takes the average of all weights and biases of the two parents, mutation changes one value randomly. Of course other implementations could be found.
-- | Chromosome instance
instance Chromosome NetworkData where
-- Take average
crossover (NetworkData xs ws) (NetworkData ys _) =
return [ NetworkData (Prelude.zipWith zipW xs ys) ws]
where
zipW (vs1,v1) (vs2,v2) = (Prelude.zipWith zipW1 vs1 vs2,zipW1 v1 v2)
zipW1 = V.zipWith (\x y -> (x + y) / 2)
-- Mutate one weight randomly
mutation (NetworkData xs ws) = do
xs' <- font="" r1="" randomchange="" xs="">->
return $ NetworkData xs' ws
where
randomChange _ [] = return []
randomChange f xs2 = do
idx <- -="" 1="" font="" getrandomr="" length="" xs2="">->
mapM (\(i,x)->if i==idx then f x else return x) $ zip [0..] xs2
r1 (vs,v) = do
(v3:vs2) <- font="" r2="" randomchange="" v:vs="">->
return (vs2,v3)
r2 v2 = do
idx2 <- -="" 1="" font="" getrandomr="" v.length="" v2="">->
dx <- 20="" font="" getrandomr="" nbsp="">->
return $ v2 V.// [(idx2,dx)]
-- calculate fitness
fitness (NetworkData xs ws) = sum (map (fitWorld xs) ws) / fromIntegral (length ws)
For the fitness function we calculate the fitness for each given world and average it. I'm not trying to be too clever with that code.
For each world we run the food searching algorithm from each corners, and evaluate how far we are from the target, and if we reached it how long it took us. So networks that find the food will always rank higher than the ones who don't, and the quicker among them will rank higher again.
-- | Calculate fitness on a given world
fitWorld :: [([Vector Float],Vector Float)] -> World -> Double
fitWorld dat w = sum (map fitCorner $ corners $ wSize w) / 4
where
fitCorner pos =
let network = fromVectors layerDef dat
poss = algSteps w (neuralAlg w network) 50 pos
endSmell = currentSmell w $ last poss
possLength = length poss
in fitFormula w endSmell possLength (distance pos $ wFood w)
-- | Fitness formula
fitFormula :: World -> Int -> Int -> Int -> Double
fitFormula w endSmell possLenth dist = case fromIntegral endSmell / fromIntegral (wSmell w) of
1 -> 2 + (fromIntegral dist / fromIntegral possLenth)
n -> n
Then we just need a stop condition: either a maximum number of generations or reaching the maximum possible fitness (shortest path found from all corners)
-- | Maximum for the fitness
maxFit :: Double
maxFit = fitFormula (World undefined undefined 10 undefined) 10 10 10
-- | Stop function
stopf :: NetworkData -> Int -> IO Bool
stopf nd gen= return $ gen > 300 || fitness nd == maxFit
And code to generate random networks
-- | Build a random network data
buildNetworkData :: (Monad m,RandomGen g) => [World] -> RandT g m NetworkData
buildNetworkData ws= do
g <- font="" getsplit="">->
let n = buildNetwork g
return $ NetworkData (toVectors n) ws
We can then evolve our population of networks, say on two worlds, w and w2:
runGAIO 64 0.1 (buildNetworkData [w,w2]) stopf
And we get after a couple of minutes a network that can find the food from another point that the tested corners:
Iteration:1
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#........#
#.......@#
#........#
#........#
##########
Iteration:2
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#......@.#
#........#
#........#
#........#
##########
Iteration:3
##########
#........#
#........#
#........#
#...X....#
#........#
#.....@..#
#........#
#........#
#........#
#........#
##########
Iteration:4
##########
#........#
#........#
#........#
#...X....#
#....@...#
#........#
#........#
#........#
#........#
#........#
##########
Iteration:5
##########
#........#
#........#
#........#
#...@....#
#........#
#........#
#........#
#........#
#........#
#........#
##########
Yummy!
Interested in all this? I recommend the ai-junkie tutorial, then! My code is of course on github. Feedback and suggestions welcome!
Subscribe to:
Posts (Atom)