Friday, June 01, 2007

A very dumb neural network in Haskell

Continuing my implementations of Michael Negnevitsky book on Artifical Intelligence in Haskell, I have tried to code a three layer neural network to implement exclusive or (Intelligence indeed...).

A neuron is made of its input weight and its threshold, and its output is calculated thus (apologies to all of you who gave me some tips about parameter order and newtype declaration, I haven't changed the code yet to follow your good advice!):

type Neuron= ([Value],Value)
type Value = Float

neuronOutput :: [Value] -> Neuron -> Value
neuronOutput inputs (weights,threshold) =
let
tot=(foldl (+) 0 (zipWith (*) inputs weights)) - threshold
in
1 / (1 + ((2.7182) ** (-tot)))


A three layer network whose input layer does nothing but fire its input to the midle layer can be modelled by to lists of neurons, and I use a TestCase type to keep track of inputs and expected outputs
       
type Network = ([Neuron],[Neuron])
type TestCase = ([Value],[Value])

Running the whole network with some input values is easy:

exec :: Network -> [Value] -> ([Value],[Value])
exec (hidden,output) inputs =
let
hiddenOutputs = map (neuronOutput inputs) hidden
outputOutputs = map (neuronOutput hiddenOutputs) output
in
(hiddenOutputs,outputOutputs)

The meat is all in one function, that calculates the output of the network for a given test and adapts the weights accordingly, first for the output layer then for the hidden layer:

defaultLearningRate::Value
defaultLearningRate=0.1

step :: (Network,Value) -> TestCase -> (Network,Value)
step ((hidden,output),err) (inputs,expected)=
let
(hiddenOutputs,outputOutputs)=exec (hidden,output) inputs
errorOutputs= zipWith (-) expected outputOutputs
gradientOutputs=zipWith (\o d->o*(1-o)*d) outputOutputs errorOutputs
newOutputs=zipWith (\(ws,t) g -> ( zipWith (\h w->w+ (defaultLearningRate*h*g)) hiddenOutputs ws ,t+(defaultLearningRate*g*(-1)))) output gradientOutputs
weightsByHidden= transpose (map fst output)
gradientHidden =zipWith (\o ws-> o * (1-o) * (foldl (+) 0 (zipWith (*) ws gradientOutputs))) hiddenOutputs weightsByHidden
newHidden=zipWith (\(ws,t) g -> ( zipWith (\h w->w+ (defaultLearningRate*h*g)) inputs ws ,t+(defaultLearningRate*g*(-1)))) hidden gradientHidden
newErr=err + (foldl (+) 0 (map (^2) errorOutputs))
in ((newHidden,newOutputs),newErr)

The result is a new adapted network and the sum of squared errors.

An epoch is a set of tests:

epoch :: Network -> [TestCase] -> (Network,Value)
epoch network= foldl step (network,0)

We can then repeat epoch till the sum of squared errors for an epoch is less than the threshold for convergence:

errorConvergence::Value
errorConvergence=0.001

run :: Network -> [TestCase] -> Int -> (Network,Int,Value)
run network allInputs epochNb=
let (newNetwork,delta) = epoch network allInputs
in (?) (delta <= errorConvergence || epochNb>225)
(newNetwork, epochNb,delta)
(run newNetwork allInputs (epochNb+1))

(?) :: Bool -> a -> a -> a
(?) True a _ = a
(?) False _ a = a

(Is there a standard function for (?)? I got tired of if then else...)

Then I test my code with the same start network as in the book:

test = do
let n = ([([0.5,0.4],0.8),([0.9,1.0],-0.1)],[([-1.2,1.1],0.3)])
let (n',e,err) = run n [([1,1],[0]),([0,1],[1]),([1,0],[1]),([0,0],[0])] 1
print (n',e,err)
return ()

And I get a mixed feeling about the result: the network does converge, and then it does work, but after something like 54000 epochs, while in the book it supposedly take 224! The book gives the result for the first test and I'm getting exactly the same thing, but it looks like my network learns a lot slower than it should. I would suppose I have a subtle error somewhere, but for the life of me I can't find it... If it jumps at anybody, please let me know. My hope now is that when refactoring the code and following the books advice on improving the performance I will find the bug!!

I have tried with random networks:

network :: Int -> Int -> Int-> IO Network
network inputNb hiddenNb outputNb= do
a<-randomNeurons inputNb hiddenNb
b<-randomNeurons hiddenNb outputNb
return (a,b)

randomNeurons:: Int -> Int -> IO [Neuron]
randomNeurons nbInput nbNeurons= replicateM nbNeurons (randomNeuron nbInput)

randomNeuron:: Int -> IO Neuron
randomNeuron nb= do
w<-replicateM nb (randomWeight nb)
t<-randomWeight nb
return (w,t)

randomWeight:: Int -> IO Value
randomWeight nbInput= do
let interval= randomR ((-2.4)/(fromIntegral nbInput),2.4/(fromIntegral nbInput))
getStdRandom interval

But I don't get better results...