A neuron is made of its input weight and its threshold, and its output is calculated thus (apologies to all of you who gave me some tips about parameter order and newtype declaration, I haven't changed the code yet to follow your good advice!):

type Neuron= ([Value],Value)

type Value = Float

neuronOutput :: [Value] -> Neuron -> Value

neuronOutput inputs (weights,threshold) =

let

tot=(foldl (+) 0 (zipWith (*) inputs weights)) - threshold

in

1 / (1 + ((2.7182) ** (-tot)))

A three layer network whose input layer does nothing but fire its input to the midle layer can be modelled by to lists of neurons, and I use a TestCase type to keep track of inputs and expected outputs

type Network = ([Neuron],[Neuron])

type TestCase = ([Value],[Value])

Running the whole network with some input values is easy:

exec :: Network -> [Value] -> ([Value],[Value])

exec (hidden,output) inputs =

let

hiddenOutputs = map (neuronOutput inputs) hidden

outputOutputs = map (neuronOutput hiddenOutputs) output

in

(hiddenOutputs,outputOutputs)

The meat is all in one function, that calculates the output of the network for a given test and adapts the weights accordingly, first for the output layer then for the hidden layer:

defaultLearningRate::Value

defaultLearningRate=0.1

step :: (Network,Value) -> TestCase -> (Network,Value)

step ((hidden,output),err) (inputs,expected)=

let

(hiddenOutputs,outputOutputs)=exec (hidden,output) inputs

errorOutputs= zipWith (-) expected outputOutputs

gradientOutputs=zipWith (\o d->o*(1-o)*d) outputOutputs errorOutputs

newOutputs=zipWith (\(ws,t) g -> ( zipWith (\h w->w+ (defaultLearningRate*h*g)) hiddenOutputs ws ,t+(defaultLearningRate*g*(-1)))) output gradientOutputs

weightsByHidden= transpose (map fst output)

gradientHidden =zipWith (\o ws-> o * (1-o) * (foldl (+) 0 (zipWith (*) ws gradientOutputs))) hiddenOutputs weightsByHidden

newHidden=zipWith (\(ws,t) g -> ( zipWith (\h w->w+ (defaultLearningRate*h*g)) inputs ws ,t+(defaultLearningRate*g*(-1)))) hidden gradientHidden

newErr=err + (foldl (+) 0 (map (^2) errorOutputs))

in ((newHidden,newOutputs),newErr)

The result is a new adapted network and the sum of squared errors.

An epoch is a set of tests:

epoch :: Network -> [TestCase] -> (Network,Value)

epoch network= foldl step (network,0)

We can then repeat epoch till the sum of squared errors for an epoch is less than the threshold for convergence:

errorConvergence::Value

errorConvergence=0.001

run :: Network -> [TestCase] -> Int -> (Network,Int,Value)

run network allInputs epochNb=

let (newNetwork,delta) = epoch network allInputs

in (?) (delta <= errorConvergence || epochNb>225)

(newNetwork, epochNb,delta)

(run newNetwork allInputs (epochNb+1))

(?) :: Bool -> a -> a -> a

(?) True a _ = a

(?) False _ a = a

(Is there a standard function for (?)? I got tired of if then else...)

Then I test my code with the same start network as in the book:

test = do

let n = ([([0.5,0.4],0.8),([0.9,1.0],-0.1)],[([-1.2,1.1],0.3)])

let (n',e,err) = run n [([1,1],[0]),([0,1],[1]),([1,0],[1]),([0,0],[0])] 1

print (n',e,err)

return ()

And I get a mixed feeling about the result: the network does converge, and then it does work, but after something like 54000 epochs, while in the book it supposedly take 224! The book gives the result for the first test and I'm getting exactly the same thing, but it looks like my network learns a lot slower than it should. I would suppose I have a subtle error somewhere, but for the life of me I can't find it... If it jumps at anybody, please let me know. My hope now is that when refactoring the code and following the books advice on improving the performance I will find the bug!!

I have tried with random networks:

network :: Int -> Int -> Int-> IO Network

network inputNb hiddenNb outputNb= do

a<-randomNeurons inputNb hiddenNb

b<-randomNeurons hiddenNb outputNb

return (a,b)

randomNeurons:: Int -> Int -> IO [Neuron]

randomNeurons nbInput nbNeurons= replicateM nbNeurons (randomNeuron nbInput)

randomNeuron:: Int -> IO Neuron

randomNeuron nb= do

w<-replicateM nb (randomWeight nb)

t<-randomWeight nb

return (w,t)

randomWeight:: Int -> IO Value

randomWeight nbInput= do

let interval= randomR ((-2.4)/(fromIntegral nbInput),2.4/(fromIntegral nbInput))

getStdRandom interval

But I don't get better results...