Saturday, January 31, 2015

Searching for food using a LambdaNet neural network

From time to time I have a fit and start doing a bit of AI. I saw that a new version of LambdaNet was released so I though I would take it for a spin and try something (a little) bit more complicated that their XOR example.

The problem is simple. In a rectangular world, there is food is one place. The food "smells" and so each position in the world has a smell associated with it, the higher the smell meaning the closer to the food. Can we have a neural network that can navigate to the food?

A few definitions:

-- | Direction to go to
data Direction = TopLeft | Left | BottomLeft | Top | Center | Bottom | TopRight | Right | BottomRight
  deriving (Show,Read,Eq,Ord,Bounded,Enum)

-- | Strength of the smell
type Smell = Int
-- | Input information
type Input = [(Direction,Smell)]
-- | Position in world
type Position = (Int,Int)
-- | Size of the world
type Size = (Int,Int)
-- | Maximum number of steps to take
type Max = Int

-- | Number of directions
dirLength :: Int

dirLength = 1 + fromEnum (maxBound :: Direction)

-- | The world
data World = World
    { wSize   :: Size -- ^ size
    , wSmell  :: Smell -- ^ Smell of the food position
    , wSmells :: DM.Map Position Smell -- ^ All smell strengths by position
    }
    deriving (Show,Read,Eq,Ord)

-- | Function deciding in which direction to move
type StepFunction = Position -> Input -> Direction

Fundamental is the concept of Direction, since we want to move. Basically, when we are in a given position in a world, we can get nine directions and their associated smell (staying in the same place is one position). The function to decide what to do in a given position given all the smells of the neighbouring positions is called StepFunction.

The algorithm is easy to write for a human brain:

-- | The base algorithm: just go toward the highest smell
baseAlg :: StepFunction
baseAlg _ = fst . maximumBy (comparing snd)

Note that we ignore the current position, we only work with the input structure.

On top of that, we need function to build the world with the proper smell indicators, run the algorithm till we find the food, etc. All this code can be found in the GitHub project but is not really critical for our understanding of neural networks. One function of interest is running one step of the algorithm, showing the intermediate structures generated:

-- | Perform one step and return the information generated: direction/smell input, direction output
algStepExplain :: World -> StepFunction -> Position -> (Position,([(Direction,Smell)],Direction))

We get the position back, and the second element of the tuple is the input and the output of the StepFunction.

What we want to do is train a neural network, which should be easy since we have an algorithm we know will work well to find the best position to move to, and then use that network as an implementation of StepFunction.

The hardest in neural network programming is to design the input and output structures, so that they represent adequately the information about your problem in a format that the network can deal with. Here, we have a fixed input size: the smells of the 9 neighbouring positions. The StepFunction returns a Direction, and a Direction is an enum of nine values, so the output of the network could also be 9 values, the highest of these indicating the direction chosen by the network.

The networks in LambdaNet requires Vectors as their input and output data, so lets format the inputs:

-- | Format the inputs suitable for the network
formatInputs ::  World -> [(Direction,Smell)] ->  Vector Float
formatInputs w =   fromList . map (\i-> fromIntegral (snd i) / fromIntegral (wSmell w))    

So an input of 1 means we're on the food itself, and the input value will decrease as we're further from the food, while staying between 0 and 1.

If we have a network, the implementation of StepFunction is straightforward:

-- | Use the network to give the answer 
neuralAlg ::  World -> Network Float -> StepFunction
neuralAlg w n _ is = toEnum $ maxIndex $ predict (formatInputs w is) n 

We format the input, run predict, retrieve the index for the maximum value in the output vector, and use that as the index in the Direction enum. We just need a trained network!

To get that, we generate the training data from a given world. We list all possible positions in the world, calculate the corresponding inputs, run the basic algorithm on the input to get the optimal answer. For the result direction will set the output value to 1, and zero for all the others

-- | Training data: for each position in the world, use the base algorithm to get the training answer
trainData ::  World -> [(Vector Float, Vector Float)]
trainData w = map onePos $ allPositions w
  where
    onePos p = 
      let (_,(is,dir)) = algStepExplain w baseAlg p
          os = map (\(d,_)->if dir==d then 1 else 0) is 
      in (formatInputs w is,fromList os) 

From here, we unimaginatively reuse the LambdaNet tutorial code to build a network...

-- | Create the network
buildNetwork :: RandomGen g => g -> Network Float
buildNetwork g = createNetwork normals g $ replicate 3 $ LayerDefinition sigmoidNeuron dirLength connectFully

And train it:

-- | Train a network on several given worlds
train :: Network Float -> [World] -> Network Float
train n ws = 
  let t = BackpropTrainer (3 :: Float) quadraticCost quadraticCost'
      dat = concatMap trainData ws
  in trainUntilErrorLessThan n t online dat 0.01

What is critical here is that we train the network on several different worlds. I tried training only one world, the resulting network would perform well on worlds of the same size or smaller, but not bigger worlds, because it was too fit for the actual smell values. Training even on only two quite different worlds brought big enhancements in the intelligence of the network, at the code of longer learning time.

Once the network is trained, you can run it on several different worlds and see how it can find the food. There is a simple visualization module that allows you to see clearly the moves, for example:

Iteration 1
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#........#
#.......@#
#........#
#........#
########## 

(X being the food, @ the current position)

Iteration 3
##########
#........#
#........#
#........#
#...X....#
#........#
#........#
#......@.#
#........#
#........#
#........#
##########

Iteration 6
##########
#........#
#........#
#........#
#...@....#
#........#
#........#
#........#
#........#
#........#
#........#
##########

Yummy!

If you're interested, the full source code with tasty unit tests in on Github.

This is of course very basic, and only begs to be enhanced with more complicated worlds (maybe with walls, several sources of food, places with no smell at all, etc). What do you do when you don't know the best algorithm yourself? Maybe I'll come back for more later to find out!


No comments: