As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.

That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (a group of genes), such as

1 0 0 1 0 1 0 1 0 1 1 1 0 0...

How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?

In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2-dimensional array), although I'm a bit stuck on what the following algorithm wants from me:

(from __http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm__)

1. Set parameter, and environment reward matrix R

2. Initialize matrix Q as zero matrix

3. For each episode: * Select random initial state *

Do while not reach goal state o

Select one among all possible actions for the current state

o Using this possible action, consider going to the next state

o Get maximum Q value of this next state based on all possible actions

o Compute

o Set the next state as the current state

End Do

End For

The big problem for me is implementing a reward matrix R and what a Q matrix exactly is and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?

If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.