2 views

As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.

That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (a group of genes), such as

1 0 0 1 0 1 0 1 0 1 1 1 0 0...

How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?

In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2-dimensional array), although I'm a bit stuck on what the following algorithm wants from me:

1. Set parameter, and environment reward matrix R

2. Initialize matrix Q as zero matrix

3. For each episode: * Select random initial state *

Do while not reach goal state o

Select one among all possible actions for the current state

o Using this possible action, consider going to the next state

o Get maximum Q value of this next state based on all possible actions

o Compute

o Set the next state as the current state

End Do

End For

The big problem for me is implementing a reward matrix R and what a Q matrix exactly is and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?

If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.

by (108k points)

The Q matrix is the minimum path from any initial state to the goal state. The algorithm used by the model helps the model to learn from its training. Each episode is equivalent to the training session.

In each training session, the model explores the environment of its own which is represented by the R matrix. As it explores, it gets some rewards, until it reaches the goal state.

This whole process is done to enhance the brain of the model which is represented by the Q matrix.