First of all: This is not a question about how to make a program play Five in a Row. Been there, done that.
Introductory explanation
I have made a five-in-a-row-game as a framework to experiment with genetically improving AI (ouch, that sounds awfully pretentious). As with most turn-based games, the best move is decided by assigning a score to every possible move and then playing the move with the highest score. The function for assigning a score to a move (a square) goes something like this:
If the square already has a token, the score is 0 since it would be illegal to place a new token in the square.
Each square can be a part of up to 20 different winning rows (5 horizontal, 5 vertical, 10 diagonals). The score of the square is the sum of the score of each of these rows.
The score of a row depends on the number of friendly and enemy tokens already in the row. Examples:
A row with four friendly tokens should have an infinite score because if you place a token there you win the game.
The score for a row with four enemy tokens should be very high since if you don't put a token there, the opponent will win on his next turn.
A row with both friendly and enemy tokens will score 0 since this row can never be part of a winning row.
Given this algorithm, I have declared a type called TBrain:
type TBrain = array[cFriendly..cEnemy , 0..4] of integer;
The values in the array indicate the score of a row with either N friendly tokens and 0 enemy tokens, or 0 friendly tokens and N enemy tokens. If there are 5 tokens in a row there's no score since the row is full.
It's actually quite easy to decide which values should be in the array. Brain[0,4] (four friendly tokens) should be "infinite", let's call that 1.000.000. vBrain[1,4] should be very high, but not so high that the brain would prefer blocking several enemies wins rather than wining itself
Consider the following (improbable) board:
0123456789 +---------- 0|1...1...12 1|.1..1..1.2 2|..1.1.1..2 3|...111...2 4|1111.1111. 5|...111.... 6|..1.1.1... 7|.1..1..1.. 8|1...1...1.
Player 2 should place his token in (9,4), winning the game, not in (4,4) even though he would then block 8 potential winning rows for player 1. Ergo, vBrain[1,4] should be (vBrain[0,4]/8)-1. Working like this we can find optimal values for the "brain", but again, this is not what I'm interested in. I want an algorithm to find the best values.
I have implemented this framework so that it's totally deterministic. There are no random values added to the scores, and if several squares have the same score the top-left will be chosen.
Actual problem
That's it for the introduction, now to the interesting part (for me, at least)
I have two "brains", vBrain1 and vBrain2. How should I iteratively make these better? Imagine something like this:
Initialize vBrain1 and vBrain2 with random values.
Simulate a game between them.
Assign the values from the winner to the loser, then randomly change one of them slightly.
This doesn't seem to work. The brains don't get any smarter. Why?
Should the score-method add some small random values to the result, so that two games between the same two brains would be different? How much should the values change for each iteration? How should the "brains" be initialized? With constant values? With random values?
Also, does this have anything to do with AI or genetic algorithms at all?
PS: The question has nothing to do with Five in a Row. That's just something I chose because I can declare a very simple "Brain" to experiment on.