0 votes
1 view
in AI and Deep Learning by (44.8k points)

I'm writing a Connect4 game with an AI opponent using adversarial search techniques and I have somewhat run into a wall. I feel that I'm not far from a solution but that there's perhaps a problem where I'm switching perspectives (as in the perspective of which participant I'm basing my evaluation scores on), missing a minus sign somewhere or something like that.

The problem is either that in the variations that I've tried that the AI chooses not to block the player when the player has three-in-a-row but otherwise the AI plays a perfect game, or that he prefers to block the player, even if he has the chance to win the game. It also seems to matter whether or not the search depth is an even or an uneven number, as the AI is pants-on-head retarded at a 6-ply search, which is pretty telling that something is wrong.

Search

The algorithm used is negamax with alpha-beta pruning and is implemented as follows:

private int Negamax(int depth, int alpha, int beta, Player player) 

{ Player winner;

if (Evaluator.IsLeafNode(game, out winner)) 

return winner == player ? (10000 / depth) : (-10000 / depth); } 

if (depth == Constants.RecursionDepth) 

{ return Evaluator.Evaluate(game, depth, player); } foreach (var move in moves) 

{ int row; if (board.DoMove(move, player, out row)) { var value = -Negamax(depth + 1, -beta, -alpha, (Player)1 - (int)player); 

board.UndoMove(move, row, player); 

if (value > alpha) { 

alpha = value; 

if (player == Player.AI) 

{ bestColumn = move; } } 

if (alpha >= beta) { 

return alpha; } } } 

return alpha; }

I don't suspect that the problem is in this function, but it might be.

Evaluation

I've based the evaluation function off of the fact that there are only 69 possible ways to get four-in-a-row on a 7x6 board. I have a look-up table of about 350 items that contains the hardcoded information for every column and row which win-combinations the row+column is a part of. For example, for row 0 and column 0, the table looks like this:

//c1r1 table[0][0] = new int[3]; 

table[0][0][0] = 21; 

table[0][0][1] = 27; 

table[0][0][2] = 61;

This means that column 0, row 0 is part of win-combination 21, 27 and 61.

I have a second table, that contains for both players how many stones it has in each of the win-combinations. When I do a move then I do the following:

public bool DoMove(int column, Player p, out int row) { row = moves[column]; 

if (row >= 0) { Cells[column + row * Constants.Columns] = p; moves[column]--; 

var combinations = this.Game.PlayerCombinations[p]; foreach (int i in TerminalPositionsTable.Get(column,row)) 

{ combinations[i]++; } 

return true; } 

else { 

return false; } }

The opposite is, of course, being done for UndoMove.

So after doing a move on column 0, row 0 by Player.Human, the table will be filled with a value of 1 at index 21, 27 and 61. If I do another move in a cell that is also part of win-combination 27, then the player combination table gets incremented at index 27 to 2.

I hope I have made that clear, as it's used in the evaluation function to very quickly determine how close a player is to scoring four-in-a-row.

The evaluation function, where I suspect the problem lies, is as follows:

public static int Evaluate(Game game, int depth, Player player) { var combinations = game.PlayerCombinations[player]; 

int score = 0; 

for (int i = 0; 

i < combinations.Length; i++) 

{ switch (combinations[i]) 

{ case 1: score += 1; 

break; 

case 2: score += 5; 

break; 

case 3: score += 15; 

break; } } 

return score; }

So I simply loop through the 69 possible win-combinations and add an amount to the score based on whether it's a single stone, two-in-a-row or three.

The part I'm still confused about in this whole adversarial search is whether I should care which player is doing a move? I mean, should I pass in the player as I do here, or should I always evaluate the board from the perspective of the AI player? I've tried many combinations of aiScore - humanScore, or just always look from the perspective of Player.AI, and such. But I've hit a dead-end and every combination I've tried was pretty flawed.

So:

  1. Is the logic of my evaluation solid on its basis?

  2. When should I 'switch perspective'?

Any help would be much appreciated.

1 Answer

0 votes
by (97.1k points)

return score;

You also need to make two changes: the first is about changing bestColumn when depth is 0. The second is an integer overflow error. The assumption that the negation of a negative number is positive breaks down for extreme values: in fact int.MinValue equals its own negation. 

with this:

Your code for the evaluations works fine, but as I am trying to understand your code in your searching part, I think you have to replace some syntaxes:

Replace this syntax:

return winner == Player.AI ? (10000 / depth) : (-10000 / depth);

With return winner == player ? (10000 / depth) : (-10000 / depth);

And

return player == Player.AI? score : -score;

Thus when you initiate alpha to int.MinValue and in the next level of the tree let beta = -alpha, beta isn't large and positive like you expect, but instead very negative, causing incorrect pruning. I would suggest using -10000 and 10000 to initiate alpha and beta, and not extremes.

If you wish to know more about Artificial Intelligence visit this Artificial Intelligence Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...