I want to use a minimax search (with alpha-beta pruning), or rather a negamax search, to make a computer program play a card game.
The card game actually consists of 4 players. So in order to be able to use minimax etc., I simplify the game to "me" against the "others". After each "move", you can objectively read the current state's evaluation from the game itself. When all 4 players have placed the card, the highest win them all - and the cards' values count.
As you don't know how the distribution of cards between the other 3 players is exactly, I thought you must simulate all possible distributions ("worlds") with the cards that are not yours. You have 12 cards, the other 3 players have 36 cards in total.
So my approach is this algorithm, where a player is a number between 1 and 3 symbolizing the three computer players that the program might need to find moves for. And -player stands for the opponents, namely all the other three players together.
private Card computerPickCard(GameState state, ArrayList<Card> cards) { int bestScore = Integer.MIN_VALUE; Card bestMove = null; int nCards = cards.size(); for (int i = 0; i < nCards; i++) { if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card int score; GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state) score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE); if (score > bestScore) { bestScore = score; bestMove = cards.get(i); } } } // now bestMove is the card to place } private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) { ArrayList<Card> cards; if (player >= 1 && player <= 3) { cards = state.getCards(player); } else { if (player == -1) { cards = state.getCards(0); cards.addAll(state.getCards(2)); cards.addAll(state.getCards(3)); } else if (player == -2) { cards = state.getCards(0); cards.addAll(state.getCards(1)); cards.addAll(state.getCards(3)); } else { cards = state.getCards(0); cards.addAll(state.getCards(1)); cards.addAll(state.getCards(2)); } } if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached if (player >= 1 && player <= 3) { return state.getCurrentPoints(player); // player's points as a positive value (for self) } else { return -state.getCurrentPoints(-player); // player's points as a negative value (for others) } } else { int score; int nCards = cards.size(); if (player > 0) { // make one move (it's player's turn) for (int i = 0; i < nCards; i++) { GameState futureState = state.testMove(cards.get(i)); if (futureState != null) { // wenn Zug gültig ist score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha); if (score >= beta) { return score; } if (score > alpha) { alpha = score; // alpha acts like max } } } return alpha; } else { // make three moves (it's the others' turn) for (int i = 0; i < nCards; i++) { GameState futureState = state.testMove(cards.get(i)); if (futureState != null) { // if move is valid for (int k = 0; k < nCards; k++) { if (k != i) { GameState futureStateLevel2 = futureState.testMove(cards.get(k)); if (futureStateLevel2 != null) { // if move is valid for (int m = 0; m < nCards; m++) { if (m != i && m != k) { GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m)); if (futureStateLevel3 != null) { // if move is valid score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha); if (score >= beta) { return score; } if (score > alpha) { alpha = score; // alpha acts like max } } } } } } } } } return alpha; } } }
This seems to work fine, but for a depth of 1 (depthLeft=1), the program already needs to calculate 50,000 moves (placed cards) on average. This is too much, of course!
So my questions are:
Is the implementation correct at all? Can you simulate a game like this? Regarding the imperfect information, especially?
How can you improve the algorithm in speed and workload?
Can I, for example, reduce the set of possible moves to a random set of 50% to improve speed, while keeping good results?
I found a UCT algorithm to be a good solution (maybe). Do you know this algorithm? Can you help me implement it?