2 views

I want to use a minimax search (with alpha-beta pruning), or rather a negamax search, to make a computer program play a card game.

The card game actually consists of 4 players. So in order to be able to use minimax etc., I simplify the game to "me" against the "others". After each "move", you can objectively read the current state's evaluation from the game itself. When all 4 players have placed the card, the highest win them all - and the cards' values count.

As you don't know how the distribution of cards between the other 3 players is exactly, I thought you must simulate all possible distributions ("worlds") with the cards that are not yours. You have 12 cards, the other 3 players have 36 cards in total.

So my approach is this algorithm, where a player is a number between 1 and 3 symbolizing the three computer players that the program might need to find moves for. And -player stands for the opponents, namely all the other three players together.

private Card computerPickCard(GameState state, ArrayList<Card> cards) { int bestScore = Integer.MIN_VALUE; Card bestMove = null; int nCards = cards.size(); for (int i = 0; i < nCards; i++) { if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card int score; GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state) score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE); if (score > bestScore) { bestScore = score; bestMove = cards.get(i); } } } // now bestMove is the card to place } private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) { ArrayList<Card> cards; if (player >= 1 && player <= 3) { cards = state.getCards(player); } else { if (player == -1) { cards = state.getCards(0); cards.addAll(state.getCards(2)); cards.addAll(state.getCards(3)); } else if (player == -2) { cards = state.getCards(0); cards.addAll(state.getCards(1)); cards.addAll(state.getCards(3)); } else { cards = state.getCards(0); cards.addAll(state.getCards(1)); cards.addAll(state.getCards(2)); } } if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached if (player >= 1 && player <= 3) { return state.getCurrentPoints(player); // player's points as a positive value (for self) } else { return -state.getCurrentPoints(-player); // player's points as a negative value (for others) } } else { int score; int nCards = cards.size(); if (player > 0) { // make one move (it's player's turn) for (int i = 0; i < nCards; i++) { GameState futureState = state.testMove(cards.get(i)); if (futureState != null) { // wenn Zug gültig ist score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha); if (score >= beta) { return score; } if (score > alpha) { alpha = score; // alpha acts like max } } } return alpha; } else { // make three moves (it's the others' turn) for (int i = 0; i < nCards; i++) { GameState futureState = state.testMove(cards.get(i)); if (futureState != null) { // if move is valid for (int k = 0; k < nCards; k++) { if (k != i) { GameState futureStateLevel2 = futureState.testMove(cards.get(k)); if (futureStateLevel2 != null) { // if move is valid for (int m = 0; m < nCards; m++) { if (m != i && m != k) { GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m)); if (futureStateLevel3 != null) { // if move is valid score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha); if (score >= beta) { return score; } if (score > alpha) { alpha = score; // alpha acts like max } } } } } } } } } return alpha; } } }

This seems to work fine, but for a depth of 1 (depthLeft=1), the program already needs to calculate 50,000 moves (placed cards) on average. This is too much, of course!

So my questions are:

1. Is the implementation correct at all? Can you simulate a game like this? Regarding the imperfect information, especially?

2. How can you improve the algorithm in speed and workload?

3. Can I, for example, reduce the set of possible moves to a random set of 50% to improve speed, while keeping good results?

4. I found a UCT algorithm to be a good solution (maybe). Do you know this algorithm? Can you help me implement it?

by (108k points)

In many card games, you can sample the unknown cards that your opponent could have done instead of generating all of them. You can take into account information like short suits and the probability of holding certain cards given play so far when doing this sampling to weight the likelihood of each possible hand. Then, you solve each hand using a perfect information search. The best move over all of these worlds is often the best move overall - with some caveat.

In games like Poker, this won't work very well -- the game is all about the hidden information. You have to be precise while balancing your actions to keep the information about your hand hidden.

But, in games like trick-based card games, this works pretty well - particularly since new information is being revealed all the time. Really good players have a good idea what everyone holds anyway. So, reasonably strong Skat and Bridge programs have been based on these ideas.

If you can completely solve the underlying world, that is best, but if you can't, you can use minimax or UCT to choose the best move in each world. There are also hybrid algorithms (ISMCTS) that try to mix this process together. Be careful about the claims here. Simple sampling approaches are easier to code -- you should try the simpler approach before a more complex one.

Here are some research papers that will provide more information on when the sampling approach to imperfect information has worked well:

Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search (This paper analyzes when the sampling approach is likely to work.)

Improving State Evaluation, Inference, and Search in Trick-Based Card Games (This paper describes the use of sampling in Skat)

Imperfect information in a computationally challenging game (This paper describes sampling in Bridge)

Information Set Monte Carlo Tree Search (This paper merges sampling and UCT/Monte Carlo Tree Search to avoid the issues in the first reference.)