0 votes
1 view
in AI and Deep Learning by (43.2k points)

I had a small but potentially stupid question about Monte Carlo Tree Search. I understand most of it but have been looking at some implementations and noticed that after the MCTS is run for a given state and the best move returned, the tree is thrown away. So for the next move, we have to run MCTS from scratch on this new state to get the next best position.

I was just wondering why we don't retain some of the information from the old tree. It seems like there is valuable information about the states in the old tree, especially given that the best move is one where the MCTS has explored most. Is there any particular reason we can't use this old information in some useful way?

1 Answer

0 votes
by (92.8k points)

Probably because of stochastical dependence. The root-problem changed and thus the different paths might be traversed. In minmax I would think, given a 50-move decision, we could reuse 1/50 of our already pre-computed data (simplified; loss is huge), but in MCTS it's maybe not as irrelevant in terms of math-proofs if we are to re-use these or not. I think this paper is explaining this (chapter 5).

Some implementations do indeed retain the information.

For example, the AlphaGo Zero paper says:

The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is maintained along with all its statistics, while the remains of the tree are discarded.

If you want to learn about monte Carlo tree search then visit this Python Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !