Q Learning Algorithm for Tic Tac Toe

Question

1 Answer

vinita · Answer 1 · 2019-07-04T11:59:30+0000

Tic-tac-toe is a two-player game. When learning using Q-Learning you need an opponent to play against while learning. That means that you need to implement another algorithm (e.g. Minimax), play yourself or use another reinforcement learning agent (might be the same Q-learning algorithm).

The machine learning approach we will use is called Reinforcement Learning, and the particular variant we will use is called Tabular Q Learning. You have a Q value for each state-action pair. You can update one Q value after every action you perform. More precisely, if applying action a1 from state s1 gets you into state s2 and brings you some reward r, then you update Q(s1, a1) as follows:

Q(s1, a1) = Q(s1, a1) + learning_rate * (r + discount_factor * max Q(s2, _) - Q(s1, a1))

Q Learning Algorithm for Tic Tac Toe

1 Answer

Related questions

Browse Categories