Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

I could not understand how to update Q values for the tic tac toe game. I read all about that but I could not imagine how to do this. I read that Q value is an updated end of the game, but I haven't understood that if there is Q value for each action?

1 Answer

0 votes
by (108k points)

Tic-tac-toe is a two-player game. When learning using Q-Learning you need an opponent to play against while learning. That means that you need to implement another algorithm (e.g. Minimax), play yourself or use another reinforcement learning agent (might be the same Q-learning algorithm).

The machine learning approach we will use is called Reinforcement Learning, and the particular variant we will use is called Tabular Q Learning. You have a Q value for each state-action pair. You can update one Q value after every action you perform. More precisely, if applying action a1 from state s1 gets you into state s2 and brings you some reward r, then you update Q(s1, a1) as follows:

Q(s1, a1) = Q(s1, a1) + learning_rate * (r + discount_factor * max Q(s2, _) - Q(s1, a1))

Browse Categories