0 votes
1 view
in AI and Deep Learning by (38k points)

I could not understand how to update Q values for the tic tac toe game. I read all about that but I could not imagine how to do this. I read that Q value is an updated end of the game, but I haven't understood that if there is Q value for each action?

1 Answer

0 votes
by (82.1k points)

Tic-tac-toe is a two-player game. When learning using Q-Learning you need an opponent to play against while learning. That means that you need to implement another algorithm (e.g. Minimax), play yourself or use another reinforcement learning agent (might be the same Q-learning algorithm).

The machine learning approach we will use is called Reinforcement Learning, and the particular variant we will use is called Tabular Q Learning. You have a Q value for each state-action pair. You can update one Q value after every action you perform. More precisely, if applying action a1 from state s1 gets you into state s2 and brings you some reward r, then you update Q(s1, a1) as follows:

Q(s1, a1) = Q(s1, a1) + learning_rate * (r + discount_factor * max Q(s2, _) - Q(s1, a1))

Welcome to Intellipaat Community. Get your technical queries answered by top developers !