0 votes
1 view
in AI and Deep Learning by (48.7k points)

How is Q-learning different from value iteration in reinforcement learning?

I know Q-learning is model-free and training samples are transitions (s, a, s', r). But since we know the transitions and the reward for every transition in Q-learning, is it not the same as model-based learning where we know the reward for a state and action pair, and the transitions for every action from a state (be it stochastic or deterministic)? I do not understand the difference.

1 Answer

0 votes
by (105k points)
edited by

Q-learning is a model-free reinforcement learning algorithm. The goal of the Q-learning is to learn a policy, which tells the agent what action to take under what circumstances. It does not require any model (thus "model-free") of the environment, and it can handle problems with stochastic transitions and rewards, without any requiring adaptations.

Value iteration is used when you have transition probabilities, which means when you know the probability of getting from state ‘x’ into state ‘x'’ with action ‘a’. 

In contrast, you might have a black box( black box transition probability is a function of the states and actions, which vary as the exploration moves forward) that allows you to simulate it, but you're not actually given the probability. So you are model-free. This is when you apply Q learning.

If you want to learn Artificial Intelligence then go through this video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !