What is the difference between value iteration and policy iteration?

Question

1 Answer

Anurag · Answer 1 · 2019-07-04T06:12:31+0000

Policy iteration algorithms: These algorithms manipulate the policy directly, rather than finding it indirectly using the optimal value function. If you start with random policy, it finds the value function of that policy, then it finds the new improvised policy based on the previous value. In this process, each policy should be a improvement of the previous one.

Value iteration algorithm: If you start with a random value function and then find a new (improved) value function in an iterative process until reaching the optimal value function. It will give you an optimal policy from the optimal value function.

You can say that both algorithms share the same working principle. These two methods are cases of generalized policy iteration.

Hope this answer helps.

What is the difference between value iteration and policy iteration?

1 Answer

Related questions

Browse Categories