Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

I am having trouble understanding the SARSA algorithm: http://en.wikipedia.org/wiki/SARSA

In particular, when updating the Q value what is gamma? and what values are used for s(t+1) and a(t+1)?

Can someone explain this algorithm to me?

Thanks.

1 Answer

0 votes
by (108k points)

Here is how it works:

In your first step, you just get a state. Simply store it away as st. Also, watch your value function for the best action to make in this state and store it as at.

In each step, you get rt+1 and st+1. Now again, use your value function to find the best action — at+1. The value of the transformation from your previous action to the new one is equal to rt+1+Q(st+1,at+1)-Q(st, at). Practice this to update your long-term estimate of the previous action's value Q(st, att). Lastly, store st+1 and at+1 as 'st' and at for the next step.

In conclusion, the value function is just a running average of these update values for each action and every state.

At every step after the first one, you get a state and a reward. The value of the previous action, the value of the current action, and the current reward give SARSA the information to raise its estimate of the long-term value of the previous action.

If you want to learn SARSA Reinforcement Learning then visit this Reinforcement Learning Training.

Browse Categories

...