Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

In any of the standard Reinforcement learning algorithms that use generalized temporal differencing (e.g. SARSA, Q-learning), the question arises as to what values to use for the lambda and gamma hyper-parameters for a specific task.

I understand that lambda is tied to the length of the eligibility traces and gamma can be interpreted as how much to discount future rewards, but how do I know when my lambda value is too low for a given task, or my gamma too high?

I realize these questions don't have well-defined answers, but knowing some 'red flags' for having inappropriate values would be very useful.

Take the standard cart-pole or inverted pendulum task for example. Should I set gamma to be high, since it requires many steps to fail the task, or low because the state information is completely Markovian? And I can't even fathom the rationale for lambda values…

1 Answer

0 votes
by (107k points)

Gamma (γ) is the discount rate. It varies between 0 and 1. The higher the value the less you are discounting. Gamma is seen as part of the problem, not of the algorithm. A reinforcement learning algorithm tries for each state to optimize the cumulative discounted reward:

R1 + gamma*R2 + gamma^2*R3 + gamma^3*r4 ... 

where rn is the reward received at time step n from the current state. So, for one choice of gamma, the algorithm may optimize one thing, and for another choice, it will optimize something else.

Lambda (λ) is a credit assignment variable. It varies between the value between 0 and 1. If the value is higher then more credit you can assign to further back states and actions. Lambda is a part of the algorithm and not of the problem. The lambda parameter decides how much you bootstrap on earlier learned value versus using the current Monte Carlo roll-out. This indicates a trade-off between more bias (low lambda) and more variance (high lambda). In many cases, initiating lambda to zero is already a fine algorithm, but setting lambda higher helps speed up things.

If you wish to learn about Reinforcement Learning then visit this Artificial Intelligence Course.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...