2 views

In any of the standard Reinforcement learning algorithms that use generalized temporal differencing (e.g. SARSA, Q-learning), the question arises as to what values to use for the lambda and gamma hyper-parameters for a specific task.

I understand that lambda is tied to the length of the eligibility traces and gamma can be interpreted as how much to discount future rewards, but how do I know when my lambda value is too low for a given task, or my gamma too high?

I realize these questions don't have well-defined answers, but knowing some 'red flags' for having inappropriate values would be very useful.

Take the standard cart-pole or inverted pendulum task for example. Should I set gamma to be high, since it requires many steps to fail the task, or low because the state information is completely Markovian? And I can't even fathom the rationale for lambda values…

by (108k points)

Gamma (γ) is the discount rate. It varies between 0 and 1. The higher the value the less you are discounting. Gamma is seen as part of the problem, not of the algorithm. A reinforcement learning algorithm tries for each state to optimize the cumulative discounted reward:

R1 + gamma*R2 + gamma^2*R3 + gamma^3*r4 ...

where rn is the reward received at time step n from the current state. So, for one choice of gamma, the algorithm may optimize one thing, and for another choice, it will optimize something else.

Lambda (λ) is a credit assignment variable. It varies between the value between 0 and 1. If the value is higher then more credit you can assign to further back states and actions. Lambda is a part of the algorithm and not of the problem. The lambda parameter decides how much you bootstrap on earlier learned value versus using the current Monte Carlo roll-out. This indicates a trade-off between more bias (low lambda) and more variance (high lambda). In many cases, initiating lambda to zero is already a fine algorithm, but setting lambda higher helps speed up things.

If you wish to learn about Reinforcement Learning then visit this Artificial Intelligence Course.