Temporal-Difference including function approximation can assemble the solutions that are worse than those achieved by Monte-Carlo regression, even in the simple case of on-policy evaluation.
To understand the additional of this problem, we tend to investigate the problem of approximation errors in areas of sharp discontinuities of the value function being more propagated by bootstrap updates.
We show empirical proof of this leakage propagation and show analytically that it should occur, in a simple Markov chain, when function approximation errors are present.
For reversible policies, the result can be interpreted as the tension between two terms of the loss function that TD minimizes, as recently described. We show that the upper bounds hold, but they do not imply that leakage propagation occurs and under what conditions.
Lastly, we examine whether or not the problem can be mitigated with a better state representation and whether it may be learned in an unsupervised manner, without rewards or privileged information.
For more information, refer to the following link: https://www.researchgate.net/publication/326290611_Temporal_Difference_Learning_with_Neural_Networks_-_Study_of_the_Leakage_Propagation_Problem
If you wish to learn the basics of Neural Network Tutorial then visit this Neural Network Tutorial.