**Temporal Difference** is an approach to learning how to predict a quantity that depends on the future values of a given signal. It can be used to learn both the V-function and the Q-function, whereas **Q-learning** is a specific TD algorithm used to learn the Q-function. As stated by Don Reba, you need the Q-function to perform an action (e.g., following an epsilon-greedy policy). If you have only the V-function you can still derive the Q-function by iterating over all the possible next states and choosing the action which leads you to the state with the highest V-value. For examples and more insights, I recommend the classic book from Sutton and Barto.

In **model-free** RL you don't learn the state-transition function (*the model*) and you can rely only on samples. However, you might be interested also in learning it, for example, because you cannot collect many samples and want to generate some virtual ones. In this case, we talk about **model-based** RL. Model-based RL is quite common in robotics, where you cannot perform many real simulations or the robot will break. This is a good survey with many examples (but it only talks about policy search algorithms). For another example have a look at this paper. Here the authors learn - along with a policy - a Gaussian process to approximate the forward model of the robot, in order to simulate trajectories and to reduce the number of real robot interaction.

Studying the Reinforcement Learning course will put forth the stronghold of a student. Also, Machine Learning Tutorial would be quite beneficial as well. It eventually helps the students to crack Machine Learning Interview Questions