In reinforcement learning, an agent tries to return up with the most effective action given a state.
For example, in the video game Pac-Man, the state space would be the 2D game world you are in, the surrounding items (PAC-dots, enemies, walls, etc), and actions would be moving through that 2D space (going up/down/left/right). Reinforcement Learning has a major role in making this topic understandable to students.
So, given the state of the game world, the agent needs to pick the best action to maximize rewards. Through reinforcement learning's trial and error, it accumulates "knowledge" through these (state, action) pairs, as in, it can tell if there would be a positive or negative reward given a (state, action)pair. Let's call this value Q(state, action). Reinforcement Learning is a great method of learning this method.
A rudimentary way to store this information would be a table like below
state | action | Q(state, action)
... | ... | ...
The (state, action) space can be very big
However, when the game gets complicated, the knowledge space can become huge and it no longer becomes feasible to store all (state, action) pairs. If you're thinking that regarding it in raw terms, even a slightly different state is still a distinct state (e.g. different positions of the enemy returning through the identical corridor). You could use one thing that may generalize the information rather than storing and searching up every little distinct state.
So, what you can do is produce a neural network, that e.g. predicts the reward for an input (state, action) (or pick the best action given a state, however you like to look at it)
Approximating the Q value with a Neural Network
So, what you effectively have is a NN that predicts the Q value, based on the input (state, action). This is way more tractable than storing every possible value as we did in the table above.
Q = neural_network.predict(state, action)
Deep Reinforcement Learning
Deep Neural Networks
To be able to do that for complicated games, the NN may need to be "deep", meaning a few hidden layers may not suffice to capture all the intricate details of that knowledge, hence the use of deep NNs (lots of hidden layers). Neural Network Tutorial is also a specific way of learning things.
The extra hidden layers allow the network to internally come up with features that can help it learn and generalize complex problems that may have been impossible on a shallow network.
Summary: Deep RL uses a Deep Neural Network to approximate Q(s, a). Non-Deep RL defines Q(s, a) using a tabular function.
A great method whilst learning this will be the Machine Learning Course.