Deep Learning involves a huge number of matrix multiplications and other computations. Using GPUs, we can massively parallelized and can speed up the model training.
A single GPU might have thousands of cores whereas a CPU usually has a maximum of 12 cores. Even though GPU cores are slower than CPU cores, they more than makeup for that because of their large number and faster memory with the massively parallelized operations. The sequential code is still faster on CPUs.
You can watch this video to know more: