The explanation
These days, the progressive deep learning for image classification issues (e.g. ImageNet) are sometimes "deep convolutional neural networks" (Deep ConvNets). They look roughly like this ConvNet configuration by Krizhevsky et al:
For the inference (classification), you feed an image into the left side (notice that the depth on the left side is 3, for RGB), crunch through a series of convolution filters, and it spits out a 1000-dimensional vector on the right-hand facet. This image is very for ImageNet, which focuses on classifying 1000 categories of images, so the 1000d vector is "score of how likely it is that this image fits in the category."
Training the neural net is merely slightly additional complicated. For training, you basically run classification repeatedly, and every so often you do backpropagation (see Andrew Ng's lectures) to improve the convolution filters in the network. Basically, backpropagation asks "what did the network classify correctly/incorrectly? For misclassified stuff, let's fix the network a little bit."
Implementation
Caffe may be an in no time open-source implementation (faster than Cuda-convent from Krizhevsky et al) of deep convolutional neural networks. The Caffe code is pretty simple to read; there is essentially one C++ file per sort of network layer (e.g. convolutional layers, max-pooling layers, etc).
For more details, check Deep learning with TensorFlow.