I am new to the field of neural networks and I would like to know the difference between Deep Belief Networks and Convolutional Networks. Also, is there a Deep Convolutional Network which is the combination of Deep Belief and Convolutional Neural Nets?
This is what I have gathered till now. Please correct me if I am wrong.
For an image classification problem, Deep Belief networks have many layers, each of which is trained using a greedy layer-wise strategy. For example, if my image size is 50 x 50, and I want a Deep Network with 4 layers namely
Input Layer
Hidden Layer 1 (HL1)
Hidden Layer 2 (HL2)
Output Layer
My input layer will have 50 x 50 = 2500 neurons, HL1 = 1000 neurons (say) , HL2 = 100 neurons (say) and output layer = 10 neurons, in order to train the weights (W1) between Input Layer and HL1, I use an AutoEncoder (2500 - 1000 - 2500) and learn W1 of size 2500 x 1000 (This is unsupervised learning). Then I feed forward all images through the first hidden layers to obtain a set of features and then use another autoencoder ( 1000 - 100 - 1000) to get the next set of features and finally use a softmax layer (100 - 10) for classification. (only learning the weights of the last layer (HL2 - Output which is the softmax layer) is supervised learning).
(I could use RBM instead of autoencoder).
If the same problem was solved using Convolutional Neural Networks, then for 50x50 input images, I would develop a network using only 7 x 7 patches (say). My layers would be
Input Layer (7 x 7 = 49 neurons)
HL1 (25 neurons for 25 different features) - (convolution layer)
Pooling Layer
Output Layer (Softmax)
And for learning the weights, I take 7 x 7 patches from images of size 50 x 50 and feed forward through a convolutional layer, so I will have 25 different feature maps each of size (50 - 7 + 1) x (50 - 7 + 1) = 44 x 44.
I then use a window of say 11x11 for pooling hand hence get 25 feature maps of size (4 x 4) for as the output of the pooling layer. I use these feature maps for classification.
While learning the weights, I don't use the layer-wise strategy as in Deep Belief Networks (Unsupervised Learning), but instead, use supervised learning and learn the weights of all the layers simultaneously. Is this correct or is there any other way to learn the weights?
Is what I have understood correct?
So if I want to use DBN's for image classification, I should resize all my images to a particular size (say 200x200) and have that many neurons in the input layer, whereas in case of CNN's, I train only on a smaller patch of the input (say 10 x 10 for an image of size 200x200) and convolve the learned weights over the entire image?
Do DBNs provide better results than CNN's or is it purely dependent on the dataset?
Thank You.