I read a few books and articles about Convolutional neural network, it seems I understand the concept but I don't know how to put it up like in the image below:

(source: __what-when-how.com__)

from 28x28 normalized pixel INPUT, we get 4 feature maps of size 24x24. but how to get them? resizing the INPUT image ? or performing image transformations? but what kind of transformations? or cutting the input image into 4 pieces of size 24x24 by 4 corners? I don't understand the process, to me, it seems they cut up or resize the image to smaller images at each step. please help thanks.