I am a newbie in convolutional neural networks and just have an idea about feature maps and how convolution is done on images to extract features. I would be glad to know some details on applying batch normalization on CNN.

I read this paper https://arxiv.org/pdf/1502.03167v3.pdf and could understand the BN algorithm applied on a data but in the end, they mentioned that a slight modification is required when applied to CNN:

For convolutional layers, we additionally want the normalization to obey the convolutional property – so those different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a mini- batch, over all locations. In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations – so for a mini-batch of size m and feature maps of size p × q, we use the effective mini-batch of size m′ = |B| = m · pq. We learn a pair of parameters γ(k) and β(k) per feature map, rather than per activation. Alg. 2 is modified similarly so that during inference the BN transform applies the same linear transformation to each activation in a given feature map.

I am total confused when they say "so that different elements of the same feature map, at different locations, are normalized in the same way"

I know what feature maps mean and different elements are the weights in every feature map. But I could not understand what location or spatial location means.

I could not understand the below sentence at all "In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations"

I would be glad if someone could elaborate and explain me in much simpler terms