Answering your first question, in such a case, you have one 2D kernel per input channel (plane).
So you perform each convolution (2D Input, 2D kernel) separately and you sum the contributions which give the final output feature map.
Referring to your second question, yes they share the same weights between each color.
If you consider a given output feature map, you have 3 x 2D kernels (i.e one kernel per input channel). Each 2D kernel shares the same weights along the whole input channel (R, G, or B here).
So the whole convolutional layer is a 4D-tensor (nb. input planes x nb. output planes x kernel width x kernel height).
Why have they split the RGB component over several regions?
They split so that they can have separate input plane and weights.
Interested in learning Artificial Intelligence? Learn more from this AI Course!