While usually, people tend to simply resize any image into a square while training a CNN (for example resnet takes a 224x224 square image), that looks ugly to me, especially when the aspect ratio is not around 1.
(In fact, that might change ground truth eg the label that an expert might give the distorted image could be different than the original one).
So now I resize the image to, say, 224x160, keeping the original ratio, and then I pad the image with 0s (paste it into a random location in a totally black 224x224 image).
My approach doesn't seem original to me, and yet I cannot find any information whatsoever about my approach versus the "usual" approach. Funky!
So, which approach is better? Why? (if the answer is data-dependent please share your thought regarding when one is preferable over the other.)