Normalizing the input of your network is a well-established technique for improving the convergence properties of a network. Yes, you can use batch normalization right after the input layer, but the nice thing about batch normalization, in addition to activation distribution stabilization, is that the mean and std deviation are likely migrated as the network learns.
Effectively, setting the batch normalization right after the input layer is a fancy data pre-processing step. It helps, sometimes a lot (e.g. in linear regression). But it's easier and more efficient to compute the mean and variance of the whole training sample once than learn it per batch.
If you want to make your career in Artificial Intelligence then go through this video:
Interested in learning Artificial Intelligence? Learn more from this AI Course!