Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)
I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model? Thanks in advance.

1 Answer

0 votes
by (33.1k points)
edited by
You need to split the data into training and test set.

Testing data points represent real-world data. Feature normalization of the explanatory (or predictor) variables is a technique used to center and normalize the data by subtracting the mean and dividing by the variance. If you take the mean and variance of the whole dataset you'll be introducing future information into the training explanatory variables.

You can perform feature normalization over the training data. Then perform normalisation on testing instances as well, but this time using the mean and variance of training explanatory variables. We can test and evaluate whether our model can generalize well to new, unseen data points.

Hope this answer helps you!

If you want to know more about Machine Learning then watch this video:

Browse Categories