Is there a rule-of-thumb for how to divide a dataset into training and validation sets?

Question

1 Answer

Shrutiparna · Answer 1 · 2019-05-27T10:09:27+0000

@kavita,There are two main concerns regarding the division-

1.With Less training data,your parameter estimates have greater variance.

2.With Less testing data, your performance statistic will have greater variance.

It should be divided in such a way that neither variance is too high.According to Pareto principle, 80/20 is the common occurring ratio.

Let’s assume you have enough data for a proper split, following are some instructive ways to get a handle on variances:

split the data into training and testing.
Then slit the training data into validation and training.
Subsample random selections of training data, train the classify and then record a performance on the validation set.
Try a different type of splits, you will notice greater performance with more data.
To get a handle on variance follows the same procedure but in reverse.

If you are a beginner and want to know more about Machine Learning, then check out this course by Intellipaat which will teach you ML from basics: Machine Learning Course

Is there a rule-of-thumb for how to divide a dataset into training and validation sets?

1 Answer

Related questions

Browse Categories