@kavita,There are two main concerns regarding the division-
1.With Less training data,your parameter estimates have greater variance.
2.With Less testing data, your performance statistic will have greater variance.
It should be divided in such a way that neither variance is too high.According to Pareto principle, 80/20 is the common occurring ratio.
Let’s assume you have enough data for a proper split, following are some instructive ways to get a handle on variances:
- split the data into training and testing.
- Then slit the training data into validation and training.
- Subsample random selections of training data, train the classify and then record a performance on the validation set.
- Try a different type of splits, you will notice greater performance with more data.
- To get a handle on variance follows the same procedure but in reverse.
If you are a beginner and want to know more about Machine Learning, then check out this course by Intellipaat which will teach you ML from basics: Machine Learning Course