Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I have a multi class classification problem and my dataset is skewed, I have 100 instances of a particular class and say 10 of some different class, so I want to split my dataset keeping ratio between classes, if I have 100 instances of a particular class and I want 30% of records to go in the training set I want to have there 30 instances of my 100 record represented class and 3 instances of my 10 record represented class and so on.

1 Answer

0 votes
by (33.1k points)

Simply try this method:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,                                          stratify=y,                                             test_size=0.25)

Study the Datasets In Machine Learning for more.

If you want to master the course go through the Machine Learning Tutorial

Browse Categories