Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

I am using a sklearn for the multi-classification task. I need to split data into train_set and test_set. I want to take randomly the same sample number from each class. Actually, I am using this function

X_train, X_test, y_train, y_test = cross_validation.train_test_split(Data, Target, test_size=0.3, random_state=0)

but it gives an unbalanced dataset! Any suggestion.

1 Answer

0 votes
by (33.1k points)

You can simply use the train test split method available in scikit learn:

For example:

#import class

from sklearn.model_selection import train_test_split

#assign variables

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

Hope this answer helps.

If you wish to learn more about scikit learn visit this Scikit Learn Tutorial

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...