Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I am using a sklearn for the multi-classification task. I need to split data into train_set and test_set. I want to take randomly the same sample number from each class. Actually, I am using this function

X_train, X_test, y_train, y_test = cross_validation.train_test_split(Data, Target, test_size=0.3, random_state=0)

but it gives an unbalanced dataset! Any suggestion.

1 Answer

0 votes
by (33.1k points)

You can simply use the train test split method available in scikit learn:

For example:

#import class

from sklearn.model_selection import train_test_split

#assign variables

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

Hope this answer helps.

If you wish to learn more about scikit learn visit this Scikit Learn Tutorial

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.4k questions

32.5k answers


108k users

Browse Categories