0 votes
1 view
in Machine Learning by (17.3k points)

I am using a sklearn for the multi-classification task. I need to split data into train_set and test_set. I want to take randomly the same sample number from each class. Actually, I am using this function

X_train, X_test, y_train, y_test = cross_validation.train_test_split(Data, Target, test_size=0.3, random_state=0)

but it gives an unbalanced dataset! Any suggestion.

1 Answer

0 votes
by (33.2k points)

You can simply use the train test split method available in scikit learn:

For example:

#import class

from sklearn.model_selection import train_test_split

#assign variables

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)

Hope this answer helps.

If you wish to learn more about scikit learn visit this Scikit Learn Tutorial

Welcome to Intellipaat Community. Get your technical queries answered by top developers !