+1 vote
5 views
in Python by (250 points)
edited by
I wish to divide pandas dataframe to 3 separate sets. I know by using train_test_split from sklearn.cross_validation, one can divide the data in two sets (train and test). but, to perform these I couldn't find any solution about splitting the data into three sets. most preferably, I would like to have the indices of the original data.
 
I know here we would be using train_test_split two times and somehow we can adjust the indices. But is these a standard or built-in way to split the data into 3 sets instead of 2?
kindly help

2 Answers

+4 votes
by (10.9k points)

You can split your dataset into train,validation and test using the numpy.split() method:

Syntax-

numpy.split(array,indices_or_sections,axis=0)
 

Example-

>>> a= np.arange(9.0)

>>> np.split(a, 3)

[array([ 0.,  1.,  2.]),

 array([ 3.,  4.,  5.]),

 array([ 6.,  7.,  8.])]

0 votes
by (29.5k points)
edited by

Simply use train_test_split function two times.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test 

= train_test_split(X, y, test_size=0.2, random_state=1)

X_train, X_val, y_train, y_val 

= train_test_split(X_train, y_train, test_size=0.25, random_state=1)

You can use the following video tutorials to clear all your doubts:-

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...