I am reading a file in python using pandas and then saving it in a numpy array. The file has the dimension of 11303402 rows x 10 columns. I need to split the data for cross-validation and for that I sliced the data into 11303402 rows x 9 columns of examples and 1 array of 11303402 rows x 1 col of labels. The following is the code:
tdata=pd.read_csv('train.csv') tdata.columns='Arrival_Time','Creation_Time','x','y','z','User','Model','Device','sensor','gt']
User_Data = np.array(tdata)
features = User_Data[:,0:9]
labels = User_Data[:,9:10]
The error comes in the following code:
classes=np.unique(labels)
idx=labels==classes[0]
Yt=labels[idx]
Xt=features[idx,:]
On the line:
Xt=features[idx,:]
it says 'too many indices for array'
The shapes of all 3 data sets are:
print np.shape(tdata) = (11303402, 10)
print np.shape(features) = (11303402, 9)
print np.shape(labels) = (11303402, 1)
If anyone knows the problem, please help.