0 votes
1 view
in Data Science by (17.6k points)

I'm using Python and I need to split my .csv imported data in two parts, a training and test set, E.G 70% training and 30% test.

I keep getting various errors, such as 'list' object is not callable and so on.

Is there any easy way of doing this?

Thanks

EDIT:

The code is basic, I'm just looking to split the dataset.

from csv import reader with open('C:/Dataset.csv', 'r') as f: data = list(reader(f))

#Imports the CSV data[0:1] ( data ) 

TypeError: 'list' object is not callable

1 Answer

0 votes
by (38.5k points)

You can use df.sample because it is more random and it's a better practice to use it.

from numpy.random import RandomState

import pandas as pd

df = pd.read_csv('C:/Dataset.csv')

rng = RandomState()

train = df.sample(frac=0.7, random_state=rng)

test = df.loc[~df.index.isin(train.index)]

Next,you can also use pandas as depicted in the below code:

import pandas as pd

import numpy as np

df = pd.read_csv('C:/Dataset.csv')

df['split'] = np.random.randn(df.shape[0], 1)

msk = np.random.rand(len(df)) <= 0.7

train = df[msk]

test = df[~msk]

Finally,the error- 'list' object is not callable,it is because you defined list as a variable before, so it would be a list, not the function anymore, that's why you shouldn't name variables functions. 

If you want to be build successful data science career then enroll for best data science certification.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...