Explore Courses Blog Tutorials Interview Questions
0 votes
in Python by (47.6k points)

I load some machine learning data from a CSV file. The first 2 columns are observations and the remaining columns are features.

Currently, I do the following:

data = pandas.read_csv('mydata.csv')

which gives something like:

data = pandas.DataFrame(np.random.rand(10,5), columns = list('abcde'))

I'd like to slice this dataframe in two data frames: one containing the columns a and b and one containing the columns c, d and e.

It is not possible to write something like

observations = data[:'c'] 

features = data['c':]

I'm not sure what the best method is. Do I need a pd.Panel?

By the way, I find dataframe indexing pretty inconsistent: data['a'] is permitted, but data[0] is not. On the other side, data['a':] is not permitted but data[0:] is. Is there a practical reason for this? This is really confusing if columns are indexed by Int, given that data[0] != data[0:1]

1 Answer

0 votes
by (106k points)

To take column-slices of dataframe in pandas .loc uses label based indexing to select both rows and columns. The labels being the values of the index or the columns. Slicing with .loc includes the last element.

Browse Categories