Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Python by (12.7k points)

I saw that with sklearn we can utilize some pre-defined datasets, for instance mydataset = datasets.load_digits() we will get an array of the dataset mydataset.data and an array of the comparing marks mydataset.target. Anyway, I need to stack my own dataset to have the option to utilize it with sklearn. How and in which organization should I load my information? My document has the accompanying configuration.

-0.2080,0.3480,0.3280,0.5040,0.9320,1.0000,label1

-0.2864,0.1992,0.2822,0.4398,0.7012,0.7800,label3

...

...

-0.2348,0.3826,0.6142,0.7492,0.0546,-0.4020,label2

-0.1856,0.3592,0.7126,0.7366,0.3414,0.1018,label1

1 Answer

0 votes
by (26.4k points)

You can utilize NumPy's genfromtxt capacity (function) to recover information from the file 

import numpy as np

mydata = np.genfromtxt(filename, delimiter=",")

Notwithstanding, in the event that you have printed sections, utilizing genfromtxt is trickier, since you need to determine the data types. 

It will be a lot simpler with the superb Pandas library

import pandas as pd

mydata = pd.read_csv(filename)

target = mydata["Label"]  #provided your csv has header row, and the label column is named "Label"

#select all but the last column as data

data = mydata.ix[:,:-1]

Looking for a good python tutorial course? Join the python certification course and get certified.

Browse Categories

...