Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

I try to run following code. Btw, I am new to both python and sklearn.

import pandas as pd

import numpy as np

from sklearn.linear_model import LogisticRegression

# data import and preparation

trainData = pd.read_csv('train.csv')

train = trainData.values

testData = pd.read_csv('test.csv')

test = testData.values

X = np.c_[train[:, 0], train[:, 2], train[:, 6:7],  train[:, 9]]

X = np.nan_to_num(X)

y = train[:, 1]

Xtest = np.c_[test[:, 0:1], test[:, 5:6],  test[:, 8]]

Xtest = np.nan_to_num(Xtest)

# model

lr = LogisticRegression(), y)

where y is a np.ndarray of 0's and 1's

I receive the following:

File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\", line >1174, in fit check_classification_targets(y)

File "C:\Anaconda3\lib\site-packages\sklearn\utils\", line 172, >in check_classification_targets raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'unknown'

from sklearn documentation:

y : array-like, shape (n_samples,) Target values (class labels in classification, real numbers in regression)

What is my error?


y is array([0.0, 1.0, 1.0, ..., 0.0, 1.0, 0.0], dtype=object) size is (891,)

1 Answer

0 votes
by (41.4k points)

Your y is of type object, so sklearn cannot recognize its type. Add the line y=y.astype('int') right after the line y = train[:, 1].

Browse Categories