0 votes
1 view
in Data Science by (13.1k points)

I have the following code to test some of most popular ML algorithms of sklearn python library:

import numpy as np

from sklearn                        import metrics, svm

from sklearn.linear_model           import LinearRegression

from sklearn.linear_model           import LogisticRegression

from sklearn.tree                   import DecisionTreeClassifier

from sklearn.neighbors              import KNeighborsClassifier

from sklearn.discriminant_analysis  import LinearDiscriminantAnalysis

from sklearn.naive_bayes            import GaussianNB

from sklearn.svm                    import SVC

trainingData    = np.array([ [2.3, 4.3, 2.5],  [1.3, 5.2, 5.2],  [3.3, 2.9, 0.8],  [3.1, 4.3, 4.0]  ])

trainingScores  = np.array( [3.4, 7.5, 4.5, 1.6] )

predictionData  = np.array([ [2.5, 2.4, 2.7],  [2.7, 3.2, 1.2] ])

clf = LinearRegression()

clf.fit(trainingData, trainingScores)

print("LinearRegression")

print(clf.predict(predictionData))

clf = svm.SVR()

clf.fit(trainingData, trainingScores)

print("SVR")

print(clf.predict(predictionData))

clf = LogisticRegression()

clf.fit(trainingData, trainingScores)

print("LogisticRegression")

print(clf.predict(predictionData))

clf = DecisionTreeClassifier()

clf.fit(trainingData, trainingScores)

print("DecisionTreeClassifier")

print(clf.predict(predictionData))

clf = KNeighborsClassifier()

clf.fit(trainingData, trainingScores)

print("KNeighborsClassifier")

print(clf.predict(predictionData))

clf = LinearDiscriminantAnalysis()

clf.fit(trainingData, trainingScores)

print("LinearDiscriminantAnalysis")

print(clf.predict(predictionData))

clf = GaussianNB()

clf.fit(trainingData, trainingScores)

print("GaussianNB")

print(clf.predict(predictionData))

clf = SVC()

clf.fit(trainingData, trainingScores)

print("SVC")

print(clf.predict(predictionData))

The first two works ok, but I got the following error in LogisticRegression call:

[email protected]:/home/ouhma# python stack.py 

LinearRegression

[ 15.72023529   6.46666667]

SVR

[ 3.95570063  4.23426243]

Traceback (most recent call last):

  File "stack.py", line 28, in <module>

    clf.fit(trainingData, trainingScores)

  File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit

    check_classification_targets(y)

  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets

    raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

The input data is the same as in the previous calls, so what is going on here?

And by the way, why there is a huge diference in the first prediction of LinearRegression() and SVR() algorithms (15.72 vs 3.95)?

1 Answer

0 votes
by (19.9k points)
edited by

You are passing float to a classifier which expects categorical values as the target vector. 

So, convert it to int and then it will be accepted as an input:

from sklearn import preprocessing

from sklearn import utils

lab_enc = preprocessing.LabelEncoder()

encoded = lab_enc.fit_transform(trainingScores)

array([1, 3, 2, 0], dtype=int64)

print(utils.multiclass.type_of_target(trainingScores))

continuous

print(utils.multiclass.type_of_target(trainingScores.astype('int')))

 multiclass

print(utils.multiclass.type_of_target(encoded))

multiclass

Learn more about Logistic Regression by watching this video tutorial:

...