0 votes
1 view
in Data Science by (17.6k points)

I have the following code to test some of most popular ML algorithms of sklearn python library:

import numpy as np

from sklearn                        import metrics, svm

from sklearn.linear_model           import LinearRegression

from sklearn.linear_model           import LogisticRegression

from sklearn.tree                   import DecisionTreeClassifier

from sklearn.neighbors              import KNeighborsClassifier

from sklearn.discriminant_analysis  import LinearDiscriminantAnalysis

from sklearn.naive_bayes            import GaussianNB

from sklearn.svm                    import SVC

trainingData    = np.array([ [2.3, 4.3, 2.5],  [1.3, 5.2, 5.2],  [3.3, 2.9, 0.8],  [3.1, 4.3, 4.0]  ])

trainingScores  = np.array( [3.4, 7.5, 4.5, 1.6] )

predictionData  = np.array([ [2.5, 2.4, 2.7],  [2.7, 3.2, 1.2] ])

clf = LinearRegression()

clf.fit(trainingData, trainingScores)

print("LinearRegression")

print(clf.predict(predictionData))

clf = svm.SVR()

clf.fit(trainingData, trainingScores)

print("SVR")

print(clf.predict(predictionData))

clf = LogisticRegression()

clf.fit(trainingData, trainingScores)

print("LogisticRegression")

print(clf.predict(predictionData))

clf = DecisionTreeClassifier()

clf.fit(trainingData, trainingScores)

print("DecisionTreeClassifier")

print(clf.predict(predictionData))

clf = KNeighborsClassifier()

clf.fit(trainingData, trainingScores)

print("KNeighborsClassifier")

print(clf.predict(predictionData))

clf = LinearDiscriminantAnalysis()

clf.fit(trainingData, trainingScores)

print("LinearDiscriminantAnalysis")

print(clf.predict(predictionData))

clf = GaussianNB()

clf.fit(trainingData, trainingScores)

print("GaussianNB")

print(clf.predict(predictionData))

clf = SVC()

clf.fit(trainingData, trainingScores)

print("SVC")

print(clf.predict(predictionData))

The first two works ok, but I got the following error in LogisticRegression call:

[email protected]:/home/ouhma# python stack.py 

LinearRegression

[ 15.72023529   6.46666667]

SVR

[ 3.95570063  4.23426243]

Traceback (most recent call last):

  File "stack.py", line 28, in <module>

    clf.fit(trainingData, trainingScores)

  File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit

    check_classification_targets(y)

  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets

    raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

The input data is the same as in the previous calls, so what is going on here?

And by the way, why there is a huge diference in the first prediction of LinearRegression() and SVR() algorithms (15.72 vs 3.95)?

1 Answer

0 votes
by (32.5k points)
edited by

You are passing float to a classifier which expects categorical values as the target vector. 

So, convert it to int and then it will be accepted as an input:

from sklearn import preprocessing

from sklearn import utils

lab_enc = preprocessing.LabelEncoder()

encoded = lab_enc.fit_transform(trainingScores)

array([1, 3, 2, 0], dtype=int64)

print(utils.multiclass.type_of_target(trainingScores))

continuous

print(utils.multiclass.type_of_target(trainingScores.astype('int')))

 multiclass

print(utils.multiclass.type_of_target(encoded))

multiclass

Learn more about Logistic Regression by watching this video tutorial:

If you wish to learn more about Data Science, visit Data Science tutorial and Data Science online courses by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...