When we need to check or visualize the performance of the multi-class classification problem, we use the AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve.
AUC is not always area under the curve of an ROC curve. Area Under the Curve is an (abstract) area under some curve, so it is a more general thing than AUROC. With imbalanced classes, it may be better to find AUC for a precision-recall curve.
See sklearn source for roc_auc_score:
def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
def _binary_roc_auc_score(y_true, y_score, sample_weight=None):
fpr, tpr, tresholds = roc_curve(y_true, y_score,
sample_weight=sample_weight)
return auc(fpr, tpr, reorder=True)
return _average_binary_score(
_binary_roc_auc_score, y_true, y_score, average,
sample_weight=sample_weight)
In the above code, this first get a roc curve and then calls auc() to get the area.
I guess your problem is the predict_proba() call. For a normal predict() the outputs are always the same:
For example:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, roc_auc_score
est = LogisticRegression(class_weight='auto')
X = np.random.rand(10, 2)
y = np.random.randint(2, size=10)
est.fit(X, y)
false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict(X))
print auc(false_positive_rate, true_positive_rate)
# 0.857142857143
print roc_auc_score(y, est.predict(X))
# 0.857142857143
If you change the above for this, you'll sometimes get different outputs:
false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict_proba(X)[:,1])
# may differ
print auc(false_positive_rate, true_positive_rate)
print roc_auc_score(y, est.predict(X))
Hope this answer helps.