Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model

from sklearn.naive_bayes import MultinomialNB

nb = MultinomialNB()

# fit a Multinomial Naive Bayes model, y_train)

# make class predictions

y_pred_class = nb.predict(X_test_dtm)

# generate classification report

from sklearn import metrics

print(metrics.classification_report(y_test, y_pred_class))

And a simplified output of the metrics.classification_report on command line screen looks like this:

             precision  recall   f1-score   support

     12       0.84      0.48      0.61      2843

     13       0.00      0.00      0.00        69

     15       1.00      0.19      0.32       232

      8       0.50      0.56      0.53      7555      

  avg/total 0.59      0.48      0.45     35919

I was wondering if there was any way to get the report output into a standard csv file with regular column headers

When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column.

1 Answer

0 votes
by (33.1k points)

As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:

report = classification_report(y_test, y_pred, output_dict=True)

and then construct a Dataframe and transpose it:

df = pandas.DataFrame(report).transpose()

From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).

Hope this answer helps you!

Browse Categories