0 votes
1 view
in Machine Learning by (19k points)

I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model

from sklearn.naive_bayes import MultinomialNB

nb = MultinomialNB()

# fit a Multinomial Naive Bayes model

nb.fit(X_train_dtm, y_train)

# make class predictions

y_pred_class = nb.predict(X_test_dtm)

# generate classification report

from sklearn import metrics

print(metrics.classification_report(y_test, y_pred_class))

And a simplified output of the metrics.classification_report on command line screen looks like this:

             precision  recall   f1-score   support

     12       0.84      0.48      0.61      2843

     13       0.00      0.00      0.00        69

     15       1.00      0.19      0.32       232

      8       0.50      0.56      0.53      7555      

  avg/total 0.59      0.48      0.45     35919

I was wondering if there was any way to get the report output into a standard csv file with regular column headers

When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column.

1 Answer

0 votes
by (33.2k points)

As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:

report = classification_report(y_test, y_pred, output_dict=True)

and then construct a Dataframe and transpose it:

df = pandas.DataFrame(report).transpose()

From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).

Hope this answer helps you!

Welcome to Intellipaat Community. Get your technical queries answered by top developers !