I am working on a supervised learning task to train a binary classifier.
I have a dataset with a large class imbalance distribution: 8 negative instances every one positive.
I use the f-measure, i.e. the harmonic mean between specificity and sensitivity, to assess the performance of a classifier.
I plot the ROC graphs of several classifiers and all present a great AUC, meaning that the classification is good. However, when I test the classifier and compute the f-measure I get a really low value. I know that this issue is caused by the class skewness of the dataset and, by now, I discover two options to deal with it:
Adopting a cost-sensitive approach by assigning weights to the dataset's instances (see this post)
Thresholding the predicted probabilities returned by the classifiers, to reduce the number of false positives and false negatives.
I went for the first option and that solved my issue (f-measure is satisfactory). BUT, now, my question is: which of these methods is preferable? And what are the differences?