0 votes
1 view
in Data Science by (17.6k points)

I'm trying to solve a binary classification problem where 80% of the data belongs to class x and 20% of the data belongs to class y. All my models (AdaBoost, Neural Networks and SVC) just predict all data to be part of class x as this is the highest accuracy they can achieve.

My goal is to achieve a higher precision for all entries of class x and I don't care how many entries are falsely classified to be part of class y.

My idea would be to just put entries in class x when the model is super sure about them and put them in class y otherwise.

How would I achieve this? Is there a way to move the treshold so that only very obvious entries are classified as class x?

I'm using python and sklearn

Sample Code:

adaboost = AdaBoostClassifier(random_state=1)

adaboost.fit(X_train, y_train)

adaboost_prediction = adaboost.predict(X_test)

confusion_matrix(adaboost_prediction,y_test) outputs:

array([[  0,   0],

       [10845, 51591]])

1 Answer

0 votes
by (38.4k points)
edited by

Use AdaBoostClassifier,  with the help of this you can output class probabilities and then threshold them by using predict_proba:

adaboost = AdaBoostClassifier(random_state=1)

adaboost.fit(X_train, y_train)

adaboost_probs = adaboost.predict_proba(X_test) ##using predict_proba instead of predict

threshold = 0.8 # for example    

thresholded_adaboost_prediction = adaboost_probs > threshold

If you want to know more about Machine Learning then watch this video:

If you want to learn data science in-depth then enroll for best data science training.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !