Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I'm trying to solve a binary classification problem where 80% of the data belongs to class x and 20% of the data belongs to class y. All my models (AdaBoost, Neural Networks and SVC) just predict all data to be part of class x as this is the highest accuracy they can achieve.

My goal is to achieve a higher precision for all entries of class x and I don't care how many entries are falsely classified to be part of class y.

My idea would be to just put entries in class x when the model is super sure about them and put them in class y otherwise.

How would I achieve this? Is there a way to move the treshold so that only very obvious entries are classified as class x?

I'm using python and sklearn

Sample Code:

adaboost = AdaBoostClassifier(random_state=1)

adaboost.fit(X_train, y_train)

adaboost_prediction = adaboost.predict(X_test)

confusion_matrix(adaboost_prediction,y_test) outputs:

array([[  0,   0],

       [10845, 51591]])

1 Answer

0 votes
by (41.4k points)
edited by

Use AdaBoostClassifier,  with the help of this you can output class probabilities and then threshold them by using predict_proba:

adaboost = AdaBoostClassifier(random_state=1)

adaboost.fit(X_train, y_train)

adaboost_probs = adaboost.predict_proba(X_test) ##using predict_proba instead of predict

threshold = 0.8 # for example    

thresholded_adaboost_prediction = adaboost_probs > threshold

If you want to know more about Machine Learning then watch this video:

If you want to learn data science in-depth then enroll for best data science training.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers

500 comments

108k users

Browse Categories

...