SMOTETomek - how to set ratio as dictionary for fixed balance

Question

asked Jul 15, 2019 in Data Science by sourav (17.6k points)

I've tried to use this technique to correct very imbalanced classes.

My data set has classes e.g.:

In [123]:
data['CON_CHURN_TOTAL'].value_counts()
Out[123]:
0 100
1 10
Name: CON_CHURN_TOTAL, dtype: int64

I wanted to use SMOTETomek to under sample 0-class and over sample 1-class to achieve ratio 80 : 20. However, I cannot find a way to correct the dictionary. Of course in full code the ratio 80:20 will be calculated based on number of rows.

When I am trying:

from imblearn.combine import SMOTETomek
smt = SMOTETomek(ratio={1:20, 0:80})

I have error:

ValueError: With over-sampling methods, the number of samples in a class should be greater or equal to the original number of samples. Originally, there is 100 samples and 80 samples are asked.

But this method should be suitable for doing both under and over sampling at the same time.

Unfortunately the documentary is not working now due to 404 error.

1 Answer

Shlok Pandey · Answer 1 · 2019-07-20T10:18:51+0000

If you want to have an under-sampling, you could pipeline 2 samplers.

Refer to the code below:

from sklearn.datasets import load_breast_cancer
import pandas as pd
from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import NearMiss
data = load_breast_cancer()
X = pd.DataFrame(data=data.data, columns=data.feature_names)
count_class_0 = 300
count_class_1 = 300
pipe = make_pipeline(
SMOTE(sampling_strategy={0: count_class_0}),
NearMiss(sampling_strategy={1: count_class_1}
)
X_smt, y_smt = pipe.fit_resample(X, data.target)

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.

SMOTETomek - how to set ratio as dictionary for fixed balance

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources