I've tried to use this technique to correct very imbalanced classes.
My data set has classes e.g.:
In [123]:
data['CON_CHURN_TOTAL'].value_counts()
Out[123]:
0 100
1 10
Name: CON_CHURN_TOTAL, dtype: int64
I wanted to use SMOTETomek to under sample 0-class and over sample 1-class to achieve ratio 80 : 20. However, I cannot find a way to correct the dictionary. Of course in full code the ratio 80:20 will be calculated based on number of rows.
When I am trying:
from imblearn.combine import SMOTETomek
smt = SMOTETomek(ratio={1:20, 0:80})
I have error:
ValueError: With over-sampling methods, the number of samples in a class should be greater or equal to the original number of samples. Originally, there is 100 samples and 80 samples are asked.
But this method should be suitable for doing both under and over sampling at the same time.
Unfortunately the documentary is not working now due to 404 error.