I have incorporated the InfogainLossLayer as suggested by Shai. I've also added another custom layer that builds the infogain matrix H based on the imbalance in the current batch.
Currently, the matrix is configured as follows:
H(i, j) = 0 if i != j
H(i, j) = 1 - f(i) if i == j (with f(i) = the frequency of class i in the batch)
I'm planning on experimenting with different configurations for the matrix in the future.
I have tested this on a 10:1 imbalance. The results have shown that the network is learning useful things now: (results after 30 epochs)
Accuracy is approx. ~70% (down from ~97%);
Precision is approx. ~20% (up from 0%);
Recall is approx. ~60% (up from 0%).
These numbers were reached at around 20 epochs and didn't change significantly after that.
!! The results stated above are merely a proof of concept, they were obtained by training a simple network on a 10:1 imbalanced dataset. !!