Combing probabilities/scores arbitrarily is very problematic, in that the performance of your different classifiers can be different, (For example, an SVM with 2 different kernels , + a Random forest + another classifier trained on a different training set). Thus, Scikit Learn Cheat Sheet is one of the most important aspect as far as
One potential methodology to "weigh" the various classifiers, might be to use their Jaccard score as a "weight". (But be warned, as I know it, the various scores don't seem to be "all created equal", i do know that a Gradient Boosting classifier I even have in my ensemble offers all its scores as 0.97, 0.98, 1.00 or 0.41/0 . I.E. it's very overconfident..). Thus, to know more study Gradient Boosting, but for now, see the following example.
class EnsembleClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, classifiers=None):
self.classifiers = classifiers
def fit(self, X, y):
for classifier in self.classifiers:
def predict_proba(self, X):
self.predictions_ = list()
for classifier in self.classifiers: self.predictions_.append(classifier.predict_proba(X))
return np.mean(self.predictions_, axis=0)
And also have a look on the link which provides a detail view on the sklearn.ensemble.VotingClassifier?