+1 vote
1 view
in Machine Learning by (4.8k points)

I'm using sklearn.pipeline.Pipeline to chain feature extractors and a classifier. Is there a way to combine multiple feature selection classes (for example the ones from sklearn.feature_selection.text) in parallel and join their output?

My code right now looks as follows:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier())])

It results in the following:

vect -> tfidf -> clf

I want to be able to specify a pipeline that looks as follows:

vect1 -> tfidf1 \
                 -> clf
vect2 -> tfidf2 /

1 Answer

+2 votes
by (7.9k points)

This has been implemented recently in the master branch of scikit-learn under the name FeatureUnion:

http://scikit-learn.org/dev/modules/pipeline.html#feature-union

...