0 votes
1 view
in Data Science by (14.8k points)

I would like to make supervised learning.

Until now I know to do supervised learning to all features.

However, I would like also to conduct experiment with the K best features.

I read the documentation and found the in Scikit learn there is SelectKBest method.

Unfortunately, I am not sure how to create new dataframe after finding those best features:

Let's assume I would like to conduct experiment with 5 best features:

from sklearn.feature_selection import SelectKBest, f_classif

select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)

Now if I would add the next line:

dataframe = pd.DataFrame(select_k_best_classifier)

I will receive a new dataframe without feature names (only index starting from 0 to 4).

I should replace it to:

dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)

My question is how to create the features_names list??

I know that I should use: select_k_best_classifier.get_support()

Which returns array of boolean values.

The true value in the array represent the index in the right column.

How should I use this boolean array with the array of all features names I can get via the method:

feature_names = list(features_dataframe.columns.values)

1 Answer

0 votes
by (23.5k points)

Use the following code:

mask = select_k_best_classifier.get_support() #list of booleans

new_features = [] # The list of your K best features

for bool, feature in zip(mask, feature_names):

    if bool:

        new_features.append(feature)

After that, change the name of your features:

dataframe = pd.DataFrame(fit_transofrmed_features, columns=new_features)

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...