I am new to Statistics.I am trying to select the best features to do classification on my data set and I chose to do so by running SelectKbest from scikitlearn.
Here is my code :
import sklearn.feature_selection as fs
kb = fs.SelectKBest(k=10)
kb.fit(X, y)
names = X.columns.values[kb.get_support()]
scores = kb.scores_[kb.get_support()]
names_scores = list(zip(names, scores))
ns_df = pd.DataFrame(data = names_scores, columns=
['Feat_names','F_Scores'])
ns_df_sorted = ns_df.sort_values(['F_Scores','Feat_names'], ascending =
[False, True])
print(ns_df_sorted)
This gives an output like this
Feat_names F_Scores
4 go_out 29.870218
8 fun1_2 27.374212
6 fun1_1 26.470766
3 date 25.035227
7 shar1_1 17.629153
2 imprace 11.331197
0 order 11.290014
5 sinc1_1 8.309805
9 shar1_2 5.009775
1 field_cd 4.515538
I am not sure what the F score here signifies and what I can interpret from it.