Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

I am new to Statistics.I am trying to select the best features to do classification on my data set and I chose to do so by running SelectKbest from scikitlearn.

Here is my code :

 import sklearn.feature_selection as fs

 kb = fs.SelectKBest(k=10), y)

 names = X.columns.values[kb.get_support()]

 scores = kb.scores_[kb.get_support()]

 names_scores = list(zip(names, scores))

 ns_df = pd.DataFrame(data = names_scores, columns=


 ns_df_sorted = ns_df.sort_values(['F_Scores','Feat_names'], ascending =

  [False, True])


This gives an output like this

  Feat_names   F_Scores

4     go_out  29.870218

8     fun1_2  27.374212

6     fun1_1  26.470766

3       date  25.035227

7    shar1_1  17.629153

2    imprace  11.331197

0      order  11.290014

5    sinc1_1   8.309805

9    shar1_2   5.009775

1   field_cd   4.515538

I am not sure what the F score here signifies and what I can interpret from it.

1 Answer

0 votes
by (41.4k points)

F-test is carried out to assess each feature and F-scores tell how much informative is each feature for our dataset.The F-scores represent the ratio between the explained and the unexplained variance.

You can refer to  the method documentation to know more.


Browse Categories