Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I am new to Statistics.I am trying to select the best features to do classification on my data set and I chose to do so by running SelectKbest from scikitlearn.

Here is my code :

 import sklearn.feature_selection as fs

 kb = fs.SelectKBest(k=10)

 kb.fit(X, y)

 names = X.columns.values[kb.get_support()]

 scores = kb.scores_[kb.get_support()]

 names_scores = list(zip(names, scores))

 ns_df = pd.DataFrame(data = names_scores, columns=

  ['Feat_names','F_Scores'])

 ns_df_sorted = ns_df.sort_values(['F_Scores','Feat_names'], ascending =

  [False, True])

 print(ns_df_sorted)

This gives an output like this

  Feat_names   F_Scores

4     go_out  29.870218

8     fun1_2  27.374212

6     fun1_1  26.470766

3       date  25.035227

7    shar1_1  17.629153

2    imprace  11.331197

0      order  11.290014

5    sinc1_1   8.309805

9    shar1_2   5.009775

1   field_cd   4.515538

I am not sure what the F score here signifies and what I can interpret from it.

1 Answer

0 votes
by (41.4k points)

F-test is carried out to assess each feature and F-scores tell how much informative is each feature for our dataset.The F-scores represent the ratio between the explained and the unexplained variance.

You can refer to  the method documentation to know more.

 

Browse Categories

...