Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I am new to Statistics.I am trying to select the best features to do classification on my data set and I chose to do so by running SelectKbest from scikitlearn.

Here is my code :

 import sklearn.feature_selection as fs

 kb = fs.SelectKBest(k=10)

 kb.fit(X, y)

 names = X.columns.values[kb.get_support()]

 scores = kb.scores_[kb.get_support()]

 names_scores = list(zip(names, scores))

 ns_df = pd.DataFrame(data = names_scores, columns=

  ['Feat_names','F_Scores'])

 ns_df_sorted = ns_df.sort_values(['F_Scores','Feat_names'], ascending =

  [False, True])

 print(ns_df_sorted)

This gives an output like this

  Feat_names   F_Scores

4     go_out  29.870218

8     fun1_2  27.374212

6     fun1_1  26.470766

3       date  25.035227

7    shar1_1  17.629153

2    imprace  11.331197

0      order  11.290014

5    sinc1_1   8.309805

9    shar1_2   5.009775

1   field_cd   4.515538

I am not sure what the F score here signifies and what I can interpret from it.

1 Answer

0 votes
by (41.4k points)

F-test is carried out to assess each feature and F-scores tell how much informative is each feature for our dataset.The F-scores represent the ratio between the explained and the unexplained variance.

You can refer to  the method documentation to know more.

 

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...