0 votes
1 view
in Data Science by (17.6k points)

I have a data set and I am trying to get the feature importances using SelectKBest and Chi2, but the SelectKBest is giving the scores of the features as nan.

The data file and code file are present at this link

# Path to the data file

file_path = r"D:\Data_Sets\Mobile_Prices\data.csv"

# Reading the data from the Southern Second Order file, and also passing the column names to south_data data frame

south_data = pd.read_csv(file_path)

# Printing the number of data points and the number of columns of south_data data frame

print("The number of data points in the data  :", south_data.shape[0])

print("The features of the data :", south_data.shape[1])

# Printing the head of south_data data frame

print(south_data.head())

# Check for the nulls

print(south_data.isnull().sum())

# Separate the x and y

x = south_data.drop("tss", axis = 1)

y = south_data["tss"]

# Find the scores of features

bestfit = SelectKBest(score_func=chi2, k=5)

features = bestfit.fit(x,y)

x_new = features.transform(x)

print(features.scores_)

# The output of features.scores_ is displayed as

# array([nan, nan, nan, nan, nan, nan, nan, nan, nan])

1 Answer

0 votes
by (38.5k points)

 The reason for nan values in your scores_ is that the values in your target variable is 1. So, you should verify your target variable.

For example:

>>> from sklearn.datasets import load_digits

import numpy as np

>>> from sklearn.feature_selection import SelectKBest, chi2

>>> X, y = load_digits(return_X_y=True)

>>> X.shape

(1797, 64)

>>> feature_selector = SelectKBest(chi2, k=20)

>>> X_new = feature_selector.fit_transform(X, np.ones(len(X)) )

>>> feature_selector.scores_

array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,

       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,

       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,

       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,

       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]) 

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...