+9 votes
1 view
in Python by (42k points)

I have pandas data frame with some categorical predictors (i.e. variables) as 0 & 1 and some numeric variables. When I fit that to a stasmodel like:

est = sm.OLS(y, X).fit()

It throws:

Pandas datacast to numpy dtype of object. Check input data with np.asarray(data). 

I converted all the dtypes of the DataFrame using 

df.convert_objects(convert_numeric=True)

After this, all dtypes of data frame variables appear as int32 or int64. But in the end it still shows dtype: object, like this:

4516        int32

4523        int32

4525        int32

4531        int32

4533        int32

4542        int32

4562        int32

sex         int64

race        int64

dispstd     int64

age_days    int64

dtype: object

Here 4516, 4523 are variable labels.

Any idea? I need to build a multi-regression model on more than hundreds of variables. For that, I have concatenated 3 pandas DataFrames to come up with final DataFrame to be used in model building.

1 Answer

+12 votes
by (90.3k points)
edited by

If X is your data frame, then try using the .astype method to convert to float when running the model:

est = sm.OLS(y, X.astype(float)).fit()

All categorical variables should be converted into dummy variables before sticking them in the model.

To know more about this you can have a look at the following video:-

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...