Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

I have used the

sklearn.preprocessing.OneHotEncoder

to transform some data the output is scipy.sparse.csr.csr_matrix how can I merge it back into my original dataframe along with the other columns?

I tried to use pd.concat but I get

TypeError: cannot concatenate a non-NDFrame object

Thanks

1 Answer

0 votes
by (33.1k points)

Here, A is csr_matrix, you can use .toarray() or .todense() method that produces a numpy matrix, which also works for the DataFrame constructor.

For example:

df = pd.DataFrame(A.toarray())

#You can also use this with pd.concat().

A = csr_matrix([[1, 0, 2], [0, 3, 0]])

  (0, 0)    1

  (0, 2)    2

  (1, 1)    3

<class 'scipy.sparse.csr.csr_matrix'>

pd.DataFrame(A.todense())

   0  1 2

0  1 0  2

1  0 3  0

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 2 entries, 0 to 1

Data columns (total 3 columns):

0    2 non-null int64

1    2 non-null int64

2    2 non-null int64

In pandas version 0.20, sparse data structures are introduced, including the SparseDataFrame.

You can also pass sparse matrices to sklearn to avoid running out of memory when converting back to pandas. You need to convert your data into the sparse format by passing a numpy array to the scipy.sparse.csr_matrix constructor and use scipy.sparse.hstack to combine.

Hope this answer helps.

31k questions

32.9k answers

507 comments

693 users

...