Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have a data frame with alpha-numeric keys which I want to save as a csv and read back later. For various reasons I need to explicitly read this key column as a string format, I have keys which are strictly numeric or even worse, things like: 1234E5 which Pandas interprets as a float. This obviously makes the key completely useless.

The problem is when I specify a string dtype for the data frame or any column of it I just get garbage back. I have some example code here:

df = pd.DataFrame(np.random.rand(2,2),

                  index=['1A', '1B'],

                  columns=['A', 'B'])

df.to_csv(savefile)

The data frame looks like:

           A         B

1A  0.209059  0.275554

1B  0.742666  0.721165

Then I read it like so:

df_read = pd.read_csv(savefile, dtype=str, index_col=0)

and the result is:

   A  B

B  (  <

Is this a problem with my computer, or something I'm doing wrong here, or just a bug?

1 Answer

0 votes
by (41.4k points)

Use a converter that applies to any column if you don't know the columns before hand:

import pandas as pd

class StringConverter(dict):

    def __contains__(self, item):

        return True

    def __getitem__(self, item):

        return str

    def get(self, default=None):

        return str

pd.read_csv(file_or_buffer, converters=StringConverter())

Browse Categories

...