Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

Applying pandas.to_numeric to a dataframe column which contains strings that represent numbers (and possibly other unparsable strings) results in an error message like this:

ValueError                                Traceback (most recent call last)

<ipython-input-66-07383316d7b6> in <module>()

      1 for column in shouldBeNumericColumns:

----> 2     trainData[column] = pandas.to_numeric(trainData[column])

/usr/local/lib/python3.5/site-packages/pandas/tools/util.py in to_numeric(arg, errors)

    113         try:

    114             values = lib.maybe_convert_numeric(values, set(),

--> 115                                                coerce_numeric=coerce_numeric)

    116         except:

    117             if errors == 'raise':

pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:53558)()

pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:53344)()

ValueError: Unable to parse string

Wouldn't it be helpful to see which value failed to parse?

1 Answer

0 votes
by (33.1k points)

You can use pandas.to_numeric

It will convert passed values to numbers. You can add parameter errors='coerce' to convert bad non-numeric values to NaN, then check these values by isnull and use boolean indexing:

For example:

print (df[pd.to_numeric(df.col, errors='coerce').isnull()])

Sample:

df = pd.DataFrame({'B':['a','7','8'],

                   'C':[7,8,9]})

print (df)

   B  C

0  a 7

1  7 8

2  8 9

print (df[pd.to_numeric(df.B, errors='coerce').isnull()])

   B  C

0  a 7

Or if need finds all string in the mixed column - numeric with string values check the type of values if is a string:

df = pd.DataFrame({'B':['a',7, 8],

                   'C':[7,8,9]})

print (df)

   B  C

0  a 7

1  7 8

2  8 9

print (df[df.B.apply(lambda x: isinstance(x, str))])

   B  C

0  a 7

Hope this answer helps.

For more visit Pandas Tutorial.

Browse Categories

...