Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

I am fairly new to Pandas and I am working on project where I have a column that looks like the following:






I am trying to get the cost factor out of it where the cost could be anything above 7000. First, this column is an object. Thus, I know that I probably cannot do a comparison with it to a number. My code, that I have looks like the following:

import pandas as pd 

health_data = pd.read_csv("inpatientCharges.csv")

state = input("What is your state: ")

issue = input("What is your issue: ")

#This line of code will create a new dataframe based on the two letter state code

state_data = health_data[(health_data.ProviderState == state)]

#With the new data set I search it for the injury the person has.


#I then make it replace the $ sign with a '' so I have a number. I also believe at this point my code may be starting to break down. 

issue_data = issue_data['AverageTotalPayments'].str.replace('$', '')

#Since the previous line took out the $ I convert it from an object to a float

issue_data = issue_data[['AverageTotalPayments']].astype(float)

#I attempt to print out the values. 

cost = issue_data[(issue_data.AverageTotalPayments >= 10000)]


When I run this code I simply get nan back. Not exactly what I want. Any help with what is wrong would be great! Thank you in advance.

1 Answer

0 votes
by (41.4k points)

Try this solution:

In [126]: df



0             $7064.38

1             $7455.75

2             $6921.90

3                  aaa

In [127]: df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000


0     True

1     True

2    False

3    False

Name: AverageTotalPayments, dtype: bool

In [128]: df[df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000]



0             $7064.38

1             $7455.75

Browse Categories