How can I remove extra whitespace from strings when parsing a csv file in Pandas?

Question

asked Sep 21, 2019 in Data Science by sourav (17.6k points)

I have the following file named 'data.csv':

1997,Ford,E350
1997, Ford , E350
1997,Ford,E350,"Super, luxurious truck"
1997,Ford,E350,"Super ""luxurious"" truck"
1997,Ford,E350," Super luxurious truck "
"1997",Ford,E350
1997,Ford,E350
2000,Mercury,Cougar

And I would like to parse it into a pandas DataFrame so that the DataFrame looks as follows:

Year Make Model Description
0 1997 Ford E350 None
1 1997 Ford E350 None
2 1997 Ford E350 Super, luxurious truck
3 1997 Ford E350 Super "luxurious" truck
4 1997 Ford E350 Super luxurious truck
5 1997 Ford E350 None
6 1997 Ford E350 None
7 2000 Mercury Cougar None

The best I could do was:

pd.read_table("data.csv", sep=r',', names=["Year", "Make", "Model", "Description"])

Which gets me:

Year Make Model Description
0 1997 Ford E350 None
1 1997 Ford E350 None
2 1997 Ford E350 Super, luxurious truck
3 1997 Ford E350 Super "luxurious" truck
4 1997 Ford E350 Super luxurious truck
5 1997 Ford E350 None
6 1997 Ford E350 None
7 2000 Mercury Cougar None

How can I get the DataFrame without those whitespaces?

1 Answer

Shlok Pandey · Answer 1 · 2019-09-21T09:53:56+0000

Use converters:

import pandas as pd
def strip(text):
try:
return text.strip()
except AttributeError:
return text
def make_int(text):
return int(text.strip('" '))
table = pd.read_table("data.csv", sep=r',',
names=["Year", "Make", "Model", "Description"],
converters = {'Description' : strip,
'Model' : strip,
'Make' : strip,
'Year' : make_int})
print(table)

Output:

Year Make Model Description
0 1997 Ford E350 None
1 1997 Ford E350 None
2 1997 Ford E350 Super, luxurious truck
3 1997 Ford E350 Super "luxurious" truck
4 1997 Ford E350 Super luxurious truck
5 1997 Ford E350 None
6 1997 Ford E350 None
7 2000 Mercury Cougar None

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

How can I remove extra whitespace from strings when parsing a csv file in Pandas?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources