Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (50.2k points)

I'm reading in a csv file with multiple datetime columns. I'd need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:

 headers = ['col1', 'col2', 'col3', 'col4']

dtypes = ['datetime', 'datetime', 'str', 'float']

pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

When run gives an error:

TypeError: data type "datetime" not understood

Converting columns after the fact, via pandas.to_datetime() isn't an option I can't know which columns will be DateTime objects. That information can change and comes from whatever informs my dtypes list.

Alternatively, I've tried to load the CSV file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!

1 Answer

0 votes
by (107k points)

The pandas.read_csv() function has a keyword parameter called parse_dates

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)

headers = ['col1', 'col2', 'col3', 'col4']

dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}

parse_dates = ['col1', 'col2']

pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

By the above code pandas will read col1 and col2 as strings, which they most likely are ("2016-05-05" etc.) and after reading the string, the date_parser for each column will act as a string and returns whatever that function returns.

1.2k questions

2.7k answers

501 comments

693 users

Browse Categories

...