datetime dtypes in pandas read_csv

Question

asked Sep 10, 2019 in Data Science by ashely (50.2k points)

I'm reading in a csv file with multiple datetime columns. I'd need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = ['datetime', 'datetime', 'str', 'float']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

When run gives an error:

TypeError: data type "datetime" not understood

Converting columns after the fact, via pandas.to_datetime() isn't an option I can't know which columns will be DateTime objects. That information can change and comes from whatever informs my dtypes list.

Alternatively, I've tried to load the CSV file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!

1 Answer

vinita · Answer 1 · 2019-09-10T10:10:06+0000

The pandas.read_csv() function has a keyword parameter called parse_dates

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

By the above code pandas will read col1 and col2 as strings, which they most likely are ("2016-05-05" etc.) and after reading the string, the date_parser for each column will act as a string and returns whatever that function returns.

datetime dtypes in pandas read_csv

1 Answer

Related questions

Browse Categories