I'm trying to parse a csv file into a dataFrame as I need to do some analysis on the timestamps. the csv file is well structured, and I can read it without a problem by using pd.read_csv:
import pandas as pd
import datetime as dt
df = pd.read_csv('trip_data.csv', low_memory=False, parse_dates=['datetime'], infer_datetime_format=True)
However, even when giving parse_dates and infer_datetime_format as arguments, I still end up with a dataFrame that doesn't parse the timestamps on my file:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8771828 entries, 0 to 8771827
Data columns (total 3 columns):
UserID int64
datetime object
amount float64
dtypes: float64(1), int64(1), object(1)
memory usage: 1.1+ GB
So when I try to get the minimum date, e.g.:
print(df['datetime'].min())
I get an incorrect answer, as I can see that the minimum timestamp on my df is 2018-01-01 00:08:26 and I get 2018-01-27 04:06:37 as minimum... am I missing anything, or is there any way to cast this to datetime64 in another way?
Here's a peak of my csv file:
UserID,datetime,amount
1,2018-01-01 00:21:05,5.8
1,2018-01-01 00:44:55,15.3
1,2018-01-01 00:08:26,8.3
1,2018-01-01 00:20:22,34.8
1,2018-01-01 00:09:18,16.55
1,2018-01-01 00:29:29,5.8
1,2018-01-01 00:38:08,12.35
1,2018-01-01 00:49:29,6.3