Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I'm trying to parse a csv file into a dataFrame as I need to do some analysis on the timestamps. the csv file is well structured, and I can read it without a problem by using pd.read_csv:

import pandas as pd

import datetime as dt

df = pd.read_csv('trip_data.csv', low_memory=False, parse_dates=['datetime'], infer_datetime_format=True)

However, even when giving parse_dates and infer_datetime_format as arguments, I still end up with a dataFrame that doesn't parse the timestamps on my file:

df.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 8771828 entries, 0 to 8771827

Data columns (total 3 columns):

UserID                   int64

datetime                 object

amount                   float64

dtypes: float64(1), int64(1), object(1)

memory usage: 1.1+ GB

So when I try to get the minimum date, e.g.:

print(df['datetime'].min())

I get an incorrect answer, as I can see that the minimum timestamp on my df is 2018-01-01 00:08:26 and I get 2018-01-27 04:06:37 as minimum... am I missing anything, or is there any way to cast this to datetime64 in another way?

Here's a peak of my csv file:

UserID,datetime,amount

1,2018-01-01 00:21:05,5.8

1,2018-01-01 00:44:55,15.3

1,2018-01-01 00:08:26,8.3

1,2018-01-01 00:20:22,34.8

1,2018-01-01 00:09:18,16.55

1,2018-01-01 00:29:29,5.8

1,2018-01-01 00:38:08,12.35

1,2018-01-01 00:49:29,6.3

1 Answer

0 votes
by (25.1k points)

pandas has a method called to_datetime, which you can use to convert to datetime. and then use the min() function on the dataframe to get theoutput. Like this:

df['datetime'] = pd.to_datetime(df['datetime'])

print(df['datetime'].min())

Browse Categories

...