Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

I have data with a time-stamp in UTC. I'd like to convert the timezone of this timestamp to 'US/Pacific' and add it as a hierarchical index to a pandas DataFrame. I've been able to convert the timestamp as an Index, but it loses the timezone formatting when I try to add it back into the DataFrame, either as a column or as an index.

 

>>> import pandas as pd

>>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)})

>>> dat.dtypes

#datetime    object

#label       object

#value        int64

#dtype: object

Now if I try to convert the Series directly I run into an error.

 

>>> times = pd.to_datetime(dat['datetime'])

>>> times.tz_localize('UTC')

#Traceback (most recent call last):

#  File "<stdin>", line 1, in <module>

#  File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize

#    raise Exception('Cannot tz-localize non-time series')

#Exception: Cannot tz-localize non-time series

If I convert it to an Index then I can manipulate it as a timeseries. Notice that the index now has the Pacific timezone.

 

>>> times_index = pd.Index(times)

>>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific')

>>> times_index_pacific

#<class 'pandas.tseries.index.DatetimeIndex'>

#[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00]

#Length: 6, Freq: None, Timezone: US/Pacific

However, now I run into problems adding the index back to the dataframe as it loses its timezone formatting:

 

>>> dat_index = dat.set_index([dat['label'], times_index_pacific])

>>> dat_index

#                                      datetime label value

#label                                                      

#a     2011-07-19 07:00:00  2011-07-19 07:00:00   a 0

#      2011-07-19 08:00:00  2011-07-19 08:00:00   a 1

#      2011-07-19 09:00:00  2011-07-19 09:00:00   a 2

#b     2011-07-19 07:00:00  2011-07-19 07:00:00   b 3

#      2011-07-19 08:00:00  2011-07-19 08:00:00   b 4

#      2011-07-19 09:00:00  2011-07-19 09:00:00   b 5

You'll notice the index is back on the UTC timezone instead of the converted Pacific timezone.

How can I change the timezone and add it as an index to a DataFrame?

1 Answer

0 votes
by (108k points)

You'll have to call it twice for the given example,i.e., once for each level:

dataframe.tz_localize('UTC', level=0)

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

Browse Categories

...