0 votes
1 view
in Data Science by (30.6k points)

I have data with a time-stamp in UTC. I'd like to convert the timezone of this timestamp to 'US/Pacific' and add it as a hierarchical index to a pandas DataFrame. I've been able to convert the timestamp as an Index, but it loses the timezone formatting when I try to add it back into the DataFrame, either as a column or as an index.

 

>>> import pandas as pd

>>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)})

>>> dat.dtypes

#datetime    object

#label       object

#value        int64

#dtype: object

Now if I try to convert the Series directly I run into an error.

 

>>> times = pd.to_datetime(dat['datetime'])

>>> times.tz_localize('UTC')

#Traceback (most recent call last):

#  File "<stdin>", line 1, in <module>

#  File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize

#    raise Exception('Cannot tz-localize non-time series')

#Exception: Cannot tz-localize non-time series

If I convert it to an Index then I can manipulate it as a timeseries. Notice that the index now has the Pacific timezone.

 

>>> times_index = pd.Index(times)

>>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific')

>>> times_index_pacific

#<class 'pandas.tseries.index.DatetimeIndex'>

#[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00]

#Length: 6, Freq: None, Timezone: US/Pacific

However, now I run into problems adding the index back to the dataframe as it loses its timezone formatting:

 

>>> dat_index = dat.set_index([dat['label'], times_index_pacific])

>>> dat_index

#                                      datetime label value

#label                                                      

#a     2011-07-19 07:00:00  2011-07-19 07:00:00   a 0

#      2011-07-19 08:00:00  2011-07-19 08:00:00   a 1

#      2011-07-19 09:00:00  2011-07-19 09:00:00   a 2

#b     2011-07-19 07:00:00  2011-07-19 07:00:00   b 3

#      2011-07-19 08:00:00  2011-07-19 08:00:00   b 4

#      2011-07-19 09:00:00  2011-07-19 09:00:00   b 5

You'll notice the index is back on the UTC timezone instead of the converted Pacific timezone.

How can I change the timezone and add it as an index to a DataFrame?

1 Answer

0 votes
by (67.8k points)

You'll have to call it twice for the given example,i.e., once for each level:

dataframe.tz_localize('UTC', level=0)

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...