0 votes
1 view
in Data Science by (10.7k points)

I have the following data frame in IPython, where each row is a single stock:

In [261]: bdata

Out[261]:

<class 'pandas.core.frame.DataFrame'>

Int64Index: 21210 entries, 0 to 21209

Data columns:

BloombergTicker      21206  non-null values

Company              21210  non-null values

Country              21210  non-null values

MarketCap            21210  non-null values

PriceReturn          21210  non-null values

SEDOL                21210  non-null values

yearmonth            21210  non-null values

dtypes: float64(2), int64(1), object(4)

I want to apply a groupby operation that computes cap-weighted average return across everything, per each date in the "yearmonth" column.

This works as expected:

In [262]: bdata.groupby("yearmonth").apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

Out[262]:

yearmonth

201204      -0.109444

201205      -0.290546

But then I want to sort of "broadcast" these values back to the indices in the original data frame, and save them as constant columns where the dates match.

In [263]: dateGrps = bdata.groupby("yearmonth")

In [264]: dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

/mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/<ipython-input-264-4a68c8782426> in <module>()

----> 1 dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

TypeError: 'DataFrameGroupBy' object does not support item assignment

I realize this naive assignment should not work. But what is the "right" Pandas idiom for assigning the result of a groupby operation into a new column on the parent dataframe?

In the end, I want a column called "MarketReturn" than will be a repeated constant value for all indices that have matching date with the output of the groupby operation.

One hack to achieve this would be the following:

marketRetsByDate  = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

bdata["MarketReturn"] = np.repeat(np.NaN, len(bdata))

for elem in marketRetsByDate.index.values:

    bdata["MarketReturn"][bdata["yearmonth"]==elem] = marketRetsByDate.ix[elem]

But this is slow, bad, and unPythonic.

Please log in or register to answer this question.

...