This is the setup:
arrays = [["2010-01-01","2010-01-01","2010-01-02","2010-01-02","2010-01-03","2010-01-03"],
["MSFT", "AAPL", "MSFT", "AAPL","MSFT", "AAPL"]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["date", "symbol"])
df = pd.DataFrame(data=np.random.randn(6, 4), index=index, columns=["high", "low", "open", "close"])
def fn_sum(close, high, low):
return close+high+low
def fn_plus(close):
return close+1
The DF looks like this:
date symbol high low open close
2010-01-01 MSFT 1.144042 0.889603 -0.193715 1.005927
AAPL 0.433530 -0.291510 1.420505 0.326206
2010-01-02 MSFT -1.509419 -0.273476 -0.620735 -0.205946
AAPL 0.454401 -0.085008 0.686485 1.309894
2010-01-03 MSFT 1.487588 -0.777500 -0.218993 -1.242664
AAPL -0.456024 -0.819463 -2.224953 1.263124
I want to use technical analysis functions on all symbols with a groupby(), apply() fashion like this:
df["1"] = df.groupby(level="symbol").apply(lambda x: fn_sum(x["close"], x["high"], x["low"]))
This results in a broadcasting error:
ValueError: operands could not be broadcast together with shapes (6,2) (3,) (6,2)
Performing the same on a singular column works though:
df["2"] = df.groupby(level="symbol").close.apply(lambda x: fn_plus(x))
Questions:
So how do I get this to work when using apply on multiple columns and combining them back to a DataFrame without broadcasting issues?
Also I'm very grateful for a better implementation that works with MultiIndex DFs like above.
For more context: I want to use technical analysis functions from the TA-lib package. See:https://mrjbq7.github.io/ta-lib/func_groups/volatility_indicators.html
The functions look like this (for example):
ATR(high, low, close[, timeperiod=?])
Average True Range (Volatility Indicators)
Inputs: prices: ['high', 'low', 'close'] Parameters: timeperiod: 14 Outputs: real
I get the same broadcasting error as above in the contrived example.