Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

I'm sure this is simple, but as a complete newbie to python, I'm having trouble figuring out how to iterate over variables in a pandas dataframe and run a regression with each.

Here's what I'm doing:

all_data = {}

for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']:

    all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015')

prices = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()})  

returns = prices.pct_change()

I know I can run a regression like this:

regs = sm.OLS(returns.FIUIX,returns.FSTMX).fit()

but suppose I want to do this for each column in the dataframe. In particular, I want to regress FIUIX on FSTMX, and then FSAIX on FSTMX, and then FSAVX on FSTMX. After each regression I want to store the residuals.

I've tried various versions of the following, but I must be getting the syntax wrong:

resids = {}

for k in returns.keys():

    reg = sm.OLS(returns[k],returns.FSTMX).fit()

    resids[k] = reg.resid

I think the problem is I don't know how to refer to the returns column by key, so returns[k] is probably wrong.

Any guidance on the best way to do this would be much appreciated. Perhaps there's a common pandas approach I'm missing.

1 Answer

0 votes
by (36.8k points)

Use the code below for output:

for column in df:


 If you are a beginner and want to know more about Data Science the do check out the Data Science course

Browse Categories