3 views

Scikit-learn utilizes a very convenient approach based on fit and predicts methods. I have time-series data in the format suited for fit and predict.

For example, I have the following Xs:

[[1.0, 2.3, 4.5], [6.7, 2.7, 1.2], ..., [3.2, 4.7, 1.1]]

and the corresponding ys:

[[1.0], [2.3], ..., [7.7]]

These data have the following meaning. The values stored in ys form a time series. The values in Xs are corresponding time-dependent "factors" that are known to have some influence on the values in ys (for example temperature, humidity, and atmospheric pressure).

Now, of course, I can use fit(Xs, Ys). But then I get a model in which future values in ys depend only on factors and do not dependent on the previous Y values (at least directly) and this is a limitation of the model. I would like to have a model in which Y_n depends also on Y_{n-1} and Y_{n-2} and so on. For example, I might want to use an exponential moving average as a model. What is the most elegant way to do it in scikit-learn

by (33.1k points)

In your problem, it seems that you are looking for a function of exponentially weighted moving average:

You can simply use pandas.stats.moments.ewma method for this problem.

For example:

import pandas, numpy

ewma = pandas.stats.moments.ewma

EMOV_n = ewma( ys, com=2 )

There is a com parameter in the above function. It is called the smoothing parameter. The value passed in the com parameter regulates the weights to optimize the results.

You can combine EMOV_n to Xs using the following code:

Xs = numpy.vstack((Xs,EMOV_n))

Now, you can use a machine learning model from scikit learn.

For example:

from sklearn import linear_model

clf = linear_model.LinearRegression()

clf.fit ( Xs, Ys )

print clf.coef_