2 views

How do I get the exponential weighted moving average in NumPy just like the following in pandas?

import pandas as pd

from datetime import datetime

# Declare variables

ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']

windowSize = 20

# Get PANDAS exponential weighted moving average

ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix()

print(ewm_pd)

I tried the following with NumPy

import numpy as np

from datetime import datetime

def strided_app(a, L, S): # Window len = L, Stride len/stepsize = S

nrows = ((a.size - L) // S) + 1

n = a.strides[0]

return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n))

def numpyEWMA(price, windowSize):

weights = np.exp(np.linspace(-1., 0., windowSize))

weights /= weights.sum()

a2D = strided_app(price, windowSize, 1)

returnArray = np.empty((price.shape[0]))

returnArray.fill(np.nan)

for index in (range(a2D.shape[0])):

returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1]

return np.reshape(returnArray, (-1, 1))

# Declare variables

ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']

windowSize = 20

# Get NumPy exponential weighted moving average

ewma_np = numpyEWMA(ibm, windowSize)

print(ewma_np)

But the results are not similar to the ones in pandas.

Is there maybe a better approach to calculate the exponential weighted moving average directly in NumPy and get the exact same result as the `pandas.ewm().mean()`?

At 60,000 requests on pandas solution, I get about 230 seconds. I am sure that with a pure NumPy, this can be decreased significantly.

by (108k points)

There is a vectorized version of the numpy_ewma function that's required to be producing the correct results. Refer the following code:

def numpy_ewma_vectorized(data, window):

alpha = 2 /(window + 1.0)

alpha_rev = 1-alpha

scale = 1/alpha_rev

n = data.shape[0]

r = np.arange(n)

scale_arr = scale**r

offset = data[0]*alpha_rev**(r+1)

pw0 = alpha*alpha_rev**(n-1)

mult = data*pw0*scale_arr

cumsums = mult.cumsum()

out = offset + cumsums*scale_arr[::-1]

return out

Further boost

We can boost it further with some code re-use, like so -

def numpy_ewma_vectorized_v2(data, window):

alpha = 2 /(window + 1.0)

alpha_rev = 1-alpha

n = data.shape[0]

pows = alpha_rev**(np.arange(n+1))

scale_arr = 1/pows[:-1]

offset = data[0]*pows[1:]

pw0 = alpha*alpha_rev**(n-1)

mult = data*pw0*scale_arr

cumsums = mult.cumsum()

out = offset + cumsums*scale_arr[::-1]

return out

Runtime test

Let's time these two against the same loopy function for a big dataset.

In [97]: data = np.random.randint(2,9,(5000))

...: window = 20

...:

In [98]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized(data, window))

Out[98]: True

In [99]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized_v2(data, window))

Out[99]: True

In [100]: %timeit numpy_ewma(data, window)

100 loops, best of 3: 6.03 ms per loop

In [101]: %timeit numpy_ewma_vectorized(data, window)

1000 loops, best of 3: 665 µs per loop

In [102]: %timeit numpy_ewma_vectorized_v2(data, window)

1000 loops, best of 3: 357 µs per loop

In [103]: 6030/357.0

Out[103]: 16.89075630252101