Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

How do I get the exponential weighted moving average in NumPy just like the following in pandas?

import pandas as pd

import pandas_datareader as pdr

from datetime import datetime

# Declare variables

ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']

windowSize = 20

# Get PANDAS exponential weighted moving average

ewm_pd = pd.DataFrame(ibm).ewm(span=windowSize, min_periods=windowSize).mean().as_matrix()

print(ewm_pd)

I tried the following with NumPy

import numpy as np

import pandas_datareader as pdr

from datetime import datetime

def strided_app(a, L, S): # Window len = L, Stride len/stepsize = S

    nrows = ((a.size - L) // S) + 1

    n = a.strides[0]

    return np.lib.stride_tricks.as_strided(a, shape=(nrows, L), strides=(S * n, n))

def numpyEWMA(price, windowSize):

    weights = np.exp(np.linspace(-1., 0., windowSize))

    weights /= weights.sum()

    a2D = strided_app(price, windowSize, 1)

    returnArray = np.empty((price.shape[0]))

    returnArray.fill(np.nan)

    for index in (range(a2D.shape[0])):

        returnArray[index + windowSize-1] = np.convolve(weights, a2D[index])[windowSize - 1:-windowSize + 1]

    return np.reshape(returnArray, (-1, 1))

# Declare variables

ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1)).reset_index(drop=True)['Adj Close']

windowSize = 20

# Get NumPy exponential weighted moving average

ewma_np = numpyEWMA(ibm, windowSize)

print(ewma_np)

But the results are not similar to the ones in pandas.

Is there maybe a better approach to calculate the exponential weighted moving average directly in NumPy and get the exact same result as the pandas.ewm().mean()?

At 60,000 requests on pandas solution, I get about 230 seconds. I am sure that with a pure NumPy, this can be decreased significantly.

1 Answer

0 votes
by (108k points)

There is a vectorized version of the numpy_ewma function that's required to be producing the correct results. Refer the following code:

def numpy_ewma_vectorized(data, window):

    alpha = 2 /(window + 1.0)

    alpha_rev = 1-alpha

    scale = 1/alpha_rev

    n = data.shape[0]

    r = np.arange(n)

    scale_arr = scale**r

    offset = data[0]*alpha_rev**(r+1)

    pw0 = alpha*alpha_rev**(n-1)

    mult = data*pw0*scale_arr

    cumsums = mult.cumsum()

    out = offset + cumsums*scale_arr[::-1]

    return out

Further boost

We can boost it further with some code re-use, like so -

def numpy_ewma_vectorized_v2(data, window):

    alpha = 2 /(window + 1.0)

    alpha_rev = 1-alpha

    n = data.shape[0]

    pows = alpha_rev**(np.arange(n+1))

    scale_arr = 1/pows[:-1]

    offset = data[0]*pows[1:]

    pw0 = alpha*alpha_rev**(n-1)

    mult = data*pw0*scale_arr

    cumsums = mult.cumsum()

    out = offset + cumsums*scale_arr[::-1]

    return out

Runtime test

Let's time these two against the same loopy function for a big dataset.

In [97]: data = np.random.randint(2,9,(5000))

    ...: window = 20

    ...:

In [98]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized(data, window))

Out[98]: True

In [99]: np.allclose(numpy_ewma(data, window), numpy_ewma_vectorized_v2(data, window))

Out[99]: True

In [100]: %timeit numpy_ewma(data, window)

100 loops, best of 3: 6.03 ms per loop

In [101]: %timeit numpy_ewma_vectorized(data, window)

1000 loops, best of 3: 665 µs per loop

In [102]: %timeit numpy_ewma_vectorized_v2(data, window)

1000 loops, best of 3: 357 µs per loop

In [103]: 6030/357.0

Out[103]: 16.89075630252101

Related questions

Browse Categories

...