Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in BI by (47.2k points)

I have some values below which show weekly results over a period of time. On week 19, there was a new process implemented which was supposed to lower the results further.

However, it is clear that there was already a week/week reduction in the results before Week 19. What is the best way to quantify the impact of the 'New Process' versus the rate of improvement before Week 19? I do not want to 'double-count' the effect of the New Process.

Week #  Result  Status

Week 1  849.27  NA

Week 2  807.59  NA

Week 3  803.59  NA

Week 4  849.7   NA

Week 5  852.19  NA

Week 6  845.06  NA

Week 7  833.77  NA

Week 8  788.46  NA

Week 9  800.32  NA

Week 10 814.66  NA

Week 11 829.21  NA

Week 12 799.49  NA

Week 13 812.24  NA

Week 14 772.62  NA

Week 15 782.13  NA

Week 16 779.66  NA

Week 17 752.86  NA

Week 18 758.39  NA

Week 19 738.47  New Process

Week 20 721.11  New Process

Week 21 642.04  New Process

Week 22 718.72  New Process

Week 23 743.47  New Process

Week 24 709.57  New Process

Week 25 704.48  New Process

Week 26 673.51  New Process

1 Answer

0 votes
by (17.6k points)
  • The first models below are estimated with OLS with a shift in the constant. There is a shift in trend in first case also.

  • I have used poisson in the last model, since the values of the dependent variable are positive and it estimates an exponential model. The standard errors are correct if we use robust covariance matrix. (We are using Poisson just to estimate an exponential model, we don't assume that the underlying distribution is Poisson).

  • Make a note that It's a pure numpy version, I didn't bother using pandas or patsy formulas. Poisson has optimization problems if some of the explanatory variables are too large.

import numpy as np

import statsmodels.api as sm

data = np.array(

      [ 849.27,  807.59, 803.59,  849.7 , 852.19, 845.06,  833.77,

        788.46,  800.32, 814.66,  829.21, 799.49, 812.24,  772.62,

        782.13,  779.66, 752.86,  758.39, 738.47, 721.11,  642.04,

        718.72,  743.47, 709.57,  704.48, 673.51])

nobs = len(data)

trend = np.arange(nobs)

proc = (trend >= 18).astype(int)

x = np.column_stack((np.ones(nobs), trend, proc, (trend - 18)*proc))      

res = sm.OLS(data, x).fit()

res.model.exog_names[:] = ['const', 'trend', 'const_diff', 'trend_new']

print(res.summary())

res2 = sm.OLS(data, x).fit()

res2.model.exog_names[:] = ['const', 'trend', 'const_diff']

print(res2.summary())

res4 = sm.OLS(np.log(data), x).fit()

res4.model.exog_names[:] = ['const', 'trend', 'const_diff']

print(res4.summary())

res3 = sm.Poisson(data, x).fit(cov_type='HC0', method='nm', maxiter=5000)

res3 = sm.Poisson(data, x).fit(start_params=res3.params, cov_type='HC0', method='bfgs')

res3.model.exog_names[:] = ['const', 'trend', 'const_diff']

print(res3.summary())

print(np.exp(res3.params))

Browse Categories

...