Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

This is my data

import numpy as np

import pandas as pd

data = {

    'cases': [120, 100, np.nan, np.nan, np.nan, np.nan, np.nan],

    'percent_change': [0.03, 0.01, 0.00, -0.001, 0.05, -0.1, 0.003],

    'tag': [7, 6, 5, 4, 3, 2, 1],

}

   cases  percent_change  tag

0  120.0           0.030    7

1  100.0           0.010    6

2    NaN           0.000    5

3    NaN          -0.001    4

4    NaN           0.050    3

5    NaN          -0.100    2

6    NaN           0.003    1

I want to create next cases' value as (next value) = (previous value) * (1+current per_change). Specifically, I want it to be done in rows that have a tag value less than 6 (and I must use a mask (i.e., df.loc for this row selection). This should give me:

   cases  percent_change  tag

0  120.0           0.030    7

1  100.0           0.010    6

2  100.0           0.000    5

3   99.9          -0.001    4

4  104.9           0.050    3

5   94.4          -0.100    2

6   94.7           0.003    1

 I have tried this:

df_index = np.where(df['tag'] == 6)

index = df_index[0][0]

df.loc[(df.tag<6), 'cases'] = (df.percent_change.shift(0).fillna(1) + 1).cumprod() * df.at[index, 'cases']

        cases  percent_change  tag

0  120.000000           0.030    7

1  100.000000           0.010    6

2  104.030000           0.000    5

3  103.925970          -0.001    4

4  109.122268           0.050    3

5   98.210042          -0.100    2

6   98.504672           0.003    1

1 Answer

0 votes
by (36.8k points)

This is what I do:

s = df.cases.isna()

percents = df.percent_change.where(s,0)+1

df['cases'] = df.cases.ffill()*percents.cumprod()

Output:

        cases  percent_change  tag

0  120.000000           0.030    7

1  100.000000           0.010    6

2  100.000000           0.000    5

3   99.900000          -0.001    4

4  104.895000           0.050    3

5   94.405500          -0.100    2

6   94.688716           0.003    1

update: If you really insist on masking on the Tag==6:

s = df.tag.eq(6).shift()

s = s.where(s).ffill()

percents = df.percent_change.where(s,0)+1

df['cases'] = df.cases.ffill()*percents.cumprod()

 Improve your knowledge in data science from scratch using Data science online courses

Browse Categories

...