Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am trying to create a new column where in, True if last n rows are True in other column. It works fine but the problem is it is taking lot of time. This is my code:

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]}) 

n=2 ## n to cover 10 min range samples 

cl_id = dfx.columns.tolist().index('A')  ### cl_id for index number of the column for using in .iloc 

l1=[False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]

dfx['B'] = l1

print(dfx)

   #old_col   # New_col

       A      B

0  False  False

1  False  False

2  False  False

3  False  False

4   True  False

5   True   True  ## Here A col last two rows True, hence True

6   True   True  ## Here A col last two rows True, hence True

7   True   True  ## Here A col last two rows True, hence True

8  False  False

9   True  False

Can anyone suggest me a better way to achieve it?

1 Answer

0 votes
by (36.8k points)

You need to use  the pandas.Series.rolling:

n = 2

dfx["A"].rolling(n).sum().eq(n)

Output:

0    False

1    False

2    False

3    False

4    False

5     True

6     True

7     True

8    False

9    False

Name: A, dtype: bool

Use the benchmark against OP (about 1000x faster):

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]*1000}) 

%timeit -n10 l1 = dfx["A"].rolling(n).sum().eq(n)

# 702 µs ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n10 l2 = [False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]

# 908 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

l1.tolist() == l2

# True

If you are a beginner and want to know more about Python the do check out the python for data science course 

Related questions

Browse Categories

...