Python Dataframe Set True if last n rows are True

Question

asked Sep 4, 2020 in Data Science by blackindya (18.4k points)

I am trying to create a new column where in, True if last n rows are True in other column. It works fine but the problem is it is taking lot of time. This is my code:

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]})
n=2 ## n to cover 10 min range samples
cl_id = dfx.columns.tolist().index('A') ### cl_id for index number of the column for using in .iloc
l1=[False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]
dfx['B'] = l1
print(dfx)
#old_col # New_col
A B
0 False False
1 False False
2 False False
3 False False
4 True False
5 True True ## Here A col last two rows True, hence True
6 True True ## Here A col last two rows True, hence True
7 True True ## Here A col last two rows True, hence True
8 False False
9 True False

Can anyone suggest me a better way to achieve it?

1 Answer

supriya · Answer 1 · 2020-09-04T04:44:54+0000

You need to use the pandas.Series.rolling:

n = 2
dfx["A"].rolling(n).sum().eq(n)
Output:
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 False
9 False
Name: A, dtype: bool

Use the benchmark against OP (about 1000x faster):

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]*1000})
%timeit -n10 l1 = dfx["A"].rolling(n).sum().eq(n)
# 702 µs ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n10 l2 = [False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]
# 908 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
l1.tolist() == l2
# True

If you are a beginner and want to know more about Python the do check out the python for data science course

Python Dataframe Set True if last n rows are True

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources