Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am trying to create a new column where in, True if last n rows are True in other column. It works fine but the problem is it is taking lot of time. This is my code:

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]}) 

n=2 ## n to cover 10 min range samples 

cl_id = dfx.columns.tolist().index('A')  ### cl_id for index number of the column for using in .iloc 

l1=[False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]

dfx['B'] = l1

print(dfx)

   #old_col   # New_col

       A      B

0  False  False

1  False  False

2  False  False

3  False  False

4   True  False

5   True   True  ## Here A col last two rows True, hence True

6   True   True  ## Here A col last two rows True, hence True

7   True   True  ## Here A col last two rows True, hence True

8  False  False

9   True  False

Can anyone suggest me a better way to achieve it?

1 Answer

0 votes
by (36.8k points)

You need to use  the pandas.Series.rolling:

n = 2

dfx["A"].rolling(n).sum().eq(n)

Output:

0    False

1    False

2    False

3    False

4    False

5     True

6     True

7     True

8    False

9    False

Name: A, dtype: bool

Use the benchmark against OP (about 1000x faster):

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]*1000}) 

%timeit -n10 l1 = dfx["A"].rolling(n).sum().eq(n)

# 702 µs ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n10 l2 = [False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]

# 908 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

l1.tolist() == l2

# True

If you are a beginner and want to know more about Python the do check out the python for data science course 

Related questions

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...