Pandas mask / where methods versus NumPy np.where

Question

asked Oct 16, 2019 in Data Science by ashely (50.2k points)

I often use Pandas mask and where methods for cleaner logic when updating values in a series conditionally. However, for relatively performance-critical code I notice a significant performance drop relative to numpy.where.

While I'm happy to accept this for specific cases, I'm interested to know:

Do Pandas mask / where methods offer any additional functionality, apart from inplace / errors / try-cast parameters? I understand those 3 parameters but rarely use them. For example, I have no idea what the level parameter refers to.
Is there any non-trivial counter-example where mask / where outperforms numpy.where? If such an example exists, it could influence how I choose appropriate methods going forward.

For reference, here's some benchmarking on Pandas 0.19.2 / Python 3.6.0:

np.random.seed(0)
n = 10000000
df = pd.DataFrame(np.random.random(n))
assert (df[0].mask(df[0] > 0.5, 1).values == np.where(df[0] > 0.5, 1, df[0])).all()
%timeit df[0].mask(df[0] > 0.5, 1) # 145 ms per loop
%timeit np.where(df[0] > 0.5, 1, df[0]) # 113 ms per loop

The performance appears to diverge further for non-scalar values:

%timeit df[0].mask(df[0] > 0.5, df[0]*2) # 338 ms per loop
%timeit np.where(df[0] > 0.5, df[0]*2, df[0]) # 153 ms per loop

1 Answer

vinita · Answer 1 · 2019-10-16T12:33:05+0000

Pandas have the potential to be at least slightly faster than numpy (because it is possible to be faster). Though, pandas' somewhat opaque handling of data-copying makes it hard to predict when this potential is overshadowed by (unnecessary) data copying. When the execution of where/mask is the bottleneck, I would use numba/cython to improve the performance

The idea is to take

np.where(df[0] > 0.5, df[0]*2, df[0])

version and to eliminate the need to create a temporary - i.e, df[0]*2.

Pandas mask / where methods versus NumPy np.where

Pandas mask / where methods versus NumPy np.where

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions