Dynamic Expression Evaluation in pandas using pd.eval()

Question

asked Sep 12, 2019 in Data Science by ashely (50.2k points)

Given two DataFrames

np.random.seed(0)
df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df1
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
3 8 8 1 6
4 7 7 8 1
df2
A B C D
0 5 9 8 9
1 4 3 0 3
2 5 0 2 3
3 8 1 3 3
4 3 7 0 1

I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:

x = 5
df2['D'] = df1['A'] + (df1['B'] * x)

I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:

x = 5 df2['D'] = df1['A'] + (df1['B'] * x)

...to code using eval. The reason for using eval is that I would like to automate many workflows, so creating them dynamically will be useful to me.

I am trying to better understand the engine and parser arguments to determine how best to solve my problem. I have gone through the documentation but the difference was not made clear to me.

What arguments should be used to ensure my code is working at a max performance?
Is there a way to assign the result of the expression back to df2?
Also, to make things more complicated, how do I pass x as an argument inside the string expression?

1 Answer

vinita · Answer 1 · 2019-09-12T14:09:26+0000

Before jumping into the usage of eval/query, it has severe performance issues if your dataset has less than 15,000 rows.

In that case, simply use df.loc[mask1, mask2].

Refer the following link regarding the same: https://pandas.pydata.org/pandas-docs/version/0.22/enhancingperf.html#enhancingperf-eval

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

Dynamic Expression Evaluation in pandas using pd.eval()

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources