Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

Given two DataFrames

np.random.seed(0)

df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))

df2 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))

df1

   A  B C  D

0  5 0  3 3

1  7 9  3 5

2  2 4  7 6

3  8 8  1 6

4  7 7  8 1

df2

   A  B C  D

0  5 9  8 9

1  4 3  0 3

2  5 0  2 3

3  8 1  3 3

4  3 7  0 1

I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:

x = 5

df2['D'] = df1['A'] + (df1['B'] * x) 

I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:

x = 5 df2['D'] = df1['A'] + (df1['B'] * x)

...to code using eval. The reason for using eval is that I would like to automate many workflows, so creating them dynamically will be useful to me.

I am trying to better understand the engine and parser arguments to determine how best to solve my problem. I have gone through the documentation but the difference was not made clear to me.

  1. What arguments should be used to ensure my code is working at a max performance?

  2. Is there a way to assign the result of the expression back to df2?

  3. Also, to make things more complicated, how do I pass x as an argument inside the string expression?

1 Answer

0 votes
by (108k points)

Before jumping into the usage of eval/query, it has severe performance issues if your dataset has less than 15,000 rows.

In that case, simply use df.loc[mask1, mask2].

Refer the following link regarding the same: https://pandas.pydata.org/pandas-docs/version/0.22/enhancingperf.html#enhancingperf-eval

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

Related questions

Browse Categories

...