Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

Given two DataFrames

np.random.seed(0)

df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))

df2 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))

df1

   A  B C  D

0  5 0  3 3

1  7 9  3 5

2  2 4  7 6

3  8 8  1 6

4  7 7  8 1

df2

   A  B C  D

0  5 9  8 9

1  4 3  0 3

2  5 0  2 3

3  8 1  3 3

4  3 7  0 1

I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:

x = 5

df2['D'] = df1['A'] + (df1['B'] * x) 

I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:

x = 5 df2['D'] = df1['A'] + (df1['B'] * x)

...to code using eval. The reason for using eval is that I would like to automate many workflows, so creating them dynamically will be useful to me.

I am trying to better understand the engine and parser arguments to determine how best to solve my problem. I have gone through the documentation but the difference was not made clear to me.

  1. What arguments should be used to ensure my code is working at a max performance?

  2. Is there a way to assign the result of the expression back to df2?

  3. Also, to make things more complicated, how do I pass x as an argument inside the string expression?

1 Answer

0 votes
by (107k points)

Before jumping into the usage of eval/query, it has severe performance issues if your dataset has less than 15,000 rows.

In that case, simply use df.loc[mask1, mask2].

Refer the following link regarding the same: https://pandas.pydata.org/pandas-docs/version/0.22/enhancingperf.html#enhancingperf-eval

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

Related questions

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...