Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have some problems with the Pandas apply function, when using multiple columns with the following dataframe

df = DataFrame ({'a' : np.random.randn(6),

                 'b' : ['foo', 'bar'] * 3,

                 'c' : np.random.randn(6)})

and the following function

def my_test(a, b):

    return a % b

When I try to apply this function with :

df['Value'] = df.apply(lambda row: my_test(row[a], row[c]), axis=1)

I get the error message:

NameError: ("global name 'a' is not defined", u'occurred at index 0')

I do not understand this message, I defined the name properly.

I would highly appreciate any help on this issue

Update

Thanks for your help. I made indeed some syntax mistakes with the code, the index should be put ''. However I still get the same issue using a more complex function such as:

def my_test(a):

    cum_diff = 0

    for ix in df.index():

        cum_diff = cum_diff + (a - df['a'][ix])

    return cum_diff 

1 Answer

0 votes
by (41.4k points)

Here, you forgot the ' ' of your string.

In [43]: df['Value'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1)

In [44]: df

Out[44]:

                    a    b     c Value

          0 -1.674308  foo 0.343801  0.044698

          1 -2.163236  bar -2.046438 -0.116798

          2 -0.199115  foo -0.458050 -0.199115

          3  0.918646  bar -0.007185 -0.001006

          4  1.336830  foo 0.534292  0.268245

          5  0.976844  bar -0.773630 -0.570417

Browse Categories

...