+5 votes
2 views
in Python by (1.4k points)

I have 2 columns:

X Y

1 3

1 4

2 6

1 6

2 3

How to sum up values of Y where X=1 e.g this will give me [3+4+6=13] in pandas?

2 Answers

+2 votes
by (10.9k points)
edited by

 @Alex , you may refer to the following approaches:

1.Using groupby() which splits the dataframe into parts according to the value in column ‘X’ -

df.groupby('X')['Y'].sum()[1]

13

2.Similarly, we can use Boolean indexing where loc is used to handle indexing of rows and columns-

df.loc[df['X'] == 1, 'Y'].sum()

13

3.Query can also be used in order to filter rows you are interested in-

df.query("X == 1")['Y'].sum()

13

Similarly, if you had three columns :

Ex-

       X Y  Z
       1 3   2

       1 4   2

       2 6   2

       1 6   2

       2  3  2

And you want to sum the rows of Y where Z is 2 and X is 2 ,then we may use the following:

1.groupby()

   df.groupby('X')['Y'].sum()

2.Query

df.query("X == 2 and Z == 2")['Y'].sum()

3.Boolean indexing

df.loc[(df['X'] == 2) & (df['Z'] == 2), 'Y'].sum() 

If You want to learn python for data science visit this python course by Intellipaat.

0 votes
by (107k points)

You can also do this without using groupby or loc. By simply including the condition in code. Let the name of dataframe be df. Then you can try :

df[df['a']==1]['b'].sum()

or you can also try :

sum(df[df['a']==1]['b'])

Another way could be to use the numpy library of python :

import numpy as np

print(np.where(df['a']==1, df['b'],0).sum())

You can use the following video tutorials to clear all your doubts:-

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...