2 views

I have the below dataframe:

import pandas as pd

data = pd.DataFrame({'start':['2020-08-01','2020-08-02','2020-08-03','2020-08-04','2020-08-05','2020-08-06','2020-08-07','2020-08-08'],

'end':['2020-08-03','2020-08-03','2020-08-06','2020-08-06','2020-08-06','2020-08-08','2020-08-08','2020-08-08'],

'score':[74, 81, 38, 49, 79, 17, 53, 69]})

I am trying to compute the score difference between the start date and its corresponding end date as:

start         end  score  result

0   2020-08-01  2020-08-03     74      36  # 74-38 as score on 08/03 is 38

1   2020-08-02  2020-08-03     81      43  # 81-38

2   2020-08-03  2020-08-06     38      21  # 38-17 as score on 08/06 is 17

3   2020-08-04  2020-08-06     49      32  # 49-17

4   2020-08-05  2020-08-06     79      62  # 79-17

5   2020-08-06  2020-08-08     17     -52  # 17-69 as score on 08/08 is 69

6   2020-08-07  2020-08-08     53     -16  # 53-69

7   2020-08-08  2020-08-08     69       0  # 69-69

Is there the good pandas way to do this?

by (36.8k points)

Suppose if start values are unique then subtracting them by using mapped values:

data['result'] = data['score'].sub(data['end'].map(data.set_index('start')['score']))

print (data)

start         end  score  result

0  2020-08-01  2020-08-03     74      36

1  2020-08-02  2020-08-03     81      43

2  2020-08-03  2020-08-06     38      21

3  2020-08-04  2020-08-06     49      32

4  2020-08-05  2020-08-06     79      62

5  2020-08-06  2020-08-08     17     -52

6  2020-08-07  2020-08-08     53     -16

7  2020-08-08  2020-08-08     69       0

Detail:

print (data['end'].map(data.set_index('start')['score']))

0    38

1    38

2    17

3    17

4    17

5    69

6    69

7    69

Name: end, dtype: int64

Learn data science with python course to improve your technical knowledge.