I am working on jupyter notebook were I am using 2 data frames.
df = pd.DataFrame({"Date": ['2014-11-21 11:00:00', '2014-11-21 11:00:03', '2014-11-21 11:00:04', '2014-11-21 11:00:05', '2014-11-21 11:00:07', '2014-11-21 11:00:08', '2014-11-21 11:00:10', '2014-11-21 11:00:11', '2014-10-24 10:00:55', '2014-10-24 10:00:59'], "A":[1, 2, 5, 3, 9, 6, 3, 0, 8, 10]})
Date A
0 2014-11-21 11:00:00 1
1 2014-11-21 11:00:03 2
2 2014-11-21 11:00:04 5
3 2014-11-21 11:00:05 3
4 2014-11-21 11:00:07 9
5 2014-11-21 11:00:08 6
6 2014-11-21 11:00:10 3
7 2014-11-21 11:00:11 0
8 2014-10-24 10:00:55 8
9 2014-10-24 10:00:59 10
The 2nd data frame contains the Date and time range as shown below:
info = pd.DataFrame({"Start": ['2014-11-21 11:00:00', '2014-11-21 11:08:00', '2014-10-24 10:55:00'], "Stop": ['2014-11-21 11:07:00', '2014-11-21 11:11:00', '2014-10-24 10:59:00']})
Start Stop
0 2014-11-21 11:00:00 2014-11-21 11:00:07
1 2014-11-21 11:00:08 2014-11-21 11:00:11
2 2014-10-24 10:00:55 2014-10-24 10:00:59
I am trying to get the cumulated sum of the data frame using the second window. It should only be done if and only if the row in the data frame is present in the range of that particular row. For example, the cumulated sum of the row with date 2014-11-21 11:00:08 should be 0 because its the beginning of the range. The cumulative for date 2014-11-21 11:00:07 should be 12(9+3).
I have given the code below for getting the result but I am not getting expected result:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Date": ['2014-11-21 11:00:00', '2014-11-21 11:00:03', '2014-11-21 11:00:04', '2014-11-21 11:00:05', '2014-11-21 11:00:07', '2014-11-21 11:00:08', '2014-11-21 11:00:10', '2014-11-21 11:00:11', '2014-10-24 10:00:55', '2014-10-24 10:00:59'], "A":[1, 2, 5, 3, 9, 6, 3, 0, 8, 10]})
info = pd.DataFrame({"Start": ['2014-11-21 11:00:00', '2014-11-21 11:00:08', '2014-10-24 10:00:55'], "Stop": ['2014-11-21 11:00:07', '2014-11-21 11:00:11', '2014-10-24 10:00:59']})
#info = pd.DataFrame({"Start": ['2014-11-21 11:00:00', '2014-11-21 11:00:00', '2014-11-21 11:00:00', '2014-11-21 11:00:01', '2014-11-21 11:00:02', '2014-11-21 11:00:03', '2014-11-21 11:00:04', '2014-11-21 11:00:05'], "Stop": ['2014-11-21 11:00:00', '2014-11-21 11:00:01', '2014-11-21 11:00:02', '2014-11-21 11:00:03', '2014-11-21 11:00:04', '2014-11-21 11:00:05', '2014-11-21 11:00:06', '2014-11-21 11:00:07']})
info['groupnum']=info.index
info.Start=pd.to_datetime(info.Start)
info.Stop=pd.to_datetime(info.Stop)
cinfo = info.set_index(pd.IntervalIndex.from_arrays(info.Start, info.Stop, closed='both'))['groupnum']
df['groupnum']=pd.to_datetime(df.Date).map(cinfo)
df['cum'] = df.groupby('groupnum').A.cumsum()
print(df)
output:
Date A group num cum
0 2014-11-21 11:00:00 1 0 1
1 2014-11-21 11:00:03 2 0 3
2 2014-11-21 11:00:04 5 0 8
3 2014-11-21 11:00:05 3 0 11
4 2014-11-21 11:00:07 9 0 20
5 2014-11-21 11:00:08 6 1 6
6 2014-11-21 11:00:10 3 1 9
7 2014-11-21 11:00:11 0 1 9
8 2014-10-24 10:00:55 8 2 8
9 2014-10-24 10:00:59 10 2 18
But i wanted the result in way:
Date A group num cum
0 2014-11-21 11:00:00 1 0 1
1 2014-11-21 11:00:03 2 0 2
2 2014-11-21 11:00:04 5 0 7
3 2014-11-21 11:00:05 3 0 10
4 2014-11-21 11:00:07 9 0 12
5 2014-11-21 11:00:08 6 1 6
6 2014-11-21 11:00:10 3 1 9
7 2014-11-21 11:00:11 0 1 3
8 2014-10-24 10:00:55 8 2 8
9 2014-10-24 10:00:59 10 2 10