Merge two dataframes on closest matching datetime index

Question

asked Aug 13, 2020 in Data Science by blackindya (18.4k points)

I have two data frames with the closest matching DateTime index, sometimes matching. An object is to merge two of them using one index as a reference and appending a second to the closest matching (within 1 minute) on the first one.

My code and output:

import pandas as pd
masterdf = pd.DataFrame({"AA":[77.368607,77.491655,77.425134,76.490991]})
masterdf.index = ['2019-10-01 07:52:07','2019-10-01 07:53:01','2019-10-01 07:53:54','2019-10-01 07:54:47']
masterdf.index.name = 'datetime'
slavedf = pd.DataFrame({"BB":[50,60,70,80]})
slavedf.index = ['2019-10-01 07:53:00','2019-10-01 07:53:54','2019-10-01 10:54:47','2019-10-01 10:00:00']
slavedf.index.name = 'datetime'
maindf = masterdf.merge(slavedf,left_index=True,right_index=True)

Present output:

masterdf =
AA
datetime
2019-10-01 07:52:07 77.368607
2019-10-01 07:53:01 77.491655
2019-10-01 07:53:54 77.425134
2019-10-01 07:54:47 76.490991
slavedf =
BB
datetime
2019-10-01 07:53:00 50
2019-10-01 07:53:54 60
2019-10-01 10:54:47 70
2019-10-01 10:00:00 80
maindf =
datetime AA BB
2019-10-01 07:53:54 77.425134 60

Expected output:

maindf =
datetime AA BB
2019-10-01 07:53:01 77.491655 50
2019-10-01 07:53:54 77.425134 60

How do I achieve this?

1 Answer

supriya · Answer 1 · 2020-08-13T10:07:25+0000

Logic here use a merge_asof , we need to adjust it due to , merge_asof will use the 2nd dataframe mutiple times , then we need additional key here is datetime to drop a duplicate

masterdf.index=pd.to_datetime(masterdf.index)
masterdf=masterdf.sort_index().reset_index()
slavedf.index=pd.to_datetime(slavedf.index)
slavedf=slavedf.sort_index().reset_index()
slavedf['datetime2']=slavedf['datetime']
slavedf['key']=slavedf.index
newdf=pd.merge_asof(masterdf,slavedf,on='datetime',tolerance=pd.Timedelta('60s'),direction='nearest')
newdf['diff']=(newdf.datetime-newdf.datetime2).abs()
newdf=newdf.sort_values('diff').drop_duplicates('key')
newdf
Out[35]:
datetime AA BB datetime2 diff
2 2019-10-01 07:53:54 77.425134 60 2019-10-01 07:53:54 00:00:00
1 2019-10-01 07:53:01 77.491655 50 2019-10-01 07:53:00 00:00:01

If you are a beginner and want to know more about Data Science the do check out the Data Science course

Merge two dataframes on closest matching datetime index

1 Answer

Related questions

Browse Categories