Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I need to merge two points together when they are within +2.75 and -2.75 from each other. The two points are in two separate dataframes (both processed in earlier steps of the pipeline).

I think merge_asof can help me in this case with a tolerance of 2.75 and direction 'nearest'.

However, I get an error:

MergeError: key must be integer, timestamp or float

This is one of the two dataframes:

    Unnamed: 0  Section_id  Section_location    

36015   36015   055_305AR_10.8  397.0   

7344    7344    055_305AR_10.8  659.0   

Now I have a second dataframe with also the Section_id, and section_locations like 402.5. Hence I want to merge if (in this example) the section_location of the second dataframe is bigger or equal than 394.25 and smaller or equal than 399.75.

I also sorted the values of both the dataframes with section_id and section_location.

I tried the following code, which gets the error.

 def mergeasof_dfs(df1, df2):

    return pd.merge_asof(left = df1, right = df2, 

                         on='Section_id', 

                         by='Section_location',

                         tolerance = 2.75,

                         direction = 'nearest'

                        )

---------------------------------------------------------------------------

MergeError                                Traceback (most recent call last)

<ipython-input-66-641a0dfae9af> in <module>

----> 1 test = mergeasof_dfs(df1, df2)

<ipython-input-65-bc88146fa086> in mergeasof_dfs(df1, df2)

      5                          by='Section_location',

      6                          tolerance = 2.75,

----> 7                          direction = 'nearest'

      8                         )

Error:

MergeError: key must be integer, timestamp or float

1 Answer

0 votes
by (41.4k points)

Create helper integer columns for merging :

First is created DataFrame by concat with keys parameter and new column by factorize for integers.

 

df1 = pd.DataFrame({

        'Section_location':list('abcymdc'),

 

})

df2 = pd.DataFrame({

        'Section_location':list('abhucda'),

 

})

 

 

df3 = pd.concat([df1[['Section_location']],df2[['Section_location']]], keys=('df1','df2'))

df3['Section_id_new'] = pd.factorize(df3['Section_location'])[0]

print (df3)

      Section_location  Section_id_new

df1 0                a               0

    1                b               1

    2                c               2

    3                y               3

    4                m               4

    5                d               5

    6                c               2

df2 0                a               0

    1                b               1

    2                h               6

    3                u               7

    4                c               2

    5                d               5

    6                a               0

df1['Section_id_new'] = df3.loc['df1', 'Section_id_new']

print (df1)

df2['Section_id_new'] = df3.loc['df2', 'Section_id_new']

print (df2)

  Section_location  Section_id_new

0                a               0

1                b               1

2                c               2

3                y               3

4                m               4

5                d               5

6                c               2

  Section_location  Section_id_new

0                a               0

1                b               1

2                h               6

3                u               7

4                c               2

5                d               5

6                a               0

So, solution is

 def mergeasof_dfs(df1, df2):

 

    df3 = pd.concat([df1[['Section_location']],df2[['Section_location']]], keys=('df1','df2'))

    df3['Section_id_new'] = pd.factorize(df3['Section_location'])[0]

    df1['Section_id_new'] = df3.loc['df1', 'Section_id_new']

    df2['Section_id_new'] = df3.loc['df2', 'Section_id_new']

 

    df = pd.merge_asof(left = df1, right = df2, 

                         on='Section_id_new', 

                         by='Section_location',

                         tolerance = 2.75,

                         direction = 'nearest'

                        )

    return df.drop('Section_id_new', axis=1)

 

Browse Categories

...