0 votes
1 view
in Data Science by (17.6k points)

Suppose I have dataframe df1 which includes two columns - A & B. Value of A represents the lower range and value of B represents the upper range.

  A     B

10.5  20.5

30.5  40.5

50.5  60.5

I've another dataframe which includes two columns - C & D containing a different range of numbers.

  C     D

12.34  15.90

13.68  19.13

33.5   35.60

35.12  38.76

50.6   59.1

Now I want to list all the pairs from df2 that fall under the groups (between the lower and upper range) in the df1.

Final output should be like this -

     Key                Values

(10.5, 20.5)  [(12.34, 15.90), (13.68, 19.13)]

(30.5, 40.5)  [(33.5, 35.60), (35.12, 38.76)]

(50.5, 60.5)  [(50.6, 59.1)]

The solution should be efficient as I have 5000 groups of range and 85000 numbers from different range.

1 Answer

0 votes
by (41.4k points)

Firstly, you have to  create output DF with the keys from df with A and B

df = pd.DataFrame({'Key':[(a,b) for a,b in df1.itertuples(index=False)]})

Then, you have to define a function to find range in df2 within the range from the Keys column

def find_range( key, df_2):

    mask = (key[0] <= df_2['C']) & (key[1] >= df_2['D'])

    return [(c,d) for c,d in df_2[mask].itertuples(index=False)]

Now using apply for creating the column Values.

df['Values'] = df['Key'].apply(find_range, args=(df2,))

 Output:

            Key                           Values

0  (10.5, 20.5)  [(12.34, 15.9), (13.68, 19.13)]

1  (30.5, 40.5)                   [(33.5, 35.6)]

If you want to be build successful data science career then enroll for best data science certification.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...