Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I am working with 3 data frames, out of which 2 data frames contains additional bin number assigned to column data based on the range in which they belong (in separate columns).

df_1

A   B

5   6

8   1

6   7

4   9

1   3

9   2

2   5

df_2

A   B   A_bin   B_bin

5   6   2     2

8   1   1     1

6   7   3     2

4   9   3     3

1   3   1     1

9   2   1     1

2   5   2     2

df_3

A   B   C   D   A_bin   B_bin   C_bin   D_bin

5   6   2   6    2     2     1     2

8   1   6   4    1     1     2     2    

6   7   3   1    3     2     1     1    

4   9   1   9    3     3     1     3    

1   3   8   7    1     1     3     3    

9   2   4   8    1     1     2     3    

2   5   9   2    2     2     3     1

df_1 contain just two columns, df_2 have additional column which contain the bin assigned to column A and B according to the range in which the belong, similarly, df_3 contains columns with values and additional column with bin number assigned.

I want to extract the rows from df_3 such that it only extract data where df_2 columns have bin value "2" for each column respectively in a separate data frame.

The Main problem i am facing is to do it WITHOUT mentioning the column names anywhere in the code.

expected output

df_output_1 (where bin values for column 'A' in df_2 is 2)

A   B   C   D  

5   6   2   6

2   5   9   2

df_output_2 (where bin values for column 'B' in df_2 is 2)

A   B   C   D  

5   6   2   6

6   7   3   1

2   5   9   2

1 Answer

0 votes
by (41.4k points)
edited by

For filtering the data frame, we use merging (right or left) 

    for bin_name in (column_name + "_bin" for column_name in df_1_columns):

      print(bin_name)

      df_3_joined = pd.merge(df_3[df_3_op_columns], df_2[df_2[bin_name] == 2][df_1_columns], how='right', on=df_1_columns, suffixes=['_l', ''])

      print(df_3_joined)

Here is the complete illustration:

import pandas as pd

df_1 = pd.DataFrame(columns = ['A', 'B'])

df_1.loc[len(df_1)] = [5,6]

df_1.loc[len(df_1)] = [8, 1]

df_1.loc[len(df_1)] = [6, 7]

df_1.loc[len(df_1)] = [4, 9]

df_1.loc[len(df_1)] = [1, 3]

df_1.loc[len(df_1)] = [9, 2]

df_1.loc[len(df_1)] = [2, 5]

df_2 = pd.DataFrame(columns = ['A', 'B', 'A_bin', 'B_bin'])

df_2.loc[len(df_2)] = [5, 6, 2, 2]

df_2.loc[len(df_2)] = [8, 1, 1, 1]

df_2.loc[len(df_2)] = [6, 7, 3, 2]

df_2.loc[len(df_2)] = [4, 9, 3, 3]

df_2.loc[len(df_2)] = [1, 3, 1, 1]

df_2.loc[len(df_2)] = [9, 2, 1, 1]

df_2.loc[len(df_2)] = [2, 5, 2, 2]

df_3 = pd.DataFrame(columns = ['A', 'B', 'C', 'D', 'A_bin', 'B_bin', 'C_bin', 'D_bin'])

df_3.loc[len(df_3)] = [5, 6, 2, 6, 2, 2, 1, 2]

df_3.loc[len(df_3)] = [8, 1, 6, 4, 1, 1, 2, 2]

df_3.loc[len(df_3)] = [6, 7, 3, 1, 3, 2, 1, 1]

df_3.loc[len(df_3)] = [4, 9, 1, 9, 3, 3, 1, 3]

df_3.loc[len(df_3)] = [1, 3, 8, 7, 1, 1, 3, 3]

df_3.loc[len(df_3)] = [9, 2, 4, 8, 1, 1, 2, 3]

df_3.loc[len(df_3)] = [2, 5, 9, 2, 2, 2, 3, 1]

results = {}

df_1_columns = list(df_1.columns)

df_3_op_columns = [cname for cname in list(df_3.columns) if not cname.endswith("_bin")]

for bin_name in (column_name + "_bin" for column_name in df_1_columns):

    df_3_joined = pd.merge(df_3[df_3_op_columns], df_2[df_2[bin_name] == 2][df_1_columns], how='right', on=df_1_columns)

    results[bin_name] = df_3_joined

for binName, result in results.iteritems():

    print(binName)

    print(result)

You can retrieve the result if you know the bin names as depicted in the below code:

A_bin_df = results['A_bin']

print(A_bin_df)

B_bin_df = results['B_bin']

print(B_bin_df)

If you are interested in learning Data Science then watch this video:

Browse Categories

...