How to generate a new dataframe in python by considering conditions from other dataframes?

Question

asked Jul 27, 2019 in Data Science by sourav (17.6k points)

I am performing data manipulation in python using pandas on a very large dataset, say 100 million rows. I have two dataframes and wish to generate third dataframe as per the conditions mentioned, the scenario is explained below:

dataframe 1:

Col_B and Col_D are of int64 type

Col_A Col_B Col_C Col_D
A 11 B 20
A 11 C 24
B 14 R 32
... ... ... ...

dataframe 2:

Col_Z is of type float64 and remaining columns are of int64

Col_X Col_Y Col_P Col_Q Col_Z
10 15 16 21 0.99
10 15 17 22 0.89
... ... ... ... ...
... ... ... ... ...
11 15 30 35 0.67
... ... ... ... ...

Condition to be applied: Consider only first row of both the dataframe, for the sake of understanding conditions:

if the value of (Col_B is between the value of Col_X and Col_Y) and value of (Col_D is between the value of Col_P and Col_Q) then return the corresponding value of Col_A, Col_C and Col_Z, otherwise return NaN

Expected Output (Dataframe 3):

Col_A Col_C Col_Z
A B 0.99
NaN NaN NaN
B R 0.67

Note: This output is generated merely considering if there are only these three rows in dataframes but in actual each value of Dataframe 1 has to scan all of the values in Dataframe 2 until desired conditions is achieved.

My Code:

df3 = {}
Col_A = []
Col_C = []
Col_Z = []
for i in df1.iterrows():
value = float(df2[(i[1][1] > df2['Col_X'].values) &
(i[1][1] < df2['Col_Y'].values) &
(i[1][3] > df2['Col_P'].values) &
(i[1][3] < df2['Col_Q'].values)]['Col_Z'])
if bool(value):
Col_Z.append(value)
Col_A.append(i[1][0])
Col_C.append(i[1][2])
else:
Col_Z.append(float('NaN'))
Col_A.append(float('NaN'))
Col_C.append(float('NaN'))

This code is working fine uptill the condition is met, as soon as condition does'nt met, it throws a TypeError. Please can any rectify this.

Also, I wanted to know if there is any alternate and efficient way to perform it. Please let me know.

1 Answer

Shlok Pandey · Answer 1 · 2019-08-01T06:26:23+0000

This code will work fine:

df3 = {}
Col_A = []
Col_C = []
Col_Z = []
for i in df1.iterrows():
value = df2[(i[1][1] > df2['Col_X'].values) &
(i[1][1] < df2['Col_Y'].values) &
(i[1][3] > df2['Col_P'].values) &
(i[1][3] < df2['Col_Q'].values)]['Col_Z']
if value.empty:
Col_Z.append(float('NaN'))
Col_A.append(float('NaN'))
Col_C.append(float('NaN'))
else:
Col_Z.append(value)
Col_A.append(i[1][0])
Col_C.append(i[1][2])
df3['A'] = Col_A
df3['C'] = Col_C
df3['Z'] = Col_Z
df3 = pd.DataFrame(df3)

How to generate a new dataframe in python by considering conditions from other dataframes?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources