Python Pandas: np.searchsorted() returns wrong index value

Question

asked Jun 24, 2020 in Data Science by blackindya (18.4k points)

I have two data frames. One data frame (df1) has 75799 rows and another data frame (df2) has 13715 rows.

df1

OBJECTID ENABLED DATECREATED DATEMODIFIED OWNER STATUS ACCURACY INSTALLATIONDATE

2 1 23/08/2001 0:00:00 3/04/2020 0:00:00 AUR In Service Unknown 26/09/1969 0:00:00
8 1 23/08/2001 0:00:00 3/04/2020 0:00:00 AUR In Service Unknown 23/08/1989 0:00:00
12 1 23/08/2001 0:00:00 3/04/2020 0:00:00 AUR In Service Unknown 13/04/1971 0:00:00
19 1 23/08/2001 0:00:00 3/04/2020 0:00:00 AUR In Service Unknown 22/03/1976 0:00:00

df2:

OBJECTID FID_OHElectricLineSegment_2k ENABLED DATECREATED DATEMODIFIED OWNER
1 19 1 23/08/2001 0:00:00 3/04/2020 0:00:00 AUR
2 41 1 23/08/2001 0:00:00 2/04/2020 0:00:00 AUR
3 98 1 23/08/2001 0:00:00 3/04/2020 0:00:00 CONS
4 167 1 23/08/2001 0:00:00 3/04/2020 0:00:00 CONS

I am comparing OBJECTID of df1 with FID_OHElectricLineSegment_2k of df2 and create a new column in df1 'zone' column and insert a value of 1 if both columns have the same value. Here is how I am doing:

df1.loc[np.searchsorted(df1['OBJECTID'].values,df2['FID_OHElectricLineSegment_2k'].values),'Zone']=1
however, it returns an error that
KeyError: '[75799] not in index'

I can understand that df1 has total rows of 75799 (0 to 75798), However, I could not understand how come np.searchsorted() returns the index value which does not exist.

1 Answer

supriya · Answer 1 · 2020-06-24T03:23:03+0000

Change loc to iloc since np.searchsorted return the position

idx=np.searchsorted(df1['OBJECTID'].values,df2['FID_OHElectricLineSegment_2k'].values)
idx=np.clip(idx,a_max=len(df)-1,a_min=0)
df1.iloc[idx,'Zone']=1

Do check out Data Science with Python course which helps you understand from scratch

Python Pandas: np.searchsorted() returns wrong index value

1 Answer

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources