Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have two data frames. One data frame (df1) has 75799 rows and another data frame (df2) has 13715 rows.

df1

OBJECTID    ENABLED DATECREATED        DATEMODIFIED        OWNER    STATUS  ACCURACY  INSTALLATIONDATE

 

      2            1    23/08/2001 0:00:00  3/04/2020 0:00:00   AUR In Service  Unknown     26/09/1969 0:00:00

      8            1    23/08/2001 0:00:00  3/04/2020 0:00:00   AUR In Service  Unknown     23/08/1989 0:00:00

      12           1    23/08/2001 0:00:00  3/04/2020 0:00:00   AUR In Service  Unknown     13/04/1971 0:00:00      

      19           1    23/08/2001 0:00:00  3/04/2020 0:00:00   AUR In Service  Unknown     22/03/1976 0:00:00

df2:

OBJECTID    FID_OHElectricLineSegment_2k    ENABLED   DATECREATED         DATEMODIFIED    OWNER 

1                  19                        1       23/08/2001 0:00:00 3/04/2020 0:00:00   AUR

2                  41                        1       23/08/2001 0:00:00 2/04/2020 0:00:00   AUR

3                  98                        1       23/08/2001 0:00:00 3/04/2020 0:00:00   CONS

4                  167                       1       23/08/2001 0:00:00 3/04/2020 0:00:00   CONS

I am comparing OBJECTID of df1 with FID_OHElectricLineSegment_2k of df2 and create a new column in df1 'zone' column and insert a value of 1 if both columns have the same value. Here is how I am doing:

df1.loc[np.searchsorted(df1['OBJECTID'].values,df2['FID_OHElectricLineSegment_2k'].values),'Zone']=1

however, it returns an error that

KeyError: '[75799] not in index'

I can understand that df1 has total rows of 75799 (0 to 75798), However, I could not understand how come np.searchsorted() returns the index value which does not exist. 

1 Answer

0 votes
by (36.8k points)

Change loc to iloc since np.searchsorted return the position

idx=np.searchsorted(df1['OBJECTID'].values,df2['FID_OHElectricLineSegment_2k'].values)

idx=np.clip(idx,a_max=len(df)-1,a_min=0)    

df1.iloc[idx,'Zone']=1

 Do check out Data Science with Python course which helps you understand from scratch 

31k questions

32.9k answers

507 comments

693 users

...