Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Python by (19.9k points)

I have the following dataframe, df

Index   time   block   cell

 0       9      25      c1

 1       9      25      c1

 2       33     35      c2

 3       47     4       c1

 4       47     17      c2

 5       100    21      c1

 6       120    21      c1

 7       120    36      c2

The duplicates are to be dropped based on time column. However, there is a condition: - if two or more similar times have the same cells, for example, index 0 and index 1 have c1 then keep any of the columns. - if two or more similar times have different cells eg index 3 and 4 and index 6 and 7 then keep all the rows corresponding to duplicate times

The resulting data frame will be as follows: df_result =

Index   time   block   cell

 0       9      25      c1

 2       33     35      c2

 3       47     4       c1

 4       47     17      c2

 5       100    21      c1

 6       120    21      c1

 7       120    36      c2

Tried  df.drop_duplicates('time')

1 Answer

0 votes
by (25.1k points)

You can group by one of the desired columns, then drop the duplicates on the other column as follows:

df = pd.DataFrame({'time':[9,9,33,47,47,100,120,120],'block':[25,25,35,4,17,21,21,36],'cell': ['c1','c1','c2','c1','c2','c1','c1','c2']})

grouped = df.groupby('time')

final_df = pd.DataFrame({'time':[] ,'block':[],'cell':[]})

for ind, gr in grouped:

    final_df = final_df.append(gr.drop_duplicates("cell"))

Browse Categories

...