0 votes
1 view
in Data Science by (17.6k points)

I have the input dataframe:

df1 = pandas.DataFrame( { 

    "Name" : ["Alice", "Bob", "Mallory", "Mallory","Mallory", "Bob" ,"Bob", "Mallory", "Alice"] , 

    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland", "Portland", "Seattle", "Seattle"] } )

And I want to groupby Name, but not unique, so the output should be:

["Alice","Bob","Mallory","Bob","Mallory", "Alice"]

I couldn't find any efficient way to do it - is there a way without iterating all rows?

1 Answer

0 votes
by (38.9k points)

You can use the below line of code:

df1.groupby((df1['Name'] != df1['Name'].shift()).cumsum()).first()

It gives the output as:

              Name      City


1            Alice      Seattle

2            Bob       Seattle

3            Mallory  Portland

4            Bob       Portland

5           Mallory   Seattle

6           Alice       Seattle

And if you want the 'Name' column, then do this:

df1.groupby((df1['Name'] != df1['Name'].shift()).cumsum())['Name'].first().values

Which will give you the output:

['Alice' 'Bob' 'Mallory' 'Bob' 'Mallory' 'Alice']

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !