Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

In Python, I have a data frame with ids and many event dates and type of events. I want to find the first event for each id and create a new column stating that date and another create another column for their first type of event.

  event_id  date    typeofevent

        e0 2015-07-20   1

        e0 2015-07-25   4

        e1 2015-06-02   2

        e1 2015-07-18   1

        e1 2015-07-20   3

        e2 2015-04-25   2

        e2 2015-07-28   2

        e2 2015-06-20   3

        e2 2015-05-25   4

df['first_event'] = df.groupby(['event_id'])['date'].first()

1 Answer

0 votes
by (41.4k points)

Use min() inside a groupby(). It will take the minimum date of each event_id. After that, you can merge them using merge().

Code:

import pandas as pd

df = pd.DataFrame({

    'event_id': ['e0', 'e0', 'e1', 'e1', 'e1', 'e2', 'e2', 'e2', 'e2'],

    'date': [

        '2015-07-20', '2015-07-25', '2015-06-02', '2015-07-18', '2015-07-20',

        '2015-04-25', '2015-07-28', '2015-06-20', '2015-05-25'

    ],

    'typeofevent': [1, 4, 2, 1, 3, 2, 2, 3, 4]})

temp_df = df.groupby('event_id').agg({'date': 'min'}).reset_index()

df = df.merge(temp_df, on='event_id')

df.rename(columns={'date_x': 'date', 'date_y': 'first'}, inplace=True)

Output:

  event_id        date  typeofevent       first

0       e0  2015-07-20            1  2015-07-20

1       e0  2015-07-25            4  2015-07-20

2       e1  2015-06-02            2  2015-06-02

3       e1  2015-07-18            1  2015-06-02

4       e1  2015-07-20            3  2015-06-02

5       e2  2015-04-25            2  2015-04-25

6       e2  2015-07-28            2  2015-04-25

7       e2  2015-06-20            3  2015-04-25

8       e2  2015-05-25            4  2015-04-25

Browse Categories

...