Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I'm a beginner in Python Data Science. I'm working on clickstream data and trying to count the consecutive clicks on an item in a given session. I'm getting the cumulative sum in 'Block' column. After that I'm aggregating on Block to get the count on each block. In the end I want to groupby Session and Item and aggregate the block count since there may be cases(Sid=6 here) where an item comes consecutively m times at first and again after other items, it comes consecutively n times. So the consecutive count should be 'm+n'.

Here is the dataset-

    Sid                    Tstamp     Itemid

0     1  2014-04-07T10:51:09.277Z  214536502

1     1  2014-04-07T10:54:09.868Z  214536500

2     1  2014-04-07T10:54:46.998Z  214536506

3     1  2014-04-07T10:57:00.306Z  214577561

4     2  2014-04-07T13:56:37.614Z  214662742

5     2  2014-04-07T13:57:19.373Z  214662742

6     2  2014-04-07T13:58:37.446Z  214825110

7     2  2014-04-07T13:59:50.710Z  214757390

8     2  2014-04-07T14:00:38.247Z  214757407

9     2  2014-04-07T14:02:36.889Z  214551617

10    3  2014-04-02T13:17:46.940Z  214716935

11    3  2014-04-02T13:26:02.515Z  214774687

12    3  2014-04-02T13:30:12.318Z  214832672

13    4  2014-04-07T12:09:10.948Z  214836765

14    4  2014-04-07T12:26:25.416Z  214706482

15    6  2014-04-03T10:44:35.672Z  214821275

16    6  2014-04-03T10:45:01.674Z  214821275

17    6  2014-04-03T10:45:29.873Z  214821371

18    6  2014-04-03T10:46:12.162Z  214821371

19    6  2014-04-03T10:46:57.355Z  214821371

20    6  2014-04-03T10:53:22.572Z  214717089

21    6  2014-04-03T10:53:49.875Z  214563337

22    6  2014-04-03T10:55:19.267Z  214706462

23    6  2014-04-03T10:55:47.327Z  214821371

24    6  2014-04-03T10:56:30.520Z  214821371

25    6  2014-04-03T10:57:19.331Z  214821371

26    6  2014-04-03T10:57:39.433Z  214819762

Here is my code-

k['Block'] =( k['Itemid'] != k['Itemid'].shift(1) ).astype(int).cumsum()

y=k.groupby('Block').count()

z=k.groupby(['Sid','Itemid']).agg({"y[Count]": lambda x: x.sum()})

1 Answer

0 votes
by (41.4k points)

Using this will give you the desired output:

k.groupby(['Sid', 'Itemid']).Block.count()

Sid  Itemid   

1    214536500    1

     214536502    1

     214536506    1

     214577561    1

2    214551617    1

     214662742    2

     214757390    1

     214757407    1

     214825110    1

3    214716935    1

     214774687    1

     214832672    1

4    214706482    1

     214836765    1

6    214563337    1

     214706462    1

     214717089    1

     214819762    1

     214821275    2

     214821371    6

Name: Block, dtype: int64

Browse Categories

...