Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in SQL by (20.3k points)

The following User History table contains one record for every day a given user has accessed a website (in a 24 hour UTC period). It has many thousands of records, but only one record per day per user. If the user has not accessed the website for that day, no record will be generated.

Id      UserId   CreationDate

------  ------   ------------

750997      12   2009-07-07 18:42:20.723

750998      15   2009-07-07 18:42:20.927

751000      19   2009-07-07 18:42:22.283

What I'm looking for is a SQL query on this table with good performance, that tells me which userids have accessed the website for (n) continuous days without missing a day.

In other words, how many users have (n) records in this table with sequential (day-before, or day-after) dates? If any day is missing from the sequence, the sequence is broken and should restart again at 1; we're looking for users who have achieved a continuous number of days here with no gaps.

Any resemblance between this query and a particular Stack Overflow badge is purely coincidental, of course.. :)

1 Answer

0 votes
by (40.7k points)

Use the below code:

WITH numberedrows

     AS (SELECT ROW_NUMBER() OVER (PARTITION BY UserID 

                                       ORDER BY CreationDate)

                - DATEDIFF(day,'19000101',CreationDate) AS TheOffset,

                CreationDate,

                UserID

         FROM   tablename)

SELECT MIN(CreationDate),

       MAX(CreationDate),

       COUNT(*) AS NumConsecutiveDays,

       UserID

FROM   numberedrows

GROUP  BY UserID,

          TheOffset

Note: Make sure the previous statement is ending with a semi-colon.  

If you have the list of the days (as a number), and the row_number, then missed days will make the offset between these two lists slightly bigger. 

Therefore, you should look for the range that has a consistent offset.

You can also use "ORDER BY NumConsecutiveDays DESC" at the end of this, or for a threshold you can use "HAVING count(*) > 14".

It will work in SQL2005 and above and will be helped by an index on tablename(UserID, CreationDate).

Note: As Offset is a reserved word, so it's recommended to use TheOffset.

Related questions

Browse Categories

...