Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in SQL by (6.1k points)

I'm actually doing a cohort analysis and can get the group of users to examine, after that they transacted in the months following on. But I need the output as:

Of that group in December, who transacted in Jan; of the Jan group from Dec, who transacted in Feb. Basically I'm tracking decay of the customer base

I do not get in return any month following Dec, which is as follows: 

WITH start_sample AS (

SELECT

  user_fk,

  created_at AS start_sample_date

  FROM transactions

    WHERE created_at >= '2016-11-01' AND created_at < '2016-12-01'

      GROUP BY user_fk,

        start_sample_date),

start_sample_min AS (

SELECT

  user_fk,

  MIN(start_sample_date) AS first_transaction

    FROM start_sample

      GROUP BY user_fk

  )

SELECT

  DATE_TRUNC('month', created_at) AS transacting_month,

  COUNT(DISTINCT user_fk)

    FROM transactions

        WHERE created_at >= '2016-11-01'

        AND t.user_fk IN(SELECT user_fk FROM start_sample_min)

          GROUP BY transacting_month

            ORDER BY transacting_month;

Then I made a churn model to see if it would get what I need, but it doesn't:

WITH monthly_users AS (

    SELECT

      user_fk AS monthly_user_fk,

      DATE_TRUNC('month', created_at) AS month

        FROM transactions

          WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'

            GROUP BY monthly_user_fk, month

            ORDER BY monthly_user_fk, month

),

lag_lead AS (

  SELECT

    monthly_user_fk,

    month,

    LAG(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lag,

    LEAD(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lead

      FROM monthly_users),

lag_lead_with_diffs AS (

  SELECT

    monthly_user_fk,

    month,

    lag AS previous_month,

    lead AS next_month,

    EXTRACT(EPOCH FROM (month - lag)/86400)::INT AS lag_size,

    EXTRACT(EPOCH FROM (lead - month)/86400)::INT AS lead_size

      FROM lag_lead

  ),

calculated AS (

      SELECT

      month,

      CASE WHEN previous_month IS NULL THEN 'ACTIVATION'

          WHEN lag_size <= 31 THEN 'ACTIVE'

          WHEN lag_size > 31 THEN 'RETURN' END AS this_month_values,

      CASE WHEN (lead_size > 31 OR lead_size IS NULL) THEN 'CHURN' ELSE NULL END AS next_month_churn,

      COUNT(DISTINCT monthly_user_fk) AS c_d_users

   FROM lag_lead_with_diffs

  GROUP BY month, 2, 3

)

SELECT

  month,

  this_month_values,

  SUM(c_d_users) AS distinct_users

  FROM calculated

  GROUP BY month, this_month_values

UNION

SELECT month + INTERVAL '1 month',

  'CHURN',

  SUM(c_d_users)

  FROM calculated

    WHERE next_month_churn IS NOT NULL

      GROUP BY month + INTERVAL '1 month', 2

        HAVING (EXTRACT(EPOCH FROM (month + INTERVAL '1 month'))) < 1512086400

          ORDER BY month, this_month_values;

However, this is not fixed in the initial group. The Active group rolls from month to month.

Yes, it is complicated but I need help with this.

1 Answer

0 votes
by (11.7k points)

I think you need the following solution:

with Monthly_Users as (

select user_fk

     , date_trunc('month',created_at) as month

     , (date_part('year', created_at) - 2016) * 12

     + date_part('month', created_at) - 11 as Months_Between

  from transactions

 where created_at between date '2016-11-01'

                      and date '2017-12-01'

 group by user_fk, month, months_between

), t2 as (

select Monthly_Users.*

     , count(*) over (partition by user_fk

                          order by month rows between unbounded preceding

                                                  and 1 preceding) prev_rec_cnt

  from Monthly_Users

)

select month

     , count(*)

  from t2

 where Months_Between = Prev_Rec_Cnt

 group by month

 order by month;

In the above query, Monthly_Users CTE is not the same as yours but adds a computation of the number of Months_Between the created_at date and your initial starting date. In the other Common Table Expression, I am counting the number of occurrences of each user_fk prior to the current month's record. Finally, in the output query, I limit the results to only those records where the Months_Between value matches the Prev_Rec_Cnt value. Any missed months will cause the Prev_Rec_Cnt value to not match the Months_Between value, so you'll be able to see the fall off of user_fk values from month to month.

I hope this will help you.

If you want to get more insights into SQL, check out this SQL Course from Intellipaat.

Related questions

0 votes
1 answer
0 votes
4 answers
0 votes
1 answer
0 votes
1 answer

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...