Pandas - Check if Numbers in Column are in row

Question

asked Jul 10, 2019 in Data Science by sourav (17.6k points)

I have a pandas dataframe as follows:

user_id product_id order_number
1 1 1
1 1 2
1 1 3
1 2 1
1 2 5
2 1 1
2 1 3
2 1 4
2 1 5
3 1 1
3 1 2
3 1 6

I wanted to query this df for the longest streak (none order_number is skipped) and last streak (since last order_number).

The ideal result is as follows:

user_id product_id longest_streak last_streak
1 1 3 3
1 2 0 0
2 1 3 3
3 1 2 0

I'd appreciate any insights on this.

1 Answer

Shlok Pandey · Answer 1 · 2019-07-12T06:48:05+0000

By using a loop and defaultdict you will get your desired output:

a = defaultdict(lambda:None)
longest = defaultdict(int)
current = defaultdict(int)
for i, j, k in df.itertuples(index=False):
if a[(i, j)] == k - 1:
current[(i, j)] += 1 if current[(i, j)] else 2
longest[(i, j)] = max(longest[(i, j)], current[(i, j)])
else:
current[(i, j)] = 0
longest[(i, j)] |= 0
a[(i, j)] = k

pd.concat(
[pd.Series(d) for d in [longest, current]],
axis=1, keys=['longest_streak', 'last_streak']
).rename_axis(['user_id', 'product_id']).reset_index()

user_id product_id longest_streak last_streak
0 1 1 3 3
1 1 2 0 0
2 2 1 3 3
3 3 1 2 0

Enroll in Masters in Data Science in USA to enhance your knowledge in Data Science!

Pandas - Check if Numbers in Column are in row

1 Answer

Related questions

Browse Categories