0 votes
1 view
in Data Science by (12.2k points)

I have to analyze the activity of users who uses an application during a given period, periods are start and end timestamp. I tried with a bar chart but I do not know how to include hours in interval. Ex : user with uid=2 use the application at [18, 19, 20, 21]

My dataframe is like:

uid           sex          start                 end

1             0       2000-01-28 16:47:00   2000-01-28 17:47:00

2             1       2000-01-28 18:07:00   2000-01-28 21:47:00

3             1       2000-01-28 18:47:00   2000-01-28 20:17:00

4             0       2000-01-28 08:00:00   2000-01-28 10:00:00

5             1       2000-01-28 02:05:00   2000-01-28 02:30:00

6             0       2000-01-28 15:10:00   2000-01-28 18:04:00

7             0       2000-01-28 01:50:00   2000-01-28 03:00:00

df['hour_s'] = pd.to_datetime(df['start']).apply(lambda x: x.hour)

df['hour_e'] = pd.to_datetime(df['end']).apply(lambda x: x.hour)

uid           sex          start                 end              hour_s      hour_e

1             0       2000-01-28 16:47:00   2000-01-28 17:47:00   16          17

2             1       2000-01-28 18:07:00   2000-01-28 21:47:00   18          21

3             1       2000-01-28 18:47:00   2000-01-28 20:17:00   18          20

4             0       2000-01-28 08:00:00   2000-01-28 10:00:00   08          10

5             1       2000-01-28 02:05:00   2000-01-28 02:30:00   02          02

6             0       2000-01-28 15:10:00   2000-01-28 18:04:00   15          18

7             0       2000-01-28 01:50:00   2000-01-28 03:00:00   01          03

I have to find number of users in a specifc hours

1 Answer

0 votes
by (17.9k points)

From your question, I assumed that you need a histogram showing user amount (freqeuncy) pivoted by hour of day. For that, you can use the following code:

#! /usr/bin/python3

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

# Read the data

df=pd.read_csv("data.csv")

# Get all hours per user (per observation)

def sum_hours(obs):

    return(list(range(obs['hour_s'],obs['hour_e']+1,1)))

# Get all existing activity hours (No matter which user)

Hours2D=list(df.apply(sum_hours,axis=1))

# Get all existing hours

HoursFlat=[hour for sublist in Hours2D for hour in sublist]

plt.hist(HoursFlat,rwidth=0.5,range=(0,24))

plt.xticks(np.arange(0,24, 1.0))

plt.xlabel('Hour of day')

plt.ylabel('Users')

plt.show()

Here, data.csv is the sample provided:

uid, sex,start,end,hour_s,hour_e

1,0,2000-01-28 16:47:00,2000-01-28 17:47:00,16,17

2,1,2000-01-28 18:07:00,2000-01-28 21:47:00,18,21

3,1,2000-01-28 18:47:00,2000-01-28 20:17:00,18,20

4,0,2000-01-28 08:00:00,2000-01-28 10:00:00,08,10

5,1,2000-01-28 02:05:00,2000-01-28 02:30:00,02,02

6,0,2000-01-28 15:10:00,2000-01-28 18:04:00,15,18

7,0,2000-01-28 01:50:00,2000-01-28 03:00:00,01,03

...