Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (17.6k points)

I have online user data with the following information,


USER Timestamp  day_of_week Busi_days   Busi_hours

AAD 2017-07-11 09:31:44 TRUE    TRUE

AAD 2017-07-11 23:24:43 TRUE    FALSE

AAD 2017-07-12 13:24:43 TRUE    TRUE

SAP 2017-07-23 14:24:34 FALSE   FALSE

SAP 2017-07-24 16:58:49 TRUE    TRUE

YAS 2017-07-31 21:10:35 TRUE    FALSE

I wanted to compute the activity of USER column and create three new columns namely: 1. Activity: With the information based on how active the user is, meaning if the same user clicked more than twice then call it TRUE else false. 2.Multiple_days: If the user clicked the website on the more one day, if the same user clicked more than 2 days call the column TRUE else FALSE. 3. Busniess_days: whether the user clicked on the weekdays, if the user clicked the website on businesses days within business hours then call it True else FALSE

I have the following script performing the above-mentioned task, but its really slow for my huge data frame my data frame is 117Mb in size. Any better solutions would be great

My try:

df.Timestamp = pd.to_datetime(df.Timestamp)

df['date'] = [ for x in df.Timestamp]

target_df = pd.DataFrame()

target_df['USER'] = df.USER.unique()

a = df.groupby(['USER', 'date']).size()

a = a[a>1]


target_df['Active'] = [True if x in UID else False for x in target_df.USER.values]

a = df.groupby('USER')['Timestamp'].nunique()

a = a[a>1]


target_df['Multiple_days'] = [True if x in UUID2 else False for x in target_df.USER.values]

a = df[(df.Busi_days==True)&(df.Busi_hours==True)].USER.unique()

target_df['Busi_weekday'] = [True if x in a else False for x in target_df.USER.values]


USER Active  Multiple_days   Busi_weekday




1 Answer

0 votes
by (41.4k points)

Use this below code which uses custom functi



int (df1)

  USER   Timestamp day_of_week  Busi_days  Busi_hours

0  AAD  2017-07-11    09:31:44       True        True

1  AAD  2017-07-11    23:24:43       True       False

2  AAD  2017-07-12    13:24:43       True        True

3  SAP  2017-07-23    14:24:34      False       False

4  SAP  2017-07-24    16:58:49       True        True

5  YAS  2017-07-31    21:10:35       True       False

def func(df, time_col, user_col):

    df[time_col] = pd.to_datetime(df[time_col])

    df['date'] = df[time_col].dt.floor('d')

    u = df.USER.unique()

    a = df.groupby([user_col, 'date']).size().reset_index(level=1, drop=True)

    a = a[a>1]

    target_df = (a[~a.index.duplicated()]

                    .astype(bool).reindex(u, fill_value=False).to_frame(name='Active'))

    a = df.groupby(user_col)[time_col].nunique()

    target_df['Multiple_days'] = a[a>1].astype(bool).reindex(u, fill_value=False)

    a = df.loc[(df.Busi_days==True)&(df.Busi_hours==True), user_col].unique()

    target_df['Busi_weekday'] = target_df.index.isin(a)

    return target_df

#inputs are name of DataFrame, column for timestamp and column for user    

print (func(df1, 'Timestamp', 'USER'))

      Active  Multiple_days  Busi_weekday


AAD     True           True          True

SAP    False           True          True

YAS    False          False         False

If you wish to learn Data Science then visit this Data Science Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers


94k users

Browse Categories