Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am new to data science and I am working on a data set which has the column named timestamp

df = pd.DataFrame({'timestamp': ['2019-10-22T15:13:36.000Z', '2019-07-16T02:13:36.000Z']})

                  timestamp

0  2019-10-22T15:13:36.000Z

1  2019-07-16T02:13:36.000Z

And I have a second data frame which consists of column name as Start Stop and datetimecolumn. The data frame is as follows:

 L=[['2019-07-16T12:25:07.000Z', '2019-07-16T00:25:07.000000Z', '2019-07-16T13:25:07.000000Z'], ['2019-03-20T09:56:29.000Z','2019-03-19T21:56:29.000000Z','2019-03-20T10:56:29.000000Z'],['2019-08-28T08:26:03.000Z', '2019-08-27T20:26:03.000000Z','2019-08-28T09:26:03.000000Z']]

    cinfo = pd.DataFrame(data=L, columns=['c_time', 'start', 'stop'])

                     c_time                        start  \

0  2019-07-16T12:25:07.000Z  2019-07-16T00:25:07.000000Z   

1  2019-03-20T09:56:29.000Z  2019-03-19T21:56:29.000000Z   

2  2019-08-28T08:26:03.000Z  2019-08-27T20:26:03.000000Z   

                          stop  

0  2019-07-16T13:25:07.000000Z  

1  2019-03-20T10:56:29.000000Z  

2  2019-08-28T09:26:03.000000Z  

Now I want to check the timestamp of my first dataset, if the timestamp is between the start and stop columns of the second dataset then I want to give the c_time value which is in the second dataset. If it doesn't match the stop and start columns then i want to set my c_time to NAN. Consider the example below for better understanding:

    timestamp                c_time

0  2019-10-22T15:13:36.000Z  NaN

1  2019-07-16T02:13:36.000Z  2019-07-16T12:25:07.000Z

To get the results I have used the code as shown below:

import pandas as pd

df = pd.DataFrame({'timestamp': ['2019-10-22T15:13:36.000Z']})

L=[['2019-07-16T12:25:07.000Z', '2019-07-16T00:25:07.000000Z', '2019-07-16T13:25:07.000000Z'], ['2019-03-20T09:56:29.000Z','2019-03-19T21:56:29.000000Z','2019-03-20T10:56:29.000000Z'],['2019-08-28T08:26:03.000Z', '2019-08-27T20:26:03.000000Z','2019-08-28T09:26:03.000000Z']]

cinfo = pd.DataFrame(data=L, columns=['c_time', 'start', 'stop'])

cinfo.start=pd.to_datetime(cinfo.start)

cinfo.stop=pd.to_datetime(cinfo.stop)

cinterval = cinfo.set_index(pd.IntervalIndex.from_arrays(cinfo.start, cinfo.stop, closed='both'))['c_time']

df['c_time'] = pd.to_datetime(df['timestamp']).map(cinterval)

But I am getting an error:

KeyError: 1571757216000000000

Can anyone help me?

1 Answer

0 votes
by (36.8k points)

You can use the code below for getting the desired output:

import pandas as pd

df = pd.DataFrame({'timestamp': ['2019-10-22T15:13:36.000Z']})

L=[['2019-07-16T12:25:07.000Z', '2019-07-16T00:25:07.000000Z', '2019-07-16T13:25:07.000000Z'], ['2019-03-20T09:56:29.000Z','2019-03-19T21:56:29.000000Z','2019-03-20T10:56:29.000000Z'],['2019-08-28T08:26:03.000Z', '2019-08-27T20:26:03.000000Z','2019-08-28T09:26:03.000000Z']]

cinfo = pd.DataFrame(data=L, columns=['c_time', 'start', 'stop'])

cinfo.start=pd.to_datetime(cinfo.start)

cinfo.stop=pd.to_datetime(cinfo.stop)

cinterval = cinfo.set_index(pd.IntervalIndex.from_arrays(cinfo.start, cinfo.stop, closed='both'))['c_time']

df['c_time'] = pd.to_datetime(df['timestamp']).map(cinterval)

print(df)

 Improve your knowledge in data science from scratch by click on the link Data Science

Browse Categories

...