Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am a beginner to Data Science and I am working on 2 datasets which consists of A and B column

       A                          B

0  2019-03-13 08:12:20  2019-03-13 08:12:25

1  2019-03-15 10:02:18  2019-03-13 10:02:20

I am trying to generate a range of seconds between the A column and the B column. As an output I need to get this:

                    A

0 2019-03-13 08:12:20

1 2019-03-13 08:12:21

2 2019-03-13 08:12:22

3 2019-03-13 08:12:23

4 2019-03-13 08:12:24

5 2019-03-13 08:12:25

I achieved it by giving the code below but it takes a lot of time since there are around 1M rows. Can anyone suggest to me which is the best way to do it?

import pandas as pd, numpy as np

df=pd.DataFrame({'A': ["2019-03-13 08:12:20", "2019-03-15 10:02:18"], 'B': ["2019-03-13 08:12:25", "2019-03-13 10:02:20"]})

l=[pd.date_range(start=df.iloc[i]['A'], end=df.iloc[i]['B'], freq='S') for i in range(len(df))]

df1=(pd.DataFrame(l).T)[0]

print(df1)

1 Answer

0 votes
by (36.8k points)

I have used a loop and got a solution for it, check it out:

l = [x for a, b in zip(df.A, df.B) for x in pd.date_range(a, b, freq='S')]

df1= pd.DataFrame({'A':l})

print(df1)

         A

0 2019-03-13 08:12:20

1 2019-03-13 08:12:21

2 2019-03-13 08:12:22

3 2019-03-13 08:12:23

4 2019-03-13 08:12:24

5 2019-03-13 08:12:25

There is another way to approach the problem as shown below:

df1 = (pd.concat([pd.Series(pd.date_range(r.A, r.B, freq='S')) for r in df.itertuples()])

         .to_frame('A'))

print (df1)

                    A

0 2019-03-13 08:12:20

1 2019-03-13 08:12:21

2 2019-03-13 08:12:22

3 2019-03-13 08:12:23

4 2019-03-13 08:12:24

5 2019-03-13 08:12:25

If you are a beginner and want to know more about Data Science the do check out the Data Science course

Browse Categories

...