Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have a simple question I cant seem to find a strait answer for.

Suppose I have a data frame with date, open, high, low, close and volume.

What I am trying to do is first find the current date which I can do with:

today = pd.datetime.today().date()

My issue comes in selecting the last 20 days of data from the current date.

I need to select the last 20 rows because I need to then find the highest and lowest values in the close colum of this dataset.

Any pointers would help a lot. Iv searched google for a while and keep finding different answers.

Thanks!

2 Answers

0 votes
by (41.4k points)

Use df[-20:] for getting the last 20 lines from a DataFrame

If you want to have the date 20 days ago, use  pd.Timedelta(-19, unit='d') + pd.datetime.today().date().

In [1]: index = pd.date_range(start=(pd.Timedelta(-30, unit='d')+pd.datetime.today().date()), periods=31)

In [2]: df = pd.DataFrame(np.random.rand(31, 4), index=index, columns=['O', 'H', 'L', 'C'])

In [3]: df = df.reset_index().rename(columns={'index': 'Date'})

In [4]: df

Out[4]:

         Date         O         H         L         C

0  2017-08-28  0.616856  0.518961  0.378005  0.716371

1  2017-08-29  0.300977  0.652217  0.713013  0.842369

2  2017-08-30  0.875668  0.232998  0.566047  0.969647

3  2017-08-31  0.273934  0.086575  0.386617  0.390749

4  2017-09-01  0.667561  0.336419  0.648809  0.619215

5  2017-09-02  0.988234  0.563675  0.402908  0.671333

6  2017-09-03  0.111710  0.549302  0.321546  0.201828

7  2017-09-04  0.469041  0.736152  0.345069  0.336593

8  2017-09-05  0.674844  0.276839  0.350289  0.862777

9  2017-09-06  0.128124  0.968918  0.713846  0.415061

10 2017-09-07  0.920488  0.252980  0.573531  0.270999

11 2017-09-08  0.113368  0.781649  0.190273  0.758834

12 2017-09-09  0.414453  0.545572  0.761805  0.586717

13 2017-09-10  0.348459  0.830177  0.779591  0.783887

14 2017-09-11  0.571877  0.230465  0.262744  0.360188

15 2017-09-12  0.844286  0.821388  0.312319  0.473672

16 2017-09-13  0.605548  0.570590  0.457141  0.882498

17 2017-09-14  0.242154  0.066617  0.028913  0.969698

18 2017-09-15  0.725521  0.742362  0.904866  0.890942

19 2017-09-16  0.460858  0.749581  0.429131  0.723394

20 2017-09-17  0.767445  0.452113  0.906294  0.978368

21 2017-09-18  0.342970  0.702579  0.029031  0.743489

22 2017-09-19  0.221478  0.339948  0.403478  0.349097

23 2017-09-20  0.147785  0.633542  0.692545  0.194496

24 2017-09-21  0.656189  0.419257  0.099094  0.708530

25 2017-09-22  0.329901  0.087101  0.683207  0.558431

26 2017-09-23  0.902550  0.155262  0.304506  0.756210

27 2017-09-24  0.072132  0.045242  0.058175  0.755649

28 2017-09-25  0.149873  0.340870  0.198454  0.725051

29 2017-09-26  0.972721  0.505842  0.886602  0.231916

30 2017-09-27  0.511109  0.990975  0.330336  0.898291

In [5]: df[-20:]

Out[5]:

         Date         O         H         L         C

11 2017-09-08  0.113368  0.781649  0.190273  0.758834

12 2017-09-09  0.414453  0.545572  0.761805  0.586717

13 2017-09-10  0.348459  0.830177  0.779591  0.783887

14 2017-09-11  0.571877  0.230465  0.262744  0.360188

15 2017-09-12  0.844286  0.821388  0.312319  0.473672

16 2017-09-13  0.605548  0.570590  0.457141  0.882498

17 2017-09-14  0.242154  0.066617  0.028913  0.969698

18 2017-09-15  0.725521  0.742362  0.904866  0.890942

19 2017-09-16  0.460858  0.749581  0.429131  0.723394

20 2017-09-17  0.767445  0.452113  0.906294  0.978368

21 2017-09-18  0.342970  0.702579  0.029031  0.743489

22 2017-09-19  0.221478  0.339948  0.403478  0.349097

23 2017-09-20  0.147785  0.633542  0.692545  0.194496

24 2017-09-21  0.656189  0.419257  0.099094  0.708530

25 2017-09-22  0.329901  0.087101  0.683207  0.558431

26 2017-09-23  0.902550  0.155262  0.304506  0.756210

27 2017-09-24  0.072132  0.045242  0.058175  0.755649

28 2017-09-25  0.149873  0.340870  0.198454  0.725051

29 2017-09-26  0.972721  0.505842  0.886602  0.231916

30 2017-09-27  0.511109  0.990975  0.330336  0.898291

In [6]: df[df.Date.isin(pd.date_range(pd.Timedelta(-19, unit='d')+pd.datetime.today().date(), periods=20))]

Out[6]:

         Date         O         H         L         C

11 2017-09-08  0.113368  0.781649  0.190273  0.758834

12 2017-09-09  0.414453  0.545572  0.761805  0.586717

13 2017-09-10  0.348459  0.830177  0.779591  0.783887

14 2017-09-11  0.571877  0.230465  0.262744  0.360188

15 2017-09-12  0.844286  0.821388  0.312319  0.473672

16 2017-09-13  0.605548  0.570590  0.457141  0.882498

17 2017-09-14  0.242154  0.066617  0.028913  0.969698

18 2017-09-15  0.725521  0.742362  0.904866  0.890942

19 2017-09-16  0.460858  0.749581  0.429131  0.723394

20 2017-09-17  0.767445  0.452113  0.906294  0.978368

21 2017-09-18  0.342970  0.702579  0.029031  0.743489

22 2017-09-19  0.221478  0.339948  0.403478  0.349097

23 2017-09-20  0.147785  0.633542  0.692545  0.194496

24 2017-09-21  0.656189  0.419257  0.099094  0.708530

25 2017-09-22  0.329901  0.087101  0.683207  0.558431

26 2017-09-23  0.902550  0.155262  0.304506  0.756210

27 2017-09-24  0.072132  0.045242  0.058175  0.755649

28 2017-09-25  0.149873  0.340870  0.198454  0.725051

29 2017-09-26  0.972721  0.505842  0.886602  0.231916

30 2017-09-27  0.511109  0.990975  0.330336  0.898291

0 votes
by (36.8k points)

I have setted up the code this way:

today = pd.datetime.today().date()

df = pd.DataFrame(

    np.random.rand(20, 4),

    pd.date_range(end=today, periods=20, freq='3D'),

    columns=['O', 'H', 'L', 'C'])

df

                   O         H         L         C

2017-08-01  0.821996  0.894122  0.829814  0.429701

2017-08-04  0.883512  0.668642  0.524440  0.914845

2017-08-07  0.035753  0.231787  0.421547  0.163865

2017-08-10  0.742781  0.293591  0.874033  0.054421

2017-08-13  0.252422  0.632991  0.547044  0.650622

2017-08-16  0.316752  0.190016  0.504701  0.827450

2017-08-19  0.777069  0.533121  0.329742  0.603473

2017-08-22  0.843260  0.546845  0.600270  0.060620

2017-08-25  0.834180  0.395653  0.189499  0.820043

2017-08-28  0.806369  0.850968  0.753335  0.902687

2017-08-31  0.336096  0.145325  0.876519  0.114923

2017-09-03  0.590195  0.946520  0.009151  0.832992

2017-09-06  0.901101  0.616852  0.375829  0.332625

2017-09-09  0.537892  0.852527  0.082807  0.966297

2017-09-12  0.104929  0.803415  0.345942  0.245934

2017-09-15  0.085703  0.743497  0.256762  0.530267

2017-09-18  0.823960  0.397983  0.173706  0.091678

2017-09-21  0.211412  0.980942  0.833802  0.763510

2017-09-24  0.312950  0.850760  0.913519  0.846466

2017-09-27  0.921168  0.568595  0.460656  0.016313

To solve it:

I am using the DateTime index slicing. This is a simple method if we use the panda's devs for this problem.

df[today - pd.offsets.Day(20):]

                   O         H         L         C

2017-09-09  0.537892  0.852527  0.082807  0.966297

2017-09-12  0.104929  0.803415  0.345942  0.245934

2017-09-15  0.085703  0.743497  0.256762  0.530267

2017-09-18  0.823960  0.397983  0.173706  0.091678

2017-09-21  0.211412  0.980942  0.833802  0.763510

2017-09-24  0.312950  0.850760  0.913519  0.846466

2017-09-27  0.921168  0.568595  0.460656  0.016313

Note: The code used will grab all the rows which are present within the last 20days. 

If you are a beginner and want to know more about Data Science the do check out the Data Science course

Browse Categories

...