Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (17.6k points)

I have a simple question I cant seem to find a strait answer for.

Suppose I have a data frame with date, open, high, low, close and volume.

What I am trying to do is first find the current date which I can do with:

today = pd.datetime.today().date()

My issue comes in selecting the last 20 days of data from the current date.

I need to select the last 20 rows because I need to then find the highest and lowest values in the close colum of this dataset.

Any pointers would help a lot. Iv searched google for a while and keep finding different answers.

Thanks!

2 Answers

0 votes
by (41.4k points)

Use df[-20:] for getting the last 20 lines from a DataFrame

If you want to have the date 20 days ago, use  pd.Timedelta(-19, unit='d') + pd.datetime.today().date().

In [1]: index = pd.date_range(start=(pd.Timedelta(-30, unit='d')+pd.datetime.today().date()), periods=31)

In [2]: df = pd.DataFrame(np.random.rand(31, 4), index=index, columns=['O', 'H', 'L', 'C'])

In [3]: df = df.reset_index().rename(columns={'index': 'Date'})

In [4]: df

Out[4]:

         Date         O         H         L         C

0  2017-08-28  0.616856  0.518961  0.378005  0.716371

1  2017-08-29  0.300977  0.652217  0.713013  0.842369

2  2017-08-30  0.875668  0.232998  0.566047  0.969647

3  2017-08-31  0.273934  0.086575  0.386617  0.390749

4  2017-09-01  0.667561  0.336419  0.648809  0.619215

5  2017-09-02  0.988234  0.563675  0.402908  0.671333

6  2017-09-03  0.111710  0.549302  0.321546  0.201828

7  2017-09-04  0.469041  0.736152  0.345069  0.336593

8  2017-09-05  0.674844  0.276839  0.350289  0.862777

9  2017-09-06  0.128124  0.968918  0.713846  0.415061

10 2017-09-07  0.920488  0.252980  0.573531  0.270999

11 2017-09-08  0.113368  0.781649  0.190273  0.758834

12 2017-09-09  0.414453  0.545572  0.761805  0.586717

13 2017-09-10  0.348459  0.830177  0.779591  0.783887

14 2017-09-11  0.571877  0.230465  0.262744  0.360188

15 2017-09-12  0.844286  0.821388  0.312319  0.473672

16 2017-09-13  0.605548  0.570590  0.457141  0.882498

17 2017-09-14  0.242154  0.066617  0.028913  0.969698

18 2017-09-15  0.725521  0.742362  0.904866  0.890942

19 2017-09-16  0.460858  0.749581  0.429131  0.723394

20 2017-09-17  0.767445  0.452113  0.906294  0.978368

21 2017-09-18  0.342970  0.702579  0.029031  0.743489

22 2017-09-19  0.221478  0.339948  0.403478  0.349097

23 2017-09-20  0.147785  0.633542  0.692545  0.194496

24 2017-09-21  0.656189  0.419257  0.099094  0.708530

25 2017-09-22  0.329901  0.087101  0.683207  0.558431

26 2017-09-23  0.902550  0.155262  0.304506  0.756210

27 2017-09-24  0.072132  0.045242  0.058175  0.755649

28 2017-09-25  0.149873  0.340870  0.198454  0.725051

29 2017-09-26  0.972721  0.505842  0.886602  0.231916

30 2017-09-27  0.511109  0.990975  0.330336  0.898291

In [5]: df[-20:]

Out[5]:

         Date         O         H         L         C

11 2017-09-08  0.113368  0.781649  0.190273  0.758834

12 2017-09-09  0.414453  0.545572  0.761805  0.586717

13 2017-09-10  0.348459  0.830177  0.779591  0.783887

14 2017-09-11  0.571877  0.230465  0.262744  0.360188

15 2017-09-12  0.844286  0.821388  0.312319  0.473672

16 2017-09-13  0.605548  0.570590  0.457141  0.882498

17 2017-09-14  0.242154  0.066617  0.028913  0.969698

18 2017-09-15  0.725521  0.742362  0.904866  0.890942

19 2017-09-16  0.460858  0.749581  0.429131  0.723394

20 2017-09-17  0.767445  0.452113  0.906294  0.978368

21 2017-09-18  0.342970  0.702579  0.029031  0.743489

22 2017-09-19  0.221478  0.339948  0.403478  0.349097

23 2017-09-20  0.147785  0.633542  0.692545  0.194496

24 2017-09-21  0.656189  0.419257  0.099094  0.708530

25 2017-09-22  0.329901  0.087101  0.683207  0.558431

26 2017-09-23  0.902550  0.155262  0.304506  0.756210

27 2017-09-24  0.072132  0.045242  0.058175  0.755649

28 2017-09-25  0.149873  0.340870  0.198454  0.725051

29 2017-09-26  0.972721  0.505842  0.886602  0.231916

30 2017-09-27  0.511109  0.990975  0.330336  0.898291

In [6]: df[df.Date.isin(pd.date_range(pd.Timedelta(-19, unit='d')+pd.datetime.today().date(), periods=20))]

Out[6]:

         Date         O         H         L         C

11 2017-09-08  0.113368  0.781649  0.190273  0.758834

12 2017-09-09  0.414453  0.545572  0.761805  0.586717

13 2017-09-10  0.348459  0.830177  0.779591  0.783887

14 2017-09-11  0.571877  0.230465  0.262744  0.360188

15 2017-09-12  0.844286  0.821388  0.312319  0.473672

16 2017-09-13  0.605548  0.570590  0.457141  0.882498

17 2017-09-14  0.242154  0.066617  0.028913  0.969698

18 2017-09-15  0.725521  0.742362  0.904866  0.890942

19 2017-09-16  0.460858  0.749581  0.429131  0.723394

20 2017-09-17  0.767445  0.452113  0.906294  0.978368

21 2017-09-18  0.342970  0.702579  0.029031  0.743489

22 2017-09-19  0.221478  0.339948  0.403478  0.349097

23 2017-09-20  0.147785  0.633542  0.692545  0.194496

24 2017-09-21  0.656189  0.419257  0.099094  0.708530

25 2017-09-22  0.329901  0.087101  0.683207  0.558431

26 2017-09-23  0.902550  0.155262  0.304506  0.756210

27 2017-09-24  0.072132  0.045242  0.058175  0.755649

28 2017-09-25  0.149873  0.340870  0.198454  0.725051

29 2017-09-26  0.972721  0.505842  0.886602  0.231916

30 2017-09-27  0.511109  0.990975  0.330336  0.898291

0 votes
by (36.8k points)

I have setted up the code this way:

today = pd.datetime.today().date()

df = pd.DataFrame(

    np.random.rand(20, 4),

    pd.date_range(end=today, periods=20, freq='3D'),

    columns=['O', 'H', 'L', 'C'])

df

                   O         H         L         C

2017-08-01  0.821996  0.894122  0.829814  0.429701

2017-08-04  0.883512  0.668642  0.524440  0.914845

2017-08-07  0.035753  0.231787  0.421547  0.163865

2017-08-10  0.742781  0.293591  0.874033  0.054421

2017-08-13  0.252422  0.632991  0.547044  0.650622

2017-08-16  0.316752  0.190016  0.504701  0.827450

2017-08-19  0.777069  0.533121  0.329742  0.603473

2017-08-22  0.843260  0.546845  0.600270  0.060620

2017-08-25  0.834180  0.395653  0.189499  0.820043

2017-08-28  0.806369  0.850968  0.753335  0.902687

2017-08-31  0.336096  0.145325  0.876519  0.114923

2017-09-03  0.590195  0.946520  0.009151  0.832992

2017-09-06  0.901101  0.616852  0.375829  0.332625

2017-09-09  0.537892  0.852527  0.082807  0.966297

2017-09-12  0.104929  0.803415  0.345942  0.245934

2017-09-15  0.085703  0.743497  0.256762  0.530267

2017-09-18  0.823960  0.397983  0.173706  0.091678

2017-09-21  0.211412  0.980942  0.833802  0.763510

2017-09-24  0.312950  0.850760  0.913519  0.846466

2017-09-27  0.921168  0.568595  0.460656  0.016313

To solve it:

I am using the DateTime index slicing. This is a simple method if we use the panda's devs for this problem.

df[today - pd.offsets.Day(20):]

                   O         H         L         C

2017-09-09  0.537892  0.852527  0.082807  0.966297

2017-09-12  0.104929  0.803415  0.345942  0.245934

2017-09-15  0.085703  0.743497  0.256762  0.530267

2017-09-18  0.823960  0.397983  0.173706  0.091678

2017-09-21  0.211412  0.980942  0.833802  0.763510

2017-09-24  0.312950  0.850760  0.913519  0.846466

2017-09-27  0.921168  0.568595  0.460656  0.016313

Note: The code used will grab all the rows which are present within the last 20days. 

If you are a beginner and want to know more about Data Science the do check out the Data Science course

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94.1k users

Browse Categories

...