Pandas Select last 20 days of data.

Question

2 Answers

Shlok Pandey · Answer 1 · 2019-07-27T14:04:46+0000

Use df[-20:] for getting the last 20 lines from a DataFrame.

If you want to have the date 20 days ago, use pd.Timedelta(-19, unit='d') + pd.datetime.today().date().

In [1]: index = pd.date_range(start=(pd.Timedelta(-30, unit='d')+pd.datetime.today().date()), periods=31)
In [2]: df = pd.DataFrame(np.random.rand(31, 4), index=index, columns=['O', 'H', 'L', 'C'])
In [3]: df = df.reset_index().rename(columns={'index': 'Date'})
In [4]: df

Out[4]:
Date O H L C
0 2017-08-28 0.616856 0.518961 0.378005 0.716371
1 2017-08-29 0.300977 0.652217 0.713013 0.842369
2 2017-08-30 0.875668 0.232998 0.566047 0.969647
3 2017-08-31 0.273934 0.086575 0.386617 0.390749
4 2017-09-01 0.667561 0.336419 0.648809 0.619215
5 2017-09-02 0.988234 0.563675 0.402908 0.671333
6 2017-09-03 0.111710 0.549302 0.321546 0.201828
7 2017-09-04 0.469041 0.736152 0.345069 0.336593
8 2017-09-05 0.674844 0.276839 0.350289 0.862777
9 2017-09-06 0.128124 0.968918 0.713846 0.415061
10 2017-09-07 0.920488 0.252980 0.573531 0.270999
11 2017-09-08 0.113368 0.781649 0.190273 0.758834
12 2017-09-09 0.414453 0.545572 0.761805 0.586717
13 2017-09-10 0.348459 0.830177 0.779591 0.783887
14 2017-09-11 0.571877 0.230465 0.262744 0.360188
15 2017-09-12 0.844286 0.821388 0.312319 0.473672
16 2017-09-13 0.605548 0.570590 0.457141 0.882498
17 2017-09-14 0.242154 0.066617 0.028913 0.969698
18 2017-09-15 0.725521 0.742362 0.904866 0.890942
19 2017-09-16 0.460858 0.749581 0.429131 0.723394
20 2017-09-17 0.767445 0.452113 0.906294 0.978368
21 2017-09-18 0.342970 0.702579 0.029031 0.743489
22 2017-09-19 0.221478 0.339948 0.403478 0.349097
23 2017-09-20 0.147785 0.633542 0.692545 0.194496
24 2017-09-21 0.656189 0.419257 0.099094 0.708530
25 2017-09-22 0.329901 0.087101 0.683207 0.558431
26 2017-09-23 0.902550 0.155262 0.304506 0.756210
27 2017-09-24 0.072132 0.045242 0.058175 0.755649
28 2017-09-25 0.149873 0.340870 0.198454 0.725051
29 2017-09-26 0.972721 0.505842 0.886602 0.231916
30 2017-09-27 0.511109 0.990975 0.330336 0.898291

In [5]: df[-20:]
Out[5]:
Date O H L C
11 2017-09-08 0.113368 0.781649 0.190273 0.758834
12 2017-09-09 0.414453 0.545572 0.761805 0.586717
13 2017-09-10 0.348459 0.830177 0.779591 0.783887
14 2017-09-11 0.571877 0.230465 0.262744 0.360188
15 2017-09-12 0.844286 0.821388 0.312319 0.473672
16 2017-09-13 0.605548 0.570590 0.457141 0.882498
17 2017-09-14 0.242154 0.066617 0.028913 0.969698
18 2017-09-15 0.725521 0.742362 0.904866 0.890942
19 2017-09-16 0.460858 0.749581 0.429131 0.723394
20 2017-09-17 0.767445 0.452113 0.906294 0.978368
21 2017-09-18 0.342970 0.702579 0.029031 0.743489
22 2017-09-19 0.221478 0.339948 0.403478 0.349097
23 2017-09-20 0.147785 0.633542 0.692545 0.194496
24 2017-09-21 0.656189 0.419257 0.099094 0.708530
25 2017-09-22 0.329901 0.087101 0.683207 0.558431
26 2017-09-23 0.902550 0.155262 0.304506 0.756210
27 2017-09-24 0.072132 0.045242 0.058175 0.755649
28 2017-09-25 0.149873 0.340870 0.198454 0.725051
29 2017-09-26 0.972721 0.505842 0.886602 0.231916
30 2017-09-27 0.511109 0.990975 0.330336 0.898291

In [6]: df[df.Date.isin(pd.date_range(pd.Timedelta(-19, unit='d')+pd.datetime.today().date(), periods=20))]
Out[6]:
Date O H L C
11 2017-09-08 0.113368 0.781649 0.190273 0.758834
12 2017-09-09 0.414453 0.545572 0.761805 0.586717
13 2017-09-10 0.348459 0.830177 0.779591 0.783887
14 2017-09-11 0.571877 0.230465 0.262744 0.360188
15 2017-09-12 0.844286 0.821388 0.312319 0.473672
16 2017-09-13 0.605548 0.570590 0.457141 0.882498
17 2017-09-14 0.242154 0.066617 0.028913 0.969698
18 2017-09-15 0.725521 0.742362 0.904866 0.890942
19 2017-09-16 0.460858 0.749581 0.429131 0.723394
20 2017-09-17 0.767445 0.452113 0.906294 0.978368
21 2017-09-18 0.342970 0.702579 0.029031 0.743489
22 2017-09-19 0.221478 0.339948 0.403478 0.349097
23 2017-09-20 0.147785 0.633542 0.692545 0.194496
24 2017-09-21 0.656189 0.419257 0.099094 0.708530
25 2017-09-22 0.329901 0.087101 0.683207 0.558431
26 2017-09-23 0.902550 0.155262 0.304506 0.756210
27 2017-09-24 0.072132 0.045242 0.058175 0.755649
28 2017-09-25 0.149873 0.340870 0.198454 0.725051
29 2017-09-26 0.972721 0.505842 0.886602 0.231916
30 2017-09-27 0.511109 0.990975 0.330336 0.898291

supriya · Answer 2 · 2020-05-15T04:06:29+0000

I have setted up the code this way:

today = pd.datetime.today().date()
df = pd.DataFrame(
np.random.rand(20, 4),
pd.date_range(end=today, periods=20, freq='3D'),
columns=['O', 'H', 'L', 'C'])
df
O H L C
2017-08-01 0.821996 0.894122 0.829814 0.429701
2017-08-04 0.883512 0.668642 0.524440 0.914845
2017-08-07 0.035753 0.231787 0.421547 0.163865
2017-08-10 0.742781 0.293591 0.874033 0.054421
2017-08-13 0.252422 0.632991 0.547044 0.650622
2017-08-16 0.316752 0.190016 0.504701 0.827450
2017-08-19 0.777069 0.533121 0.329742 0.603473
2017-08-22 0.843260 0.546845 0.600270 0.060620
2017-08-25 0.834180 0.395653 0.189499 0.820043
2017-08-28 0.806369 0.850968 0.753335 0.902687
2017-08-31 0.336096 0.145325 0.876519 0.114923
2017-09-03 0.590195 0.946520 0.009151 0.832992
2017-09-06 0.901101 0.616852 0.375829 0.332625
2017-09-09 0.537892 0.852527 0.082807 0.966297
2017-09-12 0.104929 0.803415 0.345942 0.245934
2017-09-15 0.085703 0.743497 0.256762 0.530267
2017-09-18 0.823960 0.397983 0.173706 0.091678
2017-09-21 0.211412 0.980942 0.833802 0.763510
2017-09-24 0.312950 0.850760 0.913519 0.846466
2017-09-27 0.921168 0.568595 0.460656 0.016313

To solve it:

I am using the DateTime index slicing. This is a simple method if we use the panda's devs for this problem.

df[today - pd.offsets.Day(20):]
O H L C
2017-09-09 0.537892 0.852527 0.082807 0.966297
2017-09-12 0.104929 0.803415 0.345942 0.245934
2017-09-15 0.085703 0.743497 0.256762 0.530267
2017-09-18 0.823960 0.397983 0.173706 0.091678
2017-09-21 0.211412 0.980942 0.833802 0.763510
2017-09-24 0.312950 0.850760 0.913519 0.846466
2017-09-27 0.921168 0.568595 0.460656 0.016313

Note: The code used will grab all the rows which are present within the last 20days.

If you are a beginner and want to know more about Data Science the do check out the Data Science course

Pandas Select last 20 days of data.

2 Answers

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources