0 votes
1 view
in Data Science by (13.1k points)

New to pandas development. How do I forward fill a DataFrame with the value contained in one previously seen column?

Self-contained example:

import pandas as pd

import numpy as np

O = [1, np.nan, 5, np.nan]

H = [5, np.nan, 5, np.nan]

L = [1, np.nan, 2, np.nan]

C = [5, np.nan, 2, np.nan]

timestamps = ["2017-07-23 03:13:00", "2017-07-23 03:14:00", "2017-07-23 03:15:00", "2017-07-23 03:16:00"]

dict = {'Open': O, 'High': H, 'Low': L, 'Close': C}

df = pd.DataFrame(index=timestamps, data=dict)

ohlc = df[['Open', 'High', 'Low', 'Close']]

This yields the following DataFrame:

print(ohlc)

                         Open    High    Low    Close

2017-07-23 03:13:00      1.0     5.0     1.0    5.0

2017-07-23 03:14:00      NaN     NaN     NaN    NaN

2017-07-23 03:15:00      5.0     5.0     2.0    2.0

2017-07-23 03:16:00      NaN     NaN     NaN    NaN

I want to go from that last DataFrame to something like this:

                             Open     High    Low    Close

2017-07-23 03:13:00   1.0      5.0     1.0     5.0

2017-07-23 03:14:00   5.0      5.0     5.0     5.0

2017-07-23 03:15:00   5.0      5.0     2.0     2.0

2017-07-23 03:16:00   2.0      2.0     2.0     2.0

So that the previously-seen value in 'Close' forward fills entire rows until there's a new populated row seen. It's simple enough to fill column 'Close' like so:

column2fill = 'Close'

ohlc[column2fill] = ohlc[column2fill].ffill()

print(ohlc)

                          Open    High    Low    Close

2017-07-23 03:13:00       1.0     5.0     1.0    5.0

2017-07-23 03:14:00       NaN     NaN     NaN    5.0

2017-07-23 03:15:00       5.0     5.0     2.0    2.0

2017-07-23 03:16:00       NaN     NaN     NaN    2.0

But is there a way to fill across the 03:14:00 and 03:16:00 rows with the 'Close' value of those rows? And is there a way to do it in one step using one forward fill instead of filling the 'Close' column first?

1 Answer

0 votes
by (19.9k points)

You should assign with ffill and then bfill per row by axis=1.

f = ohlc.assign(Close=ohlc['Close'].ffill()).bfill(axis=1)

print (df)

                       Open   High  Low  Close

2017-07-23 03:13:00    1.0    5.0   1.0   5.0

2017-07-23 03:14:00    5.0    5.0   5.0   5.0

2017-07-23 03:15:00    5.0    5.0   2.0   2.0

2017-07-23 03:16:00    2.0    2.0   2.0   2.0

That  is same as:

ohlc['Close'] = ohlc['Close'].ffill()

df = ohlc.bfill(axis=1)

print (df)

                      Open  High  Low  Close

2017-07-23 03:13:00   1.0   5.0   1.0   5.0

2017-07-23 03:14:00   5.0   5.0   5.0   5.0

2017-07-23 03:15:00   5.0   5.0   2.0   2.0

2017-07-23 03:16:00   2.0   2.0    2.0  2.0

...