Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (17.6k points)

New to pandas development. How do I forward fill a DataFrame with the value contained in one previously seen column?

Self-contained example:

import pandas as pd

import numpy as np

O = [1, np.nan, 5, np.nan]

H = [5, np.nan, 5, np.nan]

L = [1, np.nan, 2, np.nan]

C = [5, np.nan, 2, np.nan]

timestamps = ["2017-07-23 03:13:00", "2017-07-23 03:14:00", "2017-07-23 03:15:00", "2017-07-23 03:16:00"]

dict = {'Open': O, 'High': H, 'Low': L, 'Close': C}

df = pd.DataFrame(index=timestamps, data=dict)

ohlc = df[['Open', 'High', 'Low', 'Close']]

This yields the following DataFrame:

print(ohlc)

                         Open    High    Low    Close

2017-07-23 03:13:00      1.0     5.0     1.0    5.0

2017-07-23 03:14:00      NaN     NaN     NaN    NaN

2017-07-23 03:15:00      5.0     5.0     2.0    2.0

2017-07-23 03:16:00      NaN     NaN     NaN    NaN

I want to go from that last DataFrame to something like this:

                             Open     High    Low    Close

2017-07-23 03:13:00   1.0      5.0     1.0     5.0

2017-07-23 03:14:00   5.0      5.0     5.0     5.0

2017-07-23 03:15:00   5.0      5.0     2.0     2.0

2017-07-23 03:16:00   2.0      2.0     2.0     2.0

So that the previously-seen value in 'Close' forward fills entire rows until there's a new populated row seen. It's simple enough to fill column 'Close' like so:

column2fill = 'Close'

ohlc[column2fill] = ohlc[column2fill].ffill()

print(ohlc)

                          Open    High    Low    Close

2017-07-23 03:13:00       1.0     5.0     1.0    5.0

2017-07-23 03:14:00       NaN     NaN     NaN    5.0

2017-07-23 03:15:00       5.0     5.0     2.0    2.0

2017-07-23 03:16:00       NaN     NaN     NaN    2.0

But is there a way to fill across the 03:14:00 and 03:16:00 rows with the 'Close' value of those rows? And is there a way to do it in one step using one forward fill instead of filling the 'Close' column first?

1 Answer

0 votes
by (41.4k points)

You should assign with ffill and then bfill per row by axis=1.

f = ohlc.assign(Close=ohlc['Close'].ffill()).bfill(axis=1)

print (df)

                       Open   High  Low  Close

2017-07-23 03:13:00    1.0    5.0   1.0   5.0

2017-07-23 03:14:00    5.0    5.0   5.0   5.0

2017-07-23 03:15:00    5.0    5.0   2.0   2.0

2017-07-23 03:16:00    2.0    2.0   2.0   2.0

That  is same as:

ohlc['Close'] = ohlc['Close'].ffill()

df = ohlc.bfill(axis=1)

print (df)

                      Open  High  Low  Close

2017-07-23 03:13:00   1.0   5.0   1.0   5.0

2017-07-23 03:14:00   5.0   5.0   5.0   5.0

2017-07-23 03:15:00   5.0   5.0   2.0   2.0

2017-07-23 03:16:00   2.0   2.0    2.0  2.0

If you wish to know about Python then visit this Python Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94.1k users

Browse Categories

...