2 views

In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a data.frame) is the following:

> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)

I would like a function fill.NAs() that allows me to construct yy such that:

> yy

[1] NA NA NA  2  2  2  2  3  3  3  4  4

I need to repeat this operation for many (total ~1 Tb) small sized data.frames (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem?

The ugly solution I cooked up uses this function:

last <- function (x){

x[length(x)]

}

fill.NAs <- function(isNA){

if (isNA[1] == 1) {

isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs

# can't be forward filled

}

isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)

isNA.pos[isNA.diff < 0] <- 0

isNA.neg[isNA.diff > 0] <- 0

which.isNA.neg <- which(as.logical(isNA.neg))

if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works

which.isNA.pos <- which(as.logical(isNA.pos))

which.isNA <- which(as.logical(isNA))

if (length(which.isNA.neg)==length(which.isNA.pos)){

replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)],

which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] -

which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])

replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))

} else {

replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])

replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))

}

replacement

}

The function fill.NAs is used as follows:

y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)

isNA <- as.numeric(is.na(y))

replacement <- fill.NAs(isNA)

if (length(replacement)){

which.isNA <- which(as.logical(isNA))

to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]

y[to.replace] <- y[replacement]

Output

> y

[1] NA  2  2  2  2  3  3  3  4  4  4

... which seems to work. But, man, is it ugly! Any suggestions?

To carry your latest observation forward to your next observation, you can use the na.locf() function from the zoo package as follows:

library(zoo)

x <- zoo(1:6)

x

1 2 3 4 5 6

1 2 3 4 5 6

y <- zoo(c(2,NA,1,4,5,2))

na.locf(y)

1 2 3 4 5 6

2 2 1 4 5 2

na.locf(y, fromLast = TRUE)

1 2 3 4 5 6

2 1 1 4 5 2

z <- zoo(c(NA,9,3,2,3,2))

na.locf(z)

2 3 4 5 6

9 3 2 3 2