0 votes
1 view
in R Programming by (5.3k points)

In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a data.frame) is the following:

> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)

I would like a function fill.NAs() that allows me to construct yy such that:

> yy

[1] NA NA NA  2  2  2  2  3  3  3  4  4

I need to repeat this operation for many (total ~1 Tb) small sized data.frames (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem?

The ugly solution I cooked up uses this function:

last <- function (x){

    x[length(x)]

}    

fill.NAs <- function(isNA){

if (isNA[1] == 1) {

    isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs 

                                              # can't be forward filled

}

isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)

isNA.pos[isNA.diff < 0] <- 0

isNA.neg[isNA.diff > 0] <- 0

which.isNA.neg <- which(as.logical(isNA.neg))

if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works

which.isNA.pos <- which(as.logical(isNA.pos))

which.isNA <- which(as.logical(isNA))

if (length(which.isNA.neg)==length(which.isNA.pos)){

    replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)], 

                                which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] - 

                                which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])      

    replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))

} else {

    replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])     

    replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))

}

replacement

}

The function fill.NAs is used as follows:

y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)

isNA <- as.numeric(is.na(y))

replacement <- fill.NAs(isNA)

if (length(replacement)){

which.isNA <- which(as.logical(isNA))

to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]

y[to.replace] <- y[replacement]

Output

> y

[1] NA  2  2  2  2  3  3  3  4  4  4

... which seems to work. But, man, is it ugly! Any suggestions?

1 Answer

0 votes
by (25.3k points)

To carry your latest observation forward to your next observation, you can use the na.locf() function from the zoo package as follows:

library(zoo)

x <- zoo(1:6)

x

1 2 3 4 5 6 

1 2 3 4 5 6 

y <- zoo(c(2,NA,1,4,5,2))

na.locf(y)

1 2 3 4 5 6 

2 2 1 4 5 2 

na.locf(y, fromLast = TRUE)

1 2 3 4 5 6 

2 1 1 4 5 2 

z <- zoo(c(NA,9,3,2,3,2))

na.locf(z)

2 3 4 5 6 

9 3 2 3 2 

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...