0 votes
1 view
in R Programming by (7.3k points)

I have a large data.table, with many missing values scattered throughout its ~200k rows and 200 columns. I would like to recode those NA values to zeros as efficiently as possible.

I see two options:

1: Convert to a data.frame

2: Some kind of cool data.table sub setting command

I'll be happy with a fairly efficient solution of type 1. Converting to a data.frame and then back to a data.table won't take too long.

1 Answer

0 votes
by (25.4k points)

To replace NA’s in a large data table faster, use the := operator from data.table package as follows:

To create a large data table  with 100 columns:

require(data.table) 

require("gdata") 

create_dt <- function(nrow=5, ncol=5, propNA = 0.5){                          v <- runif(nrow * ncol) v[sample(seq_len(nrow*ncol),propNA * nrow*ncol)] <- NA data.table(matrix(v, ncol=ncol)) } 

set.seed(123) 

dt = create_dt(1e5, 100, 0.1) 

dim(dt) 

[1] 100000 100

To replace NA’s with Zero’s use the following function:

f_NA = function(DT) {

  for (i in names(DT))

    DT[is.na(get(i)), (i):=0]

}

To pass the created data frame to function:

f_NA(dt)

This replaces all NA values from the table.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...