Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (7.3k points)

I have a large data.table, with many missing values scattered throughout its ~200k rows and 200 columns. I would like to recode those NA values to zeros as efficiently as possible.

I see two options:

1: Convert to a data.frame

2: Some kind of cool data.table sub setting command

I'll be happy with a fairly efficient solution of type 1. Converting to a data.frame and then back to a data.table won't take too long.

1 Answer

0 votes
by

To replace NA’s in a large data table faster, use the := operator from data.table package as follows:

To create a large data table  with 100 columns:

require(data.table) 

require("gdata") 

create_dt <- function(nrow=5, ncol=5, propNA = 0.5){                          v <- runif(nrow * ncol) v[sample(seq_len(nrow*ncol),propNA * nrow*ncol)] <- NA data.table(matrix(v, ncol=ncol)) } 

set.seed(123) 

dt = create_dt(1e5, 100, 0.1) 

dim(dt) 

[1] 100000 100

To replace NA’s with Zero’s use the following function:

f_NA = function(DT) {

  for (i in names(DT))

    DT[is.na(get(i)), (i):=0]

}

To pass the created data frame to function:

f_NA(dt)

This replaces all NA values from the table.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers

500 comments

108k users

Browse Categories

...