You can use the fread function from the data.table package in R to import large tables in a very short time.
For example:
To check the time elapsed by fread function to read a table of 1 million rows:
library(data.table) n=1e6 DT = data.table( a=sample(1:1000,n,replace=TRUE), b=sample(1:1000,n,replace=TRUE), c=rnorm(n), d=sample(c("foo","bar","baz","qux","quux"),n,replace=TRUE), e=rnorm(n), f=sample(1:1000,n,replace=TRUE) ) DT[2,b:=NA_integer_] DT[4,c:=NA_real_] DT[3,d:=NA_character_] DT[5,d:=""] DT[2,e:=+Inf] DT[3,e:=-Inf] write.table(DT,"test.csv",sep=",",row.names=FALSE,quote=FALSE) cat("File size (MB):",round(file.info("test.csv")$size/1024^2),"\n")
To read from fread function:
require(data.table) system.time(DT <- fread("test.csv"))
Output:
system.time(DT <- fread("test.csv"))
user system elapsed
0.07 0.05 0.09
Comparison of fread function with other functions:
## user system elapsed Method
##2.59 0.08 2.70 read.csv (first time)
##2.61 0.09 2.72 read.csv (second time)
##1.08 0.06 1.14 Optimized read.table
##0.13 0.03 0.08 fread