Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

It seems like R is really designed to handle datasets that it can pull entirely into memory. What R packages are recommended for signal processing and machine learning on very large datasets that can not be pulled into memory?


If R is simply the wrong way to do this, I am open to other robust free suggestions (e.g. scipy if there is some nice way to handle very large datasets)

1 Answer

0 votes
by (33.1k points)

You should have a look at the "Large memory and out-of-memory data" subsection of the high performance computing task view on CRAN. bigmemory and ff are two popular packages. The bigmemory website has quite good presentations, vignettes, and overviews from Jay Emerson.

It is like storing data in a database and reading in smaller batches for analysis. There are many approaches to this problem. You can go through some of the examples in the biglm package.

Hope this answer helps.

If you wish to Learn Machine Learning then visit this Machine Learning Course.

Browse Categories