Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (5.3k points)

I am going through the documentation of data.table and also noticed from some of the conversations over here on SO that rbindlist is supposed to be better than rbind.

I would like to know why is rbindlist better than rbind and in which scenarios rbindlist really excels over rbind?

Is there any advantage in terms of memory utilization?

1 Answer

0 votes
by

Following are the few aspects why rbindlist is better than rbind:

  • Memory efficiency

In terms of memory, rbindlist is implemented in C, it sets attributes by reference by using setattr and thus is memory efficient.

rbind.data.frame is implemented in R, it does lots of assigning, and uses attr<- (and class<- and rownames<- all of which will (internally) create copies of the created data.frame.

  • rbindlist handles lists data.frames and data.tables, and returns a data.table without row names, while you can get mixed up in row names using rbind.

 

  • rbind does lots of checking by matching by name. (i.e. rbind.data.frame will account for the fact that columns may be in different orders, and match up by name), rbindlist doesn't do this kind of checking and will join by position.

For example:

library('data.table')

> do.call(rbind, list(data.frame(a = 1:3, b = 2:4), data.frame(b = 4:6, a = 5:7)))

  a b

1 1 2

2 2 3

3 3 4

4 5 4

5 6 5

6 7 6

 

rbindlist(list(data.frame(a = 1:3, b = 2:4), data.frame(b = 4:6, a = 5:7)))

   a b

1: 1 2

2: 2 3

3: 3 4

4: 4 5

5: 5 6

6: 6 7

 

However, that rbindlist lacked certain features like checking factor levels or matching names and bears very tiny (or no) weight towards it being faster than rbind.data.frame. It's because they were carefully implemented in C, optimized for speed and memory.

...