Why is rbindlist “better” than rbind?

Question

1 Answer

anonymous · Answer 1 · 2019-07-16T13:04:49+0000

Following are the few aspects why rbindlist is better than rbind:

Memory efficiency

In terms of memory, rbindlist is implemented in C, it sets attributes by reference by using setattr and thus is memory efficient.

rbind.data.frame is implemented in R, it does lots of assigning, and uses attr<- (and class<- and rownames<- all of which will (internally) create copies of the created data.frame.

rbindlist handles lists data.frames and data.tables, and returns a data.table without row names, while you can get mixed up in row names using rbind.

rbind does lots of checking by matching by name. (i.e. rbind.data.frame will account for the fact that columns may be in different orders, and match up by name), rbindlist doesn't do this kind of checking and will join by position.

For example:

library('data.table')
> do.call(rbind, list(data.frame(a = 1:3, b = 2:4), data.frame(b = 4:6, a = 5:7)))
a b
1 1 2
2 2 3
3 3 4
4 5 4
5 6 5
6 7 6

rbindlist(list(data.frame(a = 1:3, b = 2:4), data.frame(b = 4:6, a = 5:7)))
a b
1: 1 2
2: 2 3
3: 3 4
4: 4 5
5: 5 6
6: 6 7

However, that rbindlist lacked certain features like checking factor levels or matching names and bears very tiny (or no) weight towards it being faster than rbind.data.frame. It's because they were carefully implemented in C, optimized for speed and memory.

Why is rbindlist “better” than rbind?

1 Answer

Related questions

Browse Categories