I'm having a little trouble understanding the pass-by-reference properties of data.table. Some operations seem to 'break' the reference, and I'd like to understand exactly what's happening.
On creating a data.table from other data.table (via <-, then updating the new table by :=, the original table is also altered. This is expected, as per:
?data.table::copy
Here's an example:
library(data.table)
DT <- data.table(a=c(1,2), b=c(11,12))
print(DT)
# a b
# [1,] 1 11
# [2,] 2 12
newDT <- DT # reference, not copy
newDT[1, a := 100] # modify new DT
print(DT) # DT is modified too.
# a b
# [1,] 100 11
# [2,] 2 12
However, if I insert a non-:= based modification between the <- assignment and the := lines above, DT is now no longer modified:
DT = data.table(a=c(1,2), b=c(11,12))
newDT <- DT
newDT$b[2] <- 200 # new operation
newDT[1, a := 100]
print(DT)
# a b
# [1,] 1 11
# [2,] 2 12
So it seems that the newDT$b[2] <- 200 line somehow 'breaks' the reference. I'd guess that this invokes a copy somehow, but I would like to understand fully how R is treating these operations, to ensure I don't introduce potential bugs in my code.
I'd very much appreciate if someone could explain this to me.