I'm having a little trouble understanding the pass-by-reference properties of data.table. Some operations seem to 'break' the reference, and I'd like to understand exactly what's happening.

On creating a data.table from other data.table (via <-, then updating the new table by **:=**, the original table is also altered. This is expected, as per:

?data.table::copy

Here's an example:

library(data.table)

DT <- data.table(a=c(1,2), b=c(11,12))

print(DT)

# a b

# [1,] 1 11

# [2,] 2 12

newDT <- DT # reference, not copy

newDT[1, a := 100] # modify new DT

print(DT) # DT is modified too.

# a b

# [1,] 100 11

# [2,] 2 12

However, if I insert a non-:= based modification between the <- assignment and the := lines above, DT is now no longer modified:

DT = data.table(a=c(1,2), b=c(11,12))

newDT <- DT

newDT$b[2] <- 200 # new operation

newDT[1, a := 100]

print(DT)

# a b

# [1,] 1 11

# [2,] 2 12

So it seems that the newDT$b[2] <- 200 line somehow 'breaks' the reference. I'd guess that this invokes a copy somehow, but I would like to understand fully how R is treating these operations, to ensure I don't introduce potential bugs in my code.

I'd very much appreciate if someone could explain this to me.