Back

Explore Courses Blog Tutorials Interview Questions
+1 vote
2 views
in R Programming by (19k points)
What does .SD stand for? How is it helpful and when to use it?

According to some source, .SD is a data.table containing the subset of x's data for each group, excluding the group column(s).

Can be used when grouping by i, when grouping by by, keyed by, and adhoc_ by

Does that mean that the subset data.tables is held in memory for the upcoming/next operation?

1 Answer

0 votes
by (33.1k points)

.SD stands for "Subset of Data.table". There's no significance to the initial ".", except that it makes it even more unlikely that there will be a clash with a user-defined column name.

If this is your data.table:

DT = data.table(x=rep(c("a","b","c"),each=2), y=c(1,3), v=1:6)

setkey(DT, y)

DT

#    x y v

# 1: a 1 1

# 2: b 1 3

# 3: c 1 5

# 4: a 3 2

# 5: b 3 4

# 6: c 3 6

Doing this may help you know about .SD is:

DT[ , .SD[ , paste(x, v, sep="", collapse="_")], by=y]

#    y       V1

# 1: 1 a1_b3_c5

# 2: 3 a2_b4_c6

Basically, the by=y statement breaks the original data.table into these two sub-data.tables

DT[ , print(.SD), by=y]

# <1st sub-data.table, called '.SD' while it's being operated on>

#    x v

# 1: a 1

# 2: b 3

# 3: c 5

# <2nd sub-data.table, ALSO called '.SD' while it's being operated on>

#    x v

# 1: a 2

# 2: b 4

# 3: c 6

# <final output, since print() doesn't return anything>

# Empty data.table (0 rows) of 1 col: y

and operates on them in turn.

While it is operating on either one, it lets you refer to the current sub-data.table by using the nick-name/handle/symbol .SD. As you can access and operate on the columns just as if you were sitting at the command line working with a single data.table called. SD ... except that here, data.table will carry out those operations on every single sub-data.table defined by combinations of the key, "pasting" them back together and returning the results in a single data.table!

Browse Categories

...