Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (7.3k points)

When I need to filter data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function:

subset(airquality, Month == 8 & Temp > 90)

Rather than the [ function:

airquality[airquality$Month == 8 & airquality$Temp > 90, ]

There are two main reasons for my preference:

  1. I find the code reads better, from left to right. Even people who know nothing about R could tell what the subset statement above is doing.
  2. Because columns can be referred to as variables in the select expression, I can save a few keystrokes. In my example above, I only had to type air quality once with a subset, but three times with [.

So I was living happy, using subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset documentation, I notice this section:

Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

Could someone help clarify what the authors mean?

First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.

Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?

1 Answer

0 votes
by

While the subset() function saves typing, it is actually difficult to use non-interactively. 

For example:

To create a function that randomly reorders a subset of rows of data:

scramble <- function(x) x[sample(nrow(x)), ]

 

subscramble <- function(x, condition) {

  scramble(subset(x, condition))

}

 

 

subscramble(iris, Sepal.Length >= 5) 

Output:

Error in eval(e, x, parent.frame()) : object 'Sepal.Length' not found 

Here R no longer knows where to locate the object ‘Sepal.Length’ after it comes out of the scramble function.

Browse Categories

...