Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (7.3k points)

When I need to filter data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function:

subset(airquality, Month == 8 & Temp > 90)

Rather than the [ function:

airquality[airquality$Month == 8 & airquality$Temp > 90, ]

There are two main reasons for my preference:

  1. I find the code reads better, from left to right. Even people who know nothing about R could tell what the subset statement above is doing.
  2. Because columns can be referred to as variables in the select expression, I can save a few keystrokes. In my example above, I only had to type air quality once with a subset, but three times with [.

So I was living happy, using subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset documentation, I notice this section:

Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

Could someone help clarify what the authors mean?

First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.

Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?

1 Answer

0 votes
by

While the subset() function saves typing, it is actually difficult to use non-interactively. 

For example:

To create a function that randomly reorders a subset of rows of data:

scramble <- function(x) x[sample(nrow(x)), ]

 

subscramble <- function(x, condition) {

  scramble(subset(x, condition))

}

 

 

subscramble(iris, Sepal.Length >= 5) 

Output:

Error in eval(e, x, parent.frame()) : object 'Sepal.Length' not found 

Here R no longer knows where to locate the object ‘Sepal.Length’ after it comes out of the scramble function.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

29.3k questions

30.6k answers

501 comments

104k users

Browse Categories

...