When I need to filter data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function:
subset(airquality, Month == 8 & Temp > 90)
Rather than the [ function:
airquality[airquality$Month == 8 & airquality$Temp > 90, ]
There are two main reasons for my preference:
- I find the code reads better, from left to right. Even people who know nothing about R could tell what the subset statement above is doing.
- Because columns can be referred to as variables in the select expression, I can save a few keystrokes. In my example above, I only had to type air quality once with a subset, but three times with [.
So I was living happy, using subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset documentation, I notice this section:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
Could someone help clarify what the authors mean?
First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.
Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?