Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in R Programming by (5.3k points)
edited by

I have a data frame ("data") with lots and lots of columns. Some of the columns contain a certain string ("search_string").

How can I use dplyr::select() to give me a subset including only the columns that contain the string?

I tried:

# columns as boolean vector

select(data, grepl("search_string",colnames(data)))

# columns as vector of column names names 

select(data, colnames(data)[grepl("search_string",colnames(data))]) 

Neither of them works.

I know that select() accepts numeric vectors as a substitute for columns e.g.:

select(data,5,7,9:20)

But I don't know how to get a numeric vector of columns IDs from my grepl() expression.

1 Answer

0 votes
by
edited by

To select columns based on a string match, you can use the following functions from the dplyr package:

data(iris)

select(iris,contains("Sepal"))

    Sepal.Length Sepal.Width

1            5.1         3.5

2            4.9         3.0

3            4.7         3.2

4            4.6         3.1

5            5.0         3.6

6            5.4         3.9

OR

select(iris, matches("Petal"))

    Petal.Length Petal.Width

1            1.4         0.2

2            1.4         0.2

3            1.3         0.2

4            1.5         0.2

5            1.4         0.2

6            1.7         0.4

...