Subset if string contains

Question

asked Apr 12, 2020 in R Programming by ashely (50.2k points)

Basically I am having the following vector of strings.

first sp.z.o.o.
second s.a #should be removed
third sp z o o
fourth PP #should be removed
fifth sp z o.o.
sixth #should be removed
seventh sp z oo
eighth LTD. #should be removed
nineth sp-z-o-o
tenth spzoo
eleventh sp.zo.o

And from that I want to perform sub-setting only on those which contain all possible combination of sp z o.o. with and without spaces/double spaces, dots, comas and other symbols (such as * | - etc.). And for that I tried to use the following code, but it seems that it doesn't work.

sample <- df[grepl("(sp\\.z\\.o\\.o\\.)", df$col_1), ]

and also

sample <- df[grepl("(sp\\.*z\\.*o\\.*o\\.*)", df$col_1), ]

1 Answer

vinita · Answer 1 · 2020-04-12T08:42:45+0000

What you can do is use the below pattern:

sample <- subset(df, grepl('s.*p.*z.*o', col_1))

The above will help you to select the rows when you have the 'spzoo' in the string irrespective of anything in between.

You can also test the regex on a vector.

x <- c('first sp.z.o.o.', 'second s.a', 'third sp z o o', 'fourth PP',
'fifth sp z o.o.', 'sixth', 'seventh sp z oo', 'eighth LTD.',
'nineth sp-z-o-o', 'tenth spzoo', 'eleventh sp.zo.o')
grep('s.*p.*z.*o', x, value = TRUE)
#[1] "first sp.z.o.o." "third sp z o o" "fifth sp z o.o." "seventh sp z oo"
#[5] "nineth sp-z-o-o" "tenth spzoo" "eleventh sp.zo.o"

If you are a beginner and want to know more about R then do check out the following R programming tutorial that will help you in learning R from scratch.

Subset if string contains

1 Answer

Related questions

Browse Categories