Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

Basically I am having list of words and I am looking for words that are present in the text. The desired result is that in the last column is always found as it is searching for patterns. I am looking for exact match that is present in the words. I don't want the combinations. For the first three records it should be not found.

col_1 <- c(1,2,3,4,5)

col_2 <- c("work instruction change", 

           "technology npi inspections", 

           " functional locations",

           "Construction has started",

           " there is going to be constn coon")

df <- as.data.frame(cbind(col_1,col_2))

df$col_2 <- tolower(df$col_2)

words <- c("const","constn","constrction","construc",

                    "construct","construction","constructs","consttntype","constypes","ct","ct#",

                    "ct2"

                    )

pattern_words  <- paste(words, collapse = "|")

df$result<- ifelse(str_detect(df$col_2, regex(pattern_words)),"Found","Not Found")

1 Answer

0 votes
by (108k points)

You can simply use the word boundaries around the words.

library(stringr)

pattern_words  <- paste0('\\b', words, '\\b', collapse = "|")

df$result <- c('Not Found', 'Found')[str_detect(df$col_2, pattern_words) + 1]

#OR with `ifelse`

#df$result <- ifelse(str_detect(df$col_2, pattern_words), "Found", "Not Found")

df

#  col_1                             col_2    result

#1     1           work instruction change Not Found

#2     2        technology npi inspections Not Found

#3     3              functional locations Not Found

#4     4          construction has started     Found

#5     5  there is going to be constn coon     Found

If you want you can also use grepl here to keep it in base R :

grepl(pattern_words, df$col_2)

If you are a beginner and want to know more about R programming, then do check out the R programming tutorial.

Browse Categories

...