Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

I am having a data-set named 'dt' with column "subject", that I need to parse. For example,

ID    subject   

1     USA(Texas)(Austin)

2     USA(California)(Sacramento)

I want to have the following table as my output:

ID    subject                       Country     State        Capital   

1     USA(Texas)(Austin)            USA         Texas        Austin

2     USA(California)(Sacramento)   USA         California   Sacramento

When I had only one value in brackets, I used the following expression:

dt <- tidyr::extract(dt, subject, into = c(var1, var2), regex = "(.*)\\((.*)\\)", remove = FALSE)

But how do I change it when I have multiple expressions in parentheses?

1 Answer

0 votes
by (108k points)

In your dataset, you are having multiple brackets, now to extract data from that you have to make your regex lazy.

library(dplyr)

library(tidyr)

extract(dt, subject, into = c("Country", "State", "Capital"),

              regex = "(.*)\\((.*?)\\)\\((.*)\\)", remove = FALSE)

#  ID                     subject Country      State    Capital

#1  1          USA(Texas)(Austin)     USA      Texas     Austin

#2  2 USA(California)(Sacramento)     USA California Sacramento

If you are new to R programming then kindly go through the R programming tutorial that will help you to learn R from scratch. 

Browse Categories

...