Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

Basically I am having a tibble and all those vectors that are within the tibble are character strings with a combination of English and Mandarin characters. I want to split the tibble into two, with one column returning the English, the other column returning the Mandarin. However, I had to re-sort in order to accomplish the following:

    tb <- tibble(x = c("I我", "love愛", "you你")) #create tibble

en <- str_split(tb[[1]], "[^A-Za-z]+", simplify = T) #split string when R reads a character that is not a-z

ch <- str_split(tb[[1]], "[A-Za-z]+", simplify = T) #split string after R reads all the a-z characters

tb <- tb %>%

  mutate(EN = en[,1],

         CH = ch[,2]) %>%

  select(-x)#subset the matrices created above, because the matrices create a column of blank/"" values and also remove x column

tb

I think that my RegEx is causing some error. 

1 Answer

0 votes
by (107k points)

In R programming, you can simply use the str_match() and get data for English and rest of the characters separately.

stringr::str_match(tb$x, "([A-Za-z]+)(.*)")[, -1]

#     [,1]   [,2]

#[1,] "I"    "我"

#[2,] "love" "愛"

#[3,] "you"  "你"

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...