Str_split is returning only half of the string

Question

asked Apr 19, 2020 in R Programming by ashely (50.2k points)

Basically I am having a tibble and all those vectors that are within the tibble are character strings with a combination of English and Mandarin characters. I want to split the tibble into two, with one column returning the English, the other column returning the Mandarin. However, I had to re-sort in order to accomplish the following:

tb <- tibble(x = c("I我", "love愛", "you你")) #create tibble
en <- str_split(tb[[1]], "[^A-Za-z]+", simplify = T) #split string when R reads a character that is not a-z
ch <- str_split(tb[[1]], "[A-Za-z]+", simplify = T) #split string after R reads all the a-z characters
tb <- tb %>%
mutate(EN = en[,1],
CH = ch[,2]) %>%
select(-x)#subset the matrices created above, because the matrices create a column of blank/"" values and also remove x column
tb

I think that my RegEx is causing some error.

1 Answer

vinita · Answer 1 · 2020-04-19T09:02:57+0000

In R programming, you can simply use the str_match() and get data for English and rest of the characters separately.

stringr::str_match(tb$x, "([A-Za-z]+)(.*)")[, -1]
# [,1] [,2]
#[1,] "I" "我"
#[2,] "love" "愛"
#[3,] "you" "你"

Str_split is returning only half of the string

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources