0 votes
1 view
in Data Science by (17.6k points)

I have a (fairly long) list of vectors. The vectors consist of Russian words that I got by using the strsplit() function on sentences.

The following is what head() returns:

[[1]]

[1] "модно"     "создавать" "резюме"    "в"         "виде"     

[[2]]

[1] "ты"        "начианешь" "работать"  "с"         "этими"    

[[3]]

[1] "модно"            "называть"         "блогер-рилейшенз" "―"                "начинается"       "задолго"         

[[4]]

[1] "видел" "по"    "сыну," "что"   "он"   

[[5]]

[1] "четырнадцать," "я"             "поселился"     "на"            "улице"        

[[6]]

[1] "широко"     "продолжали" "род."

Note the vectors are of different length.

What I want is to be able to read the first words from each sentence, the second word, the third, etc.

The desired result would be something like this:

    P1              P2           P3                 P4    P5           P6

[1] "модно"         "создавать"  "резюме"           "в"   "виде"       NA

[2] "ты"            "начианешь"  "работать"         "с"   "этими"      NA

[3] "модно"         "называть"   "блогер-рилейшенз" "―"   "начинается" "задолго"         

[4] "видел"         "по"         "сыну,"            "что" "он"         NA

[5] "четырнадцать," "я"          "поселился"        "на"  "улице"      NA

[6] "широко"        "продолжали" "род."             NA    NA           NA

I have tried to just use data.frame() but that didn't work because the rows are of different length. I also tried rbind.fill() from the plyr package, but that function can only process matrices.

I found some other questions here (that's where I got the plyr help from), but those were all about combining for instance two data frames of different size.

Thanks for your help.

1 Answer

0 votes
by (31.1k points)

Try the below code which will give you the desired output:

word.list <- list(letters[1:4], letters[1:5], letters[1:2], letters[1:6])

n.obs <- sapply(word.list, length)

seq.max <- seq_len(max(n.obs))

mat <- t(sapply(word.list, "[", i = seq.max))

Here, the trick is:

c(1:2)[1:4]

That returns the vector + two NAs

 A better and more concise answer is by using one liner with plyr

plyr::ldply(word.list, rbind)

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...