Given 100,000 word-to-phonemes mappings, how can I split the original words on the phoneme boundaries?

Question

asked Aug 31, 2019 in AI and Deep Learning by ashely (50.2k points)

I have a mapping of 100,000+ words to their phonemes (CMUdict), like:

ABANDONED => [ 'AH', 'B', 'AE', 'N', 'D', 'AH', 'N', 'D' ]

I want to split the original words' letters into several groups equal to the number of phonemes, e.x.

ABANDONED => [ 'A', 'B', 'A', 'N', 'D', 'O', 'N', 'ED' ]

I don't have a mapping of phonemes to graphemes, but it seems like I should be able to compute a statistical model of phonemes to graphemes, then use that to decide where to split each word. (It would be nice if the model could also be used to convert new words to their probable phonemes)

How can I do this? I was thinking a hidden Markov model sounds like it could be applicable, but beyond that hunch, I don't know.

1 Answer

vinita · Answer 1 · 2019-08-31T10:59:56+0000

First, you have to align the word to its phonetic representation by matching the identical letters and phonemes (like N and N). You can receive the best match with dynamic programming. Then you can outline the remaining characters of the words to the remaining phonemes.

If you wish to obtain certifications in programming then you cloud join any of the programming courses.

Given 100,000 word-to-phonemes mappings, how can I split the original words on the phoneme boundaries?

Given 100,000 word-to-phonemes mappings, how can I split the original words on the phoneme boundaries?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions