Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (47.6k points)
edited by

You want to take CMU's phonetic data set input that looks like this:

ABERRATION AE2 B ER0 EY1 SH AH0 N

ABERRATIONAL AE2 B ER0 EY1 SH AH0 N AH0 L

ABERRATIONS AE2 B ER0 EY1 SH AH0 N Z

ABERT AE1 B ER0 T

ABET AH0 B EH1 T

ABETTED AH0 B EH1 T IH0 D

ABETTING AH0 B EH1 T IH0 NG

ABEX EY1 B EH0 K S

ABEYANCE AH0 B EY1 AH0 N S

(The word is to the left, to the right are a series of phonemes, key here)

And you want to use it as training data for a machine learning system that would take new words and guess how they would be pronounced in English.

It's not so obvious to me at least because there isn't a fixed token size of letters which could possibly map to a phoneme. I have a feeling that something to do with a Markov chain might be the right way to go.

How would you do this?

1 Answer

0 votes
by (33.1k points)

Grapheme-to-phoneme conversion is the task of finding the pronunciation of a word given in its written form. It has important applications in text-to-speech and speech recognition. Joint-sequence models are a simple probabilistic framework that is applicable to this problem.  The impact of the maximum approximation in training and transcription, the interaction of model size parameters, n-best list generation, confidence measures, and conversion. The problem is called Grapheme-to-phoneme conversion, a subproblem of Natural Language Processing. Google brings up a few papers.

Browse Categories

...