In most cases, the Baum-Welch algorithm is used to train a Hidden Markov model.

In many papers, however, it is argued that the BW algorithm will optimize until it got stuck in a local optimum.

Does there exist an exact algorithm that actually succeeds in finding the global optimum (except enumerating nearly all possible models and evaluating them)?

Of course, for most applications, BW will work fine. We are however interested in finding lower bounds of the amount of information loss when reducing the number of states. Therefore we always need to generate the best model possible.

We are thus looking for an efficient NP-hard algorithm (that only enumerates over a (potentially) the exponential number of extreme points) and not over a discretized number of floating points for each probability in the model.