I want to learn a Naive Bayes model for a problem where the class is boolean (takes on one of two values). Some of the features are boolean, but other features are categorical and can take on a small number of values (~5).

If all my features were boolean then I would want to use sklearn.naive_bayes.BernoulliNB. It seems clear that sklearn.naive_bayes.MultinomialNB is not what I want.

One solution is to split up my categorical features into boolean features. For instance, if a variable "X" takes on values "red", "green", "blue", I can have three variables: "X is red", "X is green", "X is blue". That violates the assumption of conditional independence of the variables given the class, so it seems totally inappropriate.

Another possibility is to encode the variable as a real-valued variable where 0.0 means red, 1.0 means green, and 2.0 means blue. That also seems totally inappropriate to use GaussianNB (for obvious reasons).

What I'm trying to do doesn't seem weird, but I don't understand how to fit it into the Naive Bayes models that sklearn gives me. It's easy to code up myself, but I prefer to use sklearn if possible for obvious reasons (most: to avoid bugs).