Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

I'm using NHunspell to check a string for spelling errors like so:

var words = content.Split(' ');

string[] incorrect;

using (var spellChecker = new Hunspell(affixFile, dictionaryFile))

{

    incorrect = words.Where(x => !spellChecker.Spell(x))

        .ToArray();

}

This generally works, but it has some problems. For example, if I'm checking the sentence "This is a (very good) example", it will report "(very" and "good)" as being misspelled. Or if the string contains a time such as "8:30", it will report that as a misspelled word. It also has problems with commas, etc.

Microsoft Word is smart enough to recognize the time, fraction, or a comma-delimited list of words. It knows when not to use an English dictionary, and it knows when to ignore symbols. How can I get a similar, more intelligent spell check in my software? Are there any libraries that provide a little more intelligence?

EDIT: I don't want to force users to have Microsoft Word installed on their machine, so using COM interop is not an option.

1 Answer

0 votes
by (108k points)

If your spell checker is that dumb, then you should pre-tokenize its input to get the words out and feed those units at a time or as a string connected with spaces. In Python, you have to apply a simple RE like \w+ for that:

>>> s = "This is an example"

>>> re.findall(r"\w+", s)

['This', 'is', 'a', 'very', 'good', 'example']

According to the .NET docs, \w is supported, so you just have to find out how re.findall function is called there.

Here is the link for better understanding of regular expression language:

https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

Browse Categories

...