Where can I find some real world typo statistics?
I'm trying to match people's input text to internal objects, and people tend to make spelling mistakes.
There are 2 kinds of mistakes:
typos - "Helllo" instead of "Hello" / "Satudray" instead of "Saturday" etc.
Spelling - "Shikago" instead of "Chicago"
I use Damerau-Levenshtein distance for the typos and Double Metaphone for spelling (Python implementations here and here).
I want to focus on the Damerau-Levenshtein (or simply edit-distance). The textbook implementations always use '1' for the weight of deletions, insertions substitutions and transpositions. While this is simple and allows for nice algorithms it doesn't match "reality" / "real-world probabilities".