• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How does spell check work?

robphelan

Diamond Member
I'm working in a proprietary world, SAP, where there is not much in the way of a spell checker and I'm thinking about developing one.

I know the simplest solution would be to parse the string into individual words then compare against a dictionary.

But, what about word suggestions? Say someone typed "strnig" , I want to suggest "string", or even "strong".

thanks for your input.

EDIT: I would like to add that many SAP implementations have not evolved enough to make web service calls. I'm sure I could simply code a call to an existing online service, but that may not be the best solution - especially since you are relying on some 3rd party.
 
I programmed a spell check in college using Soundex. I am guessing a lot of spell checkers work similarly in that way.

That's really interesting and sounds do-able.

a few questions on that:

do you build your dictionary and convert all the words using this algorithm ahead of time?

then, convert the input string and look for words that don't exist in the dictionary?

then using those words, encode them using the Soundex algorithm and find all the matching values in the dictionary to use as suggestions?
 
Last edited:
You've pretty much got it. Now you just have to code it 😉 Also, there is something called edit distance you should check out.

EDIT: LOL...looks like Crusty beat me to it. I'm not sure which one has a great order of magnitude. Is O() always proportionate to CPU cycles?
 
Another thing you might consider, throwing in a Bayesian algorithm http://en.wikipedia.org/wiki/Bayes%27_theorem

You'll need to record common words, however, after you have a good list of common words you should be able to give a heavier weight to words that are used most often.

For example, someone who puts in hte probably doesn't mean hate, height, or hit, they probably mean the. Bayes Theorem provides a way to weigh words like "the" more heavily.

http://www.cs.indiana.edu/classes/b551/slides/IntroductionToBayesianClassification.pdf
Some slides that talk about it as well.
 
Last edited:
"incorrect" results? Or just different but still likely results? Spell checking isn't really a problem with an exact solution.
 
sorry.. i meant that I would input a word in my program:
dragon <-> TRN

while the online version shows:
dragon <-> TRKN

At this point, i'm only working on the metaphone conversion of a single word at a time. the "suggestions" will come a little later.
 
I wish google released a "Spell Check" app everything else I have used is inferior

What is ridiculous is how horrible chrome's spell checker is. I mean, come on, the google search engine has one of the best spell checkers out there, yet chrome's is one of the crappiest.
 
this may seem obvious... but what about punctuation? just drop the apostrophe, hyphens, etc...?

Good question. For dashes, slashes, and underscores I would split and check the two components of the compound word separately. For apostrophes I would drop the mark and everything after it and check the remainder. That should work in English, anyway.
 
Good question. For dashes, slashes, and underscores I would split and check the two components of the compound word separately. For apostrophes I would drop the mark and everything after it and check the remainder. That should work in English, anyway.

Well, sort of. For this application, it should be fine. However, it isn't exact. Words like isn't, doesn't, couldn't are all correct yet would read incorrect for this spellchecker.

Other things like saying I'm fixin' to do 'er right. Could also pose a problem for these spell checking rules.

This is one reason why the written english language is so hard to parse. All the rules are optional.
 
Well, sort of. For this application, it should be fine. However, it isn't exact. Words like isn't, doesn't, couldn't are all correct yet would read incorrect for this spellchecker.

Other things like saying I'm fixin' to do 'er right. Could also pose a problem for these spell checking rules.

This is one reason why the written english language is so hard to parse. All the rules are optional.

Good point, I wasn't thinking about contractions.
 
it looks like the dictionary I'm using doesn't have contractions..

I searched it for does* and it returns
DOES
DOESKIN
DOESKIN
DOEST

I suppose I can add the contractions to my dictionary manually.. also, in pre-processing, I bet I can easily filter out words that contain special characters including hyphens, asterisks etc... and trust that the user knows what they're typing
 
Back
Top