Language is tricky and people can't spell very well, so often literal meaning can easily be misleading, or lead to nothing at all. The human mind has an amazing ability to fill in the gaps, correct mistakes and understand things that can't even be written (sarcasm, etc). Search needs to try to do these things too, but that's not as easy as it sounds. Many search engines are terribly slow when turning on things like synonyms and "fuzzy search" (more on this below), others don't even support these features.
Sajari offers a range of useful features to cover not only synonyms and fuzzy search, but also additional features to deal with contextual forks (e.g. where text components of a query have ambiguous meaning). Below is a brief outline of these useful search features.
You say "clever", I say "intelligent". You say "cost", I say "pricing". Language is full of things with similar or identical meaning. Synonyms help search engines to translate a query with this in mind. Unlike other search engines, these synonyms don't have to be equivalent and can be weighted as more/less important as well. This means you can control what people see in the results. If you want people searching for "surface pro" to see results with "ipad" above those with "surface pro", you can do that by setting a weight above 1.0 and visa versa.Also note these are not reversible by default, so you can control bidirectionality.
Synonyms are more than just a language construct, they're also a way to translate your visitor queries to match your content. To help with this we provide statistics on search queries to allow you to see which queries are performing poorly and create synonyms as required. This is incredibly useful when visitor queries are mismatching your content. Consider the search stats below, more searches were made for the term "mesahighdensity" than the term "mesa high density", but the former has zero click throughs. You might think spell checking and fuzzy query matching should fix this automatically, but tradeoffs for speed and coverage must be made, so this may not be corrected. Keep in mind that on other occasions there may also be much less overlap between two terms you want to use as synonyms.
Sajari allows synonyms to be added in seconds via your management interface, or bulk loaded via CSV. The form for setting these up in Sajari is show below. Note the field called "potency", this is what allows you to control how well the synonym is related to the parent. By default the potency = 1.0. Along with the "trigger" term (if this exists, we expand the query to include the "intro"), there is another term called the "clue" term, more on that below in the context clues section.
Fuzzy spelling (or approximate string matching) relates to the way people mispell words and phrases and how to autocorrect these efficiently. There are many ways to do this, some more efficient than others, all balance speed, memory overhead and correction efficiency. We've spent quite some time on this problem and have open sourced our fuzzy spelling package, which details the accuracy and speed.
Our fuzzy matching algorithm is very fast, in fact it's virtually irrelevant in terms of slowing your searches down, even for larger sites/apps. It's also not a fixed model, it grows and adapts to your content as it is added. This also means it handles any character sequences including jargon and or various languages (we haven't tested them all, but feel free to try and let us know). Popular words are more likely to be replaced. As soon as the occurences of a given term exceed a certain threshold, that term is automatically added to your fuzzy dictionary.
Below is an example of a fuzzy match correction of 3 mistakes.