What is search relevance?
Search relevance is a measure of relatedness between a search query and search results.
How search engines determine relevance is anything but simple, however. Relevance can be influenced based on any number of factors — search terms, popularity, location, past search or purchase history, and browsing behavior, just to name a few.
For web search engines, Google has set the standard. It certainly helps that Google has nearly two decades of data with hundreds of billions of searches and thousands of engineers and data scientists to fine tune relevance settings.
On-site search can be more challenging, but as we’ll see, it’s possible to provide very good search results and a great search experience even with more modest search datasets.
In this article, we’ll take a look beneath the hood to see how we index web page content and interpret queries to provide more relevant results, and how AI and ranking algorithms can influence search relevance.
More than 40% of all Google search queries are 4 words or more. The number of query terms have grown over the last couple of decades thanks in part to the advent of AI embeddings and voice search. Voice search has also changed how we search — the way we type is different from how we speak. With trillions of documents on the web, people are accustomed to writing longer-tailed search queries to narrow down results.
Search is inherently fuzzy and language is often ambiguous because a user’s query and intent is not always apparent. A search engine needs to try to make sense of the different words in a query to return relevant documents. “Bank” is a classic example of this; does it mean a financial institution or the side of a river? For some e-commerce site search use cases, customers may even type in symptoms or adjectives to find answers. Without added context, it is difficult to know exactly what they need.
There are a number of techniques that site search platforms use to help parse the meaning of a user search, including:
- Natural language processing (NLP) is the process of analyzing unstructured text to infer structure and meaning.
- Semantic query understanding is the process of actually trying to understand the intent of queries.
- Personalization to add additional information — past search history, purchase history, geo, etc. — to a query based on the individual that is searching.
- Word embeddings, vectorization, query segmentation, scoping, and other techniques are available to help search engines make sense of a query.
Typos are another problem to manage. To avoid bad search results and improve relevance, there’s a need for spell checking. Somewhere between 10-25% of queries in a search box can be misspelled, and Baymard reports that, “27% of sites are incapable of handling misspelling of just a single character in the product title”. It seems like an obvious feature that site search engines should have, but many, if not most site searches, lack good typo tolerance today.
Good search results and relevance starts well before anyone has typed in a search query. To understand how search relevancy can be improved, we need to start with search term indexing.
The search index
Before a search engine can start determining result relevance, it needs to be able to analyze each record you want to search through. Search engines will create a search index either through a web crawler or an API which ingests your site data using your sitemap, site links, or pages referenced in a datastore.
When indexing, multiple algorithms can be evaluated over each indexed record and add extra fields and information to the dataset that is helpful when querying. For example, Sajari now includes an additional feature called index pipelines that will enrich and transform data as it’s being ingested. The index pipeline has some defaults, but can be expanded as well — for one, you could use the Google Vision API to automatically extract color metadata from images to build a richer index.
When performing a search, Sajari assigns a relevance score to each document in your index. The score ranges from 0 (no match) to 1 (perfect match) and search results are ordered starting with the highest score. The relevance score consists of two score components, the index score and the feature score.
- Index score: The index score represents the textual relevance of the total score. In other words, how well the search text matches the content of the documents. This takes into account spelling, synonyms, stemming, AI based word embeddings and other language specific features.
- Feature score: The feature score represents business specific; customers can use it to make ranking adjustments to better tailor results to business requirements.
Relevance metrics scoring
If your site search is performing well by improving search clicks and revenue, it means your search engine is doing a decent job of delivering relevant search results. But there’s another, more objective measure of relevance we can use as well to get an idea for how well search is performing and discover ways to improve it.
The result of AI models on relevance can be measured using ranking quality scores such as:
- nDCG: The normalized discounted cumulative gain can determine similarity between how well a set of query results is ordered for a particular query. The higher the score, the higher the relevance of the set. This is the scoring method we use when testing search relevancy.
- MRR: The mean reciprocal rank orders results by the probability of correctness. As the name implies, it calculates the reciprocal of the rank. The score is 1 if a web page was ordered in the 1st place, 0.5 if it was ranked at 2, and so on. The mean reciprocal rank is the average across queries.
In addition, there are order ranking methods including:
- TF-IDF: One of the oldest ranking models, the term frequency-inverse document frequency model is a statistic that measures how important a word is to a document based on how many times a word appears in the document (or web page).
- Okapi BM25: The Okapi information retrieval system was developed at London City University. The “BM” stands for best match. There are newer variants, but all of them work similarly to TF-IDF.
- Dense retrieval: Uses vectors (or hashes) to find relevance mathematically. This is great, but sloooooow.
- Learn to rank: Uses a multi stage process to reorder results based on better relevance models.
- Hybrid retrieval: Another option here which uses a mix of the above.
- Learn to hash: This is a new technique that compresses vectors into “neural hashes” for fast and smart retrieval.
You can learn more about relevance scoring on this excellent Moz blog. The quality of the records matters a good deal. Content such as titles, descriptions, tags, headers, and metadata can greatly influence quality score. (Check out our free Search Health Report to see how well optimized your site is for search engines)
Dynamic boosting, signal boosting and machine learning
Relevance scoring is not static. The scores can be improved over time using signal boosting, machine learning, and algorithm adjustments.
Signal boosting is the process by which search engines leverage user behavior such as clicks and conversions to optimize search result ranking.
As more users click certain results, the system learns which pages (or products or records, etc.) are most popular and assigns them a higher relevance score. Similarly, search results that lead to site conversions (signup, shopping cart, revenue, etc.) are scored higher because they lead to important outcomes.
Signal boosting can improve ranking and relevance. It’s part of our machine learning algorithm. Prior to AI, search relied on keyword lookups, much like the index in the back of a book. This is very fast, but it regularly misses items of relevance. Humans can write rules ad infinitum and there will still be endless accuracy issues. This is the standard of search today.
AI-based search offers tremendous power through continuous and automatic improvements with intelligent feedback loops (signal boosting!). AI uses vectors, a mathematical approach to representing words, which encapsulate meaning of text very effectively. AI ranking requires using off-the-shelf models or building your own AI models for transforming text-based queries into vectors.
Sajari offers signal boosted relevance and AI (specifically reinforcement learning) packaged together in a feature called dynamic boosting, which also includes the data collection required to improve relevance and ranking. This system automatically collects data from your site, app, or store to construct the data model, machine learning to determine relevance, and signal boosting to improve results. Based on your desired objectives — conversions, signups, revenue, clicks, etc. — the search platform does the rest.
We also give customers the ability to adjust the ranking algorithm to improve results that meet specific business needs — for example, putting more weight on shopping cart activity than clicks, or layering on personalization to improve relevance.
The ability to offer exceptional user experience for site search is more possible today than ever before. But not all search engines are equal. At a minimum, a search engine should not only include features such as NLP, semantic query understanding, typo tolerance, AI, and signal processing, but also package it with fast information retrieval.
Hopefully this article provided a good primer on search relevancy with some insights into how we have engineered a state of the art relevance solution. Try implementing Sajari to see how it works for your use case — sign up for a free 14-day evaluation or contact us for a custom demo.