How do canonicals impact indexing?

A canonical tag (aka "rel canonical") is a way of telling search engines that a specific URL represents the master copy of a page. This is done by setting the canonical tag in the head section of the page, as below.


<link rel="canonical" href="https://www.sajari.com" />

Canonicals are used for a variety of reasons, such as choosing the preferred domain, http vs https preference, and consolidation of ranking "juice" for a given piece of content. Good canonicals can also help improve SEO. For more information, read how Google handles canonical tags and why the SEO community considers them important.

Canonicals are very important to the way Sajari works and one of the biggest reasons for crawling failing to index content correctly. They are a very strong signal and we generally won't index a URL if it has a canonical pointing elsewhere; we will instead try to index the canonical URL. The biggest mistakes we see with canonicals are:

  • Redirect loops: The canonical will point to a different URL, which will redirect back to the original, and so on.
  • Unresolvable: The URL in the canonical tag is either not a URL, does not exist, or cannot be resolved.
  • Self referential: Sometimes developers and CMS' set the canonical for each page as itself, defeating the point of canonicals.
  • All the same: Every page on a site has the exact same canonical URL (often the root domain or homepage).

You can tell if you have some of these issues using our content debug tool. You should either a) fix these issues or b) remove canonical tags from your pages altogether. Removing all canonicals is much better than setting them incorrectly.

How to hide a field in a search interface?

Background

When you generate an interface via console for a Site Search collection, we return title, description, URL, and image(optional) in the search results. In some instances, you might want to hide title, description, or URL.

Limitation

Our default interface uses URL field for click-tracking, and it must be returned in response, otherwise, the click-tracking won’t function. Hence, if you try to remove URL field, it will return an error:


tracking field 'url' missing from result.

Instructions

To hide ‘title’ or the ‘description’ field from the search interface:

  1. Generate an interface from the Integrate section in the console.
  2. After choosing the relevant options, and generating an interface, click on “View code“
  3. Add the “fields” parameter in the values object. See example below which will only return and render ‘title’ and ‘URL’:

values: {"q.override": true, "resultsPerPage": "10","q": getUrlParam("q"), "fields":'title,url'}