How can I fix PDFs and DOCs that fail to index or have the wrong title?

If a few PDF and DOC files are not added to your collection or have the wrong title, here are some steps to take.

  1. The first thing to do is check how the crawler views your document. Do this by adding the URL of the document to the debug page.
  2. If the debug page shows that the page is indexed correctly, then go to the Domains section of the console and use “Diagnose” to see the current crawl status of the page. If status is no-index or redirect, then it means that there are rules in the collection or a no-index tag in that document due to which we cannot crawl your document.
  3. If the debug page shows an error and mentions that it can't download the document, then it's likely a corrupt file. Some systems may still be able to open the file, but not all. We recommend re-saving or exporting with a different program or version.

Regarding the documents that have wrong title, we take the title from the metadata of the document. If no title is present, then we use the filename instead. You can do the following to update the title:

  1. Update either the metadata or the filename and upload the file to your CMS/website
  2. Once added, we will index the PDF on the next crawl cycle. If you want the change to reflect immediately, then re-index the URL of the PDF document via our Diagnose tool in the Domains section.

How to hide a field in a search interface?

Background

When you generate an interface via console for a Site Search collection, we return title, description, URL, and image(optional) in the search results. In some instances, you might want to hide title, description, or URL.

Limitation

Our default interface uses URL field for click-tracking, and it must be returned in response, otherwise, the click-tracking won’t function. Hence, if you try to remove URL field, it will return an error:


tracking field 'url' missing from result.

Instructions

To hide ‘title’ or the ‘description’ field from the search interface:

  1. Generate an interface from the Integrate section in the console.
  2. After choosing the relevant options, and generating an interface, click on “View code“
  3. Add the “fields” parameter in the values object. See example below which will only return and render ‘title’ and ‘URL’:

values: {"q.override": true, "resultsPerPage": "10","q": getUrlParam("q"), "fields":'title,url'}