Pipelines

Understanding the Pipeline Templates

This section will help you understand the pipeline templates and individual steps that are generated when you create a new collection. These pipelines will create a great search experience out of the box and provide a great starting point to further optimise the search results and tailor them to your organizational needs.

After creating a collection, navigate to the pipeline editor. The query pipeline that was created has about 100 lines of configuration code. Let's break it down.

The steps in this pipeline can broadly be categorized into 5 categories.

  1. Result settings
  2. Language settings
  3. Index scoring
  4. Feature scoring
  5. Data training

Result settings

These steps configure how the results are returned (i.e. which page of results, which specific fields are returned for each result, and requesting other data that can be aggregated from the result set). All come with reasonable defaults, so little extra work is needed in most cases.

Filters, fields and pagination

- id: set-filter
- id: set-fields
- id: pagination

These steps enable the passing of variables with the search request to change set filters, change the fields that are being returned and customize the pagination, to change the number of results returned on each page.

Count aggregates

- id: count-aggregate
  params:
    fields:
      bind: count

Use aggregates to implement facets for categories, pricing or similar fields. The above example allows for the query to pass a count variable, defining the fields for which to count distinct values in the result set.

For example, to get a count of unique values in the field color matching the query, include the a count variable with the query:

{
  "q": "your query",
  // ...
  "count": "color"
}

The resulting json response will contain a list of aggregates with the count.

"aggregates": {
  "categories": {
    "count": {
      "Aquamarine": 6,
      "Blue": 3,
      "Crimson": 3,
      "Yellow": 6
    }
  }
}

Max and min aggregates

- id: min-aggregate
  params:
    fields:
      bind: min
- id: max-aggregate
  params:
    fields:
      bind: max

Max and min aggregates calculate the available range for a facet. Pricing or ratings are a common use-case here. Instead of showing arbitrary pricing ranges, they allow for tailoring the available range to the results of the query.

To calculate the minimum and maximum value for a price field for example, the following variables can be passed in with the query:

{
  "q": "your query",
  // ...
  "min": "price",
  "max": "price"
}

The resulting json response contains two values with the price of the cheapest and the most expensive product matching the query.

"aggregates": {
  "price": {
    "min": 89,
    "max": 299
  }
}

Sorting

- id: sort
  params:
    fields:
      bind: sort

Use the sort step to sort results by a particular field. For example to sort by price, pass in a sort variable with the query.

{
  "q": "your query",
  // ...
  "sort": "price"
}

The sort order can be reversed by adding a "-" in front of the field name.

{
  "q": "your query",
  // ...
  "sort": "-price"
}

Language settings

Sajari's delivers great matching out of the box. But with a few tweaks to language specific settings, you can create results even more tailored to you and your business.

Index Spelling

- id: index-spelling
  params:
    model:
      const: default
    phraseLabelWeights:
      const: query:1.0,title:1.0
    text:
      bind: q

The index-spelling check performs spelling correction on the query. The phraseLabelWeights constants specify that previously entered queries (assigned to live training in step:train-autocomplete step) as well as the title field should have equal weight for spelling suggestions.

Synonyms

- id: synonym
  params:
    model:
      const: collection_name
    text:
      bind: q

Synonyms are words or phrases that share the exact same meaning in the same language. For example car is a synonym of auto. This step will augment the query with synonyms defined in the collection and ensures that a search for car will also match documents that contain the word auto.

Index scoring

Steps in this category define what fields should be searched and what weight each of these fields should receive.

Reinforcement learning

- id: index-text-score-instance-boost
  params:
    minCount:
      const: "5"
    threshold:
      const: "0.5"

This is one of the most powerful steps in Sajari's pipelines. It adds a ML score boost to results with positive interactions and decreases the score of results with negative interactions. For the boost to come into effect, a minimum of 5 interactions is required.

Relying solely on textual matching leads to subpar results when there is ambiguity in language. by learning which results had positive interactions (clicked for a website or purchased for products), Sajari automatically improves the relevancy of results over time.

Index Text boost

- id: index-text-index-boost
  params:
    field:
      const: description
    score:
      const: "0.5000"
    text:
      bind: q

A pipeline typically has multiple index-text-index-boosts defined, one for each searchable field. The above step assigns the description a weight of 0.5. The weight should be assigned relative to the importance of the field and will contribute accordingly to the overall result score.

Feature scoring

Feature scoring can be used to fine tune the textual matching results of the index scoring. By taking business data like sales or margins into account, results can be promoted if they are more popular or have a bigger impact on the business. However, it's important to find the right balance between optimizing for business outcomes and accuracy of the textual matching. Since each individual business is different, it often takes some experimentation to get this right.

Filter boost

- id: filter-boost
  params:
    filter:
      const: title ~ q
    score:
      const: "0.05"

The above example works on the title, but works equally well on fields that are not searchable, such as margins or sales. In this example, results that contain the query text in the title will receive an additional boost of 50%. To search for exact matches instead of simply containing the query text, the "~" can be replaced with a "=".

Data training

Post steps in the pipeline are executed after the results are being returned from the index. Typically post steps can further change the order or insert additional results (like promotions) and train the data models in Sajari.

Promotions

- id: promotions
  params:
    text:
      bind: q

Promotions can be defined in the console. This step adds additional results for specific queries that match the promotion, even if those results don't appear in the regular result list.

Train autocomplete

- id: train-autocomplete
  params:
    label:
      const: query
    model:
      const: default
    text:
      bind: q

The last step in the pipeline takes the query text the user entered and trains the autocomplete model if the query successfully delivered results. This improves autocomplete suggestions over time based on your users search behavior.

Summary

The Realtime Relevance Editor in Sajari makes it easy to experiment with different steps and understand the impact on your search results. Use the above as a starting point and explore more advanced steps as you get familiar with Sajari's capabilities.