Monday, October 5, 2020

Introducing Sajari Pipelines for Intelligently Engineered Search

Jens Schumacher

Engineering great search experiences is incredibly difficult, even for large companies with big budgets. 

There are generally two choices. 

  1. Choose a traditional search index technology. They are powerful and do the basics well, but require specialized knowledge and significant implementation effort for most advanced capabilities. 
  2. Choose a simplified turn-key solution. They are easier to understand and implement for specific use cases, but have limited capabilities. 

We didn’t like that trade-off. That’s why we developed a new search engine technology from the ground up. Sajari combines the speed and text matching capabilities of a traditional search engine with the power of a database. But the key to simplicity is how you configure your search solution — with pipelines. 

Pipelines for search customization

Pipelines allow you to break down complex problems into smaller pieces that can be mixed, matched, and combined to create incredibly powerful custom search solutions

Pipelines define a series of steps that are executed sequentially to produce an outcome. Sajari features two types of pipelines:

  • Record pipelines are  executed at index time, and update and augment information.
  • Query pipelines are  executed at query time, and construct highly complex and effective queries from a series of steps.

There are several main advantages to pipelines: 

  1. Each pipeline step does one thing, so they are easy to understand.
  2. The state is passed from step to step, so it’s easy to build highly complex workflows.
  3. Each step can be turned on/off using conditional expressions.
  4. Complex engine query requests are compiled for you.
  5. Pipelines can be versioned, A/B tested, and changed in real-time without reindexing your data.

Steps

Steps are a unit of work in the pipeline flow. They can do many things, including:

  1. Query understanding (including query rewrites, spelling, NLP, and more).
  2. Filtering results.
  3. Changing relevance logic.
  4. Analytics.
  5. Training language models (including spelling and autocomplete).

How do steps work?

Steps are made up of several components:

  1. Constants   are used to configure how steps work. They are fixed and cannot be changed for each execution of the pipeline. 
  2. Params  include input and output params. By default, input params are bound to variables set in the query request, and output params are returned as variables in the query response. They can also be configured to be constant (i.e., set when the pipeline is created and then fixed for each execution of the pipeline, ignoring any variable values).
  3. Conditions are boolean expressions to evaluate the pipeline param values. If the condition is satisfied then the step will execute, otherwise it is bypassed.

Let’s take a look at a few examples. 

Add spell-checking on the query (q) using a default language model: 

- id: index-spelling
  params:
    text:
    - bind: q
    model:
    - const: default    

Boost popular pages on your website:

- id: popularity-boost
  params:
    field:
    - const: popularityScore
    score:
    - const: "0.05"

Boost pages with a published_time in the last two weeks

- id: recent-boost
  params:
    cut-off:
    - const: 336h0m0s
    field:
    - const: published_time
    score:
    - const: "0.05"

Ok, these have all been pretty straight forward. Aside from the magic that happens behind the scenes with the spell-checking, but more on that in another post. 

Let’s take a look at a more advanced example that will boost discounted items for customers in a particular segment or coming from a particular campaign

- id: set-param-value
 description: discount is known to be important for this person
 condition: campaign = "black_friday" OR segment = "discount_group"
 params:
   param:
   - bind: discount_importance
   value:
   - const: "high"
- id: percentage-boost
 description: boost products with increased discount
 condition: discount_importance = "high"
 params:
   field:
   - const: discount_percent
   score:
   - const: 0.05

Breaking down the two steps above, the set-param-value step:

  1. Evaluates the query input params to see if the person searching arrived via a “black_friday” campaign or if they are in a known segment “discount_group”. 
  2. Sets a new param called discount_importance with the value high. This param can be accessed in subsequent steps and will be available in the query output. 

The percentage-boost step:

  1. Activates if the discount_importance param variable is set to high
  2. If the condition is met, the percentage-boost step is activated (5% weight) and products with discounts are linearly increased in ranking importance based on their level of discounting.

Note: Both these inputs (“black_fridayand “discount_group”) are created externally. You can leverage any business data and use them as inputs. 

This example illustrates how powerful the concept is. Highly complex personalization using dynamic filtering and ranking can be quickly configured and tested in minutes. 

Calling external systems

Another powerful feature of pipelines is the ability to call out to external systems. This allows you to augment records at indexing time by calling out to cloud functions or any external service.

For example, you can call out to an external vision API to extract metadata from an image. The meta-data can then be stored in the record, allowing you to search and filter on that data. 

- id: http-fetch-json
 consts:
   url:
   - value: https://<my-project>.cloudfunctions.net/vision-api
   timeout:
   - value: 5000ms
   payloadFields:
   - value: image
   payloadParams:
   - value: visionIn,visionOut
   authToken:
   - value: <my-secret>

In the example above, the inbound record (this is a record pipeline) is calling out to the Google Vision API via a simple intermediate cloud function

This simple step allows images to be searched by color and descriptions generated by AI models. See this in action below, where the “gift card” query is being filtered for green items, which works perfectly even though none of these products mentions the word “green” anywhere!

Example of sorting by color where color metadata is generated automatically

Visual search can be added in just a few lines of code. Calling external systems is also useful if you have inventory information or other useful business data residing in a different system. This allows you to join data from multiple systems on each record update.

Version control

Pipelines are immutable and cannot be edited once created. This makes it easy to version pipelines and compare the changes. It also makes analytics for feeding machine learning algorithms consistent.

You can create as many pipelines as you want and A/B test different configurations in real-time without duplicating your index data. 

Getting started with pipelines

Based on the format of your data and the resulting schema, Sajari creates a basic pipeline setup for you. From there you can edit the pipeline directly in the Sajari console via the pipeline editor or use your favorite code editor and version control software.

Editing a Sajari search pipeline

We’ve built documentation right into the editor. When you select a step id, Sajari displays the corresponding docs and code examples. It also displays a real-time search preview as you change the pipeline configuration, immediately showing how the changes affect the results. 

Sign up for a Sajari account and take pipelines for a spin!

Enjoyed the story? Share it via
Looking to improve your site's search?

Sajari is a fully-featured search platform for your site, ecommerce store or app that includes machine learning powered results, powerful analytics and fully flexible interface options. Sign-up for a free 14-day trial today or contact us at sales@sajari.com for more information.

See Sajari Site Search