Engineering great search experiences is incredibly difficult, even for large companies with big budgets.
There are generally two choices.
- Choose a traditional search index technology. They are powerful and do the basics well, but require specialized knowledge and significant implementation effort for most advanced capabilities.
- Choose a simplified turn-key solution. They are easier to understand and implement for specific use cases, but have limited capabilities.
We didn’t like that trade-off. That’s why we developed a new search engine technology from the ground up. Sajari combines the speed and text matching capabilities of a traditional search engine with the power of a database. But the key to simplicity is how you configure your search solution — with pipelines.
Pipelines for search customization
Pipelines allow you to break down complex problems into smaller pieces that can be mixed, matched, and combined to create incredibly powerful custom search solutions.
Pipelines define a series of steps that are executed sequentially to produce an outcome. Sajari features two types of pipelines:
- Record pipelines are executed at index time, and update and augment information.
- Query pipelines are executed at query time, and construct highly complex and effective queries from a series of steps.
There are several main advantages to pipelines:
- Each pipeline step does one thing, so they are easy to understand.
- The state is passed from step to step, so it’s easy to build highly complex workflows.
- Each step can be turned on/off using conditional expressions.
- Complex engine query requests are compiled for you.
- Pipelines can be versioned, A/B tested, and changed in real-time without reindexing your data.
Steps are a unit of work in the pipeline flow. They can do many things, including:
- Query understanding (including query rewrites, spelling, NLP, and more).
- Filtering results.
- Changing relevance logic.
- Training language models (including spelling and autocomplete).
How do steps work?
Steps are made up of several components:
- Constants are used to configure how steps work. They are fixed and cannot be changed for each execution of the pipeline.
- Params include input and output params. By default, input params are bound to variables set in the query request, and output params are returned as variables in the query response. They can also be configured to be constant (i.e., set when the pipeline is created and then fixed for each execution of the pipeline, ignoring any variable values).
- Conditions are boolean expressions to evaluate the pipeline param values. If the condition is satisfied then the step will execute, otherwise it is bypassed.
Let’s take a look at a few examples.
Add spell-checking on the query (q) using a default language model:
- id: index-spelling params: text: - bind: q model: - const: default
Boost popular pages on your website:
- id: popularity-boost params: field: - const: popularityScore score: - const: "0.05"
Boost pages with a published_time in the last two weeks
- id: recent-boost params: cut-off: - const: 336h0m0s field: - const: published_time score: - const: "0.05"
Ok, these have all been pretty straight forward. Aside from the magic that happens behind the scenes with the spell-checking, but more on that in another post.
Let’s take a look at a more advanced example that will boost discounted items for customers in a particular segment or coming from a particular campaign.
- id: set-param-value description: discount is known to be important for this person condition: campaign = "black_friday" OR segment = "discount_group" params: param: - bind: discount_importance value: - const: "high" - id: percentage-boost description: boost products with increased discount condition: discount_importance = "high" params: field: - const: discount_percent score: - const: 0.05
Breaking down the two steps above, the set-param-value step:
- Evaluates the query input params to see if the person searching arrived via a “black_friday” campaign or if they are in a known segment “discount_group”.
- Sets a new param called discount_importance with the value high. This param can be accessed in subsequent steps and will be available in the query output.
The percentage-boost step:
- Activates if the discount_importance param variable is set to high.
- If the condition is met, the percentage-boost step is activated (5% weight) and products with discounts are linearly increased in ranking importance based on their level of discounting.
Note: Both these inputs (“black_friday” and “discount_group”) are created externally. You can leverage any business data and use them as inputs.
This example illustrates how powerful the concept is. Highly complex personalization using dynamic filtering and ranking can be quickly configured and tested in minutes.
Calling external systems
Another powerful feature of pipelines is the ability to call out to external systems. This allows you to augment records at indexing time by calling out to cloud functions or any external service.
For example, you can call out to an external vision API to extract metadata from an image. The meta-data can then be stored in the record, allowing you to search and filter on that data.
- id: http-fetch-json consts: url: - value: https://<my-project>.cloudfunctions.net/vision-api timeout: - value: 5000ms payloadFields: - value: image payloadParams: - value: visionIn,visionOut authToken: - value: <my-secret>
This simple step allows images to be searched by color and descriptions generated by AI models. See this in action below, where the “gift card” query is being filtered for green items, which works perfectly even though none of these products mentions the word “green” anywhere!
Visual search can be added in just a few lines of code. Calling external systems is also useful if you have inventory information or other useful business data residing in a different system. This allows you to join data from multiple systems on each record update.
Pipelines are immutable and cannot be edited once created. This makes it easy to version pipelines and compare the changes. It also makes analytics for feeding machine learning algorithms consistent.
You can create as many pipelines as you want and A/B test different configurations in real-time without duplicating your index data.
Getting started with pipelines
Based on the format of your data and the resulting schema, Sajari creates a basic pipeline setup for you. From there you can edit the pipeline directly in the Sajari console via the pipeline editor or use your favorite code editor and version control software.
We’ve built documentation right into the editor. When you select a step id, Sajari displays the corresponding docs and code examples. It also displays a real-time search preview as you change the pipeline configuration, immediately showing how the changes affect the results.
Sign up for a Sajari account and take pipelines for a spin!