What is federated search?
Federated search is simply the ability to run a search query across multiple sites, domains, or data sources and control how the search functions.
In a website context, you’re creating a real time search application that cuts across multiple domains or subdomains. For enterprise search, it could include different information sources such as intranets, databases, and other datasets.
For our customers, it’s as easy as adding a new domain or subdomain to the list of sites to include in a central index, then creating a few additional rules for how search should appear. For example, if you have a “parent” website and several “sister” websites, you may want the parent website to display results from all sites equally, but on a sister site it could favor results from that sister site.
From a searcher’s perspective, it’s the same user interface: a single search box that helps them to pinpoint the information they need.
Federated search case study: NSW.gov.au
NSW.gov.au is the public face for the Australian state of New South Wales. Like other government agencies around the world, there’s a top-level government website plus dozens of other agency sites such as the department of treasury, ministry of health, department of industry, etc.
The team at NSW.gov.au found that visitors were often searching on questions around driver’s licenses, liquor licenses, moving, and other topics that were available on a sister agency site, ServiceNSW.
To accommodate visitors, NSW.gov.au used Sajari to blend datasets from ServiceNSW — including web pages, PDF and DOCX content —and deliver results alongside the parent website’s content.
Web searches on NSW.gov.au include results from both sites but relevance scoring and machine learning automatically improve results to favor certain keyword searches. Visitors that type in “rego” (short for driver’s license registration) on NSW.gov.au will get results from ServiceNSW.
Each site is being crawled independently and the data is consolidated into a federated search index; no additional connectors were needed. In another situation you may need to use APIs or other types of connectors to build a central index.
Considerations for federated search
There are several considerations for determining exactly how to federate search data across sites. Here are just a few:
Organizing search results
Every site has its own goals and objectives. The content and audience can vary considerably from site to site, so it’s important to ensure that the right content is delivered for the right query.
You can configure rules to deliver result lists on one or more sites in many different ways. For example, you could organize by collection, by filters on the front end, or even by pipeline. And these are just a few ways to think about it. Let’s look at each of them briefly.
- Collection: Each site could have its own search synonyms or promotions. In that case, you may want to configure the system to search across collections differently on each site depending on where the query originates.
- Filters: You could have different tab filters on each site to narrow results to just that site or by topic. In other words, the filters could be explicitly by domain or, if the sites have very different content, the filters could be by content type.
- Pipelines: Index and query pipelines are a feature specific to Sajari. With index pipelines, for example, you can categorize content in websites A and B with a tag that’s similar so each record is tagged as it comes in, but set a different tag for site C. Then you could build rules for search faceting by tags. Or, maybe you want to bias the results from the website someone is searching on. In that case, you would have steps at query time that boost certain results to promote that domain’s results. And, you can also do filtering in the pipeline itself.
Likely the search index will be some mixture of all the above. In other words, there are many different options for a federated search solution both at index and query time to deliver relevant information. Start with determining the goals and outcomes for visitors on each site and then you can determine exactly how to accomplish those goals with search.
Indexing, schema, and data transformation
Perhaps the biggest challenge of federating site search is indexing and managing radically different schema and site organization.
- Different schema: Sites can each use different schemas such as Dublin Core, Open Graph, or Schema.org which have different metadata fields and date and time formats.
- Domain structure: Each site could have a very different domain structure and hierarchy. Search engines can use the domain structure (e.g., /index, /products/, /services/, /services/details/, etc.) to categorize results and improve relevance.
- Tagging: The index can be impacted by (1) how h1, h2, h3, etc., tags are structured and (2) what metadata is included within tags (e.g., meta labels and properties).
To manage for these differences, you can add rules to transform data as it’s being indexed. For example, you will want to store records and data, such as time/date, in a consistent format. You may also choose to transform content for the search index. One website may call it “Corona virus” and another might call it “ COVID-19,” so you’ll want to store an index that contains the synonym. This can also be handled through more advanced vector analysis to cluster the data as numeric topics.
Another consideration is duplication of content. Different sources of data may have the same type of content, e.g, /about or /company pages, so when someone is searching for information about the business they could come across both. You will want to decide how to handle duplicate or very similar content.
Search filters help users narrow their search query to find exactly what they want. With federated search, you can create filters that cut across each site or which are specific to each domain.
Generally speaking, there are three different types of filters. These are not mutually exclusive; you can use one or all three if you wanted.
- Static filters which allow end users to filter content after entering a search query. For example, you can give visitors a way to filter results by topic or rating.
- Dynamic filters (also called facets), which are generated based on the values of the search result set. As an example, if a user is searching for “car” you could display all the brands — Toyota, Ford, Volvo, etc. — available for that category, which would be different from a search for “boats” or “motorcycles.”
- Filter results using filter expressions. In this case, the end-users will always see the filtered results. For example, you could limit results to only one site or part(s) of your site(s). If you sell shirts, shoes, and jewelry, you could exclude results from one or more sections. In other words, categories are a natural place to start your filters.
Whichever kind of filter(s) you choose, you’ll want to consider how they show up on each site. You can use the same filters on each site, or deliver filters contextually.
A brief note on analytics: if you’ve added federated searching across different sources, how do you know it’s working? Each site owner will want a view into site search performance on their site. Metrics such as search CTR, popular queries, and ineffective searches should be monitored to ensure visitors are finding what they need.
Federated search can provide a better search experience for end-users, but it requires a good deal of planning to ensure the results match expectations on each site. It’s worth pointing out that federating search doesn’t mean that each site needs to use the same CMS or adhere to the exact same schema or metadata standards. As long as the search index can be standardized, it is easily possible to deliver great results.