As I type into search boxes I’m always intrigued to see how they respond. Best case I feel like my mind has been read, more often it’s the opposite, but either way I smile - bad results hint at the opportunity and great results show what is possible. Both are the reality of search technology we experience today. Yet the future will be very different; as what we use today will be the horse and cart of tomorrow and the humble search interface will expand well beyond its uses of today. The seeds of this shift have already begun, but most are still trying to buy a better horse…
80% of all information created today is unstructured (free text with little structural explanation). In 2017 alone the information created was expected to be greater than the previous 5000 years combined. In addition to sheer volume, the rate of information creation is accelerating rapidly, 10x per year by 2025, which is also the year each human is expected to interact with connected devices nearly 5000 times each day.
So what does all that mean? Largely humans have a data problem; while Moore’s law captures the ever expanding computational power of machines, humans are left in the dust. We can’t read faster, we don’t store more information and while computationally the human brain is extremely powerful for certain tasks, it’s orders of magnitude slower than computers and can’t be parallelized like machines. (Yes there is collective intelligence but it’s not the same.)
Thus we think alone and the usefulness of our information processing life is capped by two things: our lifespan and how fast we can learn. Lifespan is largely out of our control, this is changing a little but relatively constant. While learning rate is partially genetic and partially a conscious choice, even what we can control has virtually no bearing on our information crunching capacity. Until we can plug in Matrix style, we as humans are extremely limited (note: there are signs memories can actually be implanted using RNA, but this is not happening to humans anytime soon).
Knowledge is a pyramid
A few hundred years ago, the same people would innovate in maths, physics, astronomy and other sciences during their lifetime. A hundred years ago people could innovate across a full discipline. Nowadays each discipline has been split over and over into smaller areas building on top of the existing information. The days of humans being fully across large areas of knowledge are gone. The speed that we can learn, adapt and utilize information is what is truly important today.
“You may have noticed students who just try to remember and pound back what is remembered. Well, they fail in school and in life. You’ve got to hang experience on a latticework of models in your head.” — Charlie Munger
Learning adaptability is a problem. Some reports are now projecting up to 60% of university degrees may be leading people to jobs that won’t exist in five to ten years from now. Humans can’t spend nearly the first 30 years of their life absorbing knowledge to face a world that no longer exists. Learning speed is also a problem. Humans are relatively slow to read and capture information and therefore run the risk of spending less time actually adding value.
Highly successful people tend to have broader interests and are better able to identify opportunities and accelerate their learning into those areas. They also devote more ongoing time to learning and learning how to accelerate learning itself is arguably the most valuable skill of all.
So how can we maximize learning rate and knowledge utilization while humans are still heavily restricted by our information processing capacity, and information itself is growing at a rate that far outstrips our capacity anyway?
- Faster access to information
- Smarter use of information
The first of these is literally about traversing information faster, search technology is great for that. The second part is about extending human processing capacity such that the machine based traversal evaluates information more intelligently (much like we would do). AI promises to be great for that.
Search and AI to the rescue?
In the case of information, search technology can take a query and return results to humans as if they had actually read and analyzed more information than can be read in their lifetime (many times over) in just a fraction of a second. That is of course the promise, but more often it’s like a five year old has done the analysis; the volume of information is enormous, but the analysis is rather unintelligent. AI is set to change this and usher in a totally new era of search.
Humans on average interact directly with over 10 different search technologies on a daily basis (web, ecommerce, work, files, virtual assistants, news feeds, etc.) but most they wouldn’t even remember. McKinsey has also suggested around 20% of knowledge based workers time is spent looking for information. Search technology is quietly becoming part of the fabric of life we take for granted, yet still a seemingly long way from fulfilling its potential.
However, the reality is that AI is changing this very quickly - Google publicly acknowledged in 2015 that RankBrain (an AI based ranking factor) was one of the top three contributors to Google search ranking. Today it may even be the most important of all. Many SEO practitioners have been stating that it’s no longer possible to game the Google ranking algorithm and this is likely due to the sheer complexity of the AI model (a problem for AI engineering in general). AI models can in essence allow different queries to use totally different ranking criteria.
As amazing as this is, unfortunately for the rest of the world, search technology looks similar to what it did in the late 90s. Although methods such as learn-to-rank and some other machine learning (ML) advances have helped search rankings to improve, they are far from revolutionary, for now…
The impact of AI
The promise of AI is likely over hyped, but the reality is it is already changing the world we live in drastically and that impact is only going to increase. Search technology is not as easy to impact however, as the mechanics of data structures and processing techniques were originally built around keywords, filtering and boosting. Utilizing word and paragraph vectors and neural networks for ranking at scale is a whole different ball game.
The learn-to-rank approach used today typically looks like below:
The main drawbacks of this approach are:
- The top-k is usually a small number of records, so this assumes the initial “dumb” query contains the best result
- The index structure is immutable, so the top-k does not change unless the index is re-written
- Training data is difficult to collect at scale, particularly for long-tail queries
- The ranking model is typically not neural network based (e.g. RankSVM, gradient boosted regression, such as XGBoost, etc) because of performance and training data limitations.
Aside from these issues, this also requires an enormous amount of engineering to get up and running. It’s difficult even for those who can find and afford the engineers capable of building these types of systems, and far out of reach for the average business/organisation. So AI has not yet really made it into the search interfaces we use today, except for companies with massive investment, such as Google.
This is changing though. Over the next few years search technology will transform from a configurable and deterministic processing engine into an intelligent and evolving extension of human thought. AI will inject our intelligence into search technology; businesses will be more efficient and lives will be dramatically improved. The transformation has already begun and we @Sajari can’t wait to see it all play out…