Beyond Term-Based Search
Until recently, Search at Spotify relied mostly on term matching. For example, if you type the query “electric cars climate impact”, Elasticsearch will return search results that contain everything that has each of those query words in its indexed metadata (like in the title of a podcast episode).
However, we know users don’t always type the exact words for what they want to listen to, and we have to use fuzzy matching, normalization, and even manual aliases to make up for it. While these techniques are very helpful for the user, they have limitations, as they cannot capture all variations of expressing yourself in natural language, especially when using natural language sentences.
Going back to the query “electric cars climate impact”, our Elasticsearch cluster did not actually retrieve anything for it… but does this mean that we don’t have any relevant content to show to the user for this query?
Enter Natural Language Search.
Natural Language Search
To enable users to find more relevant content with less effort, we started investigating a technique called Natural Language Search, also known as Semantic Search in the literature. In a nutshell, Natural Language Search matches a query and a textual document that are semantically correlated instead of needing exact word matches. It matches synonyms, paraphrases, etc., and any variation of natural language that express the same meaning.
To continue reading this article, click here.