会话感知的查询自动完成使用极端多标签排名

论文标题

会话感知的查询自动完成使用极端多标签排名

Session-Aware Query Auto-completion using Extreme Multi-label Ranking

论文作者

Yadav, Nishant, Sen, Rajat, Hill, Daniel N., Mazumdar, Arya, Dhillon, Inderjit S.

论文摘要

查询自动完成（QAC）是搜索引擎中的基本功能，该任务是建议在搜索栏中键入的前缀合理完成。用户会话中的先前查询可以为用户的意图提供有用的上下文，并可以利用以建议在遵守用户前缀时更相关的自动完成。可以通过最近的顺序到序列深度学习模型生成这种会话感知的QAC；但是，这些生成方法通常不符合对每种用户击键的响应的严格延迟要求。此外，这些生成方法构成了显示荒谬查询的风险。在本文中，我们提供了解决此问题的解决方案：我们采用建模会话感知QAC作为极端多标签排名（XMR）问题的新方法，其中输入是会话中的先前查询和用户当前的前缀，而输出空间是最近在最近的过去中输入的数十亿查询的集合。我们通过对算法中的关键步骤提出几个修改来调整流行的XMR算法。所提出的修改在公共搜索日志数据集上的基线XMR方法方面的平均值等级（MRR）方面提高了3.9倍。在使用会话上下文时，我们能够保持小于10 ms的推理潜伏期。与可接受潜伏期的基线模型进行比较时，我们观察到MRR的33％改善的短前缀最多3个字符。此外，作为A/B测试的一部分时，就建议接受率而言，就建议接受率而言，在生产QAC系统方面，在生产QAC系统上产生了2.81％的统计学显着提高。

Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the user's intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the user's prefix. Such session-aware QACs can be generated by recent sequence-to-sequence deep learning models; however, these generative approaches often do not meet the stringent latency requirements of responding to each user keystroke. Moreover, these generative approaches pose the risk of showing nonsensical queries. In this paper, we provide a solution to this problem: we take the novel approach of modeling session-aware QAC as an eXtreme Multi-Label Ranking (XMR) problem where the input is the previous query in the session and the user's current prefix, while the output space is the set of tens of millions of queries entered by users in the recent past. We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm. The proposed modifications yield a 3.9x improvement in terms of Mean Reciprocal Rank (MRR) over the baseline XMR approach on a public search logs dataset. We are able to maintain an inference latency of less than 10 ms while still using session context. When compared against baseline models of acceptable latency, we observed a 33% improvement in MRR for short prefixes of up to 3 characters. Moreover, our model yielded a statistically significant improvement of 2.81% over a production QAC system in terms of suggestion acceptance rate, when deployed on the search bar of an online shopping store as part of an A/B test.

下载PDF全文

下载文献需遵守相关版权规定

论文标题