ED2LM：编码器编码器到语言模型，用于更快的文档重新排序推理

论文标题

ED2LM：编码器编码器到语言模型，用于更快的文档重新排序推理

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

论文作者

Hui, Kai, Zhuang, Honglei, Chen, Tao, Qin, Zhen, Lu, Jing, Bahri, Dara, Ma, Ji, Gupta, Jai Prakash, Santos, Cicero Nogueira dos, Tay, Yi, Metzler, Don

论文摘要

最先进的神经模型通常使用交叉注意来编码文档 - 问题对进行重新排序。为此，模型通常使用仅编码（例如BERT）范式或编码器范围（如T5）方法。但是，这些范例并非没有缺陷，即，以推理时间对所有查询文档对运行模型会产生重大的计算成本。本文提出了重新排行的新培训和推论范式。我们建议使用文档形式以查询生成的形式来验证一个预验证的编码模型。随后，我们表明，在推断期间，可以将此编码器架构分解为仅解码器的语言模型。这会导致明显的推理时间加速，因为仅解码器架构仅需要学会在推理过程中解释静态编码器嵌入。我们的实验表明，这种新范式取得了结果，可与更昂贵的跨意义排名方法相媲美，同时更快地达到6.8倍。我们认为，这项工作为更有效的神经排名者铺平了道路。

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper proposes a new training and inference paradigm for re-ranking. We propose to finetune a pretrained encoder-decoder model using in the form of document to query generation. Subsequently, we show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference. This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference. Our experiments show that this new paradigm achieves results that are comparable to the more expensive cross-attention ranking approaches while being up to 6.8X faster. We believe this work paves the way for more efficient neural rankers that leverage large pretrained models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题