AggreTriever：一种简单的方法，用于汇总稳健通道检索的文本表示

论文标题

AggreTriever：一种简单的方法，用于汇总稳健通道检索的文本表示

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

论文作者

Lin, Sheng-Chieh, Li, Minghan, Lin, Jimmy

论文摘要

预训练的语言模型在许多知识密集的NLP任务中都取得了成功。但是，最近的工作表明，诸如BERT之类的模型不是``在结构上准备就绪''将文本信息汇总到[CLS]向量的密集通道检索（DPR）中。这种``缺乏准备''是由于语言模型预训练与DPR微调之间的差距所带来的。先前的解决方案要求使用计算昂贵的技术，例如硬采矿，跨编码器蒸馏以及进一步的预培训，以学习强大的DPR模型。在这项工作中，我们建议通过将上下文化的令牌嵌入到密集的向量中，以完全训练的DPR中充分利用知识，我们称之为agg*。通过从[Cls]令牌和Agg*中加入向量，我们的AggreTretiver模型显着提高了密集检索模型对内域和零照片评估的有效性，而无需引入大量的培训开销。代码可在https://github.com/castorini/dhr上找到

Pre-trained language models have been successful in many knowledge-intensive NLP tasks. However, recent work has shown that models such as BERT are not ``structurally ready'' to aggregate textual information into a [CLS] vector for dense passage retrieval (DPR). This ``lack of readiness'' results from the gap between language model pre-training and DPR fine-tuning. Previous solutions call for computationally expensive techniques such as hard negative mining, cross-encoder distillation, and further pre-training to learn a robust DPR model. In this work, we instead propose to fully exploit knowledge in a pre-trained language model for DPR by aggregating the contextualized token embeddings into a dense vector, which we call agg*. By concatenating vectors from the [CLS] token and agg*, our Aggretriever model substantially improves the effectiveness of dense retrieval models on both in-domain and zero-shot evaluations without introducing substantial training overhead. Code is available at https://github.com/castorini/dhr

下载PDF全文

下载文献需遵守相关版权规定

论文标题