论文标题
AggreTriever:一种简单的方法,用于汇总稳健通道检索的文本表示
Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval
论文作者
论文摘要
预训练的语言模型在许多知识密集的NLP任务中都取得了成功。但是,最近的工作表明,诸如BERT之类的模型不是``在结构上准备就绪''将文本信息汇总到[CLS]向量的密集通道检索(DPR)中。这种``缺乏准备''是由于语言模型预训练与DPR微调之间的差距所带来的。先前的解决方案要求使用计算昂贵的技术,例如硬采矿,跨编码器蒸馏以及进一步的预培训,以学习强大的DPR模型。在这项工作中,我们建议通过将上下文化的令牌嵌入到密集的向量中,以完全训练的DPR中充分利用知识,我们称之为agg*。通过从[Cls]令牌和Agg*中加入向量,我们的AggreTretiver模型显着提高了密集检索模型对内域和零照片评估的有效性,而无需引入大量的培训开销。代码可在https://github.com/castorini/dhr上找到
Pre-trained language models have been successful in many knowledge-intensive NLP tasks. However, recent work has shown that models such as BERT are not ``structurally ready'' to aggregate textual information into a [CLS] vector for dense passage retrieval (DPR). This ``lack of readiness'' results from the gap between language model pre-training and DPR fine-tuning. Previous solutions call for computationally expensive techniques such as hard negative mining, cross-encoder distillation, and further pre-training to learn a robust DPR model. In this work, we instead propose to fully exploit knowledge in a pre-trained language model for DPR by aggregating the contextualized token embeddings into a dense vector, which we call agg*. By concatenating vectors from the [CLS] token and agg*, our Aggretriever model substantially improves the effectiveness of dense retrieval models on both in-domain and zero-shot evaluations without introducing substantial training overhead. Code is available at https://github.com/castorini/dhr