laprador：无监督的预计量捕捞量

论文标题

laprador：无监督的预计量捕捞量

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

论文作者

Xu, Canwen, Guo, Daya, Duan, Nan, McAuley, Julian

论文摘要

在本文中，我们提出了Laprador，这是一款经过验证的双重浓密检索器，不需要任何有监督的数据进行培训。具体而言，我们首先提出迭代的对比度学习（ICOL），该学习迭代地训练查询并使用缓存机制进行编码。 ICOL不仅扩大了负面实例的数量，而且还可以在同一隐藏空间中保留缓存示例的表示。然后，我们提出词汇增强的密集检索（LEDR），作为一种通过词汇匹配来增强密集检索的简单方法。我们在最近提出的Beir基准测试中评估了Laprador，其中包括9个零弹性文本检索任务的18个数据集。实验结果表明，拉普拉多（Laprador）与受监督的密集检索模型相比实现了最先进的性能，进一步的分析揭示了我们的培训策略和目标的有效性。与重新排列相比，我们的词典增强方法可以以毫秒（22.5倍）的速度运行，同时实现出色的性能。

In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题