论文标题
laprador:无监督的预计量捕捞量
LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
论文作者
论文摘要
在本文中,我们提出了Laprador,这是一款经过验证的双重浓密检索器,不需要任何有监督的数据进行培训。具体而言,我们首先提出迭代的对比度学习(ICOL),该学习迭代地训练查询并使用缓存机制进行编码。 ICOL不仅扩大了负面实例的数量,而且还可以在同一隐藏空间中保留缓存示例的表示。然后,我们提出词汇增强的密集检索(LEDR),作为一种通过词汇匹配来增强密集检索的简单方法。我们在最近提出的Beir基准测试中评估了Laprador,其中包括9个零弹性文本检索任务的18个数据集。实验结果表明,拉普拉多(Laprador)与受监督的密集检索模型相比实现了最先进的性能,进一步的分析揭示了我们的培训策略和目标的有效性。与重新排列相比,我们的词典增强方法可以以毫秒(22.5倍)的速度运行,同时实现出色的性能。
In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.