论文标题
神经通道检索,负面对比度改善
Neural Passage Retrieval with Improved Negative Contrast
论文作者
论文摘要
在本文中,我们探讨了用于检索自动问答通道的双编码器模型中负采样的影响。我们探讨了四种负面抽样策略,这些策略补充了底片的直接随机抽样,通常用于训练双重编码器模型。在四种策略中,三种是基于检索和启发式方法。我们基于检索的策略基于问题和段落之间的语义相似性和词汇叠加。我们分为两个阶段训练双重编码器模型:使用合成数据进行预训练,并使用特定于域的数据进行微调。我们将负抽样应用于两个阶段。该方法在两个段落检索任务中进行评估。即使显而易见的是,有一种单一的抽样策略在所有任务中都效果最好,但很明显,我们的策略有助于改善响应与所有其他段落之间的对比。此外,将不同策略的负面因素混合在一起,以表现出色的绩效与所有任务中最佳性能策略。我们的结果在我们评估的两个开放域问题答案数据集上建立了新的最先进的性能水平。
In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering. We explore four negative sampling strategies that complement the straightforward random sampling of negatives, typically used to train dual encoder models. Out of the four strategies, three are based on retrieval and one on heuristics. Our retrieval-based strategies are based on the semantic similarity and the lexical overlap between questions and passages. We train the dual encoder models in two stages: pre-training with synthetic data and fine tuning with domain-specific data. We apply negative sampling to both stages. The approach is evaluated in two passage retrieval tasks. Even though it is not evident that there is one single sampling strategy that works best in all the tasks, it is clear that our strategies contribute to improving the contrast between the response and all the other passages. Furthermore, mixing the negatives from different strategies achieve performance on par with the best performing strategy in all tasks. Our results establish a new state-of-the-art level of performance on two of the open-domain question answering datasets that we evaluated.