论文标题
为TripClick Health检索建立强大的基准
Establishing Strong Baselines for TripClick Health Retrieval
论文作者
论文摘要
我们为最近发布的TripClick Health Ad-Hoc检索系列提供了强大的基于变压器的重新排列和密集的检索基线。我们以简单的负抽样策略来改善 - 最初的嘈杂 - 培训数据。在TripClick的重新排列任务中,我们在BM25上取得了巨大的收益,而TripClick并未实现原始基线。此外,我们研究了不同域特异性预训练模型对TripClick的影响。最后,我们表明,即使使用简单的培训程序,茂密的检索也优于BM25。
We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domain-specific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.