论文标题
Scirepepeval:科学文档表示形式的多格式基准
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
论文作者
论文摘要
学识渊博的科学文档表示可以作为下游任务的宝贵输入功能,而无需进行微调。但是,评估这些表示形式的现有基准无法捕获相关任务的多样性。作为回应,我们介绍了Scirepeval,这是培训和评估科学文档表示形式的第一个综合基准。它包括24个具有挑战性和现实的任务,其中8个是新的,包括四种格式:分类,回归,排名和搜索。然后,我们使用此基准测试来研究和提高科学文档表示模型的概括能力。我们展示了诸如Spectre和Scincl之类的最新模型如何在任务格式中概括,并且简单的多任务培训无法改善它们。但是,一种新的方法可以学习每个文档的多个嵌入,每个文档都以不同的格式量身定制,可以提高性能。我们使用特定于任务的控制代码和适配器进行实验,发现它们的表现优于现有的单件最先进的绝对量超过2点。我们发布了所得的多种模型家族,称为Specter2,供社区使用和建立。
Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 24 challenging and realistic tasks, 8 of which are new, across four formats: classification, regression, ranking and search. We then use this benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models like SPECTER and SciNCL struggle to generalize across the task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters and find they outperform the existing single-embedding state-of-the-art by over 2 points absolute. We release the resulting family of multi-format models, called SPECTER2, for the community to use and build on.