论文标题
卷轴:长语言序列的标准化比较
SCROLLS: Standardized CompaRison Over Long Language Sequences
论文作者
论文摘要
NLP的基准主要集中在短文上,例如句子和段落,即使长文本在野外构成了相当多的自然语言。我们介绍卷轴,这是一套需要对长文本进行推理的任务。我们检查了现有的长文本数据集,并在文本自然长时间进行手工销售,同时优先考虑涉及整个输入中综合信息的任务。卷轴包含摘要,问答和自然语言推理任务,涵盖了多个领域,包括文学,科学,商业和娱乐。最初的基线,包括长形编码器码头,表明滚动有足够的改进空间。我们将所有数据集以统一的文本到文本格式提供,并主持现场排行榜,以促进对模型架构和预训练方法的研究。
NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.