论文标题

Danetqa:是/否问题回答俄罗斯语言的数据集

DaNetQA: a yes/no Question Answering Dataset for the Russian Language

论文作者

Glushkova, Taisia, Machnev, Alexey, Fenogenova, Alena, Shavrina, Tatiana, Artemova, Ekaterina, Ignatov, Dmitry I.

论文摘要

Danetqa是一种新的提问语料库,遵循(Clark等,2019)设计:它包括自然的是/否问题。每个问题都与Wikipedia的段落配对,并从段落中得出一个答案。任务是将问题和段落作为输入,并提出是/否答案,即产生二进制输出。在本文中,我们提出了一种可再现的方法来创建Danetqa,并调查了任务和语言转移的转移学习方法。对于任务传输,我们利用三个相似的句子建模任务:1)释义的语料库,释义者,2)NLI任务,为此我们使用XNLI的俄罗斯部分,3)另一个问题回答任务,Sberquad。对于语言传递,我们将英语与俄语翻译以及多语言语言进行微调一起使用。

DaNetQA, a new question-answering corpus, follows (Clark et. al, 2019) design: it comprises natural yes/no questions. Each question is paired with a paragraph from Wikipedia and an answer, derived from the paragraph. The task is to take both the question and a paragraph as input and come up with a yes/no answer, i.e. to produce a binary output. In this paper, we present a reproducible approach to DaNetQA creation and investigate transfer learning methods for task and language transferring. For task transferring we leverage three similar sentence modelling tasks: 1) a corpus of paraphrases, Paraphraser, 2) an NLI task, for which we use the Russian part of XNLI, 3) another question answering task, SberQUAD. For language transferring we use English to Russian translation together with multilingual language fine-tuning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源