论文标题

DOQA-通过对话质量检查访问特定领域的常见问题解答

DoQA -- Accessing Domain-Specific FAQs via Conversational QA

论文作者

Campos, Jon Ander, Otegi, Arantxa, Soroa, Aitor, Deriu, Jan, Cieliebak, Mark, Agirre, Eneko

论文摘要

这项工作的目的是为FAQ站点中提供的大量特定于域特定信息构建对话率答案(QA)接口。我们提出DOQA,这是一个具有2,437个对话和10,917 QA对的数据集。对话是使用欧兹法的三个堆栈交换站点收集的对话。与以前的工作相比,DOQA包括定义明确的信息需求,从而导致更连贯和自然的对话,而较少的事实问题是多域。此外,我们介绍了一个更现实的信息检索(IR)方案,该系统需要在任何FAQ文档中找到答案。现有,强大的系统的结果表明,由于从Wikipedia QA数据集中转移学习以及对单个常见问题解答域进行微调,因此可以为没有域内培训数据的情况下为常见问题解答高质量的对话质量检查系统。良好的结果将其带入了更具挑战性的IR情况。在这两种情况下,仍然有足够的改进空间,如较高的人类上行所表明的那样。

The goal of this work is to build conversational Question Answering (QA) interfaces for the large body of domain-specific information available in FAQ sites. We present DoQA, a dataset with 2,437 dialogues and 10,917 QA pairs. The dialogues are collected from three Stack Exchange sites using the Wizard of Oz method with crowdsourcing. Compared to previous work, DoQA comprises well-defined information needs, leading to more coherent and natural conversations with less factoid questions and is multi-domain. In addition, we introduce a more realistic information retrieval(IR) scenario where the system needs to find the answer in any of the FAQ documents. The results of an existing, strong, system show that, thanks to transfer learning from a Wikipedia QA dataset and fine tuning on a single FAQ domain, it is possible to build high quality conversational QA systems for FAQs without in-domain training data. The good results carry over into the more challenging IR scenario. In both cases, there is still ample room for improvement, as indicated by the higher human upperbound.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源