XLMRQA：基于越南Wikipedia的文本知识来源的开放域问题回答

论文标题

XLMRQA：基于越南Wikipedia的文本知识来源的开放域问题回答

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

论文作者

Van Nguyen, Kiet, Do, Phong Nguyen-Thuan, Nguyen, Nhat Duy, Van Huynh, Tin, Nguyen, Anh Gia-Tuan, Nguyen, Ngan Luu-Thuy

论文摘要

问题回答（QA）是信息检索和信息提取领域内的一项自然语言理解任务，由于基于机器阅读理解的模型的强劲发展，近年来，近年来，近年来从计算语言学和人工智能研究社区引起了很多关注。基于读者的质量检查系统是一种高级搜索引擎，可以使用机器阅读理解（MRC）技术在开放域或特定领域的文本中找到正确的查询或问题的答案。 MRC和QA系统中的数据资源和机器学习方法的大多数进步尤其是在两种资源丰富的语言中显着开发的，例如英语和中文。像越南人这样的低资源语言见证了关于质量检查系统的稀缺研究。本文介绍了XLMRQA，这是第一个越南QA系统，使用基于Wikipedia的基于变压器的读者（使用UIT-Viquad语料库），使用深层神经网络模型（分别为24.46％和6.28％）。从三个系统获得的结果中，我们分析了问题类型对质量检查系统性能的影响。

Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems especially are developed significantly in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), outperforming the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题