富含音频的基于伯特的框架，用于回答多项选择问题

论文标题

富含音频的基于伯特的框架，用于回答多项选择问题

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

论文作者

Kuo, Chia-Chih, Luo, Shang-Bao, Chen, Kuan-Yu

论文摘要

在一个段落，问题和多种选择的情况下，以语音形式进行了多项选择问题答案（SMCQA）任务，该机器需要选择正确的选择来回答问题。尽管音频可以包含SMCQA的有用提示，但通常仅在系统开发中使用自动转录的文本。得益于大规模的预训练的语言表示模型，例如来自变形金刚（BERT）的双向编码器表示，只有自动转录文本的系统仍然可以达到一定的性能。但是，先前的研究证明，声学统计数据可以抵消由自动语音识别系统引起的文本不准确性，或者代表不足地潜伏在单词嵌入发生器中，从而使SMCQA系统可靠。沿着研究线，本研究集中于设计基于BERT的SMCQA框架，该框架不仅继承了Bert学到的情境化语言表示的优势，而且还将互补的声学级别的信息与文本级别的信息集成在一起。因此，提出了一个富含音频的BERT SMCQA框架。一系列实验表明，在已发表的中国SMCQA数据集上，精确性的精度显着提高。

In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in system development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, this study concentrates on designing a BERT-based SMCQA framework, which not only inherits the advantages of contextualized language representations learned by BERT, but integrates the complementary acoustic-level information distilled from audio with the text-level information. Consequently, an audio-enriched BERT-based SMCQA framework is proposed. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题