改进端到端模型，以在口语理解中设置预测

论文标题

改进端到端模型，以在口语理解中设置预测

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

论文作者

Kuo, Hong-Kwang J., Tuske, Zoltan, Thomas, Samuel, Kingsbury, Brian, Saon, George

论文摘要

口语理解（SLU）系统的目的是确定输入语音信号的含义，与旨在产生逐字记录的语音识别不同。端到端（E2E）语音建模的进步使得仅在语义实体上进行培训，这比逐字记录便宜得多。我们专注于此集合预测问题，其中未指定实体顺序。使用两个类别的E2E模型，RNN传感器和基于注意力的编码器描述器，我们表明，当按口语顺序排列训练实体序列时，这些模型最有效。为了改善E2E SLU模型在未知实体口语时，我们提出了一种新颖的数据增强技术以及一种基于隐性注意的对准方法来推断口语顺序。 RNN-T的F1得分显着增加了11％以上，而基于注意的Encoder-Decoder SLU模型的得分大约超过2％，表现优于先前报道的结果。

The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible to train solely on semantic entities, which are far cheaper to collect than verbatim transcripts. We focus on this set prediction problem, where entity order is unspecified. Using two classes of E2E models, RNN transducers and attention based encoder-decoders, we show that these models work best when the training entity sequence is arranged in spoken order. To improve E2E SLU models when entity spoken order is unknown, we propose a novel data augmentation technique along with an implicit attention based alignment method to infer the spoken order. F1 scores significantly increased by more than 11% for RNN-T and about 2% for attention based encoder-decoder SLU models, outperforming previously reported results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题