关于端到端普通话的双重拼音双重编码的有效性

论文标题

关于端到端普通话的双重拼音双重编码的有效性

On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR

论文作者

Yang, Zhao, Ng, Dianwen, Fu, Xiao, Han, Liping, Xi, Wei, Wang, Rui, Jiang, Rui, Zhao, Jizhong

论文摘要

端到端的自动语音识别（ASR）已取得了有希望的结果。但是，大多数现有的端到端ASR方法忽略了特定语言特征的使用。对于普通话中国ASR的任务，拼音与特征之间存在相互促进的关系，其中汉字可以被拼音化。基于上述直觉，我们首先研究了单输入双输出（SIDO）多任务框架中基于端到端编码器模型的类型，此后根据模糊的拼音采样方法提出了一种新型的异步解码，该方法是根据pinyin and Tremitals of Pinynitions和Pinyen and Tremitals and Tremitals和特征提出的。此外，我们提出了一种两阶段的训练策略，以使训练更稳定并更快地融合。 Aishell-1数据集的测试集的结果表明，与强大的基线模型相比，大幅度的增强了没有语言模型的增强型双学位模型可改善。

End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, there exist mutual promotion relationship between Pinyin and Character where Chinese characters can be romanized by Pinyin. Based on the above intuition, we first investigate types of end-to-end encoder-decoder based models in the single-input dual-output (SIDO) multi-task framework, after which a novel asynchronous decoding with fuzzy Pinyin sampling method is proposed according to the one-to-one correspondence characteristics between Pinyin and Character. Furthermore, we proposed a two-stage training strategy to make training more stable and converge faster. The results on the test sets of AISHELL-1 dataset show that the proposed enhanced dual-decoder model without a language model is improved by a big margin compared to strong baseline models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题