普通话非本地错误发音验证的多视图方法

论文标题

普通话非本地错误发音验证的多视图方法

A multi-view approach for Mandarin non-native mispronunciation verification

论文作者

Wang, Zhenyu, Hansen, John H. L., Xie, Yanlu

论文摘要

传统上，非本地错误发音验证系统的性能依赖于非本地语料库的有效电话级标签。在这项研究中，提出了一种多视图方法，以结合判别特征表示，这需要更少的注释来进行普通话的非本性错误发音验证。在这里，共同学习了模型以嵌入声学序列和多源信息，以获得语音属性和瓶颈功能。具有对比损失的双向LSTM嵌入模型用于将声学序列和多源信息映射到固定维嵌入中。声学嵌入之间的距离被视为手机之间的相似性。因此，预计错误发音的示例与其规范发音的相似性得分很小。该方法显示，基于GOP的方法的改善 +11.23％，单视图的诊断准确性 +1.47％，用于错误发音验证任务。

Traditionally, the performance of non-native mispronunciation verification systems relied on effective phone-level labelling of non-native corpora. In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin. Here, models are jointly learned to embed acoustic sequence and multi-source information for speech attributes and bottleneck features. Bidirectional LSTM embedding models with contrastive losses are used to map acoustic sequences and multi-source information into fixed-dimensional embeddings. The distance between acoustic embeddings is taken as the similarity between phones. Accordingly, examples of mispronounced phones are expected to have a small similarity score with their canonical pronunciations. The approach shows improvement over GOP-based approach by +11.23% and single-view approach by +1.47% in diagnostic accuracy for a mispronunciation verification task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题