在非母语（L2）英语中检测词汇应力误差，并具有数据增强和注意力

论文标题

在非母语（L2）英语中检测词汇应力误差，并具有数据增强和注意力

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

论文作者

Korzekwa, Daniel, Barra-Chicote, Roberto, Zaporowski, Szymon, Beringer, Grzegorz, Lorenzo-Trueba, Jaime, Serafinowicz, Alicja, Droppo, Jasha, Drugman, Thomas, Kostek, Bozena

论文摘要

本文介绍了两种新颖的互补技术，这些技术改善了非母语（L2）英语语音中词汇应力误差的检测：基于注意力的特征提取和基于神经文本到语音（TTS）的数据增强。在经典的方法中，音频特征通常是从诸如音节核等固定语音区域中提取的。我们提出了一个基于注意力的深度学习模型，该模型会自动从框架级别和音素级音频功能中自动得出最佳的音节级表示。训练该模型由于不正确的压力模式而挑战。为了解决这个问题，我们建议用神经TT产生的不正确的压力词来增强训练集。在Slavic和波罗的海扬声器的L2英语言论中，这两种技术的结合都达到了94.8％的精度和49.2％的召回，以检测到不正确的压力单词。

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically derives optimal syllable-level representation from frame-level and phoneme-level audio features. Training this model is challenging because of the limited amount of incorrect stress patterns. To solve this problem, we propose to augment the training set with incorrectly stressed words generated with Neural TTS. Combining both techniques achieves 94.8% precision and 49.2% recall for the detection of incorrectly stressed words in L2 English speech of Slavic and Baltic speakers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题