双重语音增强的双重应用自动语音识别

论文标题

双重语音增强的双重应用自动语音识别

Dual Application of Speech Enhancement for Automatic Speech Recognition

论文作者

Pandey, Ashutosh, Liu, Chunxi, Wang, Yun, Saraf, Yatharth

论文摘要

在这项工作中，我们利用语音增强来改善基于反复的神经网络传感器（RNN-T）的ASR系统。我们采用密集的卷积复发网络（DCRN）来基于复杂的频谱映射增强功能，并以两种方式对ASR有帮助：数据增强技术和预处理前端。在将其用于ASR数据增强时，我们利用了基于KL差异的一致性损失，该损失是在原始话语和增强话语的ASR输出之间计算的。在使用语音增强作为有效的ASR前端时，我们提出了一个基于模型预处理和特征选择的三步训练方案。我们在具有挑战性的社交媒体英语视频数据集上评估了我们的提议技术，并通过基于语音增强的数据增强来实现11.2％的平均相对改善，而基于增强的预处理进行了8.3％，而两者结合时则具有13.4％。

In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and find it helpful for ASR in two ways: a data augmentation technique, and a preprocessing frontend. In using it for ASR data augmentation, we exploit a KL divergence based consistency loss that is computed between the ASR outputs of original and enhanced utterances. In using speech enhancement as an effective ASR frontend, we propose a three-step training scheme based on model pretraining and feature selection. We evaluate our proposed techniques on a challenging social media English video dataset, and achieve an average relative improvement of 11.2% with speech enhancement based data augmentation, 8.3% with enhancement based preprocessing, and 13.4% when combining both.

下载PDF全文

下载文献需遵守相关版权规定

论文标题