使用目标源提取的扬声器加固，以进行强大的自动语音识别

论文标题

使用目标源提取的扬声器加固，以进行强大的自动语音识别

Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition

论文作者

Zorila, Catalin, Doddipatla, Rama

论文摘要

在嘈杂条件下提高单渠道自动语音识别（ASR）的准确性是具有挑战性的。可以使用强大的语音增强前端，但是，它们通常要求ASR模型重新训练以应对处理工件。在本文中，我们探讨了一种说话者加强策略，用于改善识别性能，而无需重新培训模型（AM）。这是通过通过未加工的输入来将增强信号进行混合以减轻处理工件来实现的。我们使用基于DNN扬声器提取的语音DeOisiser评估了提出的方法，该方法受感知动机的损失函数训练。结果表明，与单声道模拟和真实的chime-4评估集相比，我们的方法（无需重新训练）我们的方法的相对准确性提高约为23％和25％，并且比较胜过最新的参考方法。

Improving the accuracy of single-channel automatic speech recognition (ASR) in noisy conditions is challenging. Strong speech enhancement front-ends are available, however, they typically require that the ASR model is retrained to cope with the processing artifacts. In this paper we explore a speaker reinforcement strategy for improving recognition performance without retraining the acoustic model (AM). This is achieved by remixing the enhanced signal with the unprocessed input to alleviate the processing artifacts. We evaluate the proposed approach using a DNN speaker extraction based speech denoiser trained with a perceptually motivated loss function. Results show that (without AM retraining) our method yields about 23% and 25% relative accuracy gains compared with the unprocessed for the monoaural simulated and real CHiME-4 evaluation sets, respectively, and outperforms a state-of-the-art reference method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题