语音合成为低资源ASR的增强

论文标题

语音合成为低资源ASR的增强

Speech Synthesis as Augmentation for Low-Resource ASR

论文作者

Bagchi, Deblin, Wotherspoon, Shannon, Jiang, Zhuolin, Muthukumar, Prasanna

论文摘要

语音合成可能是低资源语音识别的关键。数据增强技术已成为现代语音识别培训的重要组成部分。然而，它们很简单，天真，很少反映现实世界中的条件。同时，语音综合技术已经迅速越来越接近实现人类言语的目标。在本文中，我们调查了使用合成语音作为数据增强形式的可能性，以降低建立语音识别器所需的资源。我们尝试三种不同类型的合成器：统计参数，神经和对抗性。我们的发现很有趣，并指出了未来的新研究方向。

Speech synthesis might hold the key to low-resource speech recognition. Data augmentation techniques have become an essential part of modern speech recognition training. Yet, they are simple, naive, and rarely reflect real-world conditions. Meanwhile, speech synthesis techniques have been rapidly getting closer to the goal of achieving human-like speech. In this paper, we investigate the possibility of using synthesized speech as a form of data augmentation to lower the resources necessary to build a speech recognizer. We experiment with three different kinds of synthesizers: statistical parametric, neural, and adversarial. Our findings are interesting and point to new research directions for the future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题