论文标题
语音合成为低资源ASR的增强
Speech Synthesis as Augmentation for Low-Resource ASR
论文作者
论文摘要
语音合成可能是低资源语音识别的关键。数据增强技术已成为现代语音识别培训的重要组成部分。然而,它们很简单,天真,很少反映现实世界中的条件。同时,语音综合技术已经迅速越来越接近实现人类言语的目标。在本文中,我们调查了使用合成语音作为数据增强形式的可能性,以降低建立语音识别器所需的资源。我们尝试三种不同类型的合成器:统计参数,神经和对抗性。我们的发现很有趣,并指出了未来的新研究方向。
Speech synthesis might hold the key to low-resource speech recognition. Data augmentation techniques have become an essential part of modern speech recognition training. Yet, they are simple, naive, and rarely reflect real-world conditions. Meanwhile, speech synthesis techniques have been rapidly getting closer to the goal of achieving human-like speech. In this paper, we investigate the possibility of using synthesized speech as a form of data augmentation to lower the resources necessary to build a speech recognizer. We experiment with three different kinds of synthesizers: statistical parametric, neural, and adversarial. Our findings are interesting and point to new research directions for the future.