论文标题
语音转换有限的数据和无限数据增加
Voice conversion with limited data and limitless data augmentations
论文作者
论文摘要
将更改应用于输入语音信号以将感知的语音说话者更改为目标,同时保持输入内容是一项具有挑战性但有趣的任务,即语音转换(VC)。在过去的几年中,在大多数系统使用数据驱动的机器学习模型的情况下,这项任务引起了重大兴趣。在低延迟现实世界情景中进行转换更具挑战性,受高质量数据的可用性的限制。诸如俯仰换和噪声之类的数据增强通常用于增加用于此任务的基于机器学习模型的数据量。在本文中,我们探讨了常见数据增强技术对实时语音转换的功效,并基于音频和语音转换效应引入了新型技术以进行数据增强。我们使用客观和主观评估方法来评估男性和女性目标扬声器的转化。
Applying changes to an input speech signal to change the perceived speaker of speech to a target while maintaining the content of the input is a challenging but interesting task known as Voice conversion (VC). Over the last few years, this task has gained significant interest where most systems use data-driven machine learning models. Doing the conversion in a low-latency real-world scenario is even more challenging constrained by the availability of high-quality data. Data augmentations such as pitch shifting and noise addition are often used to increase the amount of data used for training machine learning based models for this task. In this paper we explore the efficacy of common data augmentation techniques for real-time voice conversion and introduce novel techniques for data augmentation based on audio and voice transformation effects as well. We evaluate the conversions for both male and female target speakers using objective and subjective evaluation methodologies.