通用适配器：在语音合成的不同配置之间转换MEL-SPECTROGRAM

论文标题

通用适配器：在语音合成的不同配置之间转换MEL-SPECTROGRAM

Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

论文作者

Wang, Fan-Lin, Hsu, Po-chun, Liu, Da-rong, Lee, Hung-yi

论文摘要

最新的语音合成系统由合成器和声码器组成。但是，现有的合成器和声码编码器只能与具有特定配置提取的声学特征相匹配。因此，我们不能将任意合成器和声音编码器组合在一起以形成一个完整的系统，更不用说适用于新开发的模型了。在本文中，我们提出了通用适配器，该适配器采用了通过源配置参数的MEL-SPECTROGIN图，并将其转换为通过目标配置参数的MEL光谱图，只要我们以源和目标配置为食。实验表明，从我们的通用适配器输出中综合的语音质量与从地面真相旋律 - 光谱图中合成的语音相媲美，无论单言扬声器或多扬声器场景中如何。此外，通用适配器可以应用于最近的TTS系统和语音转换系统，而不会降低质量。

Most recent speech synthesis systems are composed of a synthesizer and a vocoder. However, the existing synthesizers and vocoders can only be matched to acoustic features extracted with a specific configuration. Hence, we can't combine arbitrary synthesizers and vocoders together to form a complete system, not to mention apply to a newly developed model. In this paper, we proposed Universal Adaptor, which takes a Mel-spectrogram parametrized by the source configuration and converts it into a Mel-spectrogram parametrized by the target configuration, as long as we feed in the source and the target configurations. Experiments show that the quality of speeches synthesized from our output of Universal Adaptor is comparable to those synthesized from ground truth Mel-spectrogram no matter in single-speaker or multi-speaker scenarios. Moreover, Universal Adaptor can be applied in the recent TTS systems and voice conversion systems without dropping quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题