论文标题
波数:语音合成的连续标准化流程
WaveNODE: A Continuous Normalizing Flow for Speech Synthesis
论文作者
论文摘要
近年来,已经提出了各种基于流的生成模型来实时产生高保真波形。但是,这些模型需要训练有素的教师网络或许多流程步骤,使其可以记忆力。在本文中,我们提出了一种称为Wavenode的新型生成模型,该模型利用了语音合成的连续归一流流量。与常规模型不同,Wavenode对流动操作的功能没有任何限制,从而允许使用更灵活和复杂的功能。此外,可以优化波数以最大程度地提高可能性,而无需任何教师网络或辅助损失项。我们通过实验表明,与常规的基于流动的声音编码器相比,Wavenode的参数可相当,而参数较少。
In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these models require either a well-trained teacher network or a number of flow steps making them memory-inefficient. In this paper, we propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis. Unlike the conventional models, WaveNODE places no constraint on the function used for flow operation, thus allowing the usage of more flexible and complex functions. Moreover, WaveNODE can be optimized to maximize the likelihood without requiring any teacher network or auxiliary loss terms. We experimentally show that WaveNODE achieves comparable performance with fewer parameters compared to the conventional flow-based vocoders.