论文标题
海雀:在适度设备上的俯仰同步神经波形产生用于成的成额演讲
Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices
论文作者
论文摘要
我们介绍了一个神经声码编码器,考虑到低功率的替代性和增强性通信设备。通过将成功的现代歌手的元素与旧一代技术的既定思想相结合,我们的系统能够在神经声码器否则会非常复杂的设备上的48kHz上产生高质量的合成语音。该系统是使用可区分的音高同步重叠添加的对手进行训练的,并通过依靠螺距同步逆短期傅立叶变换(ISTFT)来生成语音样本来降低复杂性。我们的系统以强(HIFI-GAN)的基线可相当,同时仅使用一小部分计算。我们介绍了感知评估的结果以及对系统复杂性的分析。
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system is able to produce high quality synthetic speech at 48kHz on devices where neural vocoders are otherwise prohibitively complex. The system is trained adversarially using differentiable pitch synchronous overlap add, and reduces complexity by relying on pitch synchronous Inverse Short-Time Fourier Transform (ISTFT) to generate speech samples. Our system achieves comparable quality with a strong (HiFi-GAN) baseline while using only a fraction of the compute. We present results of a perceptual evaluation as well as an analysis of system complexity.