论文标题

语音转换挑战的基线系统2020,带有循环变异自动编码器和平行Wavegan

Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN

论文作者

Tobing, Patrick Lumban, Wu, Yi-Chiao, Toda, Tomoki

论文摘要

在本文中,我们介绍了语音转换挑战基线系统(VCC)2020,该系统具有环状变异自动编码器(Cyclevae)和Parallel Wavegan(PWG),即Cyclevaepwg。 Cyclevae是一种基于VAE的非平行性语音转换,它利用转化后的声学特征在优化过程中考虑周期性重建的光谱。另一方面,PWG是一种非自助性神经声码编码器,它基于用于高质量和快速波形生成器的生成对抗网络。实际上,可以使用统一模型使用统一的模型(词内)和任务2(交叉语言)使用统一的模型来直接开发CycleVAEPWG系统,其中我们的开源实现可在https:///github.com/github.com/bigpon/bigpon/bigpon/vcc20_baseline_cyclevae上找到。 VCC 2020的结果表明,CycleVaepWG基线实现了以下方面:1)任务1的平均意见评分(MOS)为2.87,而任务1的扬声器相似性百分比(SIM)为75.37%,MOS的MOS为2.56,为2.56的MOS,sim为任务2的SIM为56.46%,表明对自然性的平均分数为56.46%,对自然的分数和几乎平均得分和平均分数和平均分数和平均得分。

In this paper, we present a description of the baseline system of Voice Conversion Challenge (VCC) 2020 with a cyclic variational autoencoder (CycleVAE) and Parallel WaveGAN (PWG), i.e., CycleVAEPWG. CycleVAE is a nonparallel VAE-based voice conversion that utilizes converted acoustic features to consider cyclically reconstructed spectra during optimization. On the other hand, PWG is a non-autoregressive neural vocoder that is based on a generative adversarial network for a high-quality and fast waveform generator. In practice, the CycleVAEPWG system can be straightforwardly developed with the VCC 2020 dataset using a unified model for both Task 1 (intralingual) and Task 2 (cross-lingual), where our open-source implementation is available at https://github.com/bigpon/vcc20_baseline_cyclevae. The results of VCC 2020 have demonstrated that the CycleVAEPWG baseline achieves the following: 1) a mean opinion score (MOS) of 2.87 in naturalness and a speaker similarity percentage (Sim) of 75.37% for Task 1, and 2) a MOS of 2.56 and a Sim of 56.46% for Task 2, showing an approximately or nearly average score for naturalness and an above average score for speaker similarity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源