论文标题
DDO:使用域的自适应预培训和意见分数分布的MOS预测框架
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores
论文作者
论文摘要
平均意见评分(MOS)是语音合成系统的典型主观评估度量。由于收集MOS是耗时的,因此如果有自动评估的精确MOS预测模型,那将是可取的。在这项工作中,我们提出了一种新型MOS预测模型DDOS。 DDOS利用域自适应预训练来进一步预训练自我监督的学习模型。并添加了一个建议的模块,以建模每个话语的意见分数分布。使用提出的组件,DDOS在BVCC数据集上的表现优于先前的工作。 BC2019数据集的零射击传输结果得到显着改善。就系统级别的得分而言,DDO还在Interspeech 2022 Voicemos挑战中赢得了第二名。
Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.