使用层次上下文信息的多尺度说话风格建模

论文标题

使用层次上下文信息的多尺度说话风格建模

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

论文作者

Lei, Shun, Zhou, Yixuan, Chen, Liyang, Hu, Jiankun, Wu, Zhiyong, Kang, Shiyin, Meng, Helen

论文摘要

先前关于表达语音综合的作品着重于建模当前句子或上下文中嵌入的单级风格，但人类语音中语言风格的多尺度性质被忽略了。在本文中，我们提出了一种多尺度的说话风格建模方法，以捕获和预测多尺度的口语风格，以改善合成语音的自然性和表现力。提出了一个多尺度提取器，以从基础真相语音中提取三个不同级别的说话样式嵌入，并明确指导基于层次结构上下文信息的多尺度样式预测指标的培训。对普通话有声读物数据集的客观和主观评估都表明，我们提出的方法可以显着改善合成语音的自然性和表现力。

Previous works on expressive speech synthesis focus on modelling the mono-scale style embedding from the current sentence or context, but the multi-scale nature of speaking style in human speech is neglected. In this paper, we propose a multi-scale speaking style modelling method to capture and predict multi-scale speaking style for improving the naturalness and expressiveness of synthetic speech. A multi-scale extractor is proposed to extract speaking style embeddings at three different levels from the ground-truth speech, and explicitly guide the training of a multi-scale style predictor based on hierarchical context information. Both objective and subjective evaluations on a Mandarin audiobooks dataset demonstrate that our proposed method can significantly improve the naturalness and expressiveness of the synthesized speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题