与多语言TTS的说话面孔一代

论文标题

与多语言TTS的说话面孔一代

Talking Face Generation with Multilingual TTS

论文作者

Song, Hyoung-Kyu, Woo, Sang Hoon, Lee, Junhyeok, Yang, Seungmin, Cho, Hyunjae, Lee, Youseong, Choi, Dongho, Kim, Kang-wook

论文摘要

在这项工作中，我们提出了一个联合系统，将说话的面部生成系统与文本到语音系统相结合，该系统只能从文本输入中生成多语言说话的面部视频。我们的系统可以在保持说话者的声音身份，以及与合成语音同步的唇部运动的同时，综合自然的多语言语音。我们通过从其他语言家族中选择四种语言（韩语，英语，日语和中文）来证明系统的概括能力。我们还将说话面部生成模型的输出与声称多语言支持的先前工作的输出进行了比较。对于我们的演示，我们将翻译API添加到预处理阶段，并以神经配音的形式呈现，以便用户可以更轻松地利用系统的多语言属性。

In this work, we propose a joint system combining a talking face generation system with a text-to-speech system that can generate multilingual talking face videos from only the text input. Our system can synthesize natural multilingual speeches while maintaining the vocal identity of the speaker, as well as lip movements synchronized to the synthesized speech. We demonstrate the generalization capabilities of our system by selecting four languages (Korean, English, Japanese, and Chinese) each from a different language family. We also compare the outputs of our talking face generation model to outputs of a prior work that claims multilingual support. For our demo, we add a translation API to the preprocessing stage and present it in the form of a neural dubber so that users can utilize the multilingual property of our system more easily.

下载PDF全文

下载文献需遵守相关版权规定

论文标题