论文标题
关于唇彩在视觉语音感知中的作用
On the role of Lip Articulation in Visual Speech Perception
论文作者
论文摘要
从音频产生逼真的唇部运动来模拟语音产生对于推动自然角色动画至关重要。先前的研究表明,用于优化和评估语音产生唇部运动模型的传统指标并不是动画质量的主观观点的良好指标。设计与主观意见保持一致的指标首先需要了解什么影响人类对质量的看法。在这项工作中,我们专注于表达程度,并进行一系列实验,以研究表达强度如何影响人类对语音伴随唇部运动的看法。具体而言,我们研究了增加不足的(缩减)和过度明确(夸张的)唇部运动如何影响人类对质量的看法。我们在仅考虑唇部运动时研究了表达强度对人类感知的影响,在这种情况下,观众会以地标表示的说话面孔以及在体现角色的背景下,在其中向观众展示了照片真实的视频。我们的结果表明,观众更喜欢过度明确的唇部运动,而不是唇部运动不足,并且这种偏好概括了不同的扬声器和实施方案。
Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understanding what impacts human perception of quality. In this work, we focus on the degree of articulation and run a series of experiments to study how articulation strength impacts human perception of lip motion accompanying speech. Specifically, we study how increasing under-articulated (dampened) and over-articulated (exaggerated) lip motion affects human perception of quality. We examine the impact of articulation strength on human perception when considering only lip motion, where viewers are presented with talking faces represented by landmarks, and in the context of embodied characters, where viewers are presented with photo-realistic videos. Our results show that viewers prefer over-articulated lip motion consistently more than under-articulated lip motion and that this preference generalizes across different speakers and embodiments.