QS-Craft：学习量化，拼字和手工的有条件人类运动动画

论文标题

QS-Craft：学习量化，拼字和手工的有条件人类运动动画

QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation

论文作者

Hong, Yuxin, Qian, Xuelin, Luo, Simian, Xue, Xiangyang, Fu, Yanwei

论文摘要

本文研究了条件人类运动动画（CHMA）的任务。给定源图像和驾驶视频，该模型应为新的框架序列进行动画，其中源图像中的人应执行与驾驶视频中的姿势序列相似的运动。尽管在图像和视频综合中使用生成的对抗网络（GAN）方法成功，但由于难以有效利用条件引导的信息（例如图像或姿势）以及生成良好视觉质量的图像，进行CHMA仍然非常具有挑战性。为此，本文提出了一种新颖的学习模型，以量化，拼写和工艺（QS-Craft），以进行有条件的人类运动动画。关键的新颖性来自新引入的三个关键步骤：量化，拼字和手工艺。特别是，我们的QS-Craft在其结构中采用变压器来利用注意力结构。引导信息表示为从驾驶视频中提取的姿势坐标序列。关于人类运动数据集的广泛实验验证了我们模型的功效。

This paper studies the task of conditional Human Motion Animation (cHMA). Given a source image and a driving video, the model should animate the new frame sequence, in which the person in the source image should perform a similar motion as the pose sequence from the driving video. Despite the success of Generative Adversarial Network (GANs) methods in image and video synthesis, it is still very challenging to conduct cHMA due to the difficulty in efficiently utilizing the conditional guided information such as images or poses, and generating images of good visual quality. To this end, this paper proposes a novel model of learning to Quantize, Scrabble, and Craft (QS-Craft) for conditional human motion animation. The key novelties come from the newly introduced three key steps: quantize, scrabble and craft. Particularly, our QS-Craft employs transformer in its structure to utilize the attention architectures. The guided information is represented as a pose coordinate sequence extracted from the driving videos. Extensive experiments on human motion datasets validate the efficacy of our model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题