跨域运动转移的运动和外观适应

论文标题

跨域运动转移的运动和外观适应

Motion and Appearance Adaptation for Cross-Domain Motion Transfer

论文作者

Xu, Borun, Wang, Biao, Deng, Jinhong, Tao, Jiale, Ge, Tiezheng, Jiang, Yuning, Li, Wen, Duan, Lixin

论文摘要

运动转移旨在将驱动视频的运动转移到源图像。当驾驶视频中的对象与源图像中的对象之间存在很大差异时，传统的单个域运动转移方法通常会产生显着的人工制品。例如，合成的图像可能无法保留源图像的人类形状（参见图1（a））。为了解决这个问题，在这项工作中，我们提出了一种运动和外观适应（MAA）方法，以进行跨域运动转移，在该方法中，我们将综合图像中的对象正规化，以捕获对象在驾驶框架中的运动，同时仍然保留源图像中对象的形状和外观。一方面，考虑合成图像和驾驶框架的对象形状可能不同，我们设计了一个形状不变的运动适应模块，该模块可以在两个图像中强制对象部分的角度的一致性来捕获运动信息。另一方面，我们引入了一个结构引导的外观一致性模块，旨在使合成图像的相应贴片和源图像之间的相似性正常，而不会影响合成图像中的学习运动。我们提出的MAA模型可以通过循环重建损失以端到端的方式进行训练，并最终产生令人满意的运动转移结果（参见图1（b））。我们在人类舞蹈数据集Mixamo-Video上进行了广泛的实验，以使时尚视频和人脸数据集数据集vox-celeb到cufs；在这两种情况下，我们的MAA模型在定量和定性上都优于现有方法。

Motion transfer aims to transfer the motion of a driving video to a source image. When there are considerable differences between object in the driving video and that in the source image, traditional single domain motion transfer approaches often produce notable artifacts; for example, the synthesized image may fail to preserve the human shape of the source image (cf . Fig. 1 (a)). To address this issue, in this work, we propose a Motion and Appearance Adaptation (MAA) approach for cross-domain motion transfer, in which we regularize the object in the synthesized image to capture the motion of the object in the driving frame, while still preserving the shape and appearance of the object in the source image. On one hand, considering the object shapes of the synthesized image and the driving frame might be different, we design a shape-invariant motion adaptation module that enforces the consistency of the angles of object parts in two images to capture the motion information. On the other hand, we introduce a structure-guided appearance consistency module designed to regularize the similarity between the corresponding patches of the synthesized image and the source image without affecting the learned motion in the synthesized image. Our proposed MAA model can be trained in an end-to-end manner with a cyclic reconstruction loss, and ultimately produces a satisfactory motion transfer result (cf . Fig. 1 (b)). We conduct extensive experiments on human dancing dataset Mixamo-Video to Fashion-Video and human face dataset Vox-Celeb to Cufs; on both of these, our MAA model outperforms existing methods both quantitatively and qualitatively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题