论文标题
引起注意的液体翘曲gan:人类图像合成的统一框架
Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis
论文作者
论文摘要
我们在统一的框架内处理人类形象的综合,包括人类运动模仿,外观转移和新型视图综合。这意味着该模型一旦接受培训,就可以用于处理所有这些任务。现有的特定于任务的方法主要使用2D关键来估计人体结构。但是,他们只能表达位置信息,没有能力表征人的个性化形状并建模肢体旋转。在本文中,我们建议使用3D身体网状恢复模块来解开姿势和形状。它不仅可以对关节位置和旋转进行建模,还可以表征个性化的身体形状。为了保存源信息,例如纹理,样式,颜色和面部身份,我们提出了一个带有注意力液体翘曲块(ATTLWB)的注意力液体翘曲gan,该循环磁块(ATTLWB)在图像和特征空间中传播源信息,以传播综合参考。具体而言,源特征是通过降级卷积自动编码器提取的,用于表征源身份。此外,我们提出的方法可以支持多个来源的更灵活的翘曲。为了进一步提高看不见的源图像的概括能力,应用了一个/少的对抗性学习。详细说明,它首先在广泛的培训集中训练模型。然后,它以一种自我监督的方式通过一个/几击未见的图像来捕获模型,以生成高分辨率(512 x 512和1024 x 1024)的结果。另外,我们构建了一个新的数据集,即IPER数据集,以评估人类运动模仿,外观传递和新型视图合成。广泛的实验证明了我们方法在保持面部身份,形状一致性和衣服细节方面的有效性。所有代码和数据集均在https://impersonator.org/work/impersonator-plus-plus.html上找到。
We tackle human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis, within a unified framework. It means that the model, once being trained, can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints to estimate the human body structure. However, they only express the position information with no abilities to characterize the personalized shape of the person and model the limb rotations. In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape. It can not only model the joint location and rotation but also characterize the personalized body shape. To preserve the source information, such as texture, style, color, and face identity, we propose an Attentional Liquid Warping GAN with Attentional Liquid Warping Block (AttLWB) that propagates the source information in both image and feature spaces to the synthesized reference. Specifically, the source features are extracted by a denoising convolutional auto-encoder for characterizing the source identity well. Furthermore, our proposed method can support a more flexible warping from multiple sources. To further improve the generalization ability of the unseen source images, a one/few-shot adversarial learning is applied. In detail, it firstly trains a model in an extensive training set. Then, it finetunes the model by one/few-shot unseen image(s) in a self-supervised way to generate high-resolution (512 x 512 and 1024 x 1024) results. Also, we build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. Extensive experiments demonstrate the effectiveness of our methods in terms of preserving face identity, shape consistency, and clothes details. All codes and dataset are available on https://impersonator.org/work/impersonator-plus-plus.html.