论文标题
3DFACESHOP:明确可控制的3D肖像生成
3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation
论文作者
论文摘要
与传统的头像创建管道相反,这是一个昂贵的过程,现代生成方法直接从照片中学习了数据分布。尽管大量作品扩展了无条件的生成模型并达到一定程度的可控性,但确保多视图一致性,尤其是在大型姿势中,仍然具有挑战性。在这项工作中,我们提出了一个网络,该网络会产生3D感知的肖像,同时根据有关姿势,身份,表达和照明的语义参数可控制的网络。我们的网络使用神经场景表示来模型3D引用的肖像,其产生的产生是由支持明确控制的参数面模型的指导。尽管可以通过将图像与部分不同的属性进行对比,但可以进一步增强潜在的分离,但在非面积区域,例如,在表达式表达式时,仍然存在明显的不一致。通过提出一种音量混合策略来解决这一问题,在该策略中,我们通过将动态和静态区域融合在一起,形成一个复合输出,并从共同学习的语义场中分割了两个部分。我们的方法在广泛的实验中的表现优于先前的艺术,从自由视点观看时,在自然照明中产生了逼真的肖像。它还证明了真实图像和室外数据的概括能力,在实际应用中显示出巨大的希望。
In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network that generates 3D-aware portraits while being controllable according to semantic parameters regarding pose, identity, expression and illumination. Our network uses neural scene representation to model 3D-aware portraits, whose generation is guided by a parametric face model that supports explicit control. While the latent disentanglement can be further enhanced by contrasting images with partially different attributes, there still exists noticeable inconsistency in non-face areas, e.g., hair and background, when animating expressions. Wesolve this by proposing a volume blending strategy in which we form a composite output by blending dynamic and static areas, with two parts segmented from the jointly learned semantic field. Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed from free viewpoints. It also demonstrates generalization ability to real images as well as out-of-domain data, showing great promise in real applications.