可概括的神经表演者：学习人类新颖视图综合的稳健辐射场

论文标题

可概括的神经表演者：学习人类新颖视图综合的稳健辐射场

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis

论文作者

Cheng, Wei, Xu, Su, Piao, Jingtan, Qian, Chen, Wu, Wayne, Lin, Kwan-Yee, Li, Hongsheng

论文摘要

这项工作的目标是使用一般的深度学习框架来综合任意人类表演者的自由视点图像，仅需要稀疏的相机视图作为输入和每盘微调的震动。由明显的身体姿势，形状和衣服类型引起的几何形状和外观的巨大变化是该任务的关键瓶颈。为了克服这些挑战，我们提出了一个简单而强大的框架，名为“可推广的神经表演者”（GNR），该框架在各种几何和外观上学习了可概括且强大的神经体的表现。具体而言，我们压缩了从几何和外观方面的有条件隐式神经辐射场作为有条件隐式神经辐射场渲染的光场。我们首先引入了隐式几何体嵌入策略，以基于参数3D人体模型和多视图图像提示提高鲁棒性。我们进一步提出了一种屏幕空间遮挡感知的外观混合技术，以通过插值源视图外观在辐射场上插入辐射场，但具有放松但近似的几何指导，以保留高质量的外观。为了评估我们的方法，我们介绍了构建具有显着复杂性和多样性的数据集的持续努力。数据集Genebody-1.0包括超过360m的370名受试者，在捕获多视觉摄像机下，执行各种姿势动作，以及各种身体形状，衣服，配饰，配饰和发型。与在所有交叉数据集，看不见的受试者和看不见的姿势设置中，基因Body-1.0和ZJU-mocap的实验表现出我们方法的鲁棒性更好。与尖端的案例相比，我们还证明了模型的竞争力。数据集，代码和模型将公开可用。

This work targets at using a general deep learning framework to synthesize free-viewpoint images of arbitrary human performers, only requiring a sparse number of camera views as inputs and skirting per-case fine-tuning. The large variation of geometry and appearance, caused by articulated body poses, shapes and clothing types, are the key bottlenecks of this task. To overcome these challenges, we present a simple yet powerful framework, named Generalizable Neural Performer (GNR), that learns a generalizable and robust neural body representation over various geometry and appearance. Specifically, we compress the light fields for novel view human rendering as conditional implicit neural radiance fields from both geometry and appearance aspects. We first introduce an Implicit Geometric Body Embedding strategy to enhance the robustness based on both parametric 3D human body model and multi-view images hints. We further propose a Screen-Space Occlusion-Aware Appearance Blending technique to preserve the high-quality appearance, through interpolating source view appearance to the radiance fields with a relax but approximate geometric guidance. To evaluate our method, we present our ongoing effort of constructing a dataset with remarkable complexity and diversity. The dataset GeneBody-1.0, includes over 360M frames of 370 subjects under multi-view cameras capturing, performing a large variety of pose actions, along with diverse body shapes, clothing, accessories and hairdos. Experiments on GeneBody-1.0 and ZJU-Mocap show better robustness of our methods than recent state-of-the-art generalizable methods among all cross-dataset, unseen subjects and unseen poses settings. We also demonstrate the competitiveness of our model compared with cutting-edge case-specific ones. Dataset, code and model will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题