通过神经动画网格进行人体绩效建模和渲染

论文标题

通过神经动画网格进行人体绩效建模和渲染

Human Performance Modeling and Rendering via Neural Animated Mesh

论文作者

Zhao, Fuqiang, Jiang, Yuheng, Yao, Kaixin, Zhang, Jiakai, Wang, Liao, Dai, Haizhao, Zhong, Yuhui, Zhang, Yingliang, Wu, Minye, Xu, Lan, Yu, Jingyi

论文摘要

最近，我们看到了光真实的人类建模和渲染的神经进步的巨大进步。但是，将它们集成到现有的下游应用程序的基于网格的管道中仍然具有挑战性。在本文中，我们介绍了一种全面的神经方法，用于高质量重建，压缩和对人类表演的呈现，来自密集的多视频视频。我们的核心直觉是用一系列高效的神经技术桥接传统的动画网格工作流程。我们首先在几分钟内引入了一个用于高质量表面产生的神经表面重建器。它与截断的签名距离场（TSDF）的隐式体积渲染与多分辨率哈希编码相结合。我们进一步提出了一个混合神经跟踪器来生成动画网格，该网格结合了自我监督框架中明确的非刚性跟踪和隐式动态变形。前者将粗糙的翘曲提供回到规范空间中，而后者隐含的翘曲进一步预测了使用4D哈希编码的位移，如我们的重建器中。然后，我们使用获得的动画网格讨论渲染方案，从动态纹理到各种带宽设置下的Lumigraph渲染。为了在质量和带宽之间取得复杂的平衡，我们通过首先渲染6个虚拟视图来涵盖表演者，然后进行闭塞感知的神经纹理混合，提出一个分层解决方案。我们在各种平台上（即，通过移动AR或带有VR头戴式头饰的人才观看人才节目将虚拟人类表演插入真实环境中，我们都证明了方法在各种基于网格的应用程序和照片现实的自由观看体验中的功效。

We have recently seen tremendous progress in the neural advances for photo-real human modeling and rendering. However, it's still challenging to integrate them into an existing mesh-based pipeline for downstream applications. In this paper, we present a comprehensive neural approach for high-quality reconstruction, compression, and rendering of human performances from dense multi-view videos. Our core intuition is to bridge the traditional animated mesh workflow with a new class of highly efficient neural techniques. We first introduce a neural surface reconstructor for high-quality surface generation in minutes. It marries the implicit volumetric rendering of the truncated signed distance field (TSDF) with multi-resolution hash encoding. We further propose a hybrid neural tracker to generate animated meshes, which combines explicit non-rigid tracking with implicit dynamic deformation in a self-supervised framework. The former provides the coarse warping back into the canonical space, while the latter implicit one further predicts the displacements using the 4D hash encoding as in our reconstructor. Then, we discuss the rendering schemes using the obtained animated meshes, ranging from dynamic texturing to lumigraph rendering under various bandwidth settings. To strike an intricate balance between quality and bandwidth, we propose a hierarchical solution by first rendering 6 virtual views covering the performer and then conducting occlusion-aware neural texture blending. We demonstrate the efficacy of our approach in a variety of mesh-based applications and photo-realistic free-view experiences on various platforms, i.e., inserting virtual human performances into real environments through mobile AR or immersively watching talent shows with VR headsets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题