PIX2Shape：使用基于视图的表示形式从图像中无监督学习3D场景

论文标题

PIX2Shape：使用基于视图的表示形式从图像中无监督学习3D场景

Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation

论文作者

Rajeswar, Sai, Mannan, Fahim, Golemo, Florian, Parent-Lévesque, Jérôme, Vazquez, David, Nowrouzezahrai, Derek, Courville, Aaron

论文摘要

我们从单个输入映像中推断并生成三维（3D）场景信息，而无需监督。这个问题的探索不足，大多数先前的工作都依赖于3D地面真相，场景的多个图像，图像剪影或钥匙点的监督。我们提出PIX2SHAPE，这是一种用四个组成部分解决此问题的方法：（i）一个编码器，从图像中删除潜在的3D表示，（ii）一种解码器，一种解码器，产生了明确的2.5D表面的重建场景重建场景中的场景重建，从潜在的代码（iii）构成了一个可与surfel surnim a I Invriim the surfel crimim and Intriim a surfel surfel的网络（III），并）climectiv and criim and criim and Intriim ai Intriim and Intriim and（解码器渲染器和培训分布的图像产生。 PIX2Shape可以生成复杂的3D场景，以与视图相关的屏幕分辨率扩展，这与捕获世界空间分辨率的表示不同，即Voxels或网格。我们表明，Pix2Shape在其编码的潜在空间中学习了一个一致的场景表示形式，然后可以将解码器应用于此潜在表示，以便从新颖的角度合成场景。我们通过在Shapenet数据集上的实验以及我们开发的名为3D-IQTT的新型基准测试中评估PIX2Shape，以根据模型启用3D空间推理的能力来评估模型。定性和定量评估证明了Pix2Shape解决场景重建，生成和理解任务的能力。

We infer and generate three-dimensional (3D) scene information from a single input image and without supervision. This problem is under-explored, with most prior work relying on supervision from, e.g., 3D ground-truth, multiple images of a scene, image silhouettes or key-points. We propose Pix2Shape, an approach to solve this problem with four components: (i) an encoder that infers the latent 3D representation from an image, (ii) a decoder that generates an explicit 2.5D surfel-based reconstruction of a scene from the latent code (iii) a differentiable renderer that synthesizes a 2D image from the surfel representation, and (iv) a critic network trained to discriminate between images generated by the decoder-renderer and those from a training distribution. Pix2Shape can generate complex 3D scenes that scale with the view-dependent on-screen resolution, unlike representations that capture world-space resolution, i.e., voxels or meshes. We show that Pix2Shape learns a consistent scene representation in its encoded latent space and that the decoder can then be applied to this latent representation in order to synthesize the scene from a novel viewpoint. We evaluate Pix2Shape with experiments on the ShapeNet dataset as well as on a novel benchmark we developed, called 3D-IQTT, to evaluate models based on their ability to enable 3d spatial reasoning. Qualitative and quantitative evaluation demonstrate Pix2Shape's ability to solve scene reconstruction, generation, and understanding tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题