分解NERF通过特征场蒸馏编辑

论文标题

分解NERF通过特征场蒸馏编辑

Decomposing NeRF for Editing via Feature Field Distillation

论文作者

Kobayashi, Sosuke, Matsumoto, Eiichi, Sitzmann, Vincent

论文摘要

新兴的神经辐射场（NERF）是计算机图形的有前途的场景表示，从图像观测值中实现了高质量的3D重建和新型视图综合。但是，编辑由NERF代表的场景很具有挑战性，因为MLP或Voxel网格等基本连接主义表示并不是以对象为中心或组成。特别是，很难选择性地编辑特定区域或对象。在这项工作中，我们解决了NERFS语义场景分解的问题，以启用基于查询的3D场景的本地编辑。我们建议将现成的，自我监督的2D图像提取器（例如夹子LSEG或Dino）提炼成与Radiance场并行优化的3D特征场。给定用户指定的查询各种模式的查询，例如文本，图像补丁或点击选择，3D特征字段将语义分解3D空间而无需重新训练，并使我们能够在射频字段中选择和编辑区域。我们的实验验证了蒸馏特征领域（DFF）可以将2D视觉和语言基础模型中的最新进度转移到3D场景表示形式中，从而令人信服的3D分割和新兴神经图形表示的选择性编辑。

Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations. However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional. In particular, it has been difficult to selectively edit specific regions or objects. In this work, we tackle the problem of semantic scene decomposition of NeRFs to enable query-based local editing of the represented 3D scenes. We propose to distill the knowledge of off-the-shelf, self-supervised 2D image feature extractors such as CLIP-LSeg or DINO into a 3D feature field optimized in parallel to the radiance field. Given a user-specified query of various modalities such as text, an image patch, or a point-and-click selection, 3D feature fields semantically decompose 3D space without the need for re-training and enable us to semantically select and edit regions in the radiance field. Our experiments validate that the distilled feature fields (DFFs) can transfer recent progress in 2D vision and language foundation models to 3D scene representations, enabling convincing 3D segmentation and selective editing of emerging neural graphics representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题