论文标题
NERF-LOC:基于变压器的对象在神经辐射场中的定位
NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields
论文作者
论文摘要
近年来,神经辐射场(NERF)已成为一种广泛应用的场景表示技术,显示了机器人导航和操纵任务的优势。为了进一步推动NERF用于机器人技术的实用程序,我们建议一个基于变压器的框架Nerf-Loc,以在NERF场景中提取3D边界对象框。 NERF-LOC将预先训练的NERF模型和相机视图作为输入,并产生标记的,定向的,面向的3D边界对象作为输出。使用当前的NERF培训工具,机器人可以实时训练NERF环境模型,并使用我们的算法识别NERF内感兴趣的对象的3D边界框,以进行下游导航或操纵任务。具体而言,我们设计了一对平行的变压器编码器分支,即粗流和细流,以编码目标对象的上下文和详细信息。然后将编码的功能与注意层融合在一起,以减轻准确对象定位的歧义。我们已经将我们的方法与基于常规的RGB(-D)方法进行了比较,该方法以NERFS的呈现的RGB图像和深度为输入。我们的方法比基准更好。
Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input and produces labeled, oriented 3D bounding boxes of objects as output. Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks. Concretely, we design a pair of paralleled transformer encoder branches, namely the coarse stream and the fine stream, to encode both the context and details of target objects. The encoded features are then fused together with attention layers to alleviate ambiguities for accurate object localization. We have compared our method with conventional RGB(-D) based methods that take rendered RGB images and depths from NeRFs as inputs. Our method is better than the baselines.