论文标题
3D对象辅助自我监督的单眼估计
3D Object Aided Self-Supervised Monocular Depth Estimation
论文作者
论文摘要
在机器人视觉,自动驾驶和3D场景理解等领域已经积极研究了单眼深度估计。给定一系列颜色图像,基于结构 - 运动框架(SFM)的框架的无监督学习方法同时预测了深度和相机相对姿势。但是,场景中动态移动的对象违反了静态世界的假设,导致动态对象的深度不准确。在这项工作中,我们提出了一种通过单眼3D对象检测来解决此类动态对象运动的新方法。具体而言,我们首先在图像中检测3D对象,并在使静态像素与固定背景相对应的静态像素时构建动态像素的每个像素对应关系,并构建与摄像机运动建模的静态像素。这样,每个像素的深度都可以通过有意义的几何模型来学习。此外,将物体检测为具有绝对尺度的立方体,用于消除单眼视力固有的规模歧义问题。 Kitti深度数据集的实验表明,我们的方法实现了最新的性能以进行深度估计。此外,深度,相机运动和物体姿势的联合训练还改善了单眼3D对象检测性能。据我们所知,这是第一项允许单眼3D对象检测网络以自我监督的方式进行微调的工作。
Monocular depth estimation has been actively studied in fields such as robot vision, autonomous driving, and 3D scene understanding. Given a sequence of color images, unsupervised learning methods based on the framework of Structure-From-Motion (SfM) simultaneously predict depth and camera relative pose. However, dynamically moving objects in the scene violate the static world assumption, resulting in inaccurate depths of dynamic objects. In this work, we propose a new method to address such dynamic object movements through monocular 3D object detection. Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose while leaving the static pixels corresponding to the rigid background to be modeled with camera motion. In this way, the depth of every pixel can be learned via a meaningful geometry model. Besides, objects are detected as cuboids with absolute scale, which is used to eliminate the scale ambiguity problem inherent in monocular vision. Experiments on the KITTI depth dataset show that our method achieves State-of-The-Art performance for depth estimation. Furthermore, joint training of depth, camera motion and object pose also improves monocular 3D object detection performance. To the best of our knowledge, this is the first work that allows a monocular 3D object detection network to be fine-tuned in a self-supervised manner.