论文标题
Monodistill:学习空间特征的单眼3D对象检测
MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
论文作者
论文摘要
3D对象检测是3D场景理解的一项基本且具有挑战性的任务,基于单眼的方法可以作为基于立体声或基于激光雷达的方法的经济替代方法。但是,由于缺乏空间提示,从单个图像中准确检测到3D空间中的物体非常困难。为了减轻此问题,我们提出了一个简单有效的方案,将从激光雷达信号的空间信息引入单程3D探测器,而无需在推理阶段引入任何额外的成本。特别是,我们首先将LiDAR信号投影到图像平面上,并将其与RGB图像对齐。之后,我们使用所得数据训练与基线模型相同架构的3D检测器(LIDAR NET)。最后,这个LiDar Net可以作为教师将学习知识转移到基线模型的老师。实验结果表明,所提出的方法可以显着提高基线模型的性能,并将基于KITTI基准的所有基于单眼方法的$ 1^{st} $排名。此外,还进行了广泛的消融研究,这进一步证明了我们设计的每个部分的有效性,并说明了基线模型从激光雷达网中学到了什么。我们的代码将在\ url {https://github.com/monster-ghost/monodistill}发布。
3D object detection is a fundamental and challenging task for 3D scene understanding, and the monocular-based methods can serve as an economical alternative to the stereo-based or LiDAR-based methods. However, accurately detecting objects in the 3D space from a single image is extremely difficult due to the lack of spatial cues. To mitigate this issue, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. In particular, we first project the LiDAR signals into the image plane and align them with the RGB images. After that, we use the resulting data to train a 3D detector (LiDAR Net) with the same architecture as the baseline model. Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model. Experimental results show that the proposed method can significantly boost the performance of the baseline model and ranks the $1^{st}$ place among all monocular-based methods on the KITTI benchmark. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each part of our designs and illustrate what the baseline model has learned from the LiDAR Net. Our code will be released at \url{https://github.com/monster-ghost/MonoDistill}.