制定单眼线索和自我监管的多框架深度学习的速度指南

论文标题

制定单眼线索和自我监管的多框架深度学习的速度指南

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

论文作者

Wang, Xiaofeng, Zhu, Zheng, Huang, Guan, Chi, Xu, Ye, Yun, Chen, Ziwei, Wang, Xingang

论文摘要

自我监督的单眼方法可以有效地了解弱纹理表面或反射性对象的深度信息。但是，由于单眼几何建模的固有歧义，深度精度受到限制。相反，由于多视图立体声（MVS）的成功，多帧深度估计方法提高了深度精度，后者直接使用几何约束。不幸的是，MV通常患有无纹理区域，非斜面表面和移动对象，尤其是在没有已知的相机运动和深度监督的现实世界视频序列中。因此，我们提出了MoveEpth，它利用了单眼线索和速度指导来改善多框架深度学习。与现有的方法在MVS深度和单眼深度之间实现一致性的方法不同，MoveEpth通过直接解决MVS的固有问题来增强多帧深度学习。我们方法的关键是利用单眼深度作为几何优先级，以构建MVS成本量，并根据预测的相机速度的指导来调整成本量的深度候选。通过学习成本量的不确定性，我们进一步融合了单眼深度和MVS深度，从而导致深度估计，以抵抗多视图几何形状的歧义。广泛的实验表明，移动性达到最新性能：与Monodepth2和Packnet相比，我们的方法相对将深度精度提高了20 \％和19.8％，而Kitti Benchmark的深度精度则提高了。 MoveEpth还概括了更具挑战性的DDAD基准，相对超过7.2 \％的ManutyDepth。该代码可在https://github.com/jeffwang987/meverepth上找到。

Self-supervised monocular methods can efficiently learn depth information of weakly textured surfaces or reflective objects. However, the depth accuracy is limited due to the inherent ambiguity in monocular geometric modeling. In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints. Unfortunately, MVS often suffers from texture-less regions, non-Lambertian surfaces, and moving objects, especially in real-world video sequences without known camera motion and depth supervision. Therefore, we propose MOVEDepth, which exploits the MOnocular cues and VElocity guidance to improve multi-frame Depth learning. Unlike existing methods that enforce consistency between MVS depth and monocular depth, MOVEDepth boosts multi-frame depth learning by directly addressing the inherent problems of MVS. The key of our approach is to utilize monocular depth as a geometric priority to construct MVS cost volume, and adjust depth candidates of cost volume under the guidance of predicted camera velocity. We further fuse monocular depth and MVS depth by learning uncertainty in the cost volume, which results in a robust depth estimation against ambiguity in multi-view geometry. Extensive experiments show MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and PackNet, our method relatively improves the depth accuracy by 20\% and 19.8\% on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD benchmark, relatively outperforming ManyDepth by 7.2\%. The code is available at https://github.com/JeffWang987/MOVEDepth.

下载PDF全文

下载文献需遵守相关版权规定

论文标题