论文标题
语义引导的代表性学习,用于自我监督的单眼深度
Semantically-Guided Representation Learning for Self-Supervised Monocular Depth
论文作者
论文摘要
使用几何学作为唯一的监督来源,自我监督的学习对单眼深度估计表现出了巨大的希望。深度网络确实能够通过隐式利用类别级别的模式来学习将视觉外观与3D属性相关联的学习表示。在这项工作中,我们调查了如何更直接地利用这种语义结构来指导几何表示学习,同时仍留在自我监督的制度中。我们提出了一种新的体系结构,利用固定的验证语义分割网络来指导通过像素适应性的卷积来指导自我审议的表示学习,而不是在多任务方法中使用语义标签和代理损失。此外,我们提出了一个两阶段的训练过程,以通过重新采样来克服动态对象的常见语义偏见。我们的方法改善了对所有像素,细粒细节和每个语义类别的自我监督的单眼预测的最新预测。
Self-supervised learning is showing great promise for monocular depth estimation, using geometry as the only source of supervision. Depth networks are indeed capable of learning representations that relate visual appearance to 3D properties by implicitly leveraging category-level patterns. In this work we investigate how to leverage more directly this semantic structure to guide geometric representation learning, while remaining in the self-supervised regime. Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions. Furthermore, we propose a two-stage training process to overcome a common semantic bias on dynamic objects via resampling. Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.