单眼视频的自我监督的人类深度估计

论文标题

单眼视频的自我监督的人类深度估计

Self-Supervised Human Depth Estimation from Monocular Videos

论文作者

Tan, Feitong, Zhu, Hao, Cui, Zhaopeng, Zhu, Siyu, Pollefeys, Marc, Tan, Ping

论文摘要

以前关于估算详细人类深度的方法通常需要使用“地面真相”深度数据进行监督培训。本文提出了一种自我监督的方法，可以在YouTube视频上进行训练而无需已知的深度，这使培训数据收集变得简单并改善了学习网络的概括。自我监督的学习是通过最大程度地降低光势损失来实现的，这是根据估计的深度和人体的3D非刚性运动在视频框架及其相邻框架之间进行评估的。为了解决这种非刚性运动，我们首先在每个视频框架上估算一个粗糙的SMPL模型，并相应地计算非刚性身体运动，这使得在估计形状细节时可以自我监督学习。实验表明，我们的方法享有更好的概括，并且在野外数据上的表现更好。

Previous methods on estimating detailed human depth often require supervised training with `ground truth' depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.

下载PDF全文

下载文献需遵守相关版权规定

论文标题