3D姿势估计的自我监督的多视图同步学习

论文标题

3D姿势估计的自我监督的多视图同步学习

Self-Supervised Multi-View Synchronization Learning for 3D Pose Estimation

论文作者

Jenni, Simon, Favaro, Paolo

论文摘要

当前的最新方法通过训练大量图像数据集和相应的骨骼姿势来训练神经网络，将单眼3D人类姿势估计作为学习问题。相比之下，我们提出了一种方法，该方法可以通过对（大）未标记的数据集进行自我监督的学习来预先训练的网络来利用小注释的数据集。为了使此类网络在预训练步骤中支持3D姿势估计，我们介绍了一项新型的自我监管功能学习任务，旨在专注于图像中的3D结构。我们利用从使用多视频相机系统捕获的视频中提取的图像。任务是分类两个图像是否将同一场景的两个视图描绘成刚性转换。在多视图数据集中，其中对象以非刚性方式变形，仅在同一时间（即同步时）进行两个视图之间发生刚性转换。我们证明了同步任务对人为360万数据集的有效性，并实现了最先进的结果。

Current state-of-the-art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses. In contrast, we propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets. To drive such networks towards supporting 3D pose estimation during the pre-training step, we introduce a novel self-supervised feature learning task designed to focus on the 3D structure in an image. We exploit images extracted from videos captured with a multi-view camera system. The task is to classify whether two images depict two views of the same scene up to a rigid transformation. In a multi-view data set, where objects deform in a non-rigid manner, a rigid transformation occurs only between two views taken at the exact same time, i.e., when they are synchronized. We demonstrate the effectiveness of the synchronization task on the Human3.6M data set and achieve state-of-the-art results in 3D human pose estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题