学习3D人类姿势估算从数十个数据集中使用几何感知自动编码器来桥接之间的骨骼格式之间

论文标题

学习3D人类姿势估算从数十个数据集中使用几何感知自动编码器来桥接之间的骨骼格式之间

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

论文作者

Sárándi, István, Hermans, Alexander, Leibe, Bastian

论文摘要

基于深度学习的3D人姿势估计在接受大量标记数据的培训时，效果最佳，从许多数据集中学习成为重要的研究方向。这项工作的一个障碍是不同数据集提供的不同骨骼格式，即它们不标记相同的一组解剖标记。关于如何最好地使用此类差异标签来最好地监督一个模型的研究几乎没有研究。我们表明，仅将单独的输出头用于不同的骨架，就会导致深度估计不一致和跨骨架的信息共享不足。作为一种补救措施，我们提出了一种新型的仿生构图自动编码器（ACAE）方法，以减少地标的数量。发现的潜在3D点捕获了骨骼之间的冗余，在用于一致性正则化时可以增强信息共享。我们的方法缩放到极端的多数据集团制度，在该制度中，我们使用28 3D人姿势数据集监督一个模型，该模型在一系列基准方面都超过了先前的工作，包括野外（3DPW）数据集中具有挑战性的3D姿势。我们的代码和模型可用于研究目的。

Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题