feater：通过基于特征地图的变压器进行人类重建的有效网络

论文标题

feater：通过基于特征地图的变压器进行人类重建的有效网络

FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

论文作者

Zheng, Ce, Mendieta, Matias, Yang, Taojiannan, Qi, Guo-Jun, Chen, Chen

论文摘要

最近，视觉变压器在一组人类重建任务中表现出了巨大的成功，例如2D人姿势估计（2D HPE），3D人类姿势估计（3D HPE）和人类网格重建（HMR）任务。在这些任务中，通常首先由CNN（例如HRNET）从图像中提取人类结构信息的特征图表示，然后通过变压器进一步处理以预测HPE或HMR的热图（将每个关节的位置编码为具有高斯分布的特征图）。但是，现有的变压器体系结构无法直接处理这些特征图输入，从而迫使对位置敏感的人类结构信息的不自然变平。此外，最近的HPE和HMR方法中的许多性能优势都是以不断增加的计算和内存需求为代价的。因此，为了同时解决这些问题，我们提出了Feater，这是一种新颖的变压器设计，在对注意力进行建模时，可以保留特征图表示的固有结构，同时降低记忆和计算成本。利用Feater，我们为一组人类重建任务建立了一个有效的网络，包括2D HPE，3D HPE和HMR。应用特征图重建模块用于改善估计的人姿势和网格的性能。广泛的实验证明了Feater对各种人姿势和网格数据集的有效性。例如，Feater在36M和3DPW数据集上需要5％的参数和16％的MAC来胜过SOTA方法网格图。项目网页是https://zczcwh.github.io/feater_page/。

Recently, vision transformers have shown great success in a set of human reconstruction tasks such as 2D human pose estimation (2D HPE), 3D human pose estimation (3D HPE), and human mesh reconstruction (HMR) tasks. In these tasks, feature map representations of the human structural information are often extracted first from the image by a CNN (such as HRNet), and then further processed by transformer to predict the heatmaps (encodes each joint's location into a feature map with a Gaussian distribution) for HPE or HMR. However, existing transformer architectures are not able to process these feature map inputs directly, forcing an unnatural flattening of the location-sensitive human structural information. Furthermore, much of the performance benefit in recent HPE and HMR methods has come at the cost of ever-increasing computation and memory needs. Therefore, to simultaneously address these problems, we propose FeatER, a novel transformer design that preserves the inherent structure of feature map representations when modeling attention while reducing memory and computational costs. Taking advantage of FeatER, we build an efficient network for a set of human reconstruction tasks including 2D HPE, 3D HPE, and HMR. A feature map reconstruction module is applied to improve the performance of the estimated human pose and mesh. Extensive experiments demonstrate the effectiveness of FeatER on various human pose and mesh datasets. For instance, FeatER outperforms the SOTA method MeshGraphormer by requiring 5% of Params and 16% of MACs on Human3.6M and 3DPW datasets. The project webpage is https://zczcwh.github.io/feater_page/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题