有效的：通过神经架构搜索有效的人姿势估计

论文标题

有效的：通过神经架构搜索有效的人姿势估计

EfficientPose: Efficient Human Pose Estimation with Neural Architecture Search

论文作者

Zhang, Wenqiang, Fang, Jiemin, Wang, Xinggang, Liu, Wenyu

论文摘要

在许多多媒体应用中，人类姿势估计是至关重要的任务。以前的方法具有出色的性能，但很少考虑效率，这使得很难在资源受限的设备上实现网络。如今，实时多媒体应用程序要求更有效的模型以进行更好的交互。此外，大多数用于姿势估计的深度神经网络直接重复使用为图像分类设计为骨干的网络，这些网络尚未针对姿势估计任务进行优化。在本文中，我们提出了一个针对人类姿势估计的有效框架，包括两个部分，有效的骨干和有效的头部。通过实施可区分的神经体系结构搜索方法，我们可以自定义骨干网络设计以进行姿势估计，并以微不足道的准确性降解来降低计算成本。对于有效的头部，我们减少了转移的卷积，并提出了空间信息校正模块，以促进最终预测的性能。在实验中，我们在MPII和可可数据集上评估了我们的网络。我们最小的型号仅具有0.65 GFLOPS，MPII上只有88.1％的[email protected]，并且我们的大型模型只有2个GFLOPS，而其准确性与最先进的大型型号（即具有9.5 Gflops的HRNET）具有竞争力。

Human pose estimation from image and video is a vital task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on resource-constrained devices. Nowadays real-time multimedia applications call for more efficient models for better interactions. Moreover, most deep neural networks for pose estimation directly reuse the networks designed for image classification as the backbone, which are not yet optimized for the pose estimation task. In this paper, we propose an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head. By implementing the differentiable neural architecture search method, we customize the backbone network design for pose estimation and reduce the computation cost with negligible accuracy degradation. For the efficient head, we slim the transposed convolutions and propose a spatial information correction module to promote the performance of the final prediction. In experiments, we evaluate our networks on the MPII and COCO datasets. Our smallest model has only 0.65 GFLOPs with 88.1% [email protected] on MPII and our large model has only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model, i.e., HRNet with 9.5 GFLOPs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题