一切姿势：朝向类别不合时宜的姿势估计

论文标题

一切姿势：朝向类别不合时宜的姿势估计

Pose for Everything: Towards Category-Agnostic Pose Estimation

论文作者

Xu, Lumin, Jin, Sheng, Zeng, Wang, Liu, Wentao, Qian, Chen, Ouyang, Wanli, Luo, Ping, Wang, Xiaogang

论文摘要

2D姿势估计的现有作品主要集中在某个类别上，例如人，动物和车辆。但是，有很多应用程序方案需要检测看不见的对象类别的姿势/关键点。在本文中，我们介绍了类别不稳定姿势估计（CAPE）的任务，该任务旨在创建一个姿势估计模型，能够在只有几个具有关键点定义的样本的情况下检测任何类别对象的姿势。为了实现这一目标，我们将姿势估计问题作为关键点匹配问题制定，并设计一个新颖的Cape框架，称为姿势匹配网络（POMNET）。提出了基于变压器的关键点交互模块（KIM），以捕获不同关键点之间的相互作用以及支持图像和查询图像之间的关系。我们还介绍了多类姿势（MP-100）数据集，该数据集是100个对象类别的2D姿势数据集，该数据集包含20K实例，并且针对开发CAPE算法进行了精心设计。实验表明，我们的方法的表现优于其他基线方法。代码和数据可在https://github.com/luminxu/pose-for-venthing上找到。

Existing works on 2D pose estimation mainly focus on a certain category, e.g. human, animal, and vehicle. However, there are lots of application scenarios that require detecting the poses/keypoints of the unseen class of objects. In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. To achieve this goal, we formulate the pose estimation problem as a keypoint matching problem and design a novel CAPE framework, termed POse Matching Network (POMNet). A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images. We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms. Experiments show that our method outperforms other baseline approaches by a large margin. Codes and data are available at https://github.com/luminxu/Pose-for-Everything.

下载PDF全文

下载文献需遵守相关版权规定

论文标题