深度感知的动作识别：通过时间热图编码的姿势运动

论文标题

深度感知的动作识别：通过时间热图编码的姿势运动

Depth-Aware Action Recognition: Pose-Motion Encoding through Temporal Heatmaps

论文作者

Segu, Mattia, Pirovano, Federico, Fumagalli, Gianmario, Fabris, Amedeo

论文摘要

大多数用于动作识别的最新方法仅依赖于编码外观，运动或姿势的2D空间特征。但是，2D数据缺乏深度信息，这对于识别细粒度的动作至关重要。在本文中，我们提出了一个深度感知的体积描述符，该描述符在统一表示中编码姿势和运动信息，以进行行动分类。我们的框架对行动识别所固有的许多挑战是强大的，例如视野，场景，衣服和身体形状的变化。我们方法的关键组成部分是深度感知的姿势运动表示（DA-POTION），这是一种编码人体语义关键的3D运动的新视频描述符。给定视频，我们使用最先进的3D人姿势回归器为每个框架生成人类的关节热图，并根据剪辑中的相对时间给它们每个框架的颜色代码。然后，我们为所有人类关节汇总了这样的3D时间编码的热图，以获得固定尺寸的描述符（DA-Potion），该尺寸适合使用浅3D卷积神经网络（CNN）进行分类。仅DA-Potion就定义了Penn Action数据集上的新最新技术。此外，我们通过将姿势运动描述符的固有互补性与外观方法相结合，通过将其与膨胀的3D Convnet（I3D）结合起来来定义JHMDB数据集中的新最新艺术。

Most state-of-the-art methods for action recognition rely only on 2D spatial features encoding appearance, motion or pose. However, 2D data lacks the depth information, which is crucial for recognizing fine-grained actions. In this paper, we propose a depth-aware volumetric descriptor that encodes pose and motion information in a unified representation for action classification in-the-wild. Our framework is robust to many challenges inherent to action recognition, e.g. variation in viewpoint, scene, clothing and body shape. The key component of our method is the Depth-Aware Pose Motion representation (DA-PoTion), a new video descriptor that encodes the 3D movement of semantic keypoints of the human body. Given a video, we produce human joint heatmaps for each frame using a state-of-the-art 3D human pose regressor and we give each of them a unique color code according to the relative time in the clip. Then, we aggregate such 3D time-encoded heatmaps for all human joints to obtain a fixed-size descriptor (DA-PoTion), which is suitable for classifying actions using a shallow 3D convolutional neural network (CNN). The DA-PoTion alone defines a new state-of-the-art on the Penn Action Dataset. Moreover, we leverage the intrinsic complementarity of our pose motion descriptor with appearance based approaches by combining it with Inflated 3D ConvNet (I3D) to define a new state-of-the-art on the JHMDB Dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题