论文标题
PAN:通过学习持续的外观实现快速行动识别
PAN: Towards Fast Action Recognition via Learning Persistence of Appearance
论文作者
论文摘要
有效地对视频中的动态运动信息进行建模对于动作识别任务至关重要。大多数最先进的方法都在很大程度上依赖于致密的光流作为运动表示。尽管将光流与RGB框架作为输入相结合可以实现出色的识别性能,但光流提取非常耗时。毫无疑问,这将违反实时行动认可。在本文中,我们通过提高对光流的依赖来阐明快速动作识别。我们的动机在于观察到,运动边界的较小位移是区分动作的最关键成分,因此我们设计了一种新型的运动提示,称为外观持续性(PA)。与光流相反,我们的PA更多地侧重于在边界上提取运动信息。同样,仅通过在特征空间中积累像素的差异,而不是对所有可能的运动向量进行详尽的搜索搜索,则更有效。就运动建模速度而言,我们的PA比传统的光流比传统的光流快1000倍(8196fps vs. 8fps)。为了进一步汇总PA中的短期动力学到长期动态,我们还设计了一种称为各种计时聚合池(VAP)的全球时间融合策略(VAP),该策略可以自适应地模拟各个时间范围内的长期时间关系。我们最终将所提出的PA和VAP合并为具有强大的时间建模能力的统一框架,称为持久性外观网络(PAN)。对六个具有挑战性的动作识别基准测试的广泛实验验证了我们的锅在低失败的最新方法上的表现。代码和模型可在以下网址提供:https://github.com/zhang-can/pan-pytorch。
Efficiently modeling dynamic motion information in videos is crucial for action recognition task. Most state-of-the-art methods heavily rely on dense optical flow as motion representation. Although combining optical flow with RGB frames as input can achieve excellent recognition performance, the optical flow extraction is very time-consuming. This undoubtably will count against real-time action recognition. In this paper, we shed light on fast action recognition by lifting the reliance on optical flow. Our motivation lies in the observation that small displacements of motion boundaries are the most critical ingredients for distinguishing actions, so we design a novel motion cue called Persistence of Appearance (PA). In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries. Also, it is more efficient by only accumulating pixel-wise differences in feature space, instead of using exhaustive patch-wise search of all the possible motion vectors. Our PA is over 1000x faster (8196fps vs. 8fps) than conventional optical flow in terms of motion modeling speed. To further aggregate the short-term dynamics in PA to long-term dynamics, we also devise a global temporal fusion strategy called Various-timescale Aggregation Pooling (VAP) that can adaptively model long-range temporal relationships across various timescales. We finally incorporate the proposed PA and VAP to form a unified framework called Persistent Appearance Network (PAN) with strong temporal modeling ability. Extensive experiments on six challenging action recognition benchmarks verify that our PAN outperforms recent state-of-the-art methods at low FLOPs. Codes and models are available at: https://github.com/zhang-can/PAN-PyTorch.