Motionsqueeze：用于视频理解的神经运动功能学习

论文标题

Motionsqueeze：用于视频理解的神经运动功能学习

MotionSqueeze: Neural Motion Feature Learning for Video Understanding

论文作者

Kwon, Heeseung, Kim, Manjin, Kwak, Suha, Cho, Minsu

论文摘要

运动在理解视频和大多数最先进的视频分类神经模型方面起着至关重要的作用，通常使用单独的现成方法提取的光流进行运动信息。由于逐帧的光流需要大量计算，因此，组合运动信息仍然是视频理解的主要计算瓶颈。在这项工作中，我们用内部和轻巧的运动功能来代替光流的外部和重计计算。我们提出了一个可训练的神经模块，称为Motionsqueeze，以进行有效的运动特征提取。它插入任何神经网络的中间，学会了建立跨帧的对应关系并将其转换为运动功能，这些功能很容易被馈送到下一个下游层以进行更好的预测。我们证明，所提出的方法在仅少量额外成本的四个标准基准上提供了可观的增益，以优于某些东西的V1和V2数据集。

Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information typically using optical flows extracted by a separate off-the-shelf method. As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding. In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features. We propose a trainable neural module, dubbed MotionSqueeze, for effective motion feature extraction. Inserted in the middle of any neural network, it learns to establish correspondences across frames and convert them into motion features, which are readily fed to the next downstream layer for better prediction. We demonstrate that the proposed method provides a significant gain on four standard benchmarks for action recognition with only a small amount of additional cost, outperforming the state of the art on Something-Something-V1&V2 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题