论文标题

感知综合网络:重新思考视频中的动作量表差异

Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos

论文作者

Tian, Yuan, Zhai, Guangtao, Gao, Zhiyong

论文摘要

CNNS堆叠固定大小的3D内核已部分解决了视频动作识别。但是,这些方法的表现可能仅用于仅捕获单个尺度空间中的刚性时空模式,同时忽略了不同动作原语的规模差异。为了克服这一限制,我们建议从数据中学习最佳尺度内核。更具体地说,提出了一个\ textIt {Action Perceptron合成器}来生成一袋固定尺寸内核的内核,这些内核通过密集的路由路径相互作用。为了确保相互作用的丰富度和路径的信息能力,我们设计了小说\ textIt {优化的特征融合层}。该层建立了一个原则上的通用范式,该范式足以涵盖首次涵盖当前大多数特征融合技术(例如,通道改组和通道辍学)。通过插入\ textIt {合成器},我们的方法可以轻松地使传统的2D CNN适应视频理解任务,例如带有边际额外计算成本的动作识别。该方法对几个具有挑战性的数据集进行了彻底评估(即,某种程度上,动力学和潜水48)高度需要时间推理或外观歧视,从而实现新的最先进的结果。特别是,我们的低分辨率模型的表现优于最近的强基线方法,即TSM和GST,其计算成本的30 \%。

Video action recognition has been partially addressed by the CNNs stacking of fixed-size 3D kernels. However, these methods may under-perform for only capturing rigid spatial-temporal patterns in single-scale spaces, while neglecting the scale variances across different action primitives. To overcome this limitation, we propose to learn the optimal-scale kernels from the data. More specifically, an \textit{action perceptron synthesizer} is proposed to generate the kernels from a bag of fixed-size kernels that are interacted by dense routing paths. To guarantee the interaction richness and the information capacity of the paths, we design the novel \textit{optimized feature fusion layer}. This layer establishes a principled universal paradigm that suffices to cover most of the current feature fusion techniques (e.g., channel shuffling, and channel dropout) for the first time. By inserting the \textit{synthesizer}, our method can easily adapt the traditional 2D CNNs to the video understanding tasks such as action recognition with marginal additional computation cost. The proposed method is thoroughly evaluated over several challenging datasets (i.e., Somehting-to-Somthing, Kinetics and Diving48) that highly require temporal reasoning or appearance discriminating, achieving new state-of-the-art results. Particularly, our low-resolution model outperforms the recent strong baseline methods, i.e., TSM and GST, with less than 30\% of their computation cost.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源