论文标题
通过代表性摘要知识传播弱监督的时间动作定位
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
论文作者
论文摘要
弱监督的时间动作本地化旨在将动作的时间边界定位,并同时使用视频级别类别标签确定其类别。许多现有的方法试图生成伪标签,以弥合分类和本地化之间的差异,但通常仅利用有限的上下文信息来生成伪标签。为了减轻这个问题,我们提出了代表性的摘要摘要和传播框架。我们的方法旨在在每个视频中开采代表性片段,以在视频片段之间传播信息,以生成更好的伪标签。对于每个视频,其自己的代表性片段和内存库中的代表性片段都会以内部和video的方式更新输入功能。伪标签是从更新功能的时间类激活图生成的,以纠正主分支的预测。与在Thumos14和ActivityNet1.3上的现有方法相比,我们的方法获得了卓越的性能,就Thumos14的平均地图而言,获得了高达1.2%的增长。
Weakly supervised temporal action localization aims to localize temporal boundaries of actions and simultaneously identify their categories with only video-level category labels. Many existing methods seek to generate pseudo labels for bridging the discrepancy between classification and localization, but usually only make use of limited contextual information for pseudo label generation. To alleviate this problem, we propose a representative snippet summarization and propagation framework. Our method seeks to mine the representative snippets in each video for propagating information between video snippets to generate better pseudo labels. For each video, its own representative snippets and the representative snippets from a memory bank are propagated to update the input features in an intra- and inter-video manner. The pseudo labels are generated from the temporal class activation maps of the updated features to rectify the predictions of the main branch. Our method obtains superior performance in comparison to the existing methods on two benchmarks, THUMOS14 and ActivityNet1.3, achieving gains as high as 1.2% in terms of average mAP on THUMOS14.