论文标题

探索视频总结的全球多样性和本地环境

Exploring Global Diversity and Local Context for Video Summarization

论文作者

Pan, Yingchao, Huang, Ouhan, Ye, Qinghao, Li, Zhongjin, Wang, Wenjiang, Li, Guodun, Chen, Yuxing

论文摘要

视频摘要旨在自动生成多样的简明摘要,这在大规模的视频处理中很有用。大多数方法倾向于在视频框架上采用自我发挥的机制,这无法模拟视频帧的多样性。为了减轻这个问题,我们重新审视了自我注意的机制的成对相似性测量,发现现有的内部产品亲和力会导致判别特征,而不是多元化的特征。鉴于这种现象,我们提出了全球多元化的关注,它使用平方的欧几里得距离来计算亲和力。此外,我们通过新颖的本地上下文关注来对局部上下文信息进行建模,以消除视频中的冗余。通过结合这两种注意机制,开发了具有多元化上下文注意力方案的视频摘要模型,即sum-dca。在基准数据集上进行了广泛的实验,以验证SUM-DCA的有效性和优越性,而基于F评分和基于等级的评估,而没有任何铃铛和哨声。

Video summarization aims to automatically generate a diverse and concise summary which is useful in large-scale video processing. Most of the methods tend to adopt self-attention mechanism across video frames, which fails to model the diversity of video frames. To alleviate this problem, we revisit the pairwise similarity measurement in self-attention mechanism and find that the existing inner-product affinity leads to discriminative features rather than diversified features. In light of this phenomenon, we propose global diverse attention which uses the squared Euclidean distance instead to compute the affinities. Moreover, we model the local contextual information by novel local contextual attention to remove the redundancy in the video. By combining these two attention mechanisms, a video SUMmarization model with Diversified Contextual Attention scheme is developed, namely SUM-DCA. Extensive experiments are conducted on benchmark data sets to verify the effectiveness and the superiority of SUM-DCA in terms of F-score and rank-based evaluation without any bells and whistles.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源