论文标题

反应:随关系查询的时间动作检测

ReAct: Temporal Action Detection with Relational Queries

论文作者

Shi, Dingfeng, Zhong, Yujie, Cao, Qiong, Zhang, Jing, Ma, Lin, Li, Jia, Tao, Dacheng

论文摘要

这项工作旨在使用与DETR相似的编码器框架来推进时间动作检测(TAD)(TAD),该框架在对象检测中表现出了很大的成功。但是,如果直接应用于TAD,该框架遇到了几个问题:解码器中的Query关系之间的探索不足,由于培训样本数量有限,分类培训不足以及推断时不可靠的分类得分。为此,我们首先提出了解码器中的关系注意机制,该机制基于其关系来指导查询之间的注意力。此外,我们提出了两种损失,以促进和稳定行动分类的培训。最后,我们建议在推理时预测每个动作查询的本地化质量,以区分高质量的查询。所提出的方法命名为React,在Thumos14上实现了最新性能,计算成本比以前的方法低得多。此外,还进行了广泛的消融研究,以验证每个提出的组件的有效性。该代码可从https://github.com/sssste/reaeact获得。

This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries, similar to DETR, which has shown great success in object detection. However, the framework suffers from several problems if directly applied to TAD: the insufficient exploration of inter-query relation in the decoder, the inadequate classification training due to a limited number of training samples, and the unreliable classification scores at inference. To this end, we first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations. Moreover, we propose two losses to facilitate and stabilize the training of action classification. Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries. The proposed method, named ReAct, achieves the state-of-the-art performance on THUMOS14, with much lower computational costs than previous methods. Besides, extensive ablation studies are conducted to verify the effectiveness of each proposed component. The code is available at https://github.com/sssste/React.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源