segtad：通过语义分割的精确时间动作检测

论文标题

segtad：通过语义分割的精确时间动作检测

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

论文作者

Zhao, Chen, Ramazanova, Merey, Xu, Mengmeng, Ghanem, Bernard

论文摘要

时间动作检测（TAD）在视频分析中是一项重要但具有挑战性的任务。大多数现有作品从图像对象检测中汲取灵感，并倾向于将其重新将其重新形成为提案生成 - 分类问题。但是，此范式有两个警告。首先，建议不配备带注释的标签，这些标签必须在经验上编译，因此注释中的信息不一定精确地用于模型培训过程中。其次，动作的时间尺度有很大的变化，而忽略这一事实可能会导致视频功能中的不足表示。为了解决这些问题并精确地模拟时间动作检测，我们以新颖的语义分割角度制定了时间动作检测的任务。由于TAD的一维特性，我们能够自由将粗粒度检测注释转换为细粒的语义分割注释。我们利用它们来提供精确的监督，以减轻不精确的提案标签引起的影响。我们提出了一个由1D语义分割网络（1D-SSN）和提案检测网络（PDN）组成的端到端框架。

Temporal action detection (TAD) is an important yet challenging task in video analysis. Most existing works draw inspiration from image object detection and tend to reformulate it as a proposal generation - classification problem. However, there are two caveats with this paradigm. First, proposals are not equipped with annotated labels, which have to be empirically compiled, thus the information in the annotations is not necessarily precisely employed in the model training process. Second, there are large variations in the temporal scale of actions, and neglecting this fact may lead to deficient representation in the video features. To address these issues and precisely model temporal action detection, we formulate the task of temporal action detection in a novel perspective of semantic segmentation. Owing to the 1-dimensional property of TAD, we are able to convert the coarse-grained detection annotations to fine-grained semantic segmentation annotations for free. We take advantage of them to provide precise supervision so as to mitigate the impact induced by the imprecise proposal labels. We propose an end-to-end framework SegTAD composed of a 1D semantic segmentation network (1D-SSN) and a proposal detection network (PDN).

下载PDF全文

下载文献需遵守相关版权规定

论文标题