无监督的域适应时空作用定位

论文标题

无监督的域适应时空作用定位

Unsupervised Domain Adaptation for Spatio-Temporal Action Localization

论文作者

Agarwal, Nakul, Chen, Yi-Ting, Dariush, Behzad, Yang, Ming-Hsuan

论文摘要

时空动作定位是计算机视觉中的一个重要问题，涉及检测活动何时何地发生，因此需要对空间和时间特征进行建模。这个问题通常是在监督学习的背景下提出的，在监督学习的背景下，学习的分类器在以下前提下运行，即培训和测试数据都是从相同的基础分布中采样的。但是，当存在明显的域移动时，该假设并不成立，从而导致对测试数据的概括性能差。为了解决这个问题，我们专注于概括训练模型的艰巨而新颖的任务，以测试样品，而无需访问后者的任何标签，以通过提出端到端的无监督域适应算法来进行时空动作定位。我们将最新的对象检测框架扩展到本地化和分类操作。为了最大程度地减少域移位，设计和集成了图像级别（时间和空间）和实例级别（时间）的三个域自适应模块。我们设计了一个新的实验设置，并评估了UCF-Sports，UCF-101和JHMDB基准数据集的建议方法和不同的适应模块。我们表明，当分别调整空间和时间特征或共同进行最有效的结果时，可以实现显着的性能增长。

Spatio-temporal action localization is an important problem in computer vision that involves detecting where and when activities occur, and therefore requires modeling of both spatial and temporal features. This problem is typically formulated in the context of supervised learning, where the learned classifiers operate on the premise that both training and test data are sampled from the same underlying distribution. However, this assumption does not hold when there is a significant domain shift, leading to poor generalization performance on the test data. To address this, we focus on the hard and novel task of generalizing training models to test samples without access to any labels from the latter for spatio-temporal action localization by proposing an end-to-end unsupervised domain adaptation algorithm. We extend the state-of-the-art object detection framework to localize and classify actions. In order to minimize the domain shift, three domain adaptation modules at image level (temporal and spatial) and instance level (temporal) are designed and integrated. We design a new experimental setup and evaluate the proposed method and different adaptation modules on the UCF-Sports, UCF-101 and JHMDB benchmark datasets. We show that significant performance gain can be achieved when spatial and temporal features are adapted separately, or jointly for the most effective results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题