相关性的深度跟踪

论文标题

Correlation-Aware Deep Tracking

论文作者

Xie, Fei, Wang, Chunyu, Wang, Guangting, Cao, Yue, Yang, Wankou, Zeng, Wenjun

论文摘要

鲁棒性和歧视能力是视觉对象跟踪中的两个基本要求。在大多数跟踪范式中，我们发现流行的类似暹罗的网络提取的功能无法完全歧视跟踪的目标和干扰物对象，从而阻碍了它们同时满足这两个要求。虽然大多数方法着重于设计可靠的相关操作，但我们提出了一个受自我/跨意义方案启发的新型目标依赖性特征网络。与类似暹罗的特征提取相反，我们的网络将特征网络的跨图像相关性深层嵌入到功能网络的多层中。通过通过多层将两个图像的功能广泛匹配，它可以抑制非目标功能，从而在实例变化的功能提取中。搜索映像的输出功能可直接用于预测目标位置，而无需额外的相关步骤。此外，我们的模型可以在丰富的未配对图像上灵活地进行训练，从而使收敛速度明显比现有方法更快。广泛的实验表明，我们的方法在实时运行时实现了最先进的结果。我们的功能网络也可以应用于现有的跟踪管道无缝以提高跟踪性能。代码将可用。

Robustness and discrimination power are two fundamental requirements in visual object tracking. In most tracking paradigms, we find that the features extracted by the popular Siamese-like networks cannot fully discriminatively model the tracked targets and distractor objects, hindering them from simultaneously meeting these two requirements. While most methods focus on designing robust correlation operations, we propose a novel target-dependent feature network inspired by the self-/cross-attention scheme. In contrast to the Siamese-like feature extraction, our network deeply embeds cross-image feature correlation in multiple layers of the feature network. By extensively matching the features of the two images through multiple layers, it is able to suppress non-target features, resulting in instance-varying feature extraction. The output features of the search image can be directly used for predicting target locations without extra correlation step. Moreover, our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods. Extensive experiments show our method achieves the state-of-the-art results while running at real-time. Our feature networks also can be applied to existing tracking pipelines seamlessly to raise the tracking performance. Code will be available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题