通过Tridentalign和上下文嵌入的视觉跟踪

论文标题

通过Tridentalign和上下文嵌入的视觉跟踪

Visual Tracking by TridentAlign and Context Embedding

论文作者

Choi, Janghoon, Kwon, Junseok, Lee, Kyoung Mu

论文摘要

基于暹罗网络的视觉跟踪方法的最新进展已在众多跟踪基准上获得了高性能。但是，具有类似类别的目标对象和干扰物对象的广泛规模变化在视觉跟踪中始终提出挑战。为了解决这些持续存在的问题，我们提出了针对基于暹罗网络的视觉跟踪方法的新颖的三叉戟和上下文嵌入模块。 Tridentalign模块促进了对目标的广泛变化和较大变形的适应性，在该变化中，它将目标对象的特征表示分为多个空间维度，以形成特征金字塔，然后在区域建议阶段中使用。同时，上下文嵌入模块旨在通过考虑对象之间的全局上下文信息来区分目标对象。上下文嵌入模块提取并将给定框架的全局上下文信息嵌入到本地特征表示中，以便可以在最终分类阶段使用该信息。在多个基准数据集上获得的实验结果表明，所提出的跟踪器的性能与最新跟踪器的性能相当，而所提出的跟踪器则以实时速度运行。

Recent advances in Siamese network-based visual tracking methods have enabled high performance on numerous tracking benchmarks. However, extensive scale variations of the target object and distractor objects with similar categories have consistently posed challenges in visual tracking. To address these persisting issues, we propose novel TridentAlign and context embedding modules for Siamese network-based visual tracking methods. The TridentAlign module facilitates adaptability to extensive scale variations and large deformations of the target, where it pools the feature representation of the target object into multiple spatial dimensions to form a feature pyramid, which is then utilized in the region proposal stage. Meanwhile, context embedding module aims to discriminate the target from distractor objects by accounting for the global context information among objects. The context embedding module extracts and embeds the global context information of a given frame into a local feature representation such that the information can be utilized in the final classification stage. Experimental results obtained on multiple benchmark datasets show that the performance of the proposed tracker is comparable to that of state-of-the-art trackers, while the proposed tracker runs at real-time speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题