RGBT跟踪通过多适配网络具有分层差异损失

论文标题

RGBT跟踪通过多适配网络具有分层差异损失

RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss

论文作者

Lu, Andong, Li, Chenglong, Yan, Yuqing, Tang, Jin, Luo, Bin

论文摘要

自RGB和热红外数据具有强大的互补优势以来，RGBT跟踪引起了人们的关注，这可能会使跟踪器全天和全天候工作。但是，如何有效地表示用于视觉跟踪的RGBT数据仍然没有很好地研究。现有的作品通常专注于提取模式共享或模式特定的信息，但是在RGBT跟踪中，这两个提示的潜力并未得到很好的探索和利用。在本文中，我们提出了一个新型的多适配网络，以共同执行用于RGBT跟踪的模态共享，特定于模态和实例感知的目标表示学习。为此，我们在端到端的深度学习框架内设计了三种适配器。具体而言，我们使用修改后的VGG-M作为通用适配器来提取模态共享的目标表示形式。要在降低计算复杂性的同时提取特定于模态的特征，我们设计了一个模态适配器，这为每个层中的一般适配器增添了一个小块，并以平行方式为每个层增添。这样的设计可以学习具有适度数量参数的多级特定于模式特异性表示，因为绝大多数参数与一般适配器共享。我们还设计实例适配器以捕获特定目标的外观属性和时间变化。此外，为了增强共享和特定的特征，我们采用了多个内核最大均值差异的丢失来测量不同模态特征的分布差异，并将其集成到每个层中以进行更强大的表示。在两个RGBT跟踪基准数据集上进行的大量实验表明，针对最先进方法，提出的跟踪器的出色性能。

RGBT tracking has attracted increasing attention since RGB and thermal infrared data have strong complementary advantages, which could make trackers all-day and all-weather work. However, how to effectively represent RGBT data for visual tracking remains unstudied well. Existing works usually focus on extracting modality-shared or modality-specific information, but the potentials of these two cues are not well explored and exploited in RGBT tracking. In this paper, we propose a novel multi-adapter network to jointly perform modality-shared, modality-specific and instance-aware target representation learning for RGBT tracking. To this end, we design three kinds of adapters within an end-to-end deep learning framework. In specific, we use the modified VGG-M as the generality adapter to extract the modality-shared target representations.To extract the modality-specific features while reducing the computational complexity, we design a modality adapter, which adds a small block to the generality adapter in each layer and each modality in a parallel manner. Such a design could learn multilevel modality-specific representations with a modest number of parameters as the vast majority of parameters are shared with the generality adapter. We also design instance adapter to capture the appearance properties and temporal variations of a certain target. Moreover, to enhance the shared and specific features, we employ the loss of multiple kernel maximum mean discrepancy to measure the distribution divergence of different modal features and integrate it into each layer for more robust representation learning. Extensive experiments on two RGBT tracking benchmark datasets demonstrate the outstanding performance of the proposed tracker against the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题