通过多个模板和时间网络改善具有光或无培训的基于暹罗的跟踪器

论文标题

通过多个模板和时间网络改善具有光或无培训的基于暹罗的跟踪器

Improving Siamese Based Trackers with Light or No Training through Multiple Templates and Temporal Network

论文作者

Sekhavati, Ali, Lee, Won-Sook

论文摘要

通常需要高计算能力和大量时间来培训大型数据集中的基于深度学习的跟踪器。根据许多因素，培训可能并不总是一种选择。在本文中，我们提出了一个关于基于暹罗的跟踪器的框架。（i）以一种消除了重新制定网络的需求的方式扩展模板，以及（ii）一个轻巧的时间网络，其新型体系结构重点介绍了可以独立于跟踪器使用的本地和全局信息。大多数基于暹罗的跟踪器仅依靠第一帧作为物体的基础真理，而当目标的外观在存在类似干扰物的情况下在随后的框架中显着变化时。有些跟踪器使用多个模板来更新的多个模板，或者它们替换了仅使用更相似的模板来更新的模板。与以前的作品不同，我们使用自适应阈值，以相似的模板以及那些略有多样的模板更新袋子。自适应阈值也会导致对恒定阈值的总体改进。此外，在网络的最后阶段获得的每个模板获得的混合特征图消除了重新训练跟踪器的需求。我们提出的轻巧的时间网络组合，仅使用对象坐标来了解不同对象的路径历史记录，并预测目标在下一帧中的潜在位置。它是独立的跟踪器，并且将其应用于新跟踪器不需要进一步的培训。通过实现这些想法，Trackers的性能在所有测试的数据集上都改进了，包括Lasot，Lasot扩展，TrackingNet，OTB100，OTB50，OTB50，UAV123和UAV20L。实验表明所提出的框架与基于卷积和变压器的跟踪器都很好地工作。本文的官方Python代码将在出版后公开获得。

High computational power and significant time are usually needed to train a deep learning based tracker on large datasets. Depending on many factors, training might not always be an option. In this paper, we propose a framework with two ideas on Siamese-based trackers. (i) Extending number of templates in a way that removes the need to retrain the network and (ii) a lightweight temporal network with a novel architecture focusing on both local and global information that can be used independently from trackers. Most Siamese-based trackers only rely on the first frame as the ground truth for objects and struggle when the target's appearance changes significantly in subsequent frames in presence of similar distractors. Some trackers use multiple templates which mostly rely on constant thresholds to update, or they replace those templates that have low similarity scores only with more similar ones. Unlike previous works, we use adaptive thresholds that update the bag with similar templates as well as those templates which are slightly diverse. Adaptive thresholds also cause an overall improvement over constant ones. In addition, mixing feature maps obtained by each template in the last stage of networks removes the need to retrain trackers. Our proposed lightweight temporal network, CombiNet, learns the path history of different objects using only object coordinates and predicts target's potential location in the next frame. It is tracker independent and applying it on new trackers does not need further training. By implementing these ideas, trackers' performance improved on all datasets tested on, including LaSOT, LaSOT extension, TrackingNet, OTB100, OTB50, UAV123 and UAV20L. Experiments indicate the proposed framework works well with both convolutional and transformer-based trackers. The official python code for this paper will be publicly available upon publication.

下载PDF全文

下载文献需遵守相关版权规定

论文标题