自我监督的视频对象细分

论文标题

自我监督的视频对象细分

Self-supervised Video Object Segmentation

论文作者

Zhu, Fangrui, Zhang, Li, Fu, Yanwei, Guo, Guodong, Xie, Weidi

论文摘要

本文的目的是自我监督的表示学习，目的是解决半监督视频对象细分（又称密集跟踪）。我们做出以下贡献：（i）我们建议通过简单，更有效的记忆机制来改善现有的自我监督方法，以解决长期对应匹配，这解决了由于对象的差异和重新施加的挑战；（ii）通过在线适应模块增强自我监督的方法，我们的方法成功地减轻了由空间不连续性引起的跟踪器漂移，例如闭塞或隔离，快速运动；（iii）我们探讨了自我监督的表示学习对密集跟踪的效率，令人惊讶的是，我们表明，强大的跟踪模型可以接受只有100个原始视频剪辑（相当于11分钟的持续时间）的训练，这表明低级统计数据已经有效地用于跟踪任务；（iv）我们在戴维斯-2017和YouTube-Vos的自我监督方法中展示了最新的结果，并超过了大多数接受数以百万计的手动分割注释培训的方法，进一步弥合了自我求职者和被监督的学习之间的差距。释放代码以促进任何进一步的研究（https://github.com/fangruizhu/self_sup_semivos）。

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching, which resolves the challenge caused by the dis-appearance and reappearance of objects; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity, e.g. occlusions or dis-occlusions, fast motions; (iii) we explore the efficiency of self-supervised representation learning for dense tracking, surprisingly, we show that a powerful tracking model can be trained with as few as 100 raw video clips (equivalent to a duration of 11mins), indicating that low-level statistics have already been effective for tracking tasks; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube-VOS, as well as surpassing most of methods trained with millions of manual segmentation annotations, further bridging the gap between self-supervised and supervised learning. Codes are released to foster any further research (https://github.com/fangruizhu/self_sup_semiVOS).

下载PDF全文

下载文献需遵守相关版权规定

论文标题