论文标题

自我监督的视频对象细分

Self-supervised Video Object Segmentation

论文作者

Zhu, Fangrui, Zhang, Li, Fu, Yanwei, Guo, Guodong, Xie, Weidi

论文摘要

本文的目的是自我监督的表示学习,目的是解决半监督视频对象细分(又称密集跟踪)。我们做出以下贡献:(i)我们建议通过简单,更有效的记忆机制来改善现有的自我监督方法,以解决长期对应匹配,这解决了由于对象的差异和重新施加的挑战; (ii)通过在线适应模块增强自我监督的方法,我们的方法成功地减轻了由空间不连续性引起的跟踪器漂移,例如闭塞或隔离,快速运动; (iii)我们探讨了自我监督的表示学习对密集跟踪的效率,令人惊讶的是,我们表明,强大的跟踪模型可以接受只有100个原始视频剪辑(相当于11分钟的持续时间)的训练,这表明低级统计数据已经有效地用于跟踪任务; (iv)我们在戴维斯-2017和YouTube-Vos的自我监督方法中展示了最新的结果,并超过了大多数接受数以百万计的手动分割注释培训的方法,进一步弥合了自我求职者和被监督的学习之间的差距。释放代码以促进任何进一步的研究(https://github.com/fangruizhu/self_sup_semivos)。

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching, which resolves the challenge caused by the dis-appearance and reappearance of objects; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity, e.g. occlusions or dis-occlusions, fast motions; (iii) we explore the efficiency of self-supervised representation learning for dense tracking, surprisingly, we show that a powerful tracking model can be trained with as few as 100 raw video clips (equivalent to a duration of 11mins), indicating that low-level statistics have already been effective for tracking tasks; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube-VOS, as well as surpassing most of methods trained with millions of manual segmentation annotations, further bridging the gap between self-supervised and supervised learning. Codes are released to foster any further research (https://github.com/fangruizhu/self_sup_semiVOS).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源