完善和表示：区域对目标表示学习

论文标题

完善和表示：区域对目标表示学习

Refine and Represent: Region-to-Object Representation Learning

论文作者

Gokul, Akash, Kallidromitis, Konstantinos, Li, Shufan, Kato, Yusuke, Kozuka, Kazuki, Darrell, Trevor, Reed, Colorado J

论文摘要

自我监督学习中的最新著作通过以对象为中心或基于区域的对应目标进行预处理，在场景级密集的预测任务上表现出了强劲的表现。在本文中，我们提出了统一基于区域和以对象为中心的训练的区域对象表示学习（R2O）。 R2O通过训练编码器以动态改进基于区域的段为以对象为中心的面具，然后共同学习掩模中内容的表示形式。 R2O使用“区域改进模块”对小型图像区域进行分组，该区域是使用区域级先验生成的小图像区域，这些区域倾向于通过聚类区域级特征与对象相对应。随着训练的进展，R2O遵循了一个区域到对象的课程，该课程鼓励学习区域水平的早期特征并逐渐进步以训练以对象为中心的表示。使用R2O在Pascal VOC（+0.7 MIOU）和CityScapes（+0.4 MIOU）的语义细分方面进行了最先进的表现，并在MS Coco（+0.3 Mask AP）上进行了实例细分。此外，在对Imagenet进行了预审进之后，R2O预处理的模型能够超过Caltech-UCSD鸟类200-2011数据集（+2.9 MIOU）的无监督物体分割中现有的最新模型，而无需进行任何进一步的培训。我们在https://github.com/kkallidromitis/r2o上提供了这项工作的代码/模型。

Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives. In this paper, we present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining. R2O operates by training an encoder to dynamically refine region-based segments into object-centric masks and then jointly learns representations of the contents within the mask. R2O uses a "region refinement module" to group small image regions, generated using a region-level prior, into larger regions which tend to correspond to objects by clustering region-level features. As pretraining progresses, R2O follows a region-to-object curriculum which encourages learning region-level features early on and gradually progresses to train object-centric representations. Representations learned using R2O lead to state-of-the art performance in semantic segmentation for PASCAL VOC (+0.7 mIOU) and Cityscapes (+0.4 mIOU) and instance segmentation on MS COCO (+0.3 mask AP). Further, after pretraining on ImageNet, R2O pretrained models are able to surpass existing state-of-the-art in unsupervised object segmentation on the Caltech-UCSD Birds 200-2011 dataset (+2.9 mIoU) without any further training. We provide the code/models from this work at https://github.com/KKallidromitis/r2o.

下载PDF全文

下载文献需遵守相关版权规定

论文标题